There are many ways to realize the word recently. I use the distance formula in Euclidean geometry.
The formula of the distance between two points x (x 1, y 1) and y (x2, y2) in two dimensions is sqrt ((x1-x2) 2+(y1-y2) 2).
Expand to n dimensions.
x(x 1,x2,…,xn),y(y 1,y2,…,yn)
sqrt [ ∑( x[i] - y[i] )^2 ] (i= 1,2,…,n)
Knn algorithm is to calculate the distance, that is, the operation between numbers. The images are in png and jpg formats, not numbers, so we need to convert them.
As the picture shows, there is a number 8. The first thing to make sure is that what I do in this step is the simplest transformation, because I assume that there is no clutter between the background and the figure, and the whole figure has only one number (0-9). If there are other situations, such as impure background color or other interference images, the transformation function needs to be redesigned.
The next step is the simplest transformation, which changes the white part (background) of the picture to 0 and the part with image to 1. The converted size should be appropriate, too small will affect the recognition accuracy, and too large will increase the calculation. So I used 32*32 in the book, and the converted result is as shown in the figure.
In this way, the picture becomes a computable number.
Next, we need to create a library that contains various examples of numbers 0-9 similar to the above figure. Because we want to compare the images to be recognized, we choose the first k nearest objects, and the comparison objects are our library. Suppose there are ten numbers from 0 to 9 in the library, and each number has 100 such instances represented by 0 and 1, then we have a *** 1000 instance.
The last step is comparison. Using Euclid's formula for calculating the geometric distance mentioned at the beginning, we must first convert this 32*32 square matrix into the coordinate representation of 1* 1024. Then, calculate the distance between the image to be recognized and 1000 samples in the library, and select the closest top k samples. For example, 50, the probability of the result number is divided by 50. For example, if the number 8 appears 40 times out of 50 times, the probability that the number to be recognized is 8 is 40/50 = 80%.
Personal understanding:
Only a single number can be recognized, and the background cannot be disturbed. If we want to identify multiple numbers or the background is disturbed, we need to consider the specific method of image conversion to 0 1 according to the specific situation.
Digital recognition depends very much on the images in the library, and the appearance of the images in the library seriously affects the image recognition (because we compare with the images in the library to find the closest top k), so the thickness, height and thinness of the numbers are decisive factors, and the possible appearance of the numbers must be fully considered when building the library.
The amount of calculation is relatively large, and the images to be recognized should be calculated one by one with all the examples in the library. If you use 32*32, it is already a 1024 dimension. If there is 1000 in the library, it is 1000 calculations between 1024 dimensional vectors. A clearer image and a richer library will only make the calculation more complicated.
For other numerical problems that can directly calculate the distance, Euclidean distance can be used or other formulas that can express the distance can be used. For non-numerical problems, appropriate transformation is needed, and the transformation method is very important. I think the information should not be lost first, and then it should be accurate and not vague. It is necessary to realize the one-to-one correspondence between before and after image transformation.
References:
Machine learning in actual combat [America] Peter Harrington People's Posts and Telecommunications Publishing House
Python source code
Import quantity
Import operating system
Import pictures from PIL
Import heapq
Import counters from the collection
Def pictureconvert (file name 1, file name 2, size = (32,32)):
#filename 1 image to be recognized, filename2 image to be recognized is converted into 0 1txt file for output, and the default image size is 32*32.
Image_file = Image.open (file name 1)
image _ file = image _ file . resize(size)
width,height = image_file.size
F 1 = Open (file name 1,' r')
F2 = Open (file name 2,' w')
For I (height) in the range:
For j (width) in the range:
pixel = image_file.getpixel((j,I))
Pixel = pixel [0]+pixel [1]+pixel [2]
if(pixel == 0):
Pixel = 0
Elif (pixels! = 765 and pixels! = 0):
Pixel = 1
# 0 stands for black (no image) and 255 stands for white (with image).
# 0/255 = 0,255/255 = 1
f2.write(str(pixel))
if(j == width- 1):
f2.write('\n ')
f 1.close()
f2.close()
Define image vector (file name):
#filename converts the 0 1 text file of the image to be recognized into a vector.
vector = numpy.zeros(( 1, 1024), numpy.int )
Use open (file name) as f:
For I(0.32) in the range:
linestr = f.readline()
For j in the range (0,32):
vector[0,32*i+j] = int(linestr[j])
Return? vector
Def compare (file name 1, file name 2):
#compare directly reads the repository identity.
#filename 1 resource directory, filename2 image to be recognized 0 1 text document path.
Trainingfilelist = os.listdir (file name 1)
M = len (list of training documents)
labelvector = []
trainingmatrix = numpy.zeros((m, 1024),numpy.int8)
For I in the range (0, m):
filenamestr = training filelist[I]
filestr = filenamestr.split(' . ')[0]
class number = int(filestr . split(' _ ')[0])
labelvector.append(classnumber)
trainingmatrix[i,:= img vector(filename 1+'/'+filenamestr)
Textvector = imgvector (file name 2)
result distance = numpy . zeros(( 1,m))
Result = []
For I in the range (0, m):
Result distance [0, i] = numpy.vdot(textvector[0], trainingmatrix[i]).
result indexes = heapq . nlargest(50,range(0,len(resultdistance[0])),resultdistance[0]。 Shoot)
For I:
result.append(labelvector[i])
Number = Counter (result). Most common (1)
Print ('probability that this number is', number [0][0] and' yes','% .2f%'% ((number [0] [1]/len (result)) * 100))
Def distinct (file name 1, file name 2, file name 3, size = (32,32)):
# filename 1 png, jpg and other original image paths, the original image of filename2 is converted into the file path of 0 1 text, and the resource path of filename3.
Pictureconvert (file name 1, file name 2, size)
Comparison (file name 3, file name 2)
URL 1 = "/Users/Wang/Desktop/number . png "
URL 2 = "/Users/Wang/Desktop/number . txt "
training library = "/Users/Wang/Documents/training digits "
Distinguish (url 1, url2, training library)