Recognizing Handwritten Digits with Machine Learning
|
Introduction
|
|
Using the DeepLearning package, this application trains a neural network to recognize the numbers in images of handwritten digits. The trained neural network is then applied to a number of test images.
The training and testing images are a very small subset of the MNIST database of handwritten digits; these consist of 28 x 28 pixel images of a handwritten digit, ranging from 0 to 9. A sample image for the digit zero is .
Ultimately, this application generates an vector of weights for each digit; think of weights as a marking grid for a multiple choice exam. When reshaped into a matrix, a weight vector for the digit 0 might look like this.
When attempting to recognize the number in an image
• |
If a pixel with a high intensity lands in the red area, the evidence is high that the handwritten digit is zero
|
• |
Conversely, if a pixel with a high intensity lands in the blue area, the evidence is low that the handwritten digit is zero
|
The DeepLearning package is a partial interface to Tensorflow, an open-source machine learning framework. To learn about the machine learning techniques used in this application, please consult these references (the next section, however, features a brief overview)
|
|
Notes
|
|
|
Introduction
|
|
We first build a computational (or dataflow) graph. Then, we create a Tensorflow session to run the graph.
Tensorflow computations involve tensors; think of tensors as multidimensional arrays.
|
|
Images
|
|
Each 28 x 28 image is flattened into a list with 784 elements.
Once flattened, the training images are stored in a tensor x, with shape of [none, 784]. The first index is the number of training images ("none" means that we can use an arbitrary number of training images).
|
|
Labels
|
|
Each training image is associated with a label.
• |
Labels are a 10-element list, where each element is either 0 or 1
|
• |
All elements apart from one are zero
|
• |
The location of the non-zero element is the "value" of the image
|
So for an image that displays the digit 5, the label is [ 0,0,0,0,0,1,0,0,0,0]. This is known as a one-hot encoding.
All the labels are stored in a tensor y_ with a shape of [none, 10].
|
|
Training
|
|
The neural network is trained via multinomial logistic regression (also known as softmax).
Step 1
Calculate the evidence that each image is in a selected class. Do this by performing a weighted sum of the pixel intensity for the flattened image.
where
• |
Wi,j and bi are the weight and the bias for digit i and pixel j. Think of W as a matrix with 784 rows (one for each pixel) and 10 columns (one for each digit), and b is a vector with 10 columns (one for each digit)
|
• |
xj is the intensity of pixel j
|
Step 2
Normalize the evidence into a vector of probabilities with softmax.
Step 3
For each image, calculate the cross-entropy of the vector of predicted probabilities and the actual probabilities (i.e the labels)
where
• |
y_ is the true distribution of probabilities (i.e. the one-hot encoded label)
|
• |
y is the predicted distribution of probabilities
|
The smaller the cross entropy, the better the prediction.
Step 4
The mean cross-entropy across all training images is then minimized to find the optimum values of W and b
|
|
Miscellaneous
|
|
This application consists of
• |
and a very small subset of images from the MNIST handwritten digit database
|
in a single zip file. The images are stored in folders; the folders should be extracted to the location as this worksheet.
|
|
|
Load Packages and Define Parameters
|
|
> |
restart:
with(DeepLearning):
with(DocumentTools):
with(DocumentTools:-Layout):
with(ImageTools):
|
> |
LEARNING_RATE := 0.01:
TRAIN_STEPS := 40:
|
Number of training images to load for each digit (maximum of 100)
Number of labels (there are 10 digits, so this is always 10)
Number of test images
|
|
Import Training Images and Generate Labels
|
|
Import the training images, where images[n] is a list containing the images for digit n.
> |
path := "C:/Users/Wilfried/Documents/Maple/Examples/ML/":
for j from 0 to L - 1 do
images[j] := [seq(Import(cat(path, j, "/", j, " (", i, ").PNG")), i = 1 .. N)];
end do:
|
Generate the labels for digit j, where label[n] is the label for image[n].
> |
for j from 0 to L - 1 do
labels[j] := ListTools:-Rotate~([[1,0,0,0,0,0,0,0,0,0]$N],-j)[]:
end do:
|
Display training images
> |
Embed([seq(images[i-1], i = 1 .. L)]);
|
|
|
Training
|
|
Flatten and collect images
> |
x_train := convert~([seq(images[i - 1][], i = 1 .. L)], list):
|
Collect labels
> |
y_train := [seq(labels[i - 1], i = 1 .. L)]:
|
Define placeholders x and y to feed the training images and labels into
> |
SetEagerExecution(false):
x := Placeholder(float[4], [none, 784]):
y_ := Placeholder(float[4], [none, L]):
|
Define weights and bias
> |
W := Variable(Array(1 .. 784, 1 .. L), datatype = float[4]):
b := Variable(Array(1 .. L), datatype = float[4]):
|
Define the classifier using multinomial logistic regression
Define the cross-entropy (i.e. the cost function)
> |
cross_entropy := ReduceMean(-ReduceSum(y_ * log(y), reduction_indicies = [1])):
|
Get a Tensorflow session
> |
sess := GetDefaultSession():
|
Initialize the variables
> |
init := VariablesInitializer():
sess:-Run(init):
|
Define the optimizer to minimize the cross entropy
> |
optimizer := Optimizer(GradientDescent(LEARNING_RATE)):
training := optimizer:-Minimize(cross_entropy):
|
Repeat the optimizer many times
> |
for i from 1 to TRAIN_STEPS do
sess:-Run(training, {x in x_train, y_ in y_train}):
if i mod 200 = 0 then
print(cat("loss = ", sess:-Run(cross_entropy, {x in x_train, y_ in y_train})));
end if:
end do:
|
|
|
Import Test Images and Predict Numbers
|
|
Randomize the order of the test images.
> |
i_rand := combinat:-randperm([seq(i, i = 1 .. 100)]);
|
|
(6.1) |
Load and flatten test images.
> |
path:= "C:/Users/Wilfried/Documents/Maple/Examples/ML/test_images":
x_test_images := [seq(Import(cat(path,"/","test (", i, ").png")), i in i_rand[1 .. T])]:
x_train:= convert~(x_test_images, list):
|
For each test image, generate 10 probabilities that the digit is a number from 1 to 10
> |
pred := sess:-Run(y, {x in x_train})
|
|
(6.2) |
For each test image, find the predicted digit associated with the greatest probability
> |
predList := seq( max[index]( pred[i, ..] ) - 1, i = 1 .. T )
|
|
(6.3) |
> |
:
Val:=Vector(10,0):
Val_mean:=Vector(10,0):
for k from 1 to 10 do:
L1:=[]:
for i from 1 to 50 do:
if predList[i]=k-1 then L1:=[op(L1),L[i]] end if:
end do:
Val(k):=evalf(L1,3):
Val_mean(k):=Statistics:-Mean(Array(L1)):
end do:
Val,Val_mean
|
|
(6.4) |
Consider the first test image
> |
Embed(x_test_images[1])
|
The ten probabilities associated with this image are
|
(6.5) |
Confirm that the probabilities add up to 1
> |
add(i, i in pred[1, ..])
|
|
(6.6) |
The maximum probability occurs at this index
> |
maxProbInd := max[index](pred[1, ..])
|
|
(6.7) |
Hence the predicted number is
|
(6.8) |
> |
|
> |
|
We now display all the predictions
> |
T1 := Table(Row(seq(predList[k],k = 1.. 25)),Row( seq(predList[k],k = 26 .. 50 ))
):
InsertContent(Worksheet(T1)):
|
|
|