Before understanding of handwritten digit classification, first see the application of this problem.
Application of Handwritten Digit classification
- Recognize number plates of vehicles.
- Process numeric entries in the forms filled up by hand.
- Process bank cheque number, amounts, date.
- Recognize digits from paper or image.
In Handwritten digit classification problem images are directly feed into the learning network rather than fixed feature vectors for images, then using Backpropagation to learn feature vector.
To design a network that can generalize the problem, We need to strongly consider Architecture of the network. To solve any unique problem or to improve focus on Architecture of the network. For example if existing network is overfitting or taking more time to train then you may consider redesign the network in such a way that it minimize the free parameters.
Why Handwritten digit classification?
- Simple problem
- It contain only black and white pixels
- digits can separated from background
- Input to network is Normalized image. In shape recognition we combine local feature to detect object. We can also use local features to detect character, but these local features are determined by learning network.
Problem with handwritten digits
Every person have different writing style so size, width, orientation of handwritten digit may differ from person to person.
Digits can located at different location so how network can handle it:
- Scan the image with local receptive field and store the state in Neurons. Like sliding kernel size over the image and performing convolution, followed by squashing function and store the output. Collection of each output is called feature map, which represent next layer.
- Local receptive field share the same weight for each feature map. This technique is called Weight sharing and benefit of this technique is that it reduce the number of free parameters.
- Having different feature maps will extract different features from the image that may need in the further steps.
- The idea of local receptive field and convolution feature map can be applied to subsequent hidden layer, to extract features.
- Features extracted at higher are less affected by the change in location.
The overall flow data in Network:
First hidden layer extract some feature maps, which followed by an next hidden layer, which perform local averaging, sub sampling, reducing resolution of the feature map, then again some feature extraction, again averaging, reduction. In final output layer is connected with last hidden layer.
28×28 input image > Four 24×24 feature maps convolutional layer (5×5 size) > Average Pooling layers (2×2 size) > Eight 12×12 feature maps convolutional layer (5×5 size) > Average Pooling layers (2×2 size) > Directly fully connected to the output
Implementation of Lenet-1
import torch import torchvision from torchvision import transforms, datasets import torch.nn as nn import torch.nn.functional as F import torch.optim as optim import numpy as np
Download the train and test dataset.
- transform.ToTensor(): convert the images into Torch Tensor.
train = datasets.MNIST("", train = True, download = True, transform = transforms.Compose([ transforms.ToTensor() ])) test = datasets.MNIST("", train = False, download = True, transform = transforms.Compose([ transforms.ToTensor() ]))
Loading the data, setting batch size and shuffling the data.
trainset = torch.utils.data.DataLoader(train, batch_size = 8, shuffle = True) testset = torch.utils.data.DataLoader(test, batch_size = 8, shuffle = True)
class Net(nn.Module): def __init__(self): super(Net, self).__init__() # first layer self.conv1 = nn.Conv2d(in_channels=1, out_channels=4, kernel_size = 5) # third layer self.conv2 = nn.Conv2d(in_channels = 4, out_channels=12, kernel_size=5) #output layer self.fc_out = nn.Linear(in_features=192, out_features=10) def forward(self, x): # applying activation on convolution result x = F.relu(self.conv1(x)) # applying averging, Second layer x = F.avg_pool2d(x,kernel_size=2) x = F.relu(self.conv2(x)) x = F.avg_pool2d(x, kernel_size=2) s = self.get_dimension_size(x) x = self.fc_out(x.view(-1, s)) return F.log_softmax(x, dim = 1) def get_dimension_size(self, x): size = x.size()[1:] num_features = 1 for s in size: num_features *= s return num_features
Training the model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") print(device) net = Net().to(device) print(net) optimizer = optim.Adam(net.parameters(), lr = 0.001) EPOCHS = 20 loss_list =  for epoch in range(EPOCHS): correct = 0 total = 0 for data in trainset: X, y = data net.zero_grad() X, y = X.to(device), y.to(device) output = net.forward(X) loss = F.nll_loss(output, y) loss.backward() optimizer.step() torch.cuda.empty_cache() loss_list.append(np.sum(loss.item()))
Defining Evaluation function
def evaluation(dataloader): total, correct = 0, 0 #keeping the network in evaluation mode net.eval() for data in dataloader: inputs, labels = data #moving the inputs and labels to gpu inputs, labels = inputs.to(device), labels.to(device) outputs = net.forward(inputs) _, pred = torch.max(outputs.data, 1) total += labels.size(0) correct += (pred == labels).sum().item() return 100 * correct / total
Saving the trained model
A relevant research paper.