I noticed on the website there were images of lenet being able to classify an
image with multiple digits on it. At least by looking at the code, it seems as
though the lenet demo only gives labels of 0-9. Is there a way to modify this
so an image with 2 digits may be classified?
Recognizing multiple characters requires two modifications:
1. you need to train lenet5 so that it can recognize characters even if they
are surrounded by other (possibly touching) characters. You also need to
train it so that it responds "none of the above" when its input window is not
centered on a character. This can be performed by artificially adding
"flanking" characters to every character shown during training and by adding
training samples with off-center characters and training the network to output
2. Once this network is trained, you can simply enlarge its input to cover
the multiple characters. The network will automatically resize its internal
layers and replicate its output layer. You will get one output vector for each
32x32 window on the input, shifted every 4 pixels. Then, you need to write
some code to interpret that sequence of outputs.
The process is explained in our 1998 paper "Gradient-Based learning applied to
document recognition" available at http://yann.lecun.com
profshadoko, is there a way to change the default behaviour of shifting every
4 pixels to, let's say, 2 pixels?
The shifting is determined by the product of the subsampling ratios of the
subsampling layers in the entire network.
LeNet5 has two subsampling layers with a ratio of 2 each, which makes a total
subsmapling ratio of 4 (one output vector every 4 input pixel). The change
that to 2, you may want to changes the subsampling ratio of the second
subsampling layer to 1.
A more efficient way would be to use another architecture instead of net-
cscscf, perhaps net-csccf or net-ccc.
To change shifting, you need to change lenet5 parameters (I used C++ version
of lenet) lenet5<t\_net>(prm, 32, 32, 5, 5, 2, 2, 5, 5, 2, 2, 120, 10); to
-> lenet5<t\_net>(prm, 32, 32, 5, 5, 1, 1, 5, 5, 2, 2, 120, 10); an after
that, shifting will be 2 pixels. But it's not good idea. Recognizing will be
twice slowly )))
This is not necessary to change shifting to 2, lenet have good recognizing
even if shifts is 6 pixels )))) (lenet5<t\_net>(prm, 32, 32, 5, 5, 3, 3,
5, 5, 2, 2, 120, 10);) May be it's not good way on architecture net-cscscf,
bu It's Work! )))))) and don't use CPU source on 100%