lenet5 demo to classify double digits


  • Anonymous

    I noticed on the website there were images of lenet being able to classify an
    image with multiple digits on it. At least by looking at the code, it seems as
    though the lenet demo only gives labels of 0-9. Is there a way to modify this
    so an image with 2 digits may be classified?

  • Yann LeCun
    Yann LeCun

    Recognizing multiple characters requires two modifications:
    1. you need to train lenet5 so that it can recognize characters even if they
    are surrounded by other (possibly touching) characters. You also need to
    train it so that it responds "none of the above" when its input window is not
    centered on a character. This can be performed by artificially adding
    "flanking" characters to every character shown during training and by adding
    training samples with off-center characters and training the network to output
    all -1.
    2. Once this network is trained, you can simply enlarge its input to cover
    the multiple characters. The network will automatically resize its internal
    layers and replicate its output layer. You will get one output vector for each
    32x32 window on the input, shifted every 4 pixels. Then, you need to write
    some code to interpret that sequence of outputs.

    The process is explained in our 1998 paper "Gradient-Based learning applied to
    document recognition" available at http://yann.lecun.com


  • Anonymous

    profshadoko, is there a way to change the default behaviour of shifting every
    4 pixels to, let's say, 2 pixels?

  • Yann LeCun
    Yann LeCun

    The shifting is determined by the product of the subsampling ratios of the
    subsampling layers in the entire network.
    LeNet5 has two subsampling layers with a ratio of 2 each, which makes a total
    subsmapling ratio of 4 (one output vector every 4 input pixel). The change
    that to 2, you may want to changes the subsampling ratio of the second
    subsampling layer to 1.
    A more efficient way would be to use another architecture instead of net-
    cscscf, perhaps net-csccf or net-ccc.

    • Yann

  • 2010-05-23

    To change shifting, you need to change lenet5 parameters (I used C++ version
    of lenet) lenet5<t\_net>(prm, 32, 32, 5, 5, 2, 2, 5, 5, 2, 2, 120, 10); to
    -> lenet5<t\_net>(prm, 32, 32, 5, 5, 1, 1, 5, 5, 2, 2, 120, 10); an after
    that, shifting will be 2 pixels. But it's not good idea. Recognizing will be
    twice slowly )))

    This is not necessary to change shifting to 2, lenet have good recognizing
    even if shifts is 6 pixels )))) (lenet5<t\_net>(prm, 32, 32, 5, 5, 3, 3,
    5, 5, 2, 2, 120, 10);) May be it's not good way on architecture net-cscscf,
    bu It's Work! )))))) and don't use CPU source on 100%