Hi,
I'm getting an error after the training is completed at the testing step. I'm training a net for neck recognition for an AR project and I'm using the dog hipsterizer and mmod examples as a starting point. I'm using dlib with cuda support on Ubuntu 16.04 on a GTX 660 with cropper samples reduced to 25 from 150 to get rid of out of memory issues on my card.
When program reaches this line:
I'm only starting to learn dlib and Machine Learning in general so most probably I'm doing something wrong. I see from the error that some of the values are not as expected by the failing expression but I have no ideea what that means.
Cheers,
Catalin
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Bellow's also the program listing. Additionally I can provide the training set and the serialized net file if needed.
#include<iostream>#include<string>#include<vector>#include<dlib/dnn.h>#include<dlib/data_io.h>#include<dlib/image_processing.h>usingnamespacestd;usingnamespacedlib;voidshow_usage(){cout<<"Usage:"<<endl<<"Nectracking <option> <path>"<<endl;cout<<"Options:"<<endl<<"\t-train\t\tTrains the detector using the set found at <path>"<<endl;cout<<"\t-track\t\tStarts the detector using the test images found at <path>"<<endl;}// ----------------------------------------------------------------------------------------std::vector<std::vector<double>>get_interocular_distances(conststd::vector<std::vector<full_object_detection>>&objects);/*! ensures - returns an object D such that: - D[i][j] == the distance, in pixels, between the eyes for the face represented by objects[i][j].!*/// ----------------------------------------------------------------------------------------template<longnum_filters,typenameSUBNET>usingcon5d=con<num_filters,5,5,2,2,SUBNET>;template<longnum_filters,typenameSUBNET>usingcon5=con<num_filters,5,5,1,1,SUBNET>;template<typenameSUBNET>usingdownsampler=relu<affine<con5d<32,relu<affine<con5d<32,relu<affine<con5d<16,SUBNET>>>>>>>>>;template<typenameSUBNET>usingrcon5=relu<affine<con5<45,SUBNET>>>;usingnet_type=loss_mmod<con<1,9,9,1,1,rcon5<rcon5<rcon5<downsampler<input_rgb_image_pyramid<pyramid_down<6>>>>>>>>;// ----------------------------------------------------------------------------------------intmain(intargc,char*argv[]){booltrain=false;stringpath;if(argc<3){show_usage();}else{std::vector<string>args;for(autoi=1;i<argc;++i)args.push_back(argv[i]);path=args[1];if(strcmp(args[0].c_str(),"-train")==0){train=true;}elseif(strcmp(args[0].c_str(),"-track")!=0){show_usage();return1;}}if(train){std::vector<matrix<rgb_pixel>>images_train;std::vector<std::vector<mmod_rect>>neck_boxes_train;load_image_dataset(images_train,neck_boxes_train,path);cout<<"num training images: "<<images_train.size()<<endl;mmod_optionsoptions(neck_boxes_train,80*80);cout<<"detection window width,height: "<<options.detector_width<<","<<options.detector_height<<endl;cout<<"overlap NMS IOU thresh: "<<options.overlaps_nms.get_iou_thresh()<<endl;cout<<"overlap NMS percent covered thresh: "<<options.overlaps_nms.get_percent_covered_thresh()<<endl;// Now we are ready to create our network and trainer.net_typenet(options);dnn_trainer<net_type>trainer(net);trainer.set_learning_rate(0.1);trainer.be_verbose();trainer.set_synchronization_file("neck_track_sync",std::chrono::minutes(5));trainer.set_iterations_without_progress_threshold(300);// Now let's train the network. We are going to use mini-batches of 150// images. The images are random crops from our training set (see// random_cropper_ex.cpp for a discussion of the random_cropper).std::vector<matrix<rgb_pixel>>mini_batch_samples;std::vector<std::vector<mmod_rect>>mini_batch_labels;random_croppercropper;dlib::randrnd;// Run the trainer until the learning rate gets small. This will probably take several// hours.while(trainer.get_learning_rate()>=1e-4){cropper(25,images_train,neck_boxes_train,mini_batch_samples,mini_batch_labels);// We can also randomly jitter the colors and that often helps a detector// generalize better to new images.for(auto&&img:mini_batch_samples)disturb_colors(img,rnd);trainer.train_one_step(mini_batch_samples,mini_batch_labels);}// wait for training threads to stoptrainer.get_net();cout<<"done training"<<endl;// Save the network to disknet.clean();serialize("neck_network.dat")<<net;// Now that we have a face detector we can test it. The first statement tests it// on the training data. It will print the precision, recall, and then average precision.// This statement should indicate that the network works perfectly on the// training data.cout<<"training results: "<<test_object_detection_function(net,images_train,neck_boxes_train)<<endl;cout<<"Now let's train the shape predictor:"<<endl;std::vector<std::vector<full_object_detection>>shapes_train;load_image_dataset(images_train,shapes_train,path);// Now make the object responsible for training the model.shape_predictor_trainersp_trainer;// This algorithm has a bunch of parameters you can mess with. The// documentation for the shape_predictor_trainer explains all of them.// You should also read Kazemi's paper which explains all the parameters// in great detail. However, here I'm just setting three of them// differently than their default values. I'm doing this because we// have a very small dataset. In particular, setting the oversampling// to a high amount (300) effectively boosts the training set size, so// that helps this example.sp_trainer.set_oversampling_amount(300);// I'm also reducing the capacity of the model by explicitly increasing// the regularization (making nu smaller) and by using trees with// smaller depths.sp_trainer.set_nu(0.05);sp_trainer.set_tree_depth(5);sp_trainer.set_cascade_depth(20);sp_trainer.set_feature_pool_region_padding(0.2);// some parts of training process can be parallelized.// Trainer will use this count of threads when possiblesp_trainer.set_num_threads(2);// Tell the trainer to print status messages to the console so we can// see how long the training will take.sp_trainer.be_verbose();// Now finally generate the shape modelshape_predictorsp=sp_trainer.train(images_train,shapes_train);// Now that we have a model we can test it. This function measures the// average distance between a face landmark output by the// shape_predictor and where it should be according to the truth data.// Note that there is an optional 4th argument that lets us rescale the// distances. Here we are causing the output to scale each face's// distances by the interocular distance, as is customary when// evaluating face landmarking systems.cout<<"mean training error: "<<test_shape_predictor(sp,images_train,shapes_train,get_interocular_distances(shapes_train))<<endl;// Finally, we save the model to disk so we can use it later.serialize("sp.dat")<<sp;}else{std::vector<matrix<rgb_pixel>>images_train;std::vector<std::vector<mmod_rect>>neck_boxes_train;load_image_dataset(images_train,neck_boxes_train,path);cout<<"num testing images: "<<images_train.size()<<endl;net_typenet;deserialize("neck_network.dat")>>net;// Now that we have a face detector we can test it. The first statement tests it// on the training data. It will print the precision, recall, and then average precision.// This statement should indicate that the network works perfectly on the// training data.cout<<"training results: "<<test_object_detection_function(net,images_train,neck_boxes_train)<<endl;}return0;}// ----------------------------------------------------------------------------------------doubleinterocular_distance(constfull_object_detection&det){dlib::vector<double,2>l,r;doublecnt=0;// Find the center of the left eye by averaging the points around// the eye.for(unsignedlongi=36;i<=41;++i){l+=det.part(i);++cnt;}l/=cnt;// Find the center of the right eye by averaging the points around// the eye.cnt=0;for(unsignedlongi=42;i<=47;++i){r+=det.part(i);++cnt;}r/=cnt;// Now return the distance between the centers of the eyesreturnlength(l-r);}std::vector<std::vector<double>>get_interocular_distances(conststd::vector<std::vector<full_object_detection>>&objects){std::vector<std::vector<double>>temp(objects.size());for(unsignedlongi=0;i<objects.size();++i){for(unsignedlongj=0;j<objects[i].size();++j){temp[i].push_back(interocular_distance(objects[i][j]));}}returntemp;}// ----------------------------------------------------------------------------------------
Last edit: Catalin Moldovan 2016-10-14
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You can't use affine layers like that during training. And really you
shouldn't have affine layers ever in training because it doesn't sense
mathematically. Use bn layers like the dnn_mmod_ex.cpp example shows.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for the answer Davis, but being new to ML I have to ask, isn't the net used in training the same net used for detection? I mean if I serialize a net,the net I deserialize to shouldn't be the same one? This is the reason I used the net definition from dnn_mmod_dog_hipserizer to train on my images because it was my assumption that the net i serialize should have exactly the same definition with the one I deserialize to. But the ones from dnn_mmod_ex and dnn_mmod_dog_hipserizer differ by the use of bn_con and affine layers.
Also if it's not to much to ask, could you recommend a good reading for someone who is just starting to learn about machine learning and who definitelly doesn't understands the concepts as you might have noticed?
Last edit: Catalin Moldovan 2016-10-17
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You should read the introductory dnn example programs that come with dlib.
They talk about this subject. Beyond that, you should read the papers
cited in the dlib documentation for deep learning.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Davis,
your advices really helped me a lot. I'm grateful. I've read the examples before but in a hurry and I missed some essential parts. Now I'm like a kid with a shiny new toy and want to show to everyone. Really starting to get a feel of just how powerfull your library is. Thanks for sharing it with the community.
One layer does batch norm and another just linearly transforms the
outputs. The latter is not useful since you will be stacking adjacent to
some other linear transformation like a convolution, and two linear things
combine into another linear thing, so no additional modeling capacity is
gained.
Hi,
I'm getting an error after the training is completed at the testing step. I'm training a net for neck recognition for an AR project and I'm using the dog hipsterizer and mmod examples as a starting point. I'm using dlib with cuda support on Ubuntu 16.04 on a GTX 660 with cropper samples reduced to 25 from 150 to get rid of out of memory issues on my card.
When program reaches this line:
I get this error message:
I'm only starting to learn dlib and Machine Learning in general so most probably I'm doing something wrong. I see from the error that some of the values are not as expected by the failing expression but I have no ideea what that means.
Cheers,
Catalin
Bellow's also the program listing. Additionally I can provide the training set and the serialized net file if needed.
Last edit: Catalin Moldovan 2016-10-14
You can't use affine layers like that during training. And really you
shouldn't have affine layers ever in training because it doesn't sense
mathematically. Use bn layers like the dnn_mmod_ex.cpp example shows.
Thanks for the answer Davis, but being new to ML I have to ask, isn't the net used in training the same net used for detection? I mean if I serialize a net,the net I deserialize to shouldn't be the same one? This is the reason I used the net definition from dnn_mmod_dog_hipserizer to train on my images because it was my assumption that the net i serialize should have exactly the same definition with the one I deserialize to. But the ones from dnn_mmod_ex and dnn_mmod_dog_hipserizer differ by the use of bn_con and affine layers.
Also if it's not to much to ask, could you recommend a good reading for someone who is just starting to learn about machine learning and who definitelly doesn't understands the concepts as you might have noticed?
Last edit: Catalin Moldovan 2016-10-17
You should read the introductory dnn example programs that come with dlib.
They talk about this subject. Beyond that, you should read the papers
cited in the dlib documentation for deep learning.
Hi Davis,
your advices really helped me a lot. I'm grateful. I've read the examples before but in a hurry and I missed some essential parts. Now I'm like a kid with a shiny new toy and want to show to everyone. Really starting to get a feel of just how powerfull your library is. Thanks for sharing it with the community.
Sweet :)
@davisking What's the difference between bn_con and affine layers? Why doesn't using affine layers in training make sense mathematically?
One layer does batch norm and another just linearly transforms the
outputs. The latter is not useful since you will be stacking adjacent to
some other linear transformation like a convolution, and two linear things
combine into another linear thing, so no additional modeling capacity is
gained.