Getting started ProjectDescriptionSourceforge Page CodeDownload & InstallationBrowse (SVN) Coding Guidelines Tutorialslibidxlibeblearn tools DemosSimple demoMNIST: Digit Recognition Face detector Documentationlibidxlibeblearn libidxgui libeblearngui libeblearntools |
libeblearn tutorial: energy-based learning in C++By Pierre Sermanet and Yann LeCun (New York University)The eblearn (energy-based learning) C++ library libeblearn contains machine learning algorithms which can be used for computer vision. The library has a generic and modular architecture, allowing easy prototyping and building of different algorithms (supervised or unsupervised learning) and configurations from basic modules. Those algorithms were used for a variety for applications, including robotics with the Learning Applied to Ground Robots DARPA project (LAGR).
Energy-based learningWhat is the energy-based learning? More resources on energy-based models:
Modular Architecture and Building BlocksEblearn was designed to be modular so that any module can be arranged is different ways with other modules to form different layers and different machines. There are 2 main types of modules (declared in EblArch.h):
Each module derives from one of those two base classes and implements the fprop, bprop, bbprop and forget methods:
Note that the type of inputs and outputs the modules accept are state_idx objects which are temporary buffers containing the results of a module's processing. For example, an fprop call will fill the x slot of the output state_idx, whereas a bprop call will fill the dx slot of the input state_idx (using the dx slot of the output state_idx). Next we describe some modules and show how they are combined with other modules. We first talk about a few basic modules, which are then used to form layers that are again used to form machines. Note that all the modules that we are describing (basic modules, layers and machines) derive from module_1_1 or module_2_1 which means that you can write and combine your own modules in your own ways, which are not restricted to the way we describe here. Basic module examplesconstant addition, linear, convolution and subsampling modulesThose basic modules (found in EblBasic.h) are used in the LeNet architecture to perform the basic operations:
non-linear modulesThese modules (EblNonLinearity.h) perform non-linear operations:
Layer module examplesThese layers (EblLayers.h) are built by stacking the basic modules described previously on top of each other to form more complicated operations:
Machine module examplesLike the layers are built by assembling basic modules, machines (EblMachines.h) can be built by assembling layers together, for instance the following machines:
Trainable machine modulesThe modules described in the previous sections need to be encapsulated in a module_2_1 with a loss function modules in order to be trained supervised. For example, to train the nn_machine_cscscf machines, we combine it with a euclidean cost module:
Module ReplicabilityModules usually operate on a specific number of dimensions, for example the convolution_module_2D only accepts inputs with 3 dimensions (because it applies 2D convolution on dimensions 1 and 2, dimension 0 is used according to a connection table). Thus if extra dimensions are present (e.g. 4D or 5D) one might want to loop over the extra dimensions and call the convolution_module_2 on each 3D subsets. We call this replicability because the module is replicated over the 3rd and 4th dimensions (the output also has 2 extra dimensions). To make a module replicable, use the DECLARE_REPLICABLE_MODULE_1_1 macro (in EblArch.h). It will automatically declare your module_1_1 as replicable and loop over extra dimensions if present. For example, here is the code to declare the convolution_module_2D as replicable: DECLARE_REPLICABLE_MODULE_1_1(linear_module_replicable, linear_module, (parameter &p, intg in, intg out), (p, in, out)); GUI displayIf the QT library is present on your system, the eblearn project will automatically compile the GUI libraries corresponding to libidx and libeblearn. Thus it produces the libeblearngui library which provides display functions for the eblearn modules. For instance, it implements the display_fprop methods for each module_1_1, thus allowing to display every stage of the internal representations of the neural network by calling that method on the top-level module (See figure c). For more details about the GUI features, refer to the GUI section in the libidx tutorial. Briefly, the libidxgui provides a global object "gui" which can open new windows (gui.new_window()), draw matrices (gui.draw_matrix()), draw text (gui << at(42, 42) << "42" << endl) and some other functionalities. Supervised LearningEblearn provides supervised learning algorithms as well as semi-supervised and unsupervised ones. They can be used independently or combined, we will focus on supervised algorithms only in this section. 1. Build your datasetWhat data should you provide to the network?Creating datasets from image directoriesOnce you have grouped all your images in different directories for each class, call the dscompile tool to transform them into a dataset object (a matrix of images with their corresponding label). This tool will extract the label information from the file directory structure: at the source level, each directory is a class named after the directory name. Then all images found in subdirectories (regardless of their names or hierarchy) are assigned the same label and are added to the dataset. For example with the following directory structrure, dscompile will automatically build a dataset with 2 different labels, "car" and "human" and will contain 5 images in total: /data$ ls -R * car: auto.png car01.png human: human1 human2 human/human1: img01.png human/human2: img01.png img02.png dscompile will create the following files:
For more details, see the tools manuals of dscompile, dssplit, dsmerge, and dsdisplay. Loading datasets into a LabeledDataSource object2. Build your networkThe eblearn neural networks are build by combining modules together as introduced at the beginning of this tutorial. Depending on your application, you may want to write your own modules or combine already existing modules in different ways. For example for simple applications (See Perceptron demo), you may want to have just one fully-connected layer in your machine. For more complex tasks, some architectures are already built-in, such as the lenet5 and lenet7 architectures. The lenet5 and lenet7 classes derive from the nn_machine_cscscf class and only specify specific sizes for the input, kernels and outputs. As described earlier, the lenet5 architecture is capable of learning handwritten caracter recognition (10 categories) while the lenet7 was used for object recognition (5 categories). Your application will probably be different than the MNIST or NORB applications but if your goal is object recognition you want to reuse the nn_machine_cscscf class and the lenet5 or lenet7 parameters but change a few of them (See the constructors of lenet5 and lenet7 for more details). The main changes you will have to do are:
Remember that once you chose your network parameters and trained your network, you have to reuse the exact same parameters when running it. The trained network will be saved in a single parameter file and to be reused correctly it needs to be loaded with the exact same network it was trained with. You now have your dataset and a network architecture, you are ready to train it. 3. Train your networka. Make your network trainableFirst you need to make your network trainable. Currently it is a module_1_1 object (as shown in figure b) that is not trainable but only runable (e.g. lenet7). To make it trainable, you need to encapsulate it in a module_2_1 with a loss function module which will compute an error distance to the target label (e.g. plane label) so that the network knows how much it should correct its weights in order to give the right answer. For instance, the supervised_euclidean_machine class is a module_2_1 object that takes a module_1_1 in its constructor and the output targets for each output class. When presented with an input image and a label, it computes the network output via the fprop method and computes the euclidean distance between the network output and the target output. Then to learn from the errors (or minimize the energies), the bprop method is called and backpropagates the gradients of the errors all the way back through the network. b. Create a trainerThe training procedure can be handled in the supervised case by the supervised_trainer class. This class takes a trainable machine (a module_2_1) and a LabeledDataSource to train and test on the dataset. c. Compute the second derivatives (bbprop)The second derivatives are used to set individual learning rates for each parameter of the network, to help speed-up the learning process and also improve the quality of the learning. The second derivatives are computed over say a hundred iterations once before starting the training. They are back-propagated through the bbprop methodes. To compute the second derivatives, call the compute_diaghessian method of the trainer as follow: thetrainer.compute_diaghessian(train_ds, iterations, 0.02); d. Train and testAfter computing the second derivatives, you can iteratively train and test the network. By testing the results on both the training and the testing sets after each training iteration, you will get a sense of the convergence of the training. Here is an example of training for 100 iterations and displaying the training-set and testing-set results at each step:for (int i = 0; i < 100; ++i) { thetrainer.train(train_ds, trainmeter, gdp, 1); cout << "training: " << flush; thetrainer.test(train_ds, trainmeter, infp); trainmeter.display(); cout << " testing: " << flush; thetrainer.test(test_ds, testmeter, infp); testmeter.display(); } Here is a typical output of what you should see when training your network: $ ./mnist /d/taf/data/mnist * MNIST demo: learning handwritten digits using the eblearn C++ library * Computing second derivatives on MNIST dataset: diaghessian inf: 0.985298 sup: 49.7398 Training network on MNIST with 2000 training samples and 1000 test samples training: [ 2000] size=2000 energy=0.19 correct=88.80% errors=11.20% rejects=0.00% testing: [ 2000] size=1000 energy=0.163 correct=90.50% errors=9.50% rejects=0.00% training: [ 4000] size=2000 energy=0.1225 correct=93.25% errors=6.75% rejects=0.00% testing: [ 4000] size=1000 energy=0.121 correct=92.80% errors=7.20% rejects=0.00% training: [ 6000] size=2000 energy=0.084 correct=95.45% errors=4.55% rejects=0.00% testing: [ 6000] size=1000 energy=0.098 correct=94.70% errors=5.30% rejects=0.00% training: [ 8000] size=2000 energy=0.065 correct=96.45% errors=3.55% rejects=0.00% testing: [ 8000] size=1000 energy=0.095 correct=95.20% errors=4.80% rejects=0.00% training: [10000] size=2000 energy=0.0545 correct=97.15% errors=2.85% rejects=0.00% testing: [10000] size=1000 energy=0.094 correct=95.80% errors=4.20% rejects=0.00% 4. Run your networkMulti-resolution detection: Classifier2DWhile the Trainer class takes a module_1_1 and trains it on a dataset, the Classifier2D class takes a trained network as input (loading a 'parameter' saved in an Idx file) to detect objects in images of any size and at different resolution. It resizes the input image to different sizes based on the passed resolutions parameters and applies the network at each scale. Finally, the values in the outputs of the network that are higher than a certain threshold will return a positive detection at the position in the image and a specific scale. // parameter, network and classifier // load the previously saved weights of a trained network parameter theparam(1); // input to the network will be 96x96 and there are 5 outputs lenet7_binocular thenet(theparam, 96, 96, 5); theparam.load_x(mono_net.c_str()); Classifier2D cb(thenet, sz, lbl, 0.0, 0.01, 240, 320); // find category of image Idx Further ReadingHere are resources that might be helpful in understanding in more details how the supervised convolutional neural networks work:
Unsupervised LearningSemi-supervised Learning |