This tutorial explains classification in details, however you can check the MNIST demo to get a general idea.
In this tutorial, we shall build a handwritten digit classifier with the datasets from the previous tutorial as the train/test data. If you did not go through that tutorial or don't want to, the compiled dataset we are going to use in this tutorial can be downloaded here mnist_compiled_data.zip.
You should unzip this to some folder (Example: /home/rex/eb_dataset/)
In our tutorial, we shall build a convolution network which would distinguish different digits from each other and give a correct label to each input digit image in our test set.
If you do not know what a convolution network(convnet) is, we recommend that you read about convnets over heresection for a quick starter on the concepts behind convnets.
In this tutorial, we are going to build a convnet with the architecture given in Figure 1. This architecture is commonly referred to as Lenet-5
Let us first list out what we already know about how we want our convnet to be.
In this step, we are going to write a very simple configuration file that passes several parameters to a generic training binary. The trainer reads the conf file and trains the network accordingly. We adopted this method because it is much easier and less cumbersome to write and make changes to a conf file than writing your c++ trainer and having to recompile it every time you change your architecture or parameters. We are going to do the following next:
1. Name a file as tutorial2.conf and open it with your favorite text editor.
The conf file that we are going to write has very simple syntax and is similar to unix shell scripting syntax
Comments start with a #
Variables are defined without a $ but are referred with a $. Example
a = hi #defining variable 'a' to be the string "hi" b = ${a} #here b will become "hi" b = a #here b will become the string "a" c = 10 # c takes the value 10
Isn't that simple :)
To describe the convnet architecture shown in the picture above, we use the syntax scheme described above, and write our configuration file.
The key to describe the architecture, is the arch variable. The training utility looks for that variable and derives the architecture from that. Since we want our architecture to be cscscf, i.e. convolution, bias, non-linearity, subsampling, convolution, bias, non-linearity, subsampling, convolution bias, non-linearity, and fully connected (as described in the image),let us write our arch variable to reflect that. In our convnet architecture, after each convolution layer, we also add an “absolute” layer and a subtractive normalization layer, which improve performance usually.
Each layer is expected to be separated by a comma ,
arch=conv0,addc0,tanh,abs0,wstd0,l2pool1,addc1,tanh,conv2,addc2,tanh,abs2,wstd1,l2pool3,addc3,tanh,conv5,addc5,tanh,linear6,addc6,tanh,linear7,addc7,tanh
As you see, this is a very long line, and gets very ugly. Hence, we can use our variable magic and write a cleaner looking syntax, that will get the same result.
arch = ${features},${classifier} features = ${c0},${s1},${c2},${s3} classifier = ${c5},${f7} nonlin = tanh # type of non-linearity pool = l2pool # subs (is another option) # main branch layers c0 = conv0,addc0,${nonlin},abs0,wstd0 s1 = ${pool}1,addc1,${nonlin} c2 = conv2,addc2,${nonlin},abs2,wstd2 s3 = ${pool}3,addc3,${nonlin} c5 = conv5,addc5,${nonlin} f7 = linear7,addc7,${nonlin}
Now we have to describe each layer's properties. For example, conv0 as seen in the image, has a kernel size of 5×5 for its convolutions and a stride of 1×1. l2pool1 has a subsampling size of 2×2 (i.e. it halves the size of the feature-map).
Describing these properties is simple. For each layer, you define each of it's property by having a variable with the naming scheme [variable name]_[property] = [property description].
For our architecture, we can describe our architecture properties with the following code.
# main branch parameters classifier_hidden = 16 # number of hidden units in 2-layer classifier conv0_kernel = 5x5 # convolution kernel sizes (hxw) conv0_stride = 1x1 # convolution strides (hxw) conv0_table = # convolution table (optional) conv0_table_in = 1 # conv input max, used if table file not defined conv0_table_out = 6 # features max, used if table file not defined conv0_weights = # manual loading of weights (optional) addc0_weights = # manual loading of weights (optional) wstd0_kernel = ${conv0_kernel} # normalization kernel sizes (hxw) subs1_kernel = 2x2 # subsampling kernel sizes (hxw) subs1_stride = ${subs1_kernel} # subsampling strides (hxw) l2pool1_kernel = 2x2 # subsampling kernel sizes (hxw) l2pool1_stride = ${l2pool1_kernel} # subsampling strides (hxw) addc1_weights = # manual loading of weights (optional) conv2_kernel = 5x5 # convolution kernel sizes (hxw) conv2_stride = 1x1 # convolution strides (hxw) #conv2_table = ${tblroot}/table_6_16_connect_60.mat # conv table (optional) conv2_table_in = thickness # use current thickness as max table input conv2_table_out = 16 # features max, used if table file not defined conv2_weights = # manual loading of weights (optional) addc2_weights = # manual loading of weights (optional) wstd2_kernel = ${conv2_kernel} # normalization kernel sizes (hxw) subs3_kernel = 2x2 # subsampling kernel sizes (hxw) subs3_stride = ${subs3_kernel} # subsampling strides (hxw) l2pool3_kernel = 2x2 # l2poolampling kernel sizes (hxw) l2pool3_stride = ${l2pool3_kernel} # subsampling strides (hxw) addc3_weights = # manual loading of weights (optional) linear5_in = ${linear5_in_${net}} # linear module input features size linear5_out = noutputs # use number of classes as max table output linear6_in = thickness # linear module input features size linear6_out = ${classifier_hidden} linear7_in = thickness # use current thickness linear7_out = noutputs # use number of classes as max table output conv5_kernel = 5x5 # convolution kernel sizes (hxw) conv5_stride = 1x1 # convolution strides (hxw) conv5_table_in = thickness # use current thickness as max table input conv5_table_out = 120 # features max, used if table file not defined
With this, we finished describing our architecture.
We have to describe a few more parameters. For one, we have to give the trainer the path to our dataset files (that we compiled with dscompile in the previous tutorial).
# training ##################################################################### classification = 1 # load datasets in classification mode, regression otherwise dataset_path = /home/rex/eb_dataset #replace this with where your dataset files are train = ${dataset_path}/mnist_train_data.mat # training data train_labels = ${dataset_path}/mnist_train_labels.mat # training labels # train_size = 10000 # limit number of samples val = ${dataset_path}/mnist_test_data.mat # validation data val_labels = ${dataset_path}/mnist_test_labels.mat # validation labels # val_size = 1000 # limit number of samples
We also have to describe a few advanced parameters, if you do not understand any of these at this point, that is okay.
# energies & answers ########################################################### trainer = trainable_module1 # the trainer module trainable_module1_energy = l2_energy # type of energy answer = class_answer # how to infer answers from network raw outputs # hyper-parameters eta = .0001 # learning rate reg = 0 # regularization reg_l1 = ${reg} # L1 regularization reg_l2 = ${reg} # L2 regularization reg_time = 0 # time (in samples) after which to start regularizing inertia = 0.0 # gradient inertia anneal_value = 0.0 # learning rate decay value anneal_period = 0 # period (in samples) at which to decay learning rate gradient_threshold = 0.0 iterations = 2 # number of training iterations ndiaghessian = 100 # number of sample for 2nd derivatives estimation epoch_mode = 1 # 0: fixed number 1: show all at least once #epoch_size = 4000 # number of training samples per epoch. comment to ignore. epoch_show_modulo = 400 # print message every n training samples sample_probabilities = 0 # use probabilities to pick samples hardest_focus = 1 # 0: focus on easiest samples 1: focus on hardest ones ignore_correct = 0 # If 1, do not train on correctly classified samples min_sample_weight = 0 # minimum probability of each sample per_class_norm = 1 # normalize probabiliy by class (1) or globally (0) shuffle_passes = 1 # shuffle samples between passes balanced_training = 1 # show each class the same amount of samples or not random_class_order = 0 # class order is randomized or not when balanced no_training_test = 0 # do not test on training set if 1 no_testing_test = 0 # do not test on testing set if 1 max_testing = 0 # limit testing to this number of samples save_pickings = 0 # save sample picking statistics binary_target = 0 # use only 1 output, -1 is negative, +1 positive test_only = 0 # if 1, just test the data and return save_weights = 0 # if 0, do not save weights after each iteration keep_outputs = 1 # keep all outputs in memory training_precision = double #float save_weights = 1 # if 0, do not save weights when training # training display ############################################################# display = 1 # display results show_conf = 0 # show configuration variables or not show_train = 1 # enable/disable all training display show_train_ninternals = 1 # number of internal examples to display show_train_errors = 0 # show worst errors on training set show_train_correct = 0 # show worst corrects on training set show_val_errors = 1 # show worst errors on validation set show_val_correct = 1 # show worst corrects on validation set show_hsample = 5 # number of samples to show on height axis show_wsample = 18 # number of samples to show on height axis show_wait_user = 0 # if 1, wait for user to close windows
Now we are ready to train :)
To train the system, in a terminal, call eblearn's train utility with argument as the file that we wrote in this tutorial.
train tutorial2.conf