This tutorial explains classification in details, however you can check the MNIST demo to get a general idea.
In this tutorial, we shall build a handwritten digit classifier with the datasets from the previous tutorial as the train/test data. If you did not go through that tutorial or don't want to, the compiled dataset we are going to use in this tutorial can be downloaded here mnist_compiled_data.zip.
You should unzip this to some folder (Example: /home/rex/eb_dataset/)
In our tutorial, we shall build a convolution network which would distinguish different digits from each other and give a correct label to each input digit image in our test set.
If you do not know what a convolution network(convnet) is, we recommend that you read about convnets over heresection for a quick starter on the concepts behind convnets.
In this tutorial, we are going to build a convnet with the architecture given in Figure 1. This architecture is commonly referred to as Lenet-5

Let us first list out what we already know about how we want our convnet to be.
In this step, we are going to write a very simple configuration file that passes several parameters to a generic training binary. The trainer reads the conf file and trains the network accordingly. We adopted this method because it is much easier and less cumbersome to write and make changes to a conf file than writing your c++ trainer and having to recompile it every time you change your architecture or parameters. We are going to do the following next:
1. Name a file as tutorial2.conf and open it with your favorite text editor.
The conf file that we are going to write has very simple syntax and is similar to unix shell scripting syntax
Comments start with a #
Variables are defined without a $ but are referred with a $. Example
a = hi #defining variable 'a' to be the string "hi"
b = ${a} #here b will become "hi"
b = a #here b will become the string "a"
c = 10 # c takes the value 10
Isn't that simple :)
To describe the convnet architecture shown in the picture above, we use the syntax scheme described above, and write our configuration file.
The key to describe the architecture, is the arch variable. The training utility looks for that variable and derives the architecture from that. Since we want our architecture to be cscscf, i.e. convolution, bias, non-linearity, subsampling, convolution, bias, non-linearity, subsampling, convolution bias, non-linearity, and fully connected (as described in the image),let us write our arch variable to reflect that. In our convnet architecture, after each convolution layer, we also add an “absolute” layer and a subtractive normalization layer, which improve performance usually.
Each layer is expected to be separated by a comma ,
arch=conv0,addc0,tanh,abs0,wstd0,l2pool1,addc1,tanh,conv2,addc2,tanh,abs2,wstd1,l2pool3,addc3,tanh,conv5,addc5,tanh,linear6,addc6,tanh,linear7,addc7,tanh
As you see, this is a very long line, and gets very ugly. Hence, we can use our variable magic and write a cleaner looking syntax, that will get the same result.
arch = ${features},${classifier}
features = ${c0},${s1},${c2},${s3}
classifier = ${c5},${f7}
nonlin = tanh # type of non-linearity
pool = l2pool # subs (is another option)
# main branch layers
c0 = conv0,addc0,${nonlin},abs0,wstd0
s1 = ${pool}1,addc1,${nonlin}
c2 = conv2,addc2,${nonlin},abs2,wstd2
s3 = ${pool}3,addc3,${nonlin}
c5 = conv5,addc5,${nonlin}
f7 = linear7,addc7,${nonlin}
Now we have to describe each layer's properties. For example, conv0 as seen in the image, has a kernel size of 5×5 for its convolutions and a stride of 1×1. l2pool1 has a subsampling size of 2×2 (i.e. it halves the size of the feature-map).
Describing these properties is simple. For each layer, you define each of it's property by having a variable with the naming scheme [variable name]_[property] = [property description].
For our architecture, we can describe our architecture properties with the following code.
# main branch parameters
classifier_hidden = 16 # number of hidden units in 2-layer classifier
conv0_kernel = 5x5 # convolution kernel sizes (hxw)
conv0_stride = 1x1 # convolution strides (hxw)
conv0_table = # convolution table (optional)
conv0_table_in = 1 # conv input max, used if table file not defined
conv0_table_out = 6 # features max, used if table file not defined
conv0_weights = # manual loading of weights (optional)
addc0_weights = # manual loading of weights (optional)
wstd0_kernel = ${conv0_kernel} # normalization kernel sizes (hxw)
subs1_kernel = 2x2 # subsampling kernel sizes (hxw)
subs1_stride = ${subs1_kernel} # subsampling strides (hxw)
l2pool1_kernel = 2x2 # subsampling kernel sizes (hxw)
l2pool1_stride = ${l2pool1_kernel} # subsampling strides (hxw)
addc1_weights = # manual loading of weights (optional)
conv2_kernel = 5x5 # convolution kernel sizes (hxw)
conv2_stride = 1x1 # convolution strides (hxw)
#conv2_table = ${tblroot}/table_6_16_connect_60.mat # conv table (optional)
conv2_table_in = thickness # use current thickness as max table input
conv2_table_out = 16 # features max, used if table file not defined
conv2_weights = # manual loading of weights (optional)
addc2_weights = # manual loading of weights (optional)
wstd2_kernel = ${conv2_kernel} # normalization kernel sizes (hxw)
subs3_kernel = 2x2 # subsampling kernel sizes (hxw)
subs3_stride = ${subs3_kernel} # subsampling strides (hxw)
l2pool3_kernel = 2x2 # l2poolampling kernel sizes (hxw)
l2pool3_stride = ${l2pool3_kernel} # subsampling strides (hxw)
addc3_weights = # manual loading of weights (optional)
linear5_in = ${linear5_in_${net}} # linear module input features size
linear5_out = noutputs # use number of classes as max table output
linear6_in = thickness # linear module input features size
linear6_out = ${classifier_hidden}
linear7_in = thickness # use current thickness
linear7_out = noutputs # use number of classes as max table output
conv5_kernel = 5x5 # convolution kernel sizes (hxw)
conv5_stride = 1x1 # convolution strides (hxw)
conv5_table_in = thickness # use current thickness as max table input
conv5_table_out = 120 # features max, used if table file not defined
With this, we finished describing our architecture.
We have to describe a few more parameters. For one, we have to give the trainer the path to our dataset files (that we compiled with dscompile in the previous tutorial).
# training #####################################################################
classification = 1 # load datasets in classification mode, regression otherwise
dataset_path = /home/rex/eb_dataset #replace this with where your dataset files are
train = ${dataset_path}/mnist_train_data.mat # training data
train_labels = ${dataset_path}/mnist_train_labels.mat # training labels
# train_size = 10000 # limit number of samples
val = ${dataset_path}/mnist_test_data.mat # validation data
val_labels = ${dataset_path}/mnist_test_labels.mat # validation labels
# val_size = 1000 # limit number of samples
We also have to describe a few advanced parameters, if you do not understand any of these at this point, that is okay.
# energies & answers ###########################################################
trainer = trainable_module1 # the trainer module
trainable_module1_energy = l2_energy # type of energy
answer = class_answer # how to infer answers from network raw outputs
# hyper-parameters
eta = .0001 # learning rate
reg = 0 # regularization
reg_l1 = ${reg} # L1 regularization
reg_l2 = ${reg} # L2 regularization
reg_time = 0 # time (in samples) after which to start regularizing
inertia = 0.0 # gradient inertia
anneal_value = 0.0 # learning rate decay value
anneal_period = 0 # period (in samples) at which to decay learning rate
gradient_threshold = 0.0
iterations = 2 # number of training iterations
ndiaghessian = 100 # number of sample for 2nd derivatives estimation
epoch_mode = 1 # 0: fixed number 1: show all at least once
#epoch_size = 4000 # number of training samples per epoch. comment to ignore.
epoch_show_modulo = 400 # print message every n training samples
sample_probabilities = 0 # use probabilities to pick samples
hardest_focus = 1 # 0: focus on easiest samples 1: focus on hardest ones
ignore_correct = 0 # If 1, do not train on correctly classified samples
min_sample_weight = 0 # minimum probability of each sample
per_class_norm = 1 # normalize probabiliy by class (1) or globally (0)
shuffle_passes = 1 # shuffle samples between passes
balanced_training = 1 # show each class the same amount of samples or not
random_class_order = 0 # class order is randomized or not when balanced
no_training_test = 0 # do not test on training set if 1
no_testing_test = 0 # do not test on testing set if 1
max_testing = 0 # limit testing to this number of samples
save_pickings = 0 # save sample picking statistics
binary_target = 0 # use only 1 output, -1 is negative, +1 positive
test_only = 0 # if 1, just test the data and return
save_weights = 0 # if 0, do not save weights after each iteration
keep_outputs = 1 # keep all outputs in memory
training_precision = double #float
save_weights = 1 # if 0, do not save weights when training
# training display #############################################################
display = 1 # display results
show_conf = 0 # show configuration variables or not
show_train = 1 # enable/disable all training display
show_train_ninternals = 1 # number of internal examples to display
show_train_errors = 0 # show worst errors on training set
show_train_correct = 0 # show worst corrects on training set
show_val_errors = 1 # show worst errors on validation set
show_val_correct = 1 # show worst corrects on validation set
show_hsample = 5 # number of samples to show on height axis
show_wsample = 18 # number of samples to show on height axis
show_wait_user = 0 # if 1, wait for user to close windows
Now we are ready to train :)
To train the system, in a terminal, call eblearn's train utility with argument as the file that we wrote in this tutorial.
train tutorial2.conf