answer_module

An answer_module serves 2 purposes:

  1. Transform the raw network outputs into the final answer. For example the class_answer outputs a 1-of-n discrete id rather than n continuous values.
  2. Interface a datasource class to feed a trainable network in a specific way. E.g., the class_answer transforms the sample's label from a discrete value into a 1-of-n target vector and feeds the sample and the target vector to the trainable network.

Conceptually, these two functionality could be separated into different modules, but they usually share common parameters that reduce the code complexity if shared in a common class.

Thus an answer module is an interface for both the input and output of a network. It is used as an input/output interface for classification, but only as an output interface in the detection case.

Existing answer modules

A set of pre-existing answer modules are defined:

  • class_answer: the answer is a 2-layer map composed of the discrete class label in layer 1, and the corresponding continuous confidence ([0, 1]) in layer 2.
  • regression_answer: the answer is equal to the raw network output.
  • vote_answer: provides a single voted answer given multiple network answers.

Writing a custom answer module

Depending wether you need to use your answer module for training or detection only, you will need to implement both input/output interfaces or only output interfacing for detection. While training, the trainable_module class will call the appropriate interfaces to feed the network and evaluate the results.

  1. input interface: this interface is implemented via fprop1 and fprop2 methods. It takes a datasource object as input and prepares the network input for flow 1 or 2. Typically, an image is propagated through flow 1 and a discrete label is propagated through flow 2. But other scenarios are possible, for example in a siamese network where a pair of images are learned to produce similar or dissimlar outputs, 2 images will be propagated through each flow. An answer module designed for video processing might also concatenate several frame in fprop1() rather than a single one.
    virtual void fprop1(labeled_datasource<T,Tds1,Tds2> &ds, mstate<Tstate> &out);
    virtual void fprop2(labeled_datasource<T,Tds1,Tds2> &ds, mstate<Tstate> &out);
  2. output interface: this interface is implemented via fprop(in, out) method:
    virtual void fprop(Tstate &in, Tstate &out);
answer.txt · Last modified: 2012/06/19 17:01 by sermanet