Creating an end-to-end face detector

Now, let us create a practical face detector with a decent performance. Creating the face detector would involve the following steps.

  1. Downloading the datasets
  2. Creating the datasets using dscompile utility
  3. Training the convnet using train utility
  4. Bootstrapping the convnet to reduce false-positives using detect utility
  5. Doing detection and seeing the results on test images

1. Downloading the datasets

  • Open a terminal (linux or OSX) and change directory to eblearn/demos/face . Enter the command
    sh download_datasets.sh

    . The script downloads various public face datasets and background images (which have no faces) and creates a nice folder structure for dscompile similar to what was shown in tutorial 1.

2. Creating the datasets using dscompile

  • If you aren't familiar with the dscompile utility, please look at Tutorial 1: What is a dataset?
  • We now have our face and background images in data/face and data/bg directories.
  1. First create the compiled face dataset with the dscompile command
    dscompile data/face -dname faceset -precision float -outdir prepared_data \
       -dims 32x32x1 -channels Yp -resize mean -kernelsz 7x7 \
       --maxdata 15000 -forcelabel face
    • Here, the options are,
      • -channels Yp : converts the face images into globally and locally normalized grayscale images.
      • -dims 32x32x1 : resizes the images to 32×32 single-channel images
      • -resize mean : uses mean resizing. Other options are bilinear and gaussian
      • -kernelsz 7×7 : uses a 7×7 kernel for local normalization
      • -maxdata 15000 : limits the number of images to 15000
      • -forcelabel face : since the internal folder names in the face folder are not classnames, we force all images inside the folder to have the class-name “face”
  2. Next, create the background dataset using the two commands
    dscompile data/bg -type patch -precision float -outdir prepared_data/bg \
    -scales 4,5,5.5,6,7 -dims 32x32x1 -channels Yp -resize mean -kernelsz 7x7 \
    -maxperclass 6 -maxdata 15000 -nopadded -forcelabel bg 
    dscompile prepared_data/bg -dname backgroundset -precision float \
     -outdir prepared_data/ -forcelabel bg -nopp
    • In the first command, we are creating a “patch dataset”, which basically extracts random image patches at different scales from the background images. It doesn't create a full dataset, but just creates .mat files of patches. Hence, we compile the dataset using the second command.
      • The options are:
      • scales 4,5,5.5,6,7 : extracts patches of different scale factors. The scale factor is relative to the “dims” variable.
    • In the second command, you are basically compiling the dataset from the individual <idx> matrix files.
  3. Next, merge the background and face datasets together with
    dsmerge prepared_data face+bg backgroundset faceset
  4. Next, split this set into training and validation datasets with 500 samples per class in the validation set using
    dssplit prepared_data face+bg face+bg_val face+bg_train -maxperclass 500

3. Training the convnet using train

  • If you followed tutorial-2, you see how we define a conf file, and describe our architecture and training variables in it. Similarly, we define our conf file, eblearn/demos/face/face.conf
  • There's nothing new in this conf file since tutorial-2 and the file is self-explanatory.
  • Train the system by executing the command
     train face.conf 
  • After this initial training, you can already visualize the results by skipping to section 5. Then iterate between sections 4 and 5 to improve results by reducing false positives.

4. Bootstrapping the convnet to reduce false-positives using detect

  • Often, training the classifier is not enough to get good practical performance. The reason is that the background space is so large, yet, we only selected a few thousand patches of background randomly, and missed lots of unique background that the classifier might confuse as a face.
  • Hence, what we shall do is run the trained network on lots of background images which do not have faces, and whatever the detector thinks are faces, we shall add them to the next round of training (labeled as background). This process is generally called bootstrapping your system.
  • To do bootstrapping,
  1. Add the following lines to the end of face.conf
    # BOOTSTRAPPING VARIABLES ##########################################
    weights=net00030.mat
    classes=${val_classes}
    input_dir=data/bg/bgimgs
    # boostrapping
    bootstrapping = 1          # enable bootstrapping extraction or not
    bootstrapping_save = 1     # save bootstrapped samples directly as a dataset
    display_bootstrapping = 0  # display bootstrapping extraction
    display = 0
    display_sleep = 0
    gt_neg_max = 3		   # maximum number of negatives to be saved per classifier and image
    
    # negative bootstrapping
    bootstrapping_max = 5000   # limit to this number of extracted samples (optional)
    gt_extract_pos = 0         # extract positive samples or not
    gt_extract_neg = 1         # extract negative samples or not
    gt_neg_threshold = .01     # minimum confidence of extracted negative samples
    gt_neg_gt_only = 0         # only extract negatives when positives are present in image
    input_random = 1           # taking random images is better for bootstrapping
    gt_name = face_neg    # name of saved bootstrapping dataset
    bbox_scalings = 1x1        # scaling factors of detected boxes
    output_dir=bootstrap_output
    # END OF BOOTSTRAPPING VARIABLES ########################################
  2. Now run face.conf with the detect utility
    detect face.conf
  3. Then, merge the created bootstrapping dataset with the training set
    rm -rf  prepared_data/bootstrap*.mat
    mv bootstrap_output/detections*/bootstrapping_face_neg_*.mat prepared_data/
    rm -rf bootstrap_output
    dsmerge prepared_data face+bg_train+fp1 face+bg_train bootstrapping_face_neg
  4. Change the variable train_dsname in face.conf from face+bg_train to face+bg_train+fp1
  5. Run training again on the bootstrapped dataset using
    train face.conf
  6. Repeat steps [2,3,4,5] 3 times, merging the bootstrapping generated dataset with the dataset used in the previous iteration, and changing the variable in step-4 appropriately. Example, for the 3rd repetition of these steps, you would merge using the command
    dsmerge prepared_data face+bg_train+fp1+fp2+fp3 \
      face+bg_train+fp1+fp2 bootstrapping_face_neg

    and change the variable train_dsname to face+bg_train+fp1+fp2+fp3

  • NOTE: After you finish your bootstrapping, remove the bootstrapping section of variables from face.conf

5. Doing detection and seeing the results on test images

Now, you have succesfully created a face detector. Using the detect utility, you can detect the faces. To do this,

  • You should have first removed the bootstrapping section added from the previous section.
  • Set weights=net00030.mat which is the set of weights that were inferred by the train utility
  • Set classes=${val_classes} which just gives the classes file, which has information on the number and names of classes used by the classifier. In this case, the classes variable is being set to the validation's classes matrix.
  • Then, modify the variable input_dir to a directory with the images that you want to evaluate on. For example, you can use eblearn/tools/data/face/, which contains nens.gif.
  • Also, modify the variable threshold to be reasonably high, say 0.9 or more. Threshold is the confidence value of the bounding boxes that would be retained, i.e. the detector only shows what it thinks are faces, if it thinks so with a confidence of 0.9 out of 1.0
  • You can set the display_sleep variable to have a reasonable delay, so that you can see the results before the next image is evaluated.
  • For more detailed documentation on the detect utility, look at this page:detect

Good Luck and use our forums for any help: EBLearn Forums

beginner_tutorial3_face_detector.txt · Last modified: 2013/01/09 15:29 by sermanet