EBLearn: detect

detect

by Pierre Sermanet (January 10th, 2011)

detect finds objects in images at multiple scales and outputs corresponding bounding boxes, given a trained model and variables definitions found a configuration file.

Calls

./detect pedestrians.conf
When variable 'input_dir' is not defined and the inputs are image files (camera = directory), one can pass the image path as second argument, e.g.:

./detect pedestrians.conf /images/pedestrians/
When the input is a video file (camera = video), one must pass the file path as second argument, e.g.:

./detect pedestrians.conf /videos/pedestrians.flv

Outputs

All outputs are saved in a directory named 'out_[timestamp]' created in the calling directory.

When variable 'save_detections' equals 1, each detected region in the original image is saved in a single image file in subdirectory 'detections/originals', and the exact input to the model (usually preprocessed and scaled) is saved as a matrix file in subdirectory 'detections/preprocessed'.
When variable 'save_video' equals 1, original images with bounding boxes overlaid are saved in subdirectory 'video'.
When variable 'bbox_saving' is different than 0, bounding boxes coordinates are saved in file 'bbox.txt'. Different values for 'bbox_saving' correspond to different saving formats:
- 1: save in all formats (in different files)
- 2: save in eblearn format
- 3: save in caltech format

Required configuration variables

weights: weights file of the trained model.
classes: names of each output class in a matrix file.
camera: the type of input. Possible inputs are:
- directory: inputs are images from a directory, which path can be defined by variable 'input_dir' or as second argument in the command line.
- video: take a video file as input, which path must be given as second argument in the command line.
- v4l2: use v4l2 camera interface (linux only). User must also defined variable 'device' to select the v4l2 device, e.g. 'device = /dev/video0'
- kinect: use Kinect's camera.
- opencv: use OpenCV's camera interface as video input.
- shmem: use shared memory as video feed.

Optional configuration variables

input_dir: the directory containing images, when using images as inputs (i.e. when 'camera' equals 'directory').
meta_name: ${name}_${machine} # name of this meta job
meta_max_jobs: 12 # maximum number of jobs to run at the same time
meta_output_dir: ${root}/../out/ # directory where to write outputs
meta_gnuplot_params: "set grid ytics;set ytics;set mytics;set grid mytics;set logscale y; set mxtics; set grid xtics; " # extra gnuplot parameters # analyze processes output or not. if 0, the meta_trainer will have no notion # of iteration and will only send 1 report at the very end. # if 1, meta_trainer will try to find iteration and other variable values # in each process' output.
meta_analyze: 1
meta_send_email: 1 # emailing results or not
meta_email: ${myemail} # email to use (use environment variable "myemail") # iterations at which to send an email
meta_email_iters: 0,1,2,3,4,5,7,10,15,20,30,50,75,100,200
meta_email_period: 1 # send email with this freq (if email_iters not defined)
meta_watch_interval: 30 # interval sec to analyze outputs and check who's alive # variables to minimize, process and iteration with lowest value will # be used to report best weights, or start consequent training
meta_minimize: test_errors,errors,test_energy,energy,1FPPI,.01FPPI
meta_ignore_iter0: 1 # do not take results for i: 0 into account
meta_sticky_vars: job,config,classes # vars to keep around at each iterations
meta_watch_vars: #job,1FPPI,.01FPPI # restrict variable watching to those
meta_nbest: 5 # number of best answers to show/send
meta_send_best: 0 # if 1 send best answers files minimizing meta_minimize's value
meta_send_logs: 0 # send logs of all jobs or not
meta_no_conf_id: 0 # do not use conf ids for naming ################################################################################ # local program configuration
name: inria
machine: ${HOSTNAME}a
ebl: ${HOME}/eblpierre/ # eblearn root
root: ${HOME}/${machine}data/ped/${name}/ds/ # datasets root
#root2: ${ebl}/demos/pedestrians/trained/ # trained weights root # network high level switches ##################################################
net_type: cscsc #cscscf
net: sr1c #mr1c mr2c # net archictecture
manual_load: 1 # manually load weights for individual modules
color: 1 # architecture #################################################################
arch: ${arch_${net}_${net_type}} # global architecture
mirror: 0 # mirror instead of zero-padding (default)
nonlin: tanh # stdsig # non-linearity module # machines
arch_mr1_cscsc: ${c0},${s1},${c2},${s3},branch5,merge21,${f5}
arch_mr2_cscsc: ${c0},${s1},branch2,${c2},${s3},branch5,merge22,${f5}
arch_sr1c_cscsc: branch1,${c0},${s1},merge1,${c2},${s3},${c5}
arch_mr1c_cscsc: branch1,${c0},${s1},merge1,${c2},${s3},branch5,merge21,${f5}
#arch_mr2c_cscsc: branch1,${c0},${s1},merge1,branch2,${c2},${s3},branch5,merge22,${f5} #arch_mr2c_cscscf: branch1,${c0},${s1},merge1,branch2,${c2},${s3},branch5,merge22,${f6},${f7}
arch_mr2c_cscsc: branch1,${c0},${s1},merge1,branch2,${c2},${s3},merge22,${f5}
arch_mr2c_cscscf: branch1,${c0},${s1},merge1,branch2,${c2},${s3},merge22,${f6},${f7}
branch1: resize00,${c00} # color branch
branch2: ${s20} # 1st multi-scale branch (biggest)
#branch2: ${s20},branch3 # 1st multi-scale branch (biggest)
branch3: ${s21},${branch4} # 1st multi-scale branch (medium)
branch4: ${s22} # 1st multi-scale branch (smallest)
branch5: ${s50},branch6 # 2nd multi-scale branch (medium)
branch6: ${s51} # 2nd multi-scale branch (smallest) # main branch layers
c0: conv0,addc0,${nonlin},diag0,abs0,wstd0
s1: subs1,addc1,${nonlin}
c2: conv2,addc2,${nonlin},diag2,abs2,wstd2
s3: subs3,addc3,${nonlin}
c5: conv5,addc5,${nonlin}
f5: linear5,addc5,${nonlin}
f6: linear6,addc6,${nonlin}
f7: linear7,addc7,${nonlin} # color branch layers
c00: conv00,addc00,${nonlin}00,diag00,abs00,wstd00 # multi scale branches layers
s20: subs00,addc,${nonlin}
s21: ${s20}
s22: ${s20}
s50: ${s20}
s51: ${s20} # branches parameters
branch1_type: narrow # feed UV color into branch
branch1_narrow_dim: 0
branch1_narrow_size: 2 # U and V
branch1_narrow_offset: 1
branch2_type: copy # feed all features to multi-scale branch
branch3_type: copy # keep 1st layer scale 1 data
branch4_type: copy # keep 1st layer scale 2 data
branch5_type: copy # keep 2nd layer scale 2 data
branch6_type: copy # keep 2nd layer scale 3 data
merge1_type: concat
merge1_branches: branch1
merge1_concat_dim: 0
merge21_type: flat
merge21_branches: branch5,branch6
merge22_type: flat
#merge22_branches: branch2,branch3,branch4,branch5,branch6
merge22_branches: branch2
merge22_inh: 16
merge22_inw: 8
merge22_branches_inh: 20
merge22_branches_inw: 12
merge22_strideh: 1
merge22_stridew: ${merge22_strideh}
merge22_branches_strideh: 1
merge22_branches_stridew: ${merge22_branches_strideh} # merge22_inh: 16 # merge22_inw: 8 # merge22_branches_inh: 20,10,5,8,4 # merge22_branches_inw: 12,6,3,4,2 # merge22_strideh: 4 # merge22_stridew: ${merge22_strideh} # merge22_branches_strideh: 4,2,1,2,1 # merge22_branches_stridew: ${merge22_branches_strideh}
conv00_kerh: 5 # convolution kernel's height
conv00_kerw: 5 # convolution kernel's width
conv00_strideh: 1 # convolution stride in height
conv00_stridew: 1 # convolution stride in width
conv00_table: ${table00} # convolution table (optional)
conv00_table_in: 2 # conv input max, used if table file not defined
conv00_table_out: 6 # features max, used if table file not defined
conv00_weights: ${wroot0c}${sp0}_layer1_convolution_kernel.mat
addc00_weights: ${wroot0c}${sp0}_layer2_bias_bias.mat
diag00_weights: ${wroot0c}${sp0}_layer4_diag_coeff.mat
wstd00_kerh: ${conv00_kerh} # normalization kernel's height
wstd00_kerw: ${conv00_kerw} # normalization kernel's width
subs00_kerh: 2 # subsampling kernel's height
subs00_kerw: 2 # subsampling kernel's width
subs00_strideh: ${subs00_kerh} # subsampling stride in height
subs00_stridew: ${subs00_kerw} # subsampling stride in width
resize00_hratio: 0.349206349 # 44 / 126
resize00_wratio: 0.358974359 # 28 / 78 # main branch parameters
inputh: 126 # input's height
inputw: 78 # input's width
conv0_kerh: 7 # convolution kernel's height
conv0_kerw: 7 # convolution kernel's width
conv0_strideh: 1 # convolution stride in height
conv0_stridew: 1 # convolution stride in width
conv0_table: ${table0_color0} # convolution table (optional)
conv0_table_in: 1 # conv input max, used if table file not defined
conv0_table_out: ${table0_max} # features max, used if table file not defined
conv0_weights: ${wroot0}${sp0}_layer1_convolution_kernel.mat
addc0_weights: ${wroot0}${sp0}_layer2_bias_bias.mat
diag0_weights: ${wroot0}${sp0}_layer4_diag_coeff.mat
wstd0_kerh: 7 # normalization kernel's height
wstd0_kerw: 7 # normalization kernel's width
subs1_kerh: 3 # subsampling kernel's height
subs1_kerw: 3 # subsampling kernel's width
subs1_strideh: ${subs1_kerh} # subsampling stride in height
subs1_stridew: ${subs1_kerw} # subsampling stride in width
addc1_weights: # weights to be loaded if manual_load: 1
conv2_kerh: 9 # convolution kernel's height
conv2_kerw: 9 # convolution kernel's width
conv2_strideh: 1 # convolution stride in height
conv2_stridew: 1 # convolution stride in width
conv2_table: ${table1} # convolution table (optional)
conv2_table_in: thickness # use current thickness as max table input
conv2_table_out: ${table1_max} # features max, used if table file not defined
conv2_weights: ${wroot1c}${sp0}${sp1}_layer1_convolution_kernel.mat
addc2_weights: ${wroot1c}${sp0}${sp1}_layer2_bias_bias.mat
diag2_weights: ${wroot1c}${sp0}${sp1}_layer4_diag_coeff.mat
wstd2_kerh: ${conv2_kerh} # normalization kernel's height
wstd2_kerw: ${conv2_kerw} # normalization kernel's width
subs3_kerh: 2 # subsampling kernel's height
subs3_kerw: 2 # subsampling kernel's width
subs3_strideh: ${subs3_kerh} # subsampling stride in height
subs3_stridew: ${subs3_kerw} # subsampling stride in width
addc3_weights: # weights to be loaded if manual_load: 1
linear5_in: ${linear5_in_${net}} #thickness # linear module input features size
linear5_out: noutputs # use number of classes as max table output
linear6_in: ${linear6_in_${net}} #thickness # linear module input features size
linear6_out: 10 # use number of classes as max table output
linear7_in: ${linear6_out} #thickness # linear module input features size
linear7_out: noutputs # use number of classes as max table output
conv5_kerh: 16 # convolution kernel's height
conv5_kerw: 8 # convolution kernel's width
conv5_strideh: 1 # convolution stride in height
conv5_stridew: 1 # convolution stride in width
conv5_table_in: thickness # use current thickness as max table input
conv5_table_out: noutputs # features max, used if table file not defined
linear5_in_mr1: 10752
linear5_in_mr2: 20832
linear5_in_mr1c: 11424
linear5_in_mr2c: 23394
linear5_in_mr2c: 17824
linear6_in_mr2c: 17824 #8704 # manual loading ##############################################################
sp0: 12 #0703 2007 # 0301 1205 #03 07 12 # sparsity layer0
sp1: 12 #03 07 12 # sparsity layer1 #wroot0: /home/pierre/eblpierre/tools/demos/pedestrians/inria/koray/pedmachines/machine5732 #wroot0: ${HOME}/koray/color/machine57 #wroot0: /data/koray/pedmachines/machine5732 #wroot1: /data/koray/pedmachines2/machine95732
wroot0: /data/koray/pedmachines_Y/machine50732
wroot0c: /data/koray/pedmachines_UV/machine50706
wroot1c: /data/koray/pedmachines_YUV/machine # tables #######################################################################
tblroot: ${ebl}/tools/data/tables/ # location of table files # conv00
table00: ${tblroot}/table_2_6_connect_6_fanin_0_density_0.5_uv0_u3_v6.mat # conv0
table0_max: 32 #64 # full table output max (overridden if table file defined)
table0_color0: # no color, use table0_max for full table
table0_color1: ${tblroot}/table_3_32_connect_32_fanin_0_density_0.33_yuv0_y26_u29_v32.mat # conv1
tbl: 3
table1_max: 64 # 64 96 128 256
tbl1_mr1: ${tblroot}/table_32_64_connect_1664_fanin_26_density_0.81_random.mat
tbl1_mr2: ${tblroot}/table_32_64_connect_1664_fanin_26_density_0.81_random.mat
tbl1_2: ${tblroot}/table_32_96_connect_2496_fanin_26_density_0.81_random.mat
tbl1_sr1c: ${tblroot}/table_38_68_connect_2040_fanin_30_density_0.79_random.mat
tbl1_mr1c: ${tblroot}/table_38_68_connect_2040_fanin_30_density_0.79_random.mat
tbl1_mr2c: ${tblroot}/table_38_68_connect_2040_fanin_30_density_0.79_random.mat #tbl_mrc: ${tbl1_3} #tbl_mr: ${tbl1_1}
table1: ${tbl1_${net}} # preprocessing ################################################################
preprocessing: 1 # 0: none 1: contrast normalization (optional)
resize: mean # bilinear
normalization_size: 7 # 9 # training #####################################################################
ds: 1 # dataset id
dsname: ${name}_${resize}${inputh}x${inputw}_ker${normalization_size}_bg
train: ${dsname}_train_${ds} # training set
val: ${dsname}_val_${ds} # validation set #root: /data/pedestrians/small/ #train: small #val: small
reg: 0 .000001 #reg: 0 .0001 #reg: .000001 #eta: .000001 .00001 # learning rate
eta: .000005 .00001 .00002 # learning rate #eta: .0000001 .0000005 .000001 .0000025 .000005 .0000075 #.00001 #.00002 # learning rate #eta: .0000005 .000005 .00001 # learning rate
reg_l1: ${reg} #.0001 # L1 regularization
reg_l2: ${reg} #.0001 # L2 regularization
iterations: 20 # number of training iterations
ndiaghessian: 400 #800 1200 # number of sample for 2nd derivatives estimation
epoch_mode: 0 # 0: fixed number 1: show all at least once
epoch_size: 4000 # number of training samples per epoch. comment to ignore.
epoch_show_modulo: 100 # print message every n training samples
sample_probabilities: 0 # use probabilities to pick samples
hardest_focus: 1 # 0: focus on easiest samples 1: focus on hardest ones
ignore_correct: 1 # If 1, do not train on correctly classified samples
min_sample_weight: 0 #.1 .5 1 # minimum probability of each sample
per_class_norm: 1 # normalize probabiliy by class (1) or globally (0)
shuffle_passes: 1 # shuffle samples between passes
balanced_training: 1 # show each class the same amount of samples or not
no_training_test: 1 # do not test on training set if 1
target_factor: 1 # multiply targets -1 and 1 by:
save_pickings: 1 # save sample picking statistics
binary_target: 0 # use only 1 output, -1 for negative examples, +1 for positive # training display #############################################################
show_train: 1 # enable/disable all training display
show_train_ninternals: 0 # number of internal examples to display
show_train_errors: 0 # show worst errors on training set
show_val_errors: 1 # show worst errors on validation set
show_val_correct: 1 # show worst corrects on validation set
show_hsample: 5 # number of samples to show on height axis
show_wsample: 18 # number of samples to show on height axis # retraining ###################################################################
retrain: 0
retrain_weights: # ${root1}/${job_name_retraining}_net040.mat # detection ####################################################################
net_min_height: ${inputh}
net_min_width: ${inputw}
nthreads: 1 # number of detection threads
ipp_cores: 1 # number of cores used by IPP
weights: ${root2}/${weights_file}
classes: ${root2}/${job_name}_classes.mat
threshold: .1 # confidence detection threshold
gain: 1
input_height: -1 # use -1 to use original size
input_width: -1 # use -1 to use original size
input_min: 0 # minimum height or width for minimum scale
input_max: 1200 # maximum height or width for maximum scale # multi-scaling type. 0: manually set each scale sizes, 1: manually set each # scale step, 2: number of scales between min and max, 3: step factor between # min and max, 4: 1 scale, the original image size.
scaling_type: 3
scaling: 1.1 # scaling ratio between scales
min_scale: .75 # min scale as factor of minimal network size
max_scale: 1.3 # max scale as factor of original resolution
input_random: 1 # randomize input list (only works for 'directory' camera).
input_npasses: 1 # passes on the input list (only works for 'directory' cam).
hzpad: .3 # vertical zero padding on each side as ratio of network's min input
wzpad: .3 # horizontal zero padding on each side as ratio of network's min in #mem_optimization: 1
bbox_saving: 2 # 0: none 1: all styles 2: eblearn style 3: caltech style
max_object_hratio: 0 #13.5 # image's height / object's height, 0 to ignore
smoothing: 0 # smooth network outputs
background_name: bg # name of background class (optional)
non-maximum suppression (nms)
nms: 1 # 0: no pruning 1: ignore overlapping bb 2: pedestrian custom
bbox_file: #bbox.txt # ignore processing, feed pre-computed bboxes to nms
bbhfactor: .7 # height factor to apply to bounding boxes
bbwfactor: .5 # width factor to apply to bounding boxes
confidence_type: 2 # 0: sqrdist 1: single output 2: max other (recommended)
max_bb_overlap: .5 # minimum ratio with smallest bbox to declare overlap
min_hcenter_dist: .4 # centers closer than this ratio over height cancel out
min_wcenter_dist: .2 # centers closer than this ratio over width cancel out
detection display
skip_frames: 0 # skip this number of frames before each processed frame
save_detections: 0 # output saving and display
save_max: 25000 # Exit when this number of objects have been saved
save_max_per_frame: 10 # Only save the first n objects per frame
save_video: 0 # save each classified frame and make a video out of it
save_video_fps: 5
use_original_fps : 0
display: 1
display_zoom: 1 # zooming
display_min: -1.7 # minimum data range to display (optional)
display_max: 1.7 # maximum data range to display (optional)
display_in_min: 0 # input image min display range (optional)
display_in_max: 255 # input image max display range (optional)
display_bb_transparency: .5 # bbox transp factor (modulated by confidence)
display_threads: 0 # each thread displays on its own
display_states: 0 # display internal states of 1 resolution
show_parts: 0 # show parts composing an object or not
silent: 0 # minimize outputs to be printed
sync_outputs: 1 # synchronize output between threads
minimal_display: 1 # only show classified input
display_sleep: 0 # sleep in milliseconds after displaying
ninternals: 1 # demo display variables
queue1: 0
qstep1: 1
qheight1: 5
qwidth1: 2
queue2: 0
qstep2: 50
qheight2: 5
qwidth2: 5
precamera: 0 # pre-camera (used before regular camera)
precamdir: ${root2}/
camera: directory # camera options: opencv shmem video directory # specify a custom image search pattern (optional)
file_pattern: #".*[.](png|jpg|jpeg|PNG|JPG|JPEG|bmp|BMP|ppm|PPM|pnm|PNM|pgm|PGM|gif|GIF)" # limit of input video duration in seconds, 0 means no limit
input_video_max_duration: 0 # step between input frames in seconds, 0 means no step
input_video_sstep: 0
evaluation
set: Test
evaluate: 1
evaluate_cmd: "${visiongrader} ${visiongrader_params}"
visiongrader: ${HOME}/visiongrader/src/main.py
visiongrader_params: "${input_params} ${groundtruth_params} ${compare_params} ${curve_params} ${ignore} "
input_params: "--input bbox.txt --input_parser eblearn --sampling 50 "
annotations: ${root}/../INRIAPerson/${set}/annotations/
groundtruth_params: "--groundtruth ${annotations} --groundtruth_parser inria --gt_whratio .43 "
compare_params: "--comparator overlap50percent --comparator_param .5 "
curve_params: "--det --saving-file curve.pickle --show-no-curve "
input_dir: /home/sermanet/${machine}data/ped/inria/INRIAPerson/${set}/pos
ignore: "--ignore ${HOME}/visiongrader/datasets/pedestrians/inria/ignore/${set}/ "

Project

Code

Tutorials

Demos

Documentation

detect

by Pierre Sermanet (January 10th, 2011)

Calls

Outputs

Required configuration variables

Optional configuration variables

non-maximum suppression (nms)

detection display

evaluation