detect finds objects in images at multiple scales and outputs corresponding bounding boxes given a trained model defined by a configuration file. Example uses of the detect tool can be found in the face detection demo or the MNIST demo.
To use detect, call:
detect pedestrians.conf
When variable 'input_dir' is not defined and the inputs are image files (camera = directory), one can pass the image path as second argument, e.g.:
detect pedestrians.conf /images/pedestrians/
When the input is a video file (camera = video), one must pass the file path as second argument, e.g.:
detect pedestrians.conf /videos/pedestrians.flv
The following variables configure the way detect works. Some variables are required, but most of them are optional.
Detect works by running a fixed-size network on different scales of a same image. For that, our architecture needs to contain a resize_module which also knows how to pre-process the image. During the training phase, your training data was most likely pre-processed already. We need to tell the resize module to use the same pre-processing that was used for training data.
Note that the resize module does not have to be the first module of the architecture as long as it is present. One can have other modules beforehand to apply some transformation of the image for example before scaling.
Here is an example of a resize module using a YnUV normalization with kernel size 7×7:
arch = resizepp0,${my_architecture} resizepp0_pp = rgb_to_ynuv0 rgb_to_ynuv0_kernel = 7x7 rgb_to_ynuv0_global_norm = 0
These variables are required in your configuration file by detect.
weights = my_weights.mat # weights file of the trained model. classes = classes.mat # names of each output class in a matrix file. camera = directory # the type of input: directory, video, v4l2, kinect, opencv, shmem input_dir = /data/ # image directory if camera = directory threshold = .1 # threshold of [0,1] confidence
The following optional variables allow tuning of detect. If not specified, detect will run with their default value.
The 'camera' variable allows to select the source of images as follow:
file_pattern = ".*[.](png|jpg|jpeg|PNG|JPG|JPEG|bmp|BMP|ppm|PPM|pnm|PNM|pgm|PGM|gif|GIF)" input_random = 1 # randomize input list (only works for 'directory' camera).
# limit of input video duration in seconds, 0 means no limit input_video_max_duration = 0 # step between input frames in seconds, 0 means no step input_video_sstep = 0
One can force the images grabbed by any camera to be a certain size as follow (optional):
input_height = 240 # use -1 to use original size input_width = 320 # use -1 to use original size
nthreads = 1 # number of detection threads input_gain = 1 # factor applied to input image input_npasses = 1 # passes on the input list (only works for 'directory' cam). bbox_saving = 2 # 0 = none 1 = all styles 2 = eblearn style 3 = caltech style max_object_hratio = 0 # image's height / object's height, 0 to ignore smoothing = 0 # smooth network outputs outputs_threshold = -1 # thresholds raw outputs before smoothing or other processing outputs_threshold_val = -1 # replacement value when below outputs_threshold dump_outputs = 0 # if 1, save detect output matrices in dump/ background_name = bg # name of background class (optional) #mem_optimization = 1 # temporarly disabled ipp_cores = 1 # number of cores used by IPP
# multi-scaling type: # 0 = manually set each scale sizes # also set: scales = 100x100,150x150 for all images # or scales = 100x100,150x150;110x110,160x160 for each image # 1 = manually set each scale step # 2 = number of scales between min and max # 3 = step factor between min and max # 4 = 1 scale, the original image size. scaling_type = 3 # type of scaling scaling = 1.1 # scaling ratio between scales min_scale = .75 # min scale as factor of minimal network size max_scale = 1.3 # max scale as factor of original resolution # limiting multi-scale to minimum and maximum height or width sizes input_min = 0 # minimum height or width for minimum scale input_max = 1200 # maximum height or width for maximum scale # padding (to detect partially cut objects at image boundaries) hzpad = .3 # height's 0-padding each side as ratio of network's min height wzpad = .3 # width's 0-padding each side as ratio of network's min width
nms = 1 # 0: none, 1: regular, 2: voting, 3: voting + regular pre_threshold = 0 # bbox threshold before nms post_threshold = .5 # bbox threshold after nms pre_hfact = 1 # bbox height factor before nms pre_wfact = 1 # bbox width factor before nms post_hfact = 1 # bbox height factor after nms post_wfact = 1 # bbox width factor after nms woverh = 1 # set width to be h * woverh if different than 1.0 max_overlap = .5 # regular nms, bboxes match when overlap less than this max_hcenter_dist = .3 # regular nms, bboxes match when hcenter dist below this max_wcenter_dist = .3 # regular nms, bboxes match when wcenter dist below this vote_max_overlap = .5 # voting nms, bboxes match when overlap less than this vote_max_hcenter_dist = .5 # voting nms, match when hcenter dist below this vote_max_wcenter_dist = .3 # voting nms, match when wcenter dist below this
The following variables are related to the display configuration of detection.
skip_frames = 0 # skip this number of frames before each processed frame save_detections = 0 # output saving and display detections_dir = detections # override name of directory with detection results save_bbox_period = 100 # period at which bbox file is saved (reduce disk I/O) save_max = 25000 # Exit when this number of objects have been saved save_max_per_frame = 10 # Only save the first n objects per frame save_video = 0 # save each classified frame and make a video out of it save_video_fps = 5 use_original_fps = 0 display = 1 # enable/disable display altogether display_zoom = 1 # zooming display_min = -1.7 # minimum data range to display (optional) display_max = 1.7 # maximum data range to display (optional) display_in_min = 0 # input image min display range (optional) display_in_max = 255 # input image max display range (optional) display_bb_transparency = .5 # bbox transp factor (modulated by confidence) display_threads = 0 # each thread displays on its own display_states = 0 # display internal states of 1 resolution show_parts = 0 # show parts composing an object or not show_extracted = 0 # show positive input patches silent = 0 # minimize outputs to be printed sync_outputs = 1 # synchronize output between threads minimal_display = 1 # only show classified input display_sleep = 0 # sleep in milliseconds after displaying ninternals = 1 # demo display variables bbox_show_conf = 1 # display confidence in each bbox or not bbox_show_class = 1 # display class name in each bbox or not no_gui_quit = 1 # if 1, wait for user to escape gui to exit program
The following variables show how to call an independent evaluation program which can print a detection score.
set = Test evaluate = 1 evaluate_cmd = "${visiongrader} ${visiongrader_params}" visiongrader = ${HOME}/visiongrader/src/main.py visiongrader_params = "${input_params} ${groundtruth_params} ${compare_params} ${curve_params} ${ignore} " input_params = "--input bbox.txt --input_parser eblearn --sampling 50 " annotations = ${root}/../INRIAPerson/${set}/annotations/ groundtruth_params = "--groundtruth ${annotations} --groundtruth_parser inria --gt_whratio .43 " compare_params = "--comparator overlap50percent --comparator_param .5 " curve_params = "--det --saving-file curve.pickle --show-no-curve " input_dir = /home/sermanet/${machine}data/ped/inria/INRIAPerson/${set}/pos ignore = "--ignore ${HOME}/visiongrader/datasets/pedestrians/inria/ignore/${set}/ "
To ensure that bounding boxes are correctly computed, set the “bbox_decision” variable and visually check that all found bounding boxes are aligned onto each image corner (or outside if padding is used) . Also set the “nms” to 0 so that all boxes are kept:
# changes the type of decision accepting positive bounding boxes # 0: accept bboxes with confidence higher than threshold # 1: accept bboxes at each corners of each scale # 2: accept bboxes at the bottom right corner of each scale bbox_decision = 2 nms = 0