Dataset Files

Datasets used for training are single files containing all data (single-files datasets are easier and safer to handle than scattered files). A dataset named 'mydata' is typically composed of the following files:

  1. mydata_data.mat: the input samples. This file is a single matrix of size NxLxCxHxW, with N the number of samples, L the number of “layers” of each sample, C the number of channels of each layer, H the height and W the width.
  2. mydata_labels.mat: the label values corresponding to each sample. This file is a matrix of size Nx1.
  3. mydata_classes.mat: the strings for each possible values, a MxP matrix, where M is the number of classes, and P the maximum length of all strings.


Dataset Tools

Depending on the data, some of these files may not be produced. These files are produced using the following tools:

  1. dscompile: given a directory of images, where the top folders are the class names, extracts and preprocess them into dataset files. Preprocessing data once at dataset creation, alleviates the need to repeat preprocessing for each sample during training, hence speeding up training.
  2. dssplit: given an existing dataset, split it into 2 datasets, where the first is limited in the number of samples, and the second one receives remaining samples.
  3. dsmerge: given 2 datasets, merge them into 1.
  4. dsdisplay: display each sample of a dataset and its corresponding label.


Extraction Script

Several calls to the dataset tools might be necessary to create training, validation and testing sets with positive and negative examples for example. As an example, these calls are grouped in the following Shell script for face detection data extraction: dsprepare.py.

Bounding boxes

When using dscompile in 'regular' mode (default), the object is assumed to fill up the entire image. One can specify bounding boxes around objects of interest if these are located in a sub-region of the image. For this, the 'pascal' mode (-type pascal) reads XML annotations files associated with each image. The XML follows the format of the PASCAL Object Recognition challenge.

Here is an XML example of image 'image0.png' located in folder '/data/images/' of size 480×640, cropped from top-left corner at 10×10 to bottom-right corner at 400×400, with 2 objects 'person' and 'car' (each have a visible region, as opposed to the 'bndbox' region which contains the entire object, potentially including occluded regions):

<annotations>
<folder>/data/images/</folder>
<filename>image0.png</filename>
<size>
	<width>640</width>
	<height>480</height>
	<depth>3</depth>
</size>
<crop>
	<xmin>10</xmin>
	<ymin>10</ymin>
	<xmax>400</xmax>
	<ymax>400</ymax>
</crop>
<object>
	<name>person</name>
	<bndbox>
		<xmin>467.593577</xmin>
		<ymin>156.004541</ymin>
		<xmax>485.583166</xmax>
		<ymax>188.118130</ymax>
	</bndbox>
	<visible>
		<xmin>467.593577</xmin>
		<ymin>156.004541</ymin>
		<xmax>485.583166</xmax>
		<ymax>188.118130</ymax>
	</visible>
</object>
<object>
	<name>car</name>
	<bndbox>
		<xmin>566.166985</xmin>
		<ymin>157.585411</ymin>
		<xmax>608.988303</xmax>
		<ymax>208.727668</ymax>
	</bndbox>
	<visible>
		<xmin>566.166985</xmin>
		<ymin>157.585411</ymin>
		<xmax>608.988303</xmax>
		<ymax>208.727668</ymax>
	</visible>
</object>
</annotations>
dataset_extraction.txt · Last modified: 2011/12/25 09:50 (external edit)