EBLearn runs faster using some code optimizations provided by some external libraries.
For all optimizations, speedups are increasing as inputs become larger. For convolutions, gains are biggest with 9×9 and 11×11 kernels.
The expected speedups for each optimizations are:
Using the TH tensor backend for SSE optimizations is recommended. It gives up to 100% increase in speed on large inputs and can be useful for both speeding up training and detection.
- To install it, look at instructions given at https://github.com/soumith/TH
SET(THC_FOUND TRUE) SET(THC_INCLUDE_DIR "your-th-installed-dir/include/") SET(THC_LIBRARIES_DIR "your-th-installed-dir/lib/")
If you are using float precision, using Intel IPP is recommended for detection (we recommend that you train in double precision though).
If you install IPP on your system, CMake will attempt to find them automatically. If it fails, you can enable IPP by pointing to the installed directories by editing tools/scripts/FindCustom.cmake and adding the following lines at the top of the file.
SET(IPP_FOUND TRUE) INCLUDE_DIRECTORIES("/opt/intel/ipp/current/em64t/include") LINK_DIRECTORIES("/opt/intel/ipp/current/em64t/lib") MESSAGE(STATUS "Found Intel IPP")
where you replace your directories by the installed dirs.
Multicore optimizations using OpenMP (experimental, probably unstable) OpenMP gives speedups on convolution modules equivalent to the number of cores on your machine, given a large enough input size. However, optimizations have not been added for all modules and are recent and not very well tested.
To use OpenMP optimizations, make sure that you have a compiler that supports OpenMP and add the environment variable “USEOPENMP”
export USEOPENMP=1
UPDATE: Due to some code changes, the CUDA might NOT compile in recent revisions. Please try trunk revision 2522 for working CUDA. If you have further questions, please email liao500km <at> gmail <dot> com (Qianli)
To use GPU optimizations (for now, only convolution, lppooling, power, tanh modules are optimized),
#cuda export LD_LIBRARY_PATH=/home/sc3104/nvidia_cuda_latest/cuda/lib64/:$LD_LIBRARY_PATH export PATH=/home/sc3104/nvidia_cuda_latest/cuda/bin:$PATH
export USECUDA=1
use_gpu=1 #1 is enable, 0 is disable gpu_id=0 #which gpu to use on the system. If this var is not specified, it uses the default GPU
use_gpu=0 #1 is enable, 0 is disable conv051_use_gpu=1 conv051_gpu_id=3 #use the GPU with id:3 on the system
Note: GPU Convolutions are only supported for float precisions and for convolutions which have either a full connection table or a connection table with a fixed fanin, i.e. each output connects to a fixed number of inputs.
Note2: CUDA only support full table or table with fixed fan-in.
Some example scripts for making tables can be: second layer: $maketable -random 32 256 -fanin 4 first layer : $maketable -full 3 32