optimizations

- Optimizations
  - Expected speedups
  - TH
  - IPP
  - OpenMP
  - GPU (CUDA)

EBLearn runs faster using some code optimizations provided by some external libraries.

TH Tensor library: SSE Optimizations
Intel IPP: float optimizations
OpenMP: multi-core optimizations
GPU (CUDA): CUDA Optimizations for convolutions

Expected speedups

For all optimizations, speedups are increasing as inputs become larger. For convolutions, gains are biggest with 9×9 and 11×11 kernels.

The expected speedups for each optimizations are:

TH: 30% to 100% speedup for both training and detection.
IPP: (in float only) up to 100% speedup for detection.
OpenMP: 0 times to n times speedup where n is the number of cores available.
GPU: 0 times to 120 times speedup

TH

Using the TH tensor backend for SSE optimizations is recommended. It gives up to 100% increase in speed on large inputs and can be useful for both speeding up training and detection.

- To install it, look at instructions given at https://github.com/soumith/TH

Open eblearn/tools/scripts/FindCustom.cmake

At the BEGINNING of the file, add the following lines

SET(THC_FOUND TRUE)
SET(THC_INCLUDE_DIR "your-th-installed-dir/include/")
SET(THC_LIBRARIES_DIR "your-th-installed-dir/lib/")

IPP

If you are using float precision, using Intel IPP is recommended for detection (we recommend that you train in double precision though).

If you install IPP on your system, CMake will attempt to find them automatically. If it fails, you can enable IPP by pointing to the installed directories by editing tools/scripts/FindCustom.cmake and adding the following lines at the top of the file.

SET(IPP_FOUND TRUE) 
INCLUDE_DIRECTORIES("/opt/intel/ipp/current/em64t/include") 
LINK_DIRECTORIES("/opt/intel/ipp/current/em64t/lib") 
MESSAGE(STATUS "Found Intel IPP")

where you replace your directories by the installed dirs.

OpenMP

Multicore optimizations using OpenMP (experimental, probably unstable) OpenMP gives speedups on convolution modules equivalent to the number of cores on your machine, given a large enough input size. However, optimizations have not been added for all modules and are recent and not very well tested.

To use OpenMP optimizations, make sure that you have a compiler that supports OpenMP and add the environment variable “USEOPENMP”

export USEOPENMP=1

GPU (CUDA)

UPDATE: Due to some code changes, the CUDA might NOT compile in recent revisions. Please try trunk revision 2522 for working CUDA. If you have further questions, please email liao500km <at> gmail <dot> com (Qianli)

To use GPU optimizations (for now, only convolution, lppooling, power, tanh modules are optimized),

Install NVIDIA CUDA Toolkit from the NVIDIA page and install it. Make sure the command “nvcc” is in the environment path. In Linux, if you are installing it to a non-standard directory, you might have to add lines similar to this in the .bashrc file and reload the terminal

#cuda                                                                                                                                                                                                                                          
export LD_LIBRARY_PATH=/home/sc3104/nvidia_cuda_latest/cuda/lib64/:$LD_LIBRARY_PATH
export PATH=/home/sc3104/nvidia_cuda_latest/cuda/bin:$PATH

set the environment variable USECUDA=1
```
export USECUDA=1
```
Make sure CMake finds the appropriate cuda libraries.
To use a particular conv module with GPU, look at the convolution module's constructor, there is an argument that takes in a boolean called use_gpu (true means runs on gpu, false means runs on cpu).

To enable/disable gpu from the conf files, you can set the following variables globally

use_gpu=1 #1 is enable, 0 is disable
gpu_id=0 #which gpu to use on the system. If this var is not specified, it uses the default GPU

You can also additionally configure each convolution module separately by setting its corresponding property. Eg. if use_gpu=1 is specified globally, but you want to run a particular conv module conv051 on cpu, you can set conv051_use_gpu=0 to disable gpu for that module. Similarly, you can specify the gpu device to use just for that particular convolution module.
```
use_gpu=0 #1 is enable, 0 is disable
conv051_use_gpu=1
conv051_gpu_id=3 #use the GPU with id:3 on the system
```

Note: GPU Convolutions are only supported for float precisions and for convolutions which have either a full connection table or a connection table with a fixed fanin, i.e. each output connects to a fixed number of inputs.
Note2: CUDA only support full table or table with fixed fan-in. Some example scripts for making tables can be: second layer: $maketable -random 32 256 -fanin 4 first layer : $maketable -full 3 32

optimizations.txt · Last modified: 2013/01/16 12:10 by qianli

Code

Programming

Other

Optimizations

Expected speedups

TH

IPP

OpenMP

GPU (CUDA)