Caffe can take many types of inputs, depending upon the input layer that we use.
Some of the input methods that are available are:
- Data
- MemoryData
- HDF5Data
- ImageData etc.
In the model file, the very first layer that you find will be Layer type: Data
, which used lmdb or leveldb
as input method. The conversion of a set of images to these databases are pretty easy as Caffe already provides the tools to convert the images.
The Layer type: MemoryData
reads data directly from memory, which will be extremely helpful while using camera inputs to be passed as Caffe input during Test phase. Using this layer for training is highly not recommended.
The Layer type: ImageData
takes a text file as input. The text file contains all the image names along with their complete path and the class number. Caffe uses OpenCV to read the images in this layer. It also takes care of all the transformations to the image. Thus instead of using OpenCV to read the image and then pass to MemoryData layer, use of ImageData is recommended.
The format of the .txt from which ImageData layer reads the image must be:
/path/to/the/image/imageName.jpg classNumber
Use of LMDB or LevelDB is highly recommended because, ImageData needn't work well if the image path or name contains spaces, or when any of the images are corrupt.
Details of various layers can be found out here.
Memory is allocated in GPU depending upon the model and batch size. If memory overflow occurs, you could try reducing the batch size. Caffe easily handled training the Imagenet database of 1.2million images. Thus with an optimal batch size, the algorithm should work without any issues.