Test labels for regression caffe, float not allowed?

Asked 2/8, 2015 at 18:5 Answered 18/5, 2016 at 6:6

Solved neural-network regression deep-learning caffe

I am doing regression using caffe, and my test.txt and train.txt files are like this:

/home/foo/caffe/data/finetune/flickr/3860781056.jpg 2.0  
/home/foo/caffe/data/finetune/flickr/4559004485.jpg 3.6  
/home/foo/caffe/data/finetune/flickr/3208038920.jpg 3.2  
/home/foo/caffe/data/finetune/flickr/6170430622.jpg 4.0  
/home/foo/caffe/data/finetune/flickr/7508671542.jpg 2.7272

My problem is it seems caffe does not allow float labels like 2.0, when I use float labels while reading, for example the 'test.txt' file caffe only recognizes

a total of 1 images

which is wrong.

But when I for example change the 2.0 to 2 in the file and the following lines same, caffe now gives

a total of 2 images

implying that the float labels are responsible for the problem.

Can anyone help me here, to solve this problem, I definitely need to use float labels for regression, so does anyone know about a work around or solution for this? Thanks in advance.

EDIT For anyone facing a similar issue use caffe to train Lenet with CSV data might be of help. Thanks to @Shai.

Nacred answered 2/8, 2015 at 18:5 Comment(1)

what do you mean by "reading"? are you using the convert_imageset utility? – Outroar 4/8, 2015 at 11:11

When using the image dataset input layer (with either lmdb or leveldb backend) caffe only supports one integer label per input image.

If you want to do regression, and use floating point labels, you should try and use the HDF5 data layer. See for example this question.

In python you can use h5py package to create hdf5 files.

import h5py, os
import caffe
import numpy as np

SIZE = 224 # fixed size to all images
with open( 'train.txt', 'r' ) as T :
    lines = T.readlines()
# If you do not have enough memory split data into
# multiple batches and generate multiple separate h5 files
X = np.zeros( (len(lines), 3, SIZE, SIZE), dtype='f4' ) 
y = np.zeros( (len(lines),1), dtype='f4' )
for i,l in enumerate(lines):
    sp = l.split(' ')
    img = caffe.io.load_image( sp[0] )
    img = caffe.io.resize( img, (SIZE, SIZE, 3) ) # resize to fixed size
    # you may apply other input transformations here...
    # Note that the transformation should take img from size-by-size-by-3 and transpose it to 3-by-size-by-size
    # for example
    # transposed_img = img.transpose((2,0,1))[::-1,:,:] # RGB->BGR
    X[i] = transposed_img
    y[i] = float(sp[1])
with h5py.File('train.h5','w') as H:
    H.create_dataset( 'X', data=X ) # note the name X given to the dataset!
    H.create_dataset( 'y', data=y ) # note the name y given to the dataset!
with open('train_h5_list.txt','w') as L:
    L.write( 'train.h5' ) # list all h5 files you are going to use

Once you have all h5 files and the corresponding test files listing them you can add an HDF5 input layer to your train_val.prototxt:

 layer {
   type: "HDF5Data"
   top: "X" # same name as given in create_dataset!
   top: "y"
   hdf5_data_param {
     source: "train_h5_list.txt" # do not give the h5 files directly, but the list.
     batch_size: 32
   }
   include { phase:TRAIN }
 }

Clarification:
When I say "caffe only supports one integer label per input image" I do not mean that the leveldb/lmdb containers are limited, I meant the tools of caffe, specifically the convert_imageset tool.
At closer inspection, it seems like caffe stores data of type Datum in leveldb/lmdb and the "label" property of this type is defined as integer (see caffe.proto) thus when using caffe interface to leveldb/lmdb you are restricted to a single int32 label per image.

Outroar answered 4/8, 2015 at 11:39 Comment(29)

Thanks @Outroar for this elaborate answer. I will try this and report back. Thnks again. – Nacred 5/8, 2015 at 14:17

@Nacred thanks for the upvote. Regarding "accepting" the answer - you better check that it works for you before "accepting"... don't you think? – Outroar 5/8, 2015 at 14:21

Actually... I had put this question on the caffe users forum and their github issue forum also (which I shouldn't have as that is for development only) and I got the same answer there ... so I am pretty sure it would work ...anyway you are right about not accepting it yet. – Nacred 5/8, 2015 at 14:30

@Nacred please link these questions, so people looking at each source can find all the relevant answers quickly and efficiently. – Outroar 5/8, 2015 at 14:32

I have linked that question in the edit, I am sort of new here, so it might not be what you had in mind. Please point out if something else needs to be done. – Nacred 5/8, 2015 at 14:43

Hi @Outroar now I want to know is there any easy way of creating hdf5 files from matlab only? – Nacred 9/8, 2015 at 9:28

@Nacred should be quite easy. Matlab has full support of hdf5. See here for more details. – Outroar 9/8, 2015 at 9:36

@Nacred you should be careful, though: python and caffe store marices in a row-major fashion, while Matlas is column-major. you might need to "transpose" your arrays in Matlab. – Outroar 9/8, 2015 at 9:37

Shouldn't the line: y = np.zeros( (1,len(lines)), dtype='f4' ) be y = np.zeros( (len(lines)), dtype='f4' ) ? – Kumamoto 12/11, 2015 at 15:36

@angela I'm not sure, I think both options are valid. – Outroar 12/11, 2015 at 15:39

I was saying that because you index the y vector later as so: y[i] = float(sp[1]). This gives you an exception because the first dimension is 1, and i may be different from 1. Did you mean y[1, i] = float(sp[1])? – Kumamoto 12/11, 2015 at 15:41

I am asking this because I'm having trouble with the HDF5 data layer, and I'm hoping this is my issue :) – Kumamoto 12/11, 2015 at 15:41

@angela y[1,i] is an error because the first entry is y[0,i] - python index starts with 0 not 1. If you have a specific problem please ask a new question. It is difficult to guess your problem from your comments and help you solve it. – Outroar 12/11, 2015 at 15:46

Yep, I meant y[0,i]. What I was asking is whether you intentionally gave the labels array 2 dimensions (1, len(lines)) as you did here: y = np.zeros( (1,len(lines)), dtype='f4' ), or whether it was a mistake since you later index that same y vector with a single index i – Kumamoto 12/11, 2015 at 15:51

@angela I did it on purpose(I'm used to Matlab...) but I'm not certain this is crucial – Outroar 12/11, 2015 at 16:26

You get an exception if you do it the way you say in your answer, since you cannot index y[i] if i>0. In any case, I posted my actual question here, in case you want to have a look: https://mcmap.net/q/671216/-caffe-hdf5-not-learning – Kumamoto 12/11, 2015 at 17:27

How should I specify the shape of y? – Neils 22/9, 2016 at 12:58

@GuWang the shape of y is number of samples-by-dimension of y – Outroar 22/9, 2016 at 13:2

@Outroar I mean should I provide just the true label like 0 or 1, or the one hot coding of y? – Neils 22/9, 2016 at 13:5

@GuWang if you are doing classification with "SoftmaxWithLoss" layer, than y can be a scalar per sample image. If you use other loss layers than you might need to change the way you feed y to caffe. – Outroar 22/9, 2016 at 13:13

@Shai: in the case of img = caffe.io.resize( img, (SIZE, SIZE, 3) ), where do you add the H and W values? I don't have a square image and I'd like to use this functionality to verify if my own function works properly. – Beatabeaten 12/1, 2017 at 11:3

@Beatabeaten img = caffe.io.resize( img, (H, W, 3) ) see io.py. – Outroar 12/1, 2017 at 11:9

Can we use ImageData layer for regression? as you said we can not use multiple label for lmdb what about ImageData? – Rodney 21/2, 2018 at 17:56

@saeedmasoomi as far as I know ImageData layer also supports only single integer label per input image. Why not using HDFtData layer? – Outroar 21/2, 2018 at 19:15

@Outroar could you share code about how to handle pixel level labels? Let's say in this case your data=X is the image of size h x w x 3 (three channels) and your label (data=Y) is of the size h x w x 1 – Fancied 31/5, 2018 at 7:1

@Fancied it's the same code as in this answer, only y[i] should be 2D (in fact, 3D with channel dim=1) instead of a float scalar. – Outroar 31/5, 2018 at 7:12

@Outroar I see - how does caffe generalize where in the case of classification the label is 1D where in the case of semantic segmentation the label is 3D. How does caffe know how to do the right thing with the labels in both cases? – Fancied 31/5, 2018 at 7:21

@Outroar moreover, lets say in another case, the each label is represented by a color so lets say the label (data=Y) is also h x w x 3. How again does caffe understand this and do the right training? – Fancied 31/5, 2018 at 7:24

@Fancied caffe does not "understand" anything, it just process whatever inputs you feed it. If your hdf5 files contains X as 3D and Y as 3D caffe will process them accordingly. During "forward" caffe loads x and y from file (in "HDF5Data" layer) and reshape the rest of the net according to the shapes of x and y it read. The rest of the processing/"understanding" is up to you and the net you design. – Outroar 31/5, 2018 at 7:47

Shai's answer already covers saving float labels to HDF5 format. In case LMDB is required/preferred, here's a snippet on how to create an LMDB from float data (adapted from this github comment):

import lmdb
import caffe
def scalars_to_lmdb(scalars, path_dst):

    db = lmdb.open(path_dst, map_size=int(1e12))

    with db.begin(write=True) as in_txn:    
        for idx, x in enumerate(scalars):            
            content_field = np.array([x])
            # get shape (1,1,1)
            content_field = np.expand_dims(content_field, axis=0)
            content_field = np.expand_dims(content_field, axis=0)
            content_field = content_field.astype(float)

            dat = caffe.io.array_to_datum(content_field)
            in_txn.put('{:0>10d}'.format(idx) dat.SerializeToString())
    db.close()

Unicuspid answered 21/9, 2015 at 15:12 Comment(2)

I'm afraid using caffe.io.array_to_datum is problematic, as label field in datum is defined as integer – Outroar 2/5, 2016 at 5:1

@Shai, true, the ground truth is saved to the data field of the datum. This requires generating separate lmdb for the input and ground truth respectively. – Unicuspid 2/5, 2016 at 8:36

I ended up transposing, switching the channel order, and using unsigned ints rather than floats to get results. I suggest reading an image back from your HDF5 file to make sure it displays correctly.

First read the image as unsigned ints:

img = np.array(Image.open('images/' + image_name))

Then change the channel order from RGB to BGR:

img = img[:, :, ::-1]

Finally, switch from Height x Width x Channels to Channels x Height x Width:

img = img.transpose((2, 0, 1))

Merely changing the shape will scramble your image and ruin your data!

To read back the image:

with h5py.File(h5_filename, 'r') as hf:
    images_test = hf.get('images')
    targets_test = hf.get('targets')
    for i, img in enumerate(images_test):
        print(targets_test[i])
        from skimage.viewer import ImageViewer
        viewer = ImageViewer(img.reshape(SIZE, SIZE, 3))
        viewer.show()

Here's a script I wrote which deals with two labels (steer and speed) for a self-driving car task: https://gist.github.com/crizCraig/aa46105d34349543582b177ae79f32f0

Prunella answered 2/5, 2016 at 4:48 Comment(0)

Besides @Shai's answer above, I wrote a MultiTaskData layer supporting float typed labels.

Its main idea is to store the labels in float_data field of Datum, and the MultiTaskDataLayer will parse them as labels for any number of tasks according to the value of task_num and label_dimension set in net.prototxt. The related files include: caffe.proto, multitask_data_layer.hpp/cpp, io.hpp/cpp.

You can easily add this layer to your own caffe and use it like this (this is an example for face expression label distribution learning task in which the "exp_label" can be float typed vectors such as [0.1, 0.1, 0.5, 0.2, 0.1] representing face expressions(5 class)'s probability distribution.):

    name: "xxxNet"
    layer {
        name: "xxx"
        type: "MultiTaskData"
        top: "data"
        top: "exp_label"
        data_param { 
            source: "expression_ld_train_leveldb"   
            batch_size: 60 
            task_num: 1
            label_dimension: 8
        }
        transform_param {
            scale: 0.00390625
            crop_size: 60
            mirror: true
        }
        include:{ phase: TRAIN }
    }
    layer { 
        name: "exp_prob" 
        type: "InnerProduct"
        bottom: "data"  
        top: "exp_prob" 
        param {
            lr_mult: 1
            decay_mult: 1
        }
        param {
            lr_mult: 2
            decay_mult: 0
        }
        inner_product_param {
            num_output: 8
            weight_filler {
            type: "xavier"
            }    
        bias_filler {      
            type: "constant"
            }  
        }
    }
    layer {  
        name: "exp_loss"  
        type: "EuclideanLoss"  
        bottom: "exp_prob" 
        bottom: "exp_label"
        top: "exp_loss"
        include:{ phase: TRAIN }
    }

Seato answered 18/5, 2016 at 6:6 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags