tensorflow store training data on GPU memory

Asked 2/6, 2016 at 15:37 Answered 19/10, 2017 at 21:41

Solved neural-network tensorflow theano deep-learning

I am pretty new to tensorflow. I used to use theano for deep learning development. I notice a difference between these two, that is where input data can be stored.

In Theano, it supports shared variable to store input data on GPU memory to reduce the data transfer between CPU and GPU.

In tensorflow, we need to feed data into placeholder, and the data can come from CPU memory or files.

My question is: is it possible to store input data on GPU memory for tensorflow? or does it already do it in some magic way?

Thanks.

Arel answered 2/6, 2016 at 15:37 Comment(10)

Here's a full example of that -- mnist fully_connected_preloaded.py – Orthodontia 2/6, 2016 at 16:53

@YaroslavBulatov Thanks! – Arel 4/6, 2016 at 5:41

@YaroslavBulatov not sure you're aware or not but the code you provided performs one epoch at 28 seconds which is terrible. (Btw it is on GPU). Furthermore, I cannot find even a single good performing tensorflow example on the internet which is very strange compared to other deep learning frameworks such as theano and torch. Is it because tensorflow is really slower than the others ? If not, why nobody from the creators try to solve this problem while all the new tensorflow users complaning about this? – Weekend 3/10, 2016 at 11:29

Soumith Chintala has benchmarks with code which compare tf favorably against caffe/torch, you could start with those models – Orthodontia 3/10, 2016 at 16:30

@Weekend -- here's the link to benchmarks of convnets -- github.com/soumith/convnet-benchmarks . Also, I ported Torch lbfgs.lua example script to TensorFlow, and got it to run faster with full-size batches, here's a comparison -- github.com/yaroslavvb/lbfgs . Matching performance on smaller batches is harder -- because TensorFlow is designed to scale to distributed systems/future hardware chips, there are multiple levels of indirection with some constant overhead which dominates in tiny computations. IE,script that multiplies 2 numbers is 1000's of times slower in TF than numpy – Orthodontia 3/10, 2016 at 17:32

@YaroslavBulatov thank you for all the valuable information you provided. – Weekend 3/10, 2016 at 17:59

@YaroslavBulatov I know this is an old question, but turning on log_device_placement in the first example you link to shows that the queueing operations generated by tf.train.slice_producer reside on the CPU. Queueing slices on the CPU would seem to negate the advantage of storing the data on the GPU since the slices would be transferred to CPU and back. Am I missing something? – Forepeak 19/10, 2017 at 20:21

You are correct, queues don't have GPU support. For better performance on GPU use tf.data instead of queues – Orthodontia 19/10, 2017 at 20:24

@YaroslavBulatov According to my error messages, tf.data.Dataset.from_tensor_slices and some of the Iterator functionality don't currently have GPU kernels either. That's how I ended up here. – Forepeak 19/10, 2017 at 20:28

I see. This seems to be an uncommon case, usually data reading is not a bottleneck so data lives on cpu – Orthodontia 19/10, 2017 at 20:36

If your data fits on the GPU, you can load it into a constant on GPU from e.g. a numpy array:

with tf.device('/gpu:0'):
  tensorflow_dataset = tf.constant(numpy_dataset)

One way to extract minibatches would be to slice that array at each step instead of feeding it using tf.slice:

  batch = tf.slice(tensorflow_dataset, [index, 0], [batch_size, -1])

There are many possible variations around that theme, including using queues to prefetch the data to GPU dynamically.

Thinker answered 2/6, 2016 at 16:1 Comment(1)

Thank you very much! I will look into that. – Arel 4/6, 2016 at 5:40

It is possible, as has been indicated, but make sure that it is actually useful before devoting too much effort to it. At least at present, not every operation has GPU support, and the list of operations without such support includes some common batching and shuffling operations. There may be no advantage to putting your data on GPU if the first stage of processing is to move it to CPU.

Before trying to refactor code to use on-GPU storage, try at least one of the following:

1) Start your session with device placement logging to log which ops are executed on which devices:

config = tf.ConfigProto(log_device_placement=True)
sess = tf.Session(config=config)

2) Try to manually place your graph on GPU by putting its definition in a with tf.device('/gpu:0'): block. This will throw exceptions if ops are not GPU-supported.

Forepeak answered 19/10, 2017 at 21:41 Comment(0)

Recommended topics

Hot tags