How to prepare my own data for tensorflow?
Asked Answered
O

1

8

I install Tensorflow on ubuntu 14.04. I completed MNIST For ML Beginners tutorial. I understood it.

Nor, I try to use my own data. I have train datas as T[1000][10]. Labels are L[2], 1 or 0.

How can I access my data mnist.train.images ?

Ornelas answered 24/4, 2016 at 10:28 Comment(4)
Did you check out input_data.py? I think you will get some ideas from the file.Teammate
I check it. github.com/tensorflow/tensorflow/blob/r0.8/tensorflow/examples/… But I don't undertand how to install and parse the datas.Ornelas
The script automatically download and import dataset. I want to do it myself.Ornelas
I put an answer for you. Let me know if it makes sense to you.Teammate
T
1

In input_data.py, these two functions do the main job.

1. Download

def maybe_download(filename, work_directory):
    """Download the data from Yann's website, unless it's already here."""
    if not os.path.exists(work_directory):
        os.mkdir(work_directory)
    filepath = os.path.join(work_directory, filename)
    if not os.path.exists(filepath):
        filepath, _ = urlretrieve(SOURCE_URL + filename, filepath)
        statinfo = os.stat(filepath)
        print('Succesfully downloaded', filename, statinfo.st_size, 'bytes.')
    return filepath

2 Image to nparray

def extract_images(filename):
    """Extract the images into a 4D uint8 numpy array [index, y, x, depth]."""
    print('Extracting', filename)
    with gzip.open(filename) as bytestream:
        magic = _read32(bytestream)
        if magic != 2051:
            raise ValueError(
                'Invalid magic number %d in MNIST image file: %s' %
                (magic, filename))
        num_images = _read32(bytestream)
        rows = _read32(bytestream)
        cols = _read32(bytestream)
        buf = bytestream.read(rows * cols * num_images)
        data = numpy.frombuffer(buf, dtype=numpy.uint8)
        data = data.reshape(num_images, rows, cols, 1)
        return data

Based on your dataset and location, you can call:

local_file = maybe_download(TRAIN_IMAGES, train_dir)
train_images = extract_images(local_file)

See the full source code at https://github.com/nlintz/TensorFlow-Tutorials/blob/master/input_data.py.

Teammate answered 24/4, 2016 at 22:56 Comment(3)
Thanks your answer. My input data has two files. One of them includes words(mnist uses images sequence), the other includes labels(0 or 1). I can't enumare my input.Ornelas
What do you mean, "I can't enumare my input."? See def extract_labels(filename, one_hot=False) in the example file.Teammate
My word file has non-english character. I can't extract a matrix such as "data" in extract_images function.Ornelas

© 2022 - 2024 — McMap. All rights reserved.