How to read json files in Tensorflow?
Asked Answered
H

2

10

I'm trying to write a function, that reads json files in tensorflow. The json files have the following structure:

{
    "bounding_box": {
        "y": 98.5, 
        "x": 94.0, 
        "height": 197, 
        "width": 188
     }, 
    "rotation": {
        "yaw": -27.97019577026367,
        "roll": 2.206029415130615, 
        "pitch": 0.0}, 
        "confidence": 3.053506851196289, 
        "landmarks": {
            "1": {
                "y": 180.87722778320312, 
                "x": 124.47326660156205}, 
            "0": {
                "y": 178.60653686523438, 
                "x": 183.41931152343795}, 
            "2": {
                "y": 224.5936889648438, 
                "x": 141.62365722656205
}}}

I only need the bounding box information. There are a few examples on how to write read_and_decode-functions, and I'm trying to transform these examples into a function for json files, but there are still a lot of questions...:

def read_and_decode(filename_queue):

  reader = tf.WhichKindOfReader() # ??? 
  _, serialized_example = reader.read(filename_queue)
  features = tf.parse_single_example( 
      serialized_example,

      features={

          'bounding_box':{ 

              'y': tf.VarLenFeature(<whatstheproperdatatype>) ???
              'x': 
              'height': 
              'width': 

          # I only need the bounding box... - do I need to write 
          # the format information for the other features...???

          }
      })

  y=tf.decode() # decoding necessary?
  x=
  height=
  width= 

  return x,y,height,width

I've done research on the internet for hours, but can't find anything really detailled on how to read json in tensorflow...

Maybe someone can give me a clue...

Haiphong answered 14/7, 2016 at 18:42 Comment(0)
A
6

Update

The solution below does get the job done but it is not very efficient, see comments for details.

Original answer

You can use standard python json parsing with TensorFlow if you wrap the functions with tf.py_func:

import json
import numpy as np
import tensorflow as tf

def get_bbox(str):
    obj = json.loads(str.decode('utf-8'))
    bbox = obj['bounding_box']
    return np.array([bbox['x'], bbox['y'], bbox['height'], bbox['width']], dtype='f')

def get_multiple_bboxes(str):
    return [[get_bbox(x) for x in str]]

raw = tf.placeholder(tf.string, [None])
[parsed] = tf.py_func(get_multiple_bboxes, [raw], [tf.float32])

Note that tf.py_func returns a list of tensors rather than just a single tensor, which is why we need to wrap parsed in a list [parsed]. If not, parsed would get the shape [1, None, 4] rather than the desired shape [None, 4] (where None is the batch size).

Using your data you get the following results:

json_string = """{
    "bounding_box": {
        "y": 98.5,
        "x": 94.0,
        "height": 197,
        "width": 188
     },
    "rotation": {
        "yaw": -27.97019577026367,
        "roll": 2.206029415130615,
        "pitch": 0.0},
        "confidence": 3.053506851196289,
        "landmarks": {
            "1": {
                "y": 180.87722778320312,
                "x": 124.47326660156205},
            "0": {
                "y": 178.60653686523438,
                "x": 183.41931152343795},
            "2": {
                "y": 224.5936889648438,
                "x": 141.62365722656205
}}}"""
my_data = np.array([json_string, json_string, json_string])

init_op = tf.initialize_all_variables()
with tf.Session() as sess:
    sess.run(init_op)
    print(sess.run(parsed, feed_dict={raw: my_data}))
    print(sess.run(tf.shape(parsed), feed_dict={raw: my_data}))
[[  94.    98.5  197.   188. ]
 [  94.    98.5  197.   188. ]
 [  94.    98.5  197.   188. ]]
[3 4]
Assimilative answered 21/9, 2016 at 8:4 Comment(9)
While this works, you have to worry about the GIL though.Santiago
@Santiago I wasn't aware that such a thing existed. Would you mind elaborating a bit? What is the worst case scenario of my solution?Assimilative
To make a long story short, each of those lines in python code will block any other line of python code to be executed while they are running. This means your py_func cannot run concurrently with any other python code, unless you avoid GIL.Santiago
@Santiago Would it be better to pre-process all the json objects into TFRecords before the training, store on disk, and read in batches during the training? tf.TFRecordReader is written in C++ and therefore avoid GIL, right? Thanks for letting me know!Assimilative
Yeah that will work out great actually. Alternatively you can prepare the labels as, and put them into a tf.constant() array or variable, and fetch from there.Santiago
@Santiago : Can you explain the latter part with tf.constant() as a solution for loading json data.Coaction
@Assimilative : Any way I can read json files from within tensorflow?Coaction
@Coaction I'm not sure I understand what you mean... Reading files vs just parsing json stored in string variables?Assimilative
@Backlin: I meant reading files, as currently I have made a two input model , first input is an image and the second is array read from a json or flatbuffer file. So, I wanted to make it a single input model with json/flatbuffer read from within the tensorflow model.( I'm okay with a protocol buffer too)Coaction
R
1

This might be skirting the issue, but you could preprocess your data with a command line tool like https://stedolan.github.io/jq/tutorial/ into a line-based data format, like csv. Would possibly be more efficient also.

Rexford answered 14/7, 2016 at 21:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.