Convert python sequence with multiple datatypes to tensor
Asked Answered
E

2

8

I'm using TensorFlow r1.7 and python3.6.5. I am also very new to TensorFlow, so I'd like easy to read explanations if possible.

I'm trying to convert my input data into a dataset of tensors with this function tf.data.Dataset.from_tensor_slices(). I pass my tuple with mixed datatypes into this function. However, when running my code I get this error: ValueError: Can't convert Python sequence with mixed types to Tensor.

I want to know why I am receiving this error, and how I can convert my data to a dataset of tensors even with mixed datatypes.

Here's a printout of the top 5 entries in my tuple.

(13501, 2, None, 51, '2232', 'S35', '734.72', 'CLA', '240', 1035, 2060, 1252, 1182, 10, '967.28', '338.50', None, 14, 102, 3830)
(15124, 2, None, 57, '2641', 'S35', '234.80', 'DDA', '240', 743, 1597, 4706, 156, 0, None, None, None, 3, 27, 981)
(40035, 2, None, None, '21', 'K00', '60.06', 'CHK', '520', 76, 1863, 12, None, 1, '85.06', '25.00', None, 1, 5, 245)
(42331, 3, None, 62, '121', 'S50', '1859.01', 'ACT', '420', 952, 1583, 410, 255, 0, None, None, None, 6, 117, 1795)
(201721, 3, None, 42, '2472', 'S35', '1413.84', 'CLA', '350', 868, 1746, 963, 264, 0, None, None, None, 18, 65, 4510)

As you can see, I have a mix of integers, floats, and strings in my input data.

Here is a traceback of the error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/miikey101/Documents/Khalen_Case_Loader/tensorflow/k_means/k_means.py", line 10, in prepare_dataset
    dataset = tf.data.Dataset.from_tensor_slices(dm_data)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 222, in from_tensor_slices
    return TensorSliceDataset(tensors)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1017, in __init__
    for i, t in enumerate(nest.flatten(tensors))
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1017, in <listcomp>
    for i, t in enumerate(nest.flatten(tensors))
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 950, in convert_to_tensor
    as_ref=False)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1040, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 235, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 185, in constant
    t = convert_to_eager_tensor(value, ctx, dtype)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 131, in convert_to_eager_tensor
    return ops.EagerTensor(value, context=handle, device=device, dtype=dtype)
ValueError: Can't convert Python sequence with mixed types to Tensor.
Empiric answered 13/4, 2018 at 20:43 Comment(0)
O
11

In tensorflow you can't have a tensor with more than one data type.

Quoting the documentation:

It is not possible to have a tf.Tensor with more than one data type. It is possible, however, to serialize arbitrary data structures as strings and store those in tf.Tensors.

Hence a workaround could be to create a tensor with data type tf.String and, on the occurrence, cast the field to the desired data type

Outstrip answered 14/4, 2018 at 7:44 Comment(1)
I see, thank you. I converted all my strings to integer representations, then all the integers to floats and I was able to successfully convert the data into tensors.Empiric
D
1

You want a tensor for each of your features (columns). Only if it's a multi-dimensional feature (like an image, a video, list of strings, vector) would you have more dimensions in the tensor and even then they would all have the same datatype.

tf.data.Dataset.from_tensor_slices() will accept your input as a dictionary of lists (key is the name of the feature, value is a list of the values in that feature), or as a list of lists. I can't remember if it eats Pandas dataframes but if it doesn't you can easily convert it to a dictionary df.to_dict().

However, you can't input None values. You will have to find some value for those before converting into a tensor. Classic approaches to that is median value, zero value, most common value, "missing"/"unknown" value for strings or categories, or imputation.

Drinker answered 7/3, 2021 at 12:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.