Running the model out of the box generates these files in the data dir :
ls
dev-v2.tgz newstest2013.en
giga-fren.release2.fixed.en newstest2013.en.ids40000
giga-fren.release2.fixed.en.gz newstest2013.fr
giga-fren.release2.fixed.en.ids40000 newstest2013.fr.ids40000
giga-fren.release2.fixed.fr training-giga-fren.tar
giga-fren.release2.fixed.fr.gz vocab40000.from
giga-fren.release2.fixed.fr.ids40000 vocab40000.to
Reading the src of translate.py :
https://github.com/tensorflow/models/blob/master/tutorials/rnn/translate/translate.py
tf.app.flags.DEFINE_string("from_train_data", None, "Training data.")
tf.app.flags.DEFINE_string("to_train_data", None, "Training data.")
To utilize my own training data I created dirs my-from-train-data & to-from-train-data and add my own training data to each of these dirs, training data is contained in the files mydata.from & mydata.to
my-to-train-data contains mydata.from
my-from-train-data contains mydata.to
I could not find documentation as to using own training data or what format it should take so I inferred this from the translate.py src and contents of data dir created when executing translate model out of the box.
Contents of mydata.from :
Is this a question
Contents of mydata.to :
Yes!
I then attempt to train the model using :
python translate.py --from_train_data my-from-train-data --to_train_data my-to-train-data
This returns with an error :
tensorflow.python.framework.errors_impl.NotFoundError: my-from-train-data.ids40000
Appears I need to create file my-from-train-data.ids40000 , what should it's contents be ? Is there an example of how to train this model using custom data ?