Technically, it's a JSONL file, where each line of the file is a JSON object. So it wouldn't validate as JSON.
From the OpenAI Fine Tuning doc:
import os
import openai
openai.api_key = os.getenv("OPENAI_API_KEY")
openai.File.create(
file=open("mydata.jsonl", "rb"),
purpose='fine-tune'
)
Here's where you are uploading your jsonl training dataset to OpenAI.
The thing that the fine tuning doc doesn't explicitly mention is that each of these return an object that gives you critical info. For instance, openai.File.create() returns a file_id that is critical for the next step.
Here's the python I use in my training script:
if args.train:
results = openai.File.create(
file=open(args.train, "rb"),
purpose='fine-tune'
)
print("upload results: " + str(results) + "\n")
print("file_id: " + results.id)
results = openai.FineTuningJob.create(training_file=results.id, model=config.BASE_MODEL)
print("fine-tuning results: " + str(results) + "\n")
print("\nUse the following command to check the status of your fine-tuning job:")
print(f"python train.py --state {results.id}")
To check on the status of results.id (which in turn will eventually give you the model id):
if args.state:
results = openai.FineTuningJob.retrieve(args.state)
print("fine-tuning state: " + str(results) + "\n")