OpenAI GPT-3 API: Fine tune a fine tuned model? [closed]

The OpenAI documentation for the model attribute in the fine-tune API states a bit confusingly:

model

The name of the base model to fine-tune. You can select one of "ada", "babbage", "curie", "davinci", or a fine-tuned model created after 2022-04-21.

My question: is it better to fine-tune a base model or a fine-tuned model?

I created a fine-tune model from ada with file mydata1K.jsonl:

ada + mydata1K.jsonl --> ada:ft-acme-inc-2022-06-25

Now I have a bigger file of samples mydata2K.jsonl that I want to use to improve the fine-tuned model. In this second round of fine-tuning, is it better to fine-tune ada again or to fine-tune my fine-tuned model ada:ft-acme-inc-2022-06-25? I'm assuming this is possible because my fine tuned model is created after 2022-04-21.

ada + mydata2K.jsonl --> better-model

ada:ft-acme-inc-2022-06-25 + mydata2K.jsonl --> even-better-model?

UPDATE

It looks like fine-tuning a fine-tuned model is not supported anymore, as stated in the official OpenAI documentation:

Can I continue fine-tuning a model that has already been fine-tuned?

No, we do not currently support continuing the fine-tuning process once a job has finished. We plan to support this in the near future.

As stated in the official OpenAI documentation:

If you have already fine-tuned a model for your task and now have additional training data that you would like to incorporate, you can continue fine-tuning from the model. This creates a model that has learned from all of the training data without having to re-train from scratch.

To do this, pass in the fine-tuned model name when creating a new fine-tuning job (e.g., -m curie:ft-<org>-<date>). Other training parameters do not have to be changed, however if your new training data is much smaller than your previous training data, you may find it useful to reduce learning_rate_multiplier by a factor of 2 to 4.

Which option to choose?

You're asking about two options:

Option 1: ada + bigger-training-dataset.jsonl
Option 2: ada:ft-acme-inc-2022-06-25 + additional-training-dataset.jsonl

The documentation says nothing about which option is better in terms of which would yield better results.

However...

Choose Option 2

Why?

When training a fine-tuned model, the total tokens used will be billed according to our training rates.

If you choose Option 1, you'll pay for some tokens in your training dataset twice. First when doing fine-tuning with initial training dataset, second when doing fine-tuning with bigger training dataset (i.e., bigger-training-dataset.jsonl = initial-training-dataset.jsonl + additional-training-dataset.jsonl).

It's better to continue fine-tuning from a fine-tuned model because you'll pay only for tokens in your additional training dataset.

Read more about fine-tuning pricing calculation.

Can I continue fine-tuning a model that has already been fine-tuned?

Which option to choose?

Choose Option 2

Recommended topics

Hot tags