My AutoML Image classification model has been training for longer than the requested node hours
Asked Answered
S

0

7

I am new to VertexAI. I am trying out its AutoML feature for a single label image classification problem. I followed all recommended steps to upload my images (Jpegs with resolution 720 x 1024) and had a manual split of data (Total images 1001 for label 1 and 870 for label 2, split approximately to 80,10 and 10% for training, validation and test). I set the requested node hours to 8 and started training. The training has been going on for more than 11 and half hours at which time I cancelled the training. Prior to this I had requested 20 node hours and the training kept going on for more than 1 day and 11 hours for the same dataset. So my questions are:

  1. Why does the training continue for more time than requested through the node hour budget?
  2. Is there a way to check how the model training is progressing? Any logs for training, intermediate values of model performance through the epochs, etc. that can be accessed? Basically, I have no idea if and how the training is progressing.

I have read through the documentation on vertex AI that I could find and other blogs and posts on the internet, but could not find any specific answers. Some posts says their image classification model with Vertex AI AutoML trained within a few hours, with about 1000s of images.

I would much appreciate any help with this. Thanks

Salinometer answered 7/11, 2022 at 18:27 Comment(4)
Were you able to find an answer to this? I have same questions and the official docs are very lackingCoif
Did you let it run to the end? And I assume cancelling the training will lead to costs, but no model?Amidase
The training time will always take longer as there are tasks such as setup, teardown and validation of datasets that are not involved in node hours. But difference between 1 day 11 hours and 20 hours is quite a lot.Stenger
same for me, for only 60 training jpg file, i set 8 node hours as budget, but now it is more than 4 days and it is still training... no official doc regarding this... i have to cancel the job manuellyFredenburg

© 2022 - 2024 — McMap. All rights reserved.