I am new to VertexAI. I am trying out its AutoML feature for a single label image classification problem. I followed all recommended steps to upload my images (Jpegs with resolution 720 x 1024) and had a manual split of data (Total images 1001 for label 1 and 870 for label 2, split approximately to 80,10 and 10% for training, validation and test). I set the requested node hours to 8 and started training. The training has been going on for more than 11 and half hours at which time I cancelled the training. Prior to this I had requested 20 node hours and the training kept going on for more than 1 day and 11 hours for the same dataset. So my questions are:
- Why does the training continue for more time than requested through the node hour budget?
- Is there a way to check how the model training is progressing? Any logs for training, intermediate values of model performance through the epochs, etc. that can be accessed? Basically, I have no idea if and how the training is progressing.
I have read through the documentation on vertex AI that I could find and other blogs and posts on the internet, but could not find any specific answers. Some posts says their image classification model with Vertex AI AutoML trained within a few hours, with about 1000s of images.
I would much appreciate any help with this. Thanks