Yolo not starting to train

Q

6

I am trying to train Yolo on a custom dataset and everything seems to be working without errors but it just isn't training.

I followed the tutorial on https://github.com/AlexeyAB/darknet twice but I get the same results

./darknet detector train data/obj.data cfg/yolo-obj.cfg yolov4.conv.137

[yolo] params: iou loss: ciou (4), iou_norm: 0.07, cls_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000 
Total BFLOPS 59.563 
avg_outputs = 489778 
Loading weights from yolov4.conv.137...
 seen 64, trained: 0 K-images (0 Kilo-batches_64) 
Done! Loaded 137 layers from weights-file 
Learning Rate: 0.001, Momentum: 0.949, Decay: 0.0005
Resizing, random_coef = 1.40 

 608 x 608 
 Create 64 permanent cpu-threads 

 mosaic=1 - compile Darknet with OpenCV for using mosaic=1

I also tried without the pre-trained weights but this doesn't start the training process either

./darknet detector train data/obj.data cfg/yolo-obj.cfg
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, cls_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000 
Total BFLOPS 59.563 
avg_outputs = 489778 
Learning Rate: 0.001, Momentum: 0.949, Decay: 0.0005
Resizing, random_coef = 1.40 

 608 x 608 
 Create 64 permanent cpu-threads 

 mosaic=1 - compile Darknet with OpenCV for using mosaic=1

What am I missing?

Quadragesimal answered 10/6, 2020 at 5:10 Comment(2)

Did you run run make to compile darknet? – Parrott 11/6, 2020 at 17:35

Open yolo configuration file (.cfg), search for cutmix, you will see line mosaic=1, change 1 to 0 and train again. – Parrott 11/6, 2020 at 18:17

O

7

If you want to use OpenCV you need to re-compile Darknet, but first change the make file to the following:

 OPENCV=1

If you don't need OpenCV then do as @TaQuangTu sugested. When you fix this line just run the build.sh script again and it should work just fine.

I'd also suggest changing the following lines if you intent to train using a GPU

GPU=1
CUDNN=1
CUDNN_HALF=1

Octofoil answered 21/9, 2020 at 18:12 Comment(1)

I confirm, that without OpenCV it won't train anything. I've just followed Rômulo's proposal, enabled OpenCV, and it did work for me. – Wakefield 14/11, 2020 at 19:26

G

4

my friend, l just solved this problem right now. l think i have find the reason here. If your train/test.txt are empty, this is the rreason. you open"creating-train-and-test-txt-files.py" and edit it. Find the keyword is jpeg place. we could find only 2 jpeg words here and you edit them into "jpg" and replace this in your Google Drive. Finally, restart the colaboratory work. And your training will not quit for "608 x 608 Create 64 permanent cpu-threads ".

Best wishes from China.

Gothar answered 30/3, 2021 at 13:55 Comment(0)

A

0

The above error is caused mainly due to empty train.txt and test.txt files. Please check these two files

Arvie answered 26/2, 2021 at 12:6 Comment(0)

P

0

How have you installed OpenCV?

For a simple fix, you can try this sudo apt install libopencv-dev python3-opencv

Also make sure you have cmake,

sudo apt install cmake

This should install opencv 3.2 and cmake 3.10 in your system. Then try running darknet.

Finally, change the Makefile,

OPENCV=1

Porscheporsena answered 26/2, 2021 at 22:2 Comment(0)

F

0

use this to enable use of opencv:

$ git clone https://github.com/AlexeyAB/darknet.git
$ cd darknet
$ sed -i 's/OPENCV=0/OPENCV=1/' Makefile

https://github.com/AlexeyAB/darknet/blob/master/Makefile#L4

Ferula answered 7/2, 2023 at 13:44 Comment(0)

M

0

See your resource utilization when you start training and see if the RAM causes it to exceed.

If it is so then try this solution:

CFG-Parameters in the [net] section:

[net] section

batch - number of samples (images, letters, ...) which will be precossed in one batch

subdivisions- number of mini_batches in one batch, size mini_batch = batch/subdivisions,so GPU processes mini_batch samples at once, and the weights will be updated for batch samples (1 iteration processes batch images)

With reference from this, I tried various combinations for min_batch size like:

batch=64, subdivisions=8

"OR"

batch=64, subdivisions=16

and so on...

I found that my colab is working only for min_batch=2 So I consider the subdivisions half of the batch like:

batch=64
subdivisions=32

"OR"

batch=32
subdivisions=16

Or any other...

And it also raises an error when I use

batch=1, subdivisions=1

Mcnamee answered 4/4, 2023 at 14:15 Comment(0)

Recommended topics

Hot tags