Originally created by: testdummyvt
Here is a more readable and professional version of your update.
Hello everyone! I’ve been experimenting with the YOLOv9 Tiny model and wanted to share my latest progress.
You can find the implementation of the YOLOv9-NMS-Free version here: https://github.com/testdummyvt/libreyolo
Due to limited computational resources, I chose not to train the entire model from scratch. Instead, I used the following approach:
I performed a validation run on 5,000 images using the following environment:
Initializing COCO evaluator...
Warning: Failed to initialize COCO evaluator: Images directory not found: /home/testdummy/datasets/coco/coco/images/val
Falling back to legacy DetMetrics
Warming up model (3 iterations)...
Validating on 5000 images...
Device: cuda
Batch size: 16
Validating: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 313/313 [00:26<00:00, 11.95it/s]
==================================================
Validation Results
==================================================
metrics/precision: 0.6357
metrics/recall: 0.4153
metrics/mAP50: 0.5449
metrics/mAP50-95: 0.4173
--------------------------------------------------
Images processed: 5000
Total time: 26.18s
Speed: 5.2ms/image
==================================================
Results for Yolov9-NMS-Free: {'metrics/precision': 0.6357262341447409, 'metrics/recall': 0.41528902822833136, 'metrics/mAP50': 0.5448938380897804, 'metrics/mAP50-95': 0.4173021161704325, 'speed/preprocess_ms': 0.251129674911499, 'speed/inference_ms': 1.773804712295532, 'speed/postprocess_ms': 2.016630506515503, 'speed/total_ms': 5.236816024780273, 'speed/total_s': 26.184080123901367, 'speed/images_seen': 5000}
To be honest, these results look almost too good to be true 😅. I would appreciate it if someone could help verify whether these metrics are accurate.
I used the following script to generate the validation results:
from libreyolo import LibreYOLO9NMSFree
if __name__ == "__main__":
model_weights = "/home/testdummy/projects/code/libreyolo/runs/train/v9_nms_free_exp_07/weights/best.pt"
model = LibreYOLO9NMSFree(model_weights, size="t")
results = model.val(
data="/home/testdummy/datasets/coco/data.yaml",
batch=16,
imgsz=640,
conf=0.25,
iou=0.6,
split="val",
save_json=True,
verbose=True,
)
print(f"Results for Yolov9-NMS-Free: {results}")
weight file I use: best.pt (huggingface.co)
Originally posted by: testdummyvt
Tried Roboflow model leaderboard benchmark code: code
Yolov9t - NMS - Free
Yolov9t
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.304
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.420
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.328
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.136
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.333
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.427
It got a lower score than what is reported in orginal yolov9 repo:
Originally posted by: testdummyvt
Update:
Ok, once I made COCOEval work, now I get the following results, which make more sense:
Yolov9t-NMS-Free model results:
Yolov9t model results for comparision:
Originally posted by: EHxuban11
Hello, super interesting work!!!!!
I'm going to try it out. I can also train a model with your code with the A100 or rtx5080. What are your gpu ressources?
Btw, it looks like https://huggingface.co/testdummyvt/yolov9_nmsfree_exp/blob/v1_0_exps/weights/best.pt is a private repo. I was thinking about training your model further, unfrozen, to push it further.
Originally posted by: testdummyvt
@EHxuban11 I made the repo public now.
I have an RTX 3090, and it runs at 70% of the original clock speed (I got it like that), so it takes a lot of time to train each epoch.
Would be nice to see how it will perform with a larger GPU.
Originally posted by: EHxuban11
Hello @testdummyvt, I ran the training in the rtx5080 with your exact same parameters (30 epochs, etc) except for Batch size 48 instead of 32. I got :
0.352 mAP50-95 for yolo9t nms free
0.372 mAP50-95 when running yolo9t with nms
Very promising, especially because we can run it for longer, unfrozen, etc.
I'm now proceeding with a smaller unfrozen run, without augmenetations. Let's see if the 2 mAP50-95 gap closes. I'll report back!
Originally posted by: EHxuban11
Hello @testdummyvt comming back with not so great results from the unfrozen run. As you can see in the image, the val peaked at the similar level as the frozen run I did yesterday at 0.3526, but from there it went down. The model has been overfitting. I think that we can just use your freezing approach and train 4 sizes, t,s,m,c in preparation for putting them in HF.
Regarding fine-tuning, do you know if this can be easily fine-tuned? I can run some tests with RF100 datasets. I think that now that performance has been demonstrated in terms of map, the main thing missing is to demostrate it can be fine tuned easily by people.