Post

Military Aircraft Detection

Military aircraft detection in images and videos using YOLOv8.

Military Aircraft Detection in Images and Videos using YOLOv8

Introduction

In this project, I used a YOLOv8 model to detect and classify military aircraft in images and videos. The goal is to demonstrate the power of Object Detection in the field of Computer Vision, particularly for the recognition of complex and moving objects, especially objects with similar characteristics like military aircraft.

The notebook associated with this project is available at this Kaggle link.

YOLOv8

YOLOv8

YOLOv8 is an Object Detection model based on the YOLO (You Only Look Once) neural network. It is the eighth version of this model, which has been improved to be faster and more accurate than its predecessors. Developed by Ultralytics, YOLOv8 offers significant improvements in terms of accuracy and speed compared to its previous versions.

Main Features of YOLOv8:

  • Real-time Detection: Capable of processing live videos to detect objects instantly.
  • High Accuracy: Uses advanced deep learning techniques to provide precise predictions.
  • Efficiency: Designed to be used even on machines with limited resources, such as laptops without a powerful GPU.

Advantages of YOLOv8:

  • Speed: Optimized for speed, which is crucial for applications like aerial surveillance.
  • Versatility: Can be applied to various types of objects and contexts, including aircraft detection in images and videos.
  • Ease of Use: Integrated with popular development tools and well-documented, making it easy to implement.

By using YOLOv8, this project aims to demonstrate how modern computer vision technologies can be effectively applied for detecting specific objects in images and videos, contributing to fields such as security and aerial surveillance.

Dataset

We have a rich dataset containing 14,500 images, each containing one or more aircraft. For each image, an associated CSV file provides detailed annotations of the aircraft present, including the coordinates of their positions (xmin, ymin, xmax, ymax) and their classification.

Example of a CSV File
filenamewidthheightclassxminyminxmaxymax
000aa01b25574f28b654718db0700f7220481365F358521771998503
000aa01b25574f28b654718db0700f7220481365JAS39169769549893
000aa01b25574f28b654718db0700f7220481365JAS391259084401009
000aa01b25574f28b654718db0700f7220481365B5227790112881177

This data provides a solid foundation for training and testing our YOLOv8 model, enabling it to accurately recognize various types of military aircraft in different contexts.

Sample Images from the Dataset

Here are some sample images from the dataset used to train and test the YOLOv8 model, along with their corresponding annotations:

C5

This image shows a C5 Galaxy aircraft, a heavy military transport aircraft used by the US Air Force. Military aircraft can have various shapes and sizes, making their detection and classification challenging for Object Detection models.

Mirage2000

This image shows a Mirage 2000 aircraft, a fighter jet designed by the French company Dassault Aviation in the late 1970s. The Mirage 2000 is primarily used by the French Air Force, which received 315 units, while 286 others were exported to eight different countries.

The dataset contains all types of images—some clearer than others, images with one or multiple aircraft, and others with aircraft from different classes. This allows us to test the model’s ability to detect and classify aircraft in varied contexts.

Model Training

To train a YOLOv8 model, it is essential to have a labeled dataset that will serve as the basis for the model’s learning. In this project, we used a dataset containing 14,500 images of military aircraft, each annotated with the coordinates of the aircraft present and their classification.

Train Validation Test Split

Before starting the training, we split our dataset into three distinct parts: a training set (70%), a validation set (15%), and a test set (15%). This division allows us to check the model’s performance on unseen data and ensure it generalizes well to images it has not seen before.

graph TD
    B1 -->|70%| B[Ensemble d'entraînement]
    C1 -->|15%| C[Ensemble de validation]
    D1 -->|15%| D[Ensemble de test]

    subgraph Dataset
        direction LR
        B1[████████████████████████████████████████████████████████████████████████████████████████████████████████]
        C1[██████████████████]
        D1[██████████████████]
    end

Training Steps

1. Model Configuration

A YAML configuration file is created to specify:

  • The paths to the training, validation, and test sets.
  • The number of classes to detect.
  • The class names.
Example Configuration:
1
2
3
4
5
train: ../data/train.txt
val: ../data/val.txt
test: ../data/test.txt
nc: 3
names: ['F35', 'JAS39', 'B52']

2. Model Architecture

YOLOv8 uses a convolutional neural network (CNN) architecture with several layers:

  • Convolutional layers: To extract features from the images.
  • Activation layers: To introduce non-linearity.
  • Pooling layers: To reduce dimensionality.
  • Prediction layers: To generate predictions for bounding boxes and classes.

Each layer is designed to capture specific information from the images and combine it to produce accurate predictions.

3. Training Process

During training, the model goes through the following steps:

  • Forward propagation: The image passes through the model’s layers, producing predictions.
  • Loss calculation: The difference between the predictions and the actual annotations is calculated. YOLO’s loss function combines classification errors, localization errors, and missing objects.
  • Backward propagation: The loss gradients are calculated and used to update the model’s weights through the gradient descent algorithm.
  • Validation: After each epoch, the model is evaluated on the validation set to adjust hyperparameters and prevent overfitting.

Model Validation

After training, the model is evaluated on the test set to measure its performance. The following metrics are used to assess the quality of the predictions:

  • Box(P): Precision of the bounding box detections.
  • R: Recall, measuring the model’s ability to find all relevant instances.
  • mAP50: Mean Average Precision at 50% IoU (Intersection over Union), measuring the average precision at a 50% IoU threshold.
  • mAP50-95: Mean Average Precision at different IoU thresholds, from 50% to 95%.

These metrics evaluate the model’s ability to detect and classify military aircraft with precision and recall, while minimizing false positives and false negatives.

Here is an example code to evaluate the model on the test set:
1
2
3
4
5
6
!yolo val \
model='/kaggle/input/yolov7-military-plane/yolov8-m-best.pt' \
data='/kaggle/working/ultralytics/data/mad.yaml'\
augment \
batch=12 \
imgsz=1280

This code is used to evaluate the model on the validation set, using the trained model and the parameters specified in the YAML configuration file.

Here are the results of the model evaluation on the validation set:

Validation Results (Scroll to see all classes)
ClassImagesInstancesBox(P)RmAP50m
all202434080.9440.860.9380.89
A1048820.9610.9270.9530.902
A400M46670.9380.8660.9510.883
AG60034350.9941.00.9950.981
AV8B37650.9630.9850.9860.97
B149670.9290.910.9510.912
B252680.950.8440.960.855
B5250640.9790.9220.9670.929
Be20032350.9811.00.9950.928
C130931800.870.9280.9450.885
C21071460.9660.9660.9910.973
C1759880.90.820.9270.844
C550500.9440.880.9470.925
E247670.9380.9050.9360.898
E716181.00.9760.9950.974
EF200056840.9290.7260.8850.832
F11735461.00.8220.9040.852
F1438680.9220.8240.8840.843
F151021960.90.9340.9570.914
F161352230.8890.780.8730.793
F18952070.9450.8650.9480.869
F2256900.9050.8420.9180.893
F351131470.9130.8620.9370.871
F458860.9440.8490.9160.854
JAS3951810.9350.8520.9430.885
MQ934360.8980.7320.8990.852
Mig3139650.9470.8250.9380.897
Mirage200030750.950.8530.870.841
P334630.9710.5290.7990.756
RQ452650.9180.8640.9480.869
Rafale59980.8870.8570.9310.89
SR7125420.8960.7860.9550.887
Su3448620.9370.8710.9530.907
Su5741720.9710.9290.9790.945
Tu16040540.9590.9260.970.921
Tu9526360.9650.7580.910.877
Tornado43630.9610.7740.9240.882
U238440.9760.9340.9870.956
US284900.9810.9440.9810.942
V22681000.9880.8470.9470.853
XB7020200.980.90.9380.88
YF2313180.9680.8330.9530.941
Vulcan46690.9610.7540.9110.851
J2047760.9060.7760.8960.856

Overall Performance:

The metrics show that the model has good precision (0.944) and recall (0.86).

  • The mAP50 and mAP50-95 values indicate high performance for most aircraft classes, with scores close to or above 0.9, demonstrating the model’s effectiveness in detecting aircraft in images.

  • However, it’s important to note that some classes have lower scores, which could be due to:

    • Variations in the training data.
    • Specific characteristics of certain aircraft.

Opportunities for Improvement:

These results suggest that we can further improve the model by:

  • Adjusting the hyperparameters.
  • Collecting more data for underrepresented classes.

Validation Results Visualization

To better understand the model’s performance, we can visualize the results of the predictions on the validation set.

Confusion Matrix

Confusion Matrix

The confusion matrix shows the model’s performance in classifying aircraft classes. The diagonal elements represent the true positives, while the off-diagonal elements represent the false positives and false negatives. The matrix provides insights into the model’s ability to distinguish between different aircraft classes and identify common errors.

Precision-Recall Curve

Precision-Recall Curve

The precision-recall curve illustrates the trade-off between precision and recall for different classification thresholds. A higher area under the curve (AUC) indicates better performance, with a balance between precision and recall. The curve helps evaluate the model’s ability to detect aircraft while minimizing false positives.

Testing the Model on Images

Test on Test Dataset

After training and validating the model, we can test it on real images to evaluate its performance under real-world conditions. Here is an example of code to test the model on some images from the test dataset

Click to see the code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
from ultralytics import YOLO
import cv2
import matplotlib.pyplot as plt
import glob
import random
import pandas as pd

model = YOLO('/kaggle/input/yolov7-military-plane/yolov8-m-best.pt')

test_image_paths = sorted(glob.glob('/kaggle/working/ultralytics/data/test/images/*.jpg'))
test_annotation_paths = sorted(glob.glob('/kaggle/working/ultralytics/data/test/labels/*.txt'))

sampled_indices = random.sample(range(len(test_image_paths)), 5)
sampled_image_paths = [test_image_paths[i] for i in sampled_indices]
sampled_annotation_paths = [test_annotation_paths[i] for i in sampled_indices]

results = model(sampled_image_paths)

for img_path, ann_path, result in zip(sampled_image_paths, sampled_annotation_paths, results):
    image = cv2.imread(img_path)[:, :, ::-1]
    plt.figure(figsize=(10, 10))
    plt.imshow(image)
    ax = plt.gca()

    with open(ann_path, 'r') as f:
        annotations = f.readlines()
    
    print(f"Real Annotations for {img_path}:")
    for annotation in annotations:
        class_num, x_center, y_center, b_width, b_height = map(float, annotation.split())
        class_name = model.names[int(class_num)]
        print(f"Class: {class_name}")

    for box in result.boxes:
        x1, y1, x2, y2 = box.xyxy[0].cpu().numpy() 
        rect = plt.Rectangle((x1, y1), x2 - x1, y2 - y1, fill=False, color='red')
        ax.add_patch(rect)
        plt.text(x1, y1, model.names[int(box.cls[0])], color='white', fontsize=12, bbox=dict(facecolor='red', alpha=0.5))

    plt.axis('off')
    plt.show()

The code loads the trained YOLOv8 model, selects a few random images from the test dataset, makes predictions on these images, and displays the results. The ground truth annotations are also printed for comparison with the model’s predictions.

Test on Real Images

In addition to the test dataset images, we can also test the model on out-of-sample images to assess its ability to generalize to new data. I selected an image of a Rafale from Google Images to test the model.

Here is the Rafale image used for the test and the model’s prediction:

Rafale

We can see that the model correctly detected and classified the two Rafale aircraft in the image, with accurate bounding boxes and correct class predictions.

Testing the Model on Videos

In addition to images, the YOLOv8 model can also be used to detect aircraft in videos. I tested the model on a promotional video of the Rafale on YouTube. Here is the link to the video: Rafale Video

After downloading the video, I extracted frames at regular intervals and used the YOLOv8 model to detect the aircraft in each frame. Here is an example of code to detect aircraft in a video:

Click to see the code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import cv2
import numpy as np
from ultralytics import YOLO

F22_video_path = "/kaggle/input/videos/F-22 Raptor with Wall of Fire - Dayton Air Show 72323.mp4"

def process_video(video_path, output_path):
    cap = cv2.VideoCapture(video_path)
    
    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = int(cap.get(cv2.CAP_PROP_FPS))
    new_width = 640
    new_height = int(new_width * height / width)

    fourcc = cv2.VideoWriter_fourcc(*'VP90') 
    out = cv2.VideoWriter(output_path, fourcc, fps, (new_width, new_height))

    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break
        
        resized_frame = cv2.resize(frame, (new_width, new_height))

        results = model(resized_frame)

        annotated_frame = results[0].plot()

        out.write(annotated_frame)

    cap.release()
    out.release()

output_path = '/kaggle/working/annotated_F22_video.mp4'

process_video(F22_video_path, output_path)

Here is an example of the video annotated with YOLOv8 detections:

The video shows YOLOv8 detections on the Rafale presentation video, with bounding boxes and class predictions for each detected aircraft. The model is capable of detecting moving aircraft in the video, demonstrating its ability to process video sequences in real-time.

However, we can observe that the model sometimes struggles to detect aircraft when they are partially hidden or seen from behind. For example, in the video, the model has difficulty detecting Rafale aircraft when they are viewed from behind, and it tends to confuse them with other aircraft, particularly the EF2000 and the Tornado.

This highlights the importance of training data quality and the diversity of examples to improve the model’s performance under various conditions.

We can try the model on another video where the aircraft visibility is lower to see if the model can still detect them.

This is an example of a video annotated with YOLOv8 detections on a F22 presentation video. The model performed relatively well in detecting the aircraft in the video, but it encountered difficulties in correctly classifying the aircraft. It seems that the model confused the F22 with other aircraft, notably the F35. This may be due to similarities in the visual characteristics of the aircraft or variations in the training data.

By looking at the results of the model on the videos, we can see that the model is capable of detecting aircraft in videos, but it could be better by hadding a temporal component to the model to improve the tracking of the aircraft, so the model can better understand the context of the video and track the aircraft more accurately, by looking at the previous probabilities of the classes.

Conclusion

In this project, I used a YOLOv8 model to detect and classify military aircraft in images and videos. The model was trained on a dataset containing 14,500 images of military aircraft, with detailed annotations for each image.

After training and validation, the model demonstrated strong performance in terms of precision and recall, with high scores for most aircraft classes. The tests on real images and videos also showed that the model is capable of detecting aircraft under various conditions, although it may sometimes face challenges with partially hidden aircraft or when they are captured from unusual angles.

This post is licensed under CC BY 4.0 by the author.