ini
This commit is contained in:
commit
98103806ef
112
GUIDE_TRAIN_TEST_YOLO.md
Normal file
112
GUIDE_TRAIN_TEST_YOLO.md
Normal file
@ -0,0 +1,112 @@
|
||||
# YOLOv8 Training & Testing Guide
|
||||
|
||||
This guide details how to prepare your dataset, train the YOLOv8 model for Potholes and Road Signs, and test the trained model.
|
||||
|
||||
## 1. Dataset Preparation
|
||||
|
||||
YOLOv8 requires data in a specific format.
|
||||
|
||||
### A. Data Structure
|
||||
|
||||
Organize your dataset folder like this:
|
||||
|
||||
```
|
||||
datasets/
|
||||
road_signs_potholes/
|
||||
train/
|
||||
images/
|
||||
img1.jpg
|
||||
...
|
||||
labels/
|
||||
img1.txt
|
||||
...
|
||||
val/
|
||||
images/
|
||||
...
|
||||
labels/
|
||||
...
|
||||
data.yaml
|
||||
```
|
||||
|
||||
### B. Label Format
|
||||
|
||||
Each `.txt` file in `labels/` corresponds to an image.
|
||||
Format: `class_id center_x center_y width height` (normalized 0-1).
|
||||
|
||||
Example `img1.txt`:
|
||||
|
||||
```
|
||||
0 0.5 0.5 0.2 0.3
|
||||
1 0.1 0.1 0.05 0.1
|
||||
```
|
||||
|
||||
**Class IDs Mapping (Example):**
|
||||
|
||||
- `0`: Traffic Sign
|
||||
- `1`: Pothole
|
||||
- `2`: Manhole
|
||||
|
||||
### C. Creating `data.yaml`
|
||||
|
||||
Create a `data.yaml` file inside your dataset folder (or anywhere accessible).
|
||||
|
||||
```yaml
|
||||
path: ../datasets/road_signs_potholes # dataset root dir
|
||||
train: train/images # train images (relative to 'path')
|
||||
val: val/images # val images (relative to 'path')
|
||||
|
||||
# Classes
|
||||
names:
|
||||
0: Traffic Sign
|
||||
1: Pothole
|
||||
2: Manhole
|
||||
```
|
||||
|
||||
## 2. Training the Model
|
||||
|
||||
We have provided a script `backend/models/train_yolo.py`.
|
||||
|
||||
**Command:**
|
||||
Open your terminal in `d:\Time-Pass-Projects\pothole-roadsign detection`.
|
||||
|
||||
```bash
|
||||
# Activate your environment if needed
|
||||
python backend/models/train_yolo.py
|
||||
```
|
||||
|
||||
_Note: You will need to edit `backend/models/train_yolo.py` slightly to point to your actual `data.yaml` path if you haven't already, or pass it as an argument if you modify the script to accept args._
|
||||
|
||||
**Training Process:**
|
||||
|
||||
1. The script downloads `yolov8n.pt` (nano) as a starting point.
|
||||
2. It runs for 50 epochs (adjustable).
|
||||
3. **Result:** Weights are saved in `runs/detect/train/weights/best.pt`.
|
||||
|
||||
## 3. Testing the Model
|
||||
|
||||
Once trained, you should test it visually.
|
||||
|
||||
### A. Locate your specific model
|
||||
|
||||
Find `runs/detect/trainX/weights/best.pt`.
|
||||
Copy this file to `backend/models/best.pt` (or update paths in scripts).
|
||||
|
||||
### B. Run the Test Script
|
||||
|
||||
We have created `backend/test_model.py` for quick verification.
|
||||
|
||||
```bash
|
||||
python backend/test_model.py
|
||||
```
|
||||
|
||||
_Make sure to update the `video_path` or `image_path` in `test_model.py` to point to a real file._
|
||||
|
||||
## 4. Integration
|
||||
|
||||
After you are satisfied with `best.pt`:
|
||||
|
||||
1. Move `best.pt` to `backend/models/`.
|
||||
2. Updates `backend/pipelines/video_processor.py` line 10:
|
||||
```python
|
||||
self.yolo = YOLOManager("backend/models/best.pt")
|
||||
```
|
||||
1
backend/.gitignore
vendored
Normal file
1
backend/.gitignore
vendored
Normal file
@ -0,0 +1 @@
|
||||
/venv
|
||||
43
backend/README.md
Normal file
43
backend/README.md
Normal file
@ -0,0 +1,43 @@
|
||||
# Pothole & Road Sign Detection Backend
|
||||
|
||||
This project implements a Two-Stage Detection Pipeline using YOLOv8 (for detection & tracking) and CLIP (for zero-shot classification).
|
||||
|
||||
## Setup
|
||||
|
||||
1. **Install Dependencies**:
|
||||
```bash
|
||||
pip install -r backend/requirements.txt
|
||||
```
|
||||
2. **Training YOLO (Crucial Step)**:
|
||||
You MUST train a YOLO model on your custom dataset (Pothole, Manhole, Traffic Signs) for this to work effectively.
|
||||
- Prepare your `dataset.yaml`.
|
||||
- Run the training script:
|
||||
```python
|
||||
from backend.models.train_yolo import train_yolo
|
||||
train_yolo("path/to/dataset.yaml", epochs=50)
|
||||
```
|
||||
|
||||
- This will generate `best.pt`. Move this file to `backend/models/best.pt`.
|
||||
|
||||
## Running the API
|
||||
|
||||
Start the FastAPI server:
|
||||
|
||||
```bash
|
||||
cd backend
|
||||
python main.py
|
||||
```
|
||||
|
||||
The server will start at `http://0.0.0.0:8000`.
|
||||
|
||||
## API Usage
|
||||
|
||||
**Endpoint**: `POST /detect/video`
|
||||
|
||||
- **Body**: `multipart/form-data`, key `file` (Video file).
|
||||
- **Response**: JSON summary of unique objects detected.
|
||||
|
||||
## Configuration
|
||||
|
||||
- Modify `backend/pipelines/video_processor.py` to change the `yolo_model_path` to your trained model path (e.g., `backend/models/best.pt`).
|
||||
- You can also adjust the CLA candidate labels in `VideoProcessor.__init__`.
|
||||
59
backend/main.py
Normal file
59
backend/main.py
Normal file
@ -0,0 +1,59 @@
|
||||
from fastapi import FastAPI, File, UploadFile, HTTPException
|
||||
from fastapi.responses import JSONResponse
|
||||
import shutil
|
||||
import os
|
||||
import uuid
|
||||
from backend.pipelines.video_processor import VideoProcessor
|
||||
|
||||
app = FastAPI(title="Pothole & Road Sign Detection API")
|
||||
|
||||
# Initialize Processor (Loading models takes time, do it on startup)
|
||||
# In production, use lifespan events or dependency injection
|
||||
print("Initializing Video Processor...")
|
||||
try:
|
||||
processor = VideoProcessor()
|
||||
except Exception as e:
|
||||
print(f"Warning: Could not initialize processor (Check model paths). Error: {e}")
|
||||
processor = None
|
||||
|
||||
UPLOAD_DIR = "uploads"
|
||||
os.makedirs(UPLOAD_DIR, exist_ok=True)
|
||||
|
||||
@app.get("/")
|
||||
def health_check():
|
||||
return {"status": "running", "models_loaded": processor is not None}
|
||||
|
||||
@app.post("/detect/video")
|
||||
async def detect_video(file: UploadFile = File(...)):
|
||||
if processor is None:
|
||||
raise HTTPException(status_code=503, detail="Models not accepted or loaded.")
|
||||
|
||||
# Save uploaded file
|
||||
file_id = str(uuid.uuid4())
|
||||
file_location = os.path.join(UPLOAD_DIR, f"{file_id}_{file.filename}")
|
||||
|
||||
with open(file_location, "wb") as buffer:
|
||||
shutil.copyfileobj(file.file, buffer)
|
||||
|
||||
try:
|
||||
# Run processing
|
||||
start_time = os.times().elapsed
|
||||
results = processor.process_video(file_location)
|
||||
|
||||
# Cleanup
|
||||
os.remove(file_location)
|
||||
|
||||
return {
|
||||
"video_id": file_id,
|
||||
"processed": True,
|
||||
"unique_objects": results
|
||||
}
|
||||
except Exception as e:
|
||||
# Cleanup on error
|
||||
if os.path.exists(file_location):
|
||||
os.remove(file_location)
|
||||
raise HTTPException(status_code=500, detail=str(e))
|
||||
|
||||
if __name__ == "__main__":
|
||||
import uvicorn
|
||||
uvicorn.run(app, host="0.0.0.0", port=8000)
|
||||
62
backend/models/clip_manager.py
Normal file
62
backend/models/clip_manager.py
Normal file
@ -0,0 +1,62 @@
|
||||
from transformers import CLIPProcessor, CLIPModel
|
||||
import torch
|
||||
from PIL import Image
|
||||
|
||||
class CLIPManager:
|
||||
def __init__(self, model_id: str = "openai/clip-vit-base-patch32"):
|
||||
"""
|
||||
Initializes the CLIP model and processor.
|
||||
|
||||
Args:
|
||||
model_id (str): Hugging Face model ID.
|
||||
"""
|
||||
print(f"Loading CLIP model: {model_id}...")
|
||||
self.device = "cuda" if torch.cuda.is_available() else "cpu"
|
||||
self.model = CLIPModel.from_pretrained(model_id).to(self.device)
|
||||
self.processor = CLIPProcessor.from_pretrained(model_id)
|
||||
print(f"CLIP loaded on {self.device}.")
|
||||
|
||||
def classify_image(self, image: Image.Image, candidate_labels: list[str]):
|
||||
"""
|
||||
Classifies an image against a list of text labels.
|
||||
|
||||
Args:
|
||||
image (PIL.Image): The cropped image to classify.
|
||||
candidate_labels (list[str]): List of strings to compare against.
|
||||
|
||||
Returns:
|
||||
dict: {label: score} sorted by confidence.
|
||||
"""
|
||||
if not candidate_labels:
|
||||
return {}
|
||||
|
||||
inputs = self.processor(text=candidate_labels, images=image, return_tensors="pt", padding=True).to(self.device)
|
||||
|
||||
with torch.no_grad():
|
||||
outputs = self.model(**inputs)
|
||||
|
||||
logits_per_image = outputs.logits_per_image # this is the image-text similarity score
|
||||
probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities
|
||||
|
||||
# Convert to dictionary
|
||||
scores = probs.cpu().numpy()[0]
|
||||
result = {label: float(score) for label, score in zip(candidate_labels, scores)}
|
||||
|
||||
# Sort by score descending
|
||||
sorted_result = dict(sorted(result.items(), key=lambda item: item[1], reverse=True))
|
||||
return sorted_result
|
||||
|
||||
def get_best_match(self, image: Image.Image, candidate_labels: list[str], threshold: float = 0.5):
|
||||
"""
|
||||
Returns the single best match if it exceeds the threshold.
|
||||
"""
|
||||
results = self.classify_image(image, candidate_labels)
|
||||
if not results:
|
||||
return None, 0.0
|
||||
|
||||
best_label = list(results.keys())[0]
|
||||
best_score = list(results.values())[0]
|
||||
|
||||
if best_score >= threshold:
|
||||
return best_label, best_score
|
||||
return "Uncertain", best_score
|
||||
39
backend/models/train_yolo.py
Normal file
39
backend/models/train_yolo.py
Normal file
@ -0,0 +1,39 @@
|
||||
from ultralytics import YOLO
|
||||
|
||||
def train_yolo(data_yaml_path: str, model_size: str = "yolov8n.pt", epochs: int = 50):
|
||||
"""
|
||||
Trains a YOLOv8 model on a custom dataset.
|
||||
|
||||
Args:
|
||||
data_yaml_path (str): Path to the dataset.yaml file.
|
||||
model_size (str): Pre-trained model to start from (e.g., yolov8n.pt, yolov8s.pt).
|
||||
epochs (int): Number of training epochs.
|
||||
"""
|
||||
print(f"Loading {model_size}...")
|
||||
model = YOLO(model_size)
|
||||
|
||||
print(f"Starting training for {epochs} epochs using {data_yaml_path}...")
|
||||
model.train(data=data_yaml_path, epochs=epochs, imgsz=640)
|
||||
|
||||
print("Training complete. Validating...")
|
||||
metrics = model.val()
|
||||
print(f"Validation metrics: {metrics}")
|
||||
|
||||
print("Exporting model...")
|
||||
path = model.export(format="onnx")
|
||||
print(f"Model exported to {path}")
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Example usage:
|
||||
# Ensure you have a data.yaml file configured for your dataset
|
||||
# train_yolo("path/to/data.yaml")
|
||||
|
||||
# Use relative path from where user is running (backend folder)
|
||||
# They are running from 'backend', so dataset is at '../datasets/...'
|
||||
dataset_path = "../datasets/road_signs_potholes/data.yaml"
|
||||
|
||||
# Or absolute path if needed:
|
||||
# dataset_path = "d:/Time-Pass-Projects/pothole-roadsign detection/datasets/road_signs_potholes/data.yaml"
|
||||
|
||||
print(f"Using dataset: {dataset_path}")
|
||||
train_yolo(dataset_path, epochs=100) # Increased epochs for better results on small data
|
||||
35
backend/models/yolo_manager.py
Normal file
35
backend/models/yolo_manager.py
Normal file
@ -0,0 +1,35 @@
|
||||
from ultralytics import YOLO
|
||||
import cv2
|
||||
import numpy as np
|
||||
|
||||
class YOLOManager:
|
||||
def __init__(self, model_path: str = "yolov8n.pt"):
|
||||
"""
|
||||
Initializes the YOLO model for inference.
|
||||
|
||||
Args:
|
||||
model_path (str): Path to the trained YOLO model weights (.pt file).
|
||||
"""
|
||||
print(f"Loading YOLO model from {model_path}...")
|
||||
self.model = YOLO(model_path)
|
||||
|
||||
def track(self, frame, conf: float = 0.25, iou: float = 0.5):
|
||||
"""
|
||||
Runs YOLO tracking on a single frame.
|
||||
|
||||
Args:
|
||||
frame: Numpy array (image).
|
||||
conf (float): Confidence threshold.
|
||||
iou (float): IoU threshold.
|
||||
|
||||
Returns:
|
||||
Results object from Ultralytics.
|
||||
"""
|
||||
# persist=True is crucial for tracking to work across frames
|
||||
results = self.model.track(frame, persist=True, conf=conf, iou=iou, tracker="bytetrack.yaml", verbose=False)
|
||||
return results[0]
|
||||
|
||||
def detect(self, frame):
|
||||
"""Standard detection without tracking."""
|
||||
results = self.model.predict(frame, verbose=False)
|
||||
return results[0]
|
||||
165
backend/pipelines/video_processor.py
Normal file
165
backend/pipelines/video_processor.py
Normal file
@ -0,0 +1,165 @@
|
||||
import cv2
|
||||
import time
|
||||
from collections import defaultdict
|
||||
from backend.models.yolo_manager import YOLOManager
|
||||
from backend.models.clip_manager import CLIPManager
|
||||
from backend.utils.image_utils import is_blurry, crop_image, convert_cv2_to_pil
|
||||
|
||||
class VideoProcessor:
|
||||
def __init__(self, yolo_model_path="yolov8n.pt", clip_model_id="openai/clip-vit-base-patch32"):
|
||||
self.yolo = YOLOManager(yolo_model_path)
|
||||
self.clip = CLIPManager(clip_model_id)
|
||||
|
||||
# Buffer to store the best shot for each track ID
|
||||
# Format: {track_id: {'crop': np.array, 'area': float, 'frame_idx': int, 'bbox': list}}
|
||||
self.active_tracks = {}
|
||||
|
||||
# Store final results
|
||||
self.final_results = []
|
||||
|
||||
# CLIP Candidates
|
||||
self.pothole_labels = ["pothole", "shadow", "patch work", "manhole", "road crack"]
|
||||
self.sign_labels = ["stop sign", "yield sign", "speed limit 30", "speed limit 40", "speed limit 50", "speed limit 60", "pedestrian crossing", "no u-turn", "traffic light", "keep right"]
|
||||
|
||||
# Frame counter
|
||||
self.frame_count = 0
|
||||
|
||||
def process_video(self, video_path: str):
|
||||
cap = cv2.VideoCapture(video_path)
|
||||
if not cap.isOpened():
|
||||
print(f"Error opening video: {video_path}")
|
||||
return []
|
||||
|
||||
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
|
||||
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
|
||||
|
||||
print(f"Processing video {video_path} ({width}x{height})...")
|
||||
|
||||
while cap.isOpened():
|
||||
ret, frame = cap.read()
|
||||
if not ret:
|
||||
break
|
||||
|
||||
self.frame_count += 1
|
||||
|
||||
# 1. Run YOLO Tracking
|
||||
results = self.yolo.track(frame)
|
||||
|
||||
if results.boxes is None or results.boxes.id is None:
|
||||
continue
|
||||
|
||||
boxes = results.boxes.xyxy.cpu().numpy()
|
||||
track_ids = results.boxes.id.cpu().numpy()
|
||||
class_ids = results.boxes.cls.cpu().numpy() # 0, 1, 2 depending on training
|
||||
|
||||
current_frame_ids = set()
|
||||
|
||||
for box, track_id, cls in zip(boxes, track_ids, class_ids):
|
||||
track_id = int(track_id)
|
||||
current_frame_ids.add(track_id)
|
||||
|
||||
x1, y1, x2, y2 = box
|
||||
w_box = x2 - x1
|
||||
h_box = y2 - y1
|
||||
area = w_box * h_box
|
||||
|
||||
# Check if this is the "best shot" so far
|
||||
if track_id not in self.active_tracks:
|
||||
self.active_tracks[track_id] = {
|
||||
'crop': crop_image(frame, box),
|
||||
'area': area,
|
||||
'frame_idx': self.frame_count,
|
||||
'class_id': int(cls),
|
||||
'bbox': box,
|
||||
'processed': False
|
||||
}
|
||||
else:
|
||||
# Update if bigger area and not processed yet
|
||||
if area > self.active_tracks[track_id]['area'] and not self.active_tracks[track_id]['processed']:
|
||||
self.active_tracks[track_id].update({
|
||||
'crop': crop_image(frame, box),
|
||||
'area': area,
|
||||
'frame_idx': self.frame_count,
|
||||
'bbox': box
|
||||
})
|
||||
|
||||
# Trigger Classification if object is near the edge (leaving frame)
|
||||
# Margin of 50 pixels
|
||||
if x1 < 50 or y1 < 50 or x2 > width - 50 or y2 > height - 50:
|
||||
self._classify_and_store(track_id)
|
||||
|
||||
# Cleanup tracks that are no longer present (Exited frame)
|
||||
# Identify tracks in self.active_tracks that are NOT in current_frame_ids
|
||||
# We need to be careful not to classify already processed/removed tracks
|
||||
# But here we execute the "Trigger A"
|
||||
|
||||
# Simple approach: Check all active tracks. If a track was seen recently but not now, assume it left?
|
||||
# Better approach for simplicity: We do classification on 'processed' flag or when finalizing.
|
||||
# Real ByteTrack keeps tracks 'lost' for some frames.
|
||||
# Here we will iterate existing keys and if not in current_frame, classify.
|
||||
|
||||
# Cleanup tracks that are no longer present (Exited frame)
|
||||
# We iterate over a copy of keys to avoid RuntimeError
|
||||
for tid in list(self.active_tracks.keys()):
|
||||
if tid not in current_frame_ids:
|
||||
# It's gone from view (or mostly gone)
|
||||
self._classify_and_store(tid)
|
||||
# Remove from active tracks to save memory
|
||||
if self.active_tracks[tid].get('processed'):
|
||||
del self.active_tracks[tid]
|
||||
|
||||
cap.release()
|
||||
|
||||
# Process any remaining tracks
|
||||
for tid in list(self.active_tracks.keys()):
|
||||
self._classify_and_store(tid)
|
||||
|
||||
print("Processing complete.")
|
||||
return self.final_results
|
||||
|
||||
def _classify_and_store(self, track_id):
|
||||
track_data = self.active_tracks.get(track_id)
|
||||
if not track_data or track_data.get('processed'):
|
||||
return
|
||||
|
||||
crop = track_data['crop']
|
||||
|
||||
# Blur check - if too blurry, maybe skip or mark low confidence?
|
||||
# For now, we process anyway but could log it.
|
||||
# if is_blurry(crop): ...
|
||||
|
||||
# Prepare for CLIP
|
||||
pil_image = convert_cv2_to_pil(crop)
|
||||
|
||||
# Deciding which labels to use based on YOLO class
|
||||
# Assuming YOLO classes: 0=Sign, 1=Pothole/Manhole (Just an example schema)
|
||||
# You'd need to map this to your specific training.
|
||||
# For this logic, let's try both or fallback.
|
||||
|
||||
# Strategy: Classify against ALL relevant labels to be safe?
|
||||
# Or split if we trust YOLO class.
|
||||
# Let's trust YOLO class if available.
|
||||
# For this template, I will simply check against both lists and take the max confidence one.
|
||||
|
||||
candidates = self.sign_labels + self.pothole_labels
|
||||
best_label, score = self.clip.get_best_match(pil_image, candidates, threshold=0.5)
|
||||
|
||||
obj_type = "Traffic Sign" if best_label in self.sign_labels else "Road Damage"
|
||||
|
||||
result = {
|
||||
"id": track_id,
|
||||
"type": obj_type,
|
||||
"subtype": best_label,
|
||||
"confidence": float(score),
|
||||
"frame_idx": track_data['frame_idx'],
|
||||
# In a real app, you might save the crop to disk and return a URL
|
||||
# "crop_path": save_to_disk...
|
||||
}
|
||||
|
||||
self.final_results.append(result)
|
||||
self.active_tracks[track_id]['processed'] = True
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Test run
|
||||
processor = VideoProcessor()
|
||||
# processor.process_video("test_video.mp4")
|
||||
11
backend/requirements.txt
Normal file
11
backend/requirements.txt
Normal file
@ -0,0 +1,11 @@
|
||||
ultralytics
|
||||
transformers
|
||||
torch
|
||||
fastapi
|
||||
uvicorn
|
||||
opencv-python-headless
|
||||
pillow
|
||||
numpy
|
||||
ftfy
|
||||
regex
|
||||
tqdm
|
||||
109
backend/runs/detect/train/args.yaml
Normal file
109
backend/runs/detect/train/args.yaml
Normal file
@ -0,0 +1,109 @@
|
||||
task: detect
|
||||
mode: train
|
||||
model: yolov8n.pt
|
||||
data: ../datasets/road_signs_potholes/data.yaml
|
||||
epochs: 100
|
||||
time: null
|
||||
patience: 100
|
||||
batch: 16
|
||||
imgsz: 640
|
||||
save: true
|
||||
save_period: -1
|
||||
cache: false
|
||||
device: cpu
|
||||
workers: 8
|
||||
project: null
|
||||
name: train
|
||||
exist_ok: false
|
||||
pretrained: true
|
||||
optimizer: auto
|
||||
verbose: true
|
||||
seed: 0
|
||||
deterministic: true
|
||||
single_cls: false
|
||||
rect: false
|
||||
cos_lr: false
|
||||
close_mosaic: 10
|
||||
resume: false
|
||||
amp: true
|
||||
fraction: 1.0
|
||||
profile: false
|
||||
freeze: null
|
||||
multi_scale: 0.0
|
||||
compile: false
|
||||
overlap_mask: true
|
||||
mask_ratio: 4
|
||||
dropout: 0.0
|
||||
val: true
|
||||
split: val
|
||||
save_json: false
|
||||
conf: null
|
||||
iou: 0.7
|
||||
max_det: 300
|
||||
half: false
|
||||
dnn: false
|
||||
plots: true
|
||||
end2end: null
|
||||
source: null
|
||||
vid_stride: 1
|
||||
stream_buffer: false
|
||||
visualize: false
|
||||
augment: false
|
||||
agnostic_nms: false
|
||||
classes: null
|
||||
retina_masks: false
|
||||
embed: null
|
||||
show: false
|
||||
save_frames: false
|
||||
save_txt: false
|
||||
save_conf: false
|
||||
save_crop: false
|
||||
show_labels: true
|
||||
show_conf: true
|
||||
show_boxes: true
|
||||
line_width: null
|
||||
format: torchscript
|
||||
keras: false
|
||||
optimize: false
|
||||
int8: false
|
||||
dynamic: false
|
||||
simplify: true
|
||||
opset: null
|
||||
workspace: null
|
||||
nms: false
|
||||
lr0: 0.01
|
||||
lrf: 0.01
|
||||
momentum: 0.937
|
||||
weight_decay: 0.0005
|
||||
warmup_epochs: 3.0
|
||||
warmup_momentum: 0.8
|
||||
warmup_bias_lr: 0.1
|
||||
box: 7.5
|
||||
cls: 0.5
|
||||
dfl: 1.5
|
||||
pose: 12.0
|
||||
kobj: 1.0
|
||||
rle: 1.0
|
||||
angle: 1.0
|
||||
nbs: 64
|
||||
hsv_h: 0.015
|
||||
hsv_s: 0.7
|
||||
hsv_v: 0.4
|
||||
degrees: 0.0
|
||||
translate: 0.1
|
||||
scale: 0.5
|
||||
shear: 0.0
|
||||
perspective: 0.0
|
||||
flipud: 0.0
|
||||
fliplr: 0.5
|
||||
bgr: 0.0
|
||||
mosaic: 1.0
|
||||
mixup: 0.0
|
||||
cutmix: 0.0
|
||||
copy_paste: 0.0
|
||||
copy_paste_mode: flip
|
||||
auto_augment: randaugment
|
||||
erasing: 0.4
|
||||
cfg: null
|
||||
tracker: botsort.yaml
|
||||
save_dir: D:\Time-Pass-Projects\pothole-roadsign detection\backend\runs\detect\train
|
||||
71
backend/test_model.py
Normal file
71
backend/test_model.py
Normal file
@ -0,0 +1,71 @@
|
||||
from backend.models.yolo_manager import YOLOManager
|
||||
import cv2
|
||||
import os
|
||||
|
||||
def test_model(model_path="backend/models/best.pt", source="test_video.mp4"):
|
||||
"""
|
||||
Tests the YOLO model on a video or image.
|
||||
"""
|
||||
if not os.path.exists(model_path):
|
||||
print(f"Model not found at {model_path}. Using standard yolov8n.pt for demo.")
|
||||
model_path = "yolov8n.pt"
|
||||
|
||||
yolo = YOLOManager(model_path)
|
||||
|
||||
# Check if source is image or video
|
||||
ext = os.path.splitext(source)[1].lower()
|
||||
if ext in ['.jpg', '.jpeg', '.png', '.bmp']:
|
||||
frame = cv2.imread(source)
|
||||
if frame is None:
|
||||
print(f"Could not read image: {source}")
|
||||
return
|
||||
|
||||
results = yolo.detect(frame)
|
||||
res_plotted = results.plot()
|
||||
cv2.imshow("YOLO Detection", res_plotted)
|
||||
cv2.waitKey(0)
|
||||
cv2.destroyAllWindows()
|
||||
|
||||
else:
|
||||
# Video
|
||||
cap = cv2.VideoCapture(source)
|
||||
if not cap.isOpened():
|
||||
print(f"Could not open video: {source}")
|
||||
return
|
||||
|
||||
print("Press 'q' to exit.")
|
||||
while True:
|
||||
ret, frame = cap.read()
|
||||
if not ret:
|
||||
break
|
||||
|
||||
# Use 'track' or 'detect'
|
||||
results = yolo.track(frame)
|
||||
|
||||
# Plot results on frame
|
||||
annotated_frame = results.plot()
|
||||
|
||||
cv2.imshow("YOLO Tracking", annotated_frame)
|
||||
if cv2.waitKey(1) & 0xFF == ord('q'):
|
||||
break
|
||||
|
||||
cap.release()
|
||||
cv2.destroyAllWindows()
|
||||
|
||||
if __name__ == "__main__":
|
||||
# CHANGE THIS to your test file
|
||||
TEST_FILE = "d:/path/to/your/test/video_or_image.jpg"
|
||||
|
||||
if not os.path.exists(TEST_FILE):
|
||||
if TEST_FILE == "0":
|
||||
# Webcam
|
||||
test_model(source=0)
|
||||
else:
|
||||
print(f"File {TEST_FILE} not found.")
|
||||
TEST_FILE = input("Enter path to image/video (or 0 for webcam): ").strip('"')
|
||||
if TEST_FILE == "0":
|
||||
test_model(source=0)
|
||||
else:
|
||||
test_model(source=TEST_FILE)
|
||||
else:
|
||||
test_model(source=TEST_FILE)
|
||||
52
backend/utils/image_utils.py
Normal file
52
backend/utils/image_utils.py
Normal file
@ -0,0 +1,52 @@
|
||||
import cv2
|
||||
import numpy as np
|
||||
from PIL import Image
|
||||
|
||||
def is_blurry(image: np.ndarray, threshold: float = 100.0) -> bool:
|
||||
"""
|
||||
Checks if an image is blurry using the Laplacian variance method.
|
||||
|
||||
Args:
|
||||
image (np.ndarray): The image to check (BGR format).
|
||||
threshold (float): The variance threshold below which the image is considered blurry.
|
||||
|
||||
Returns:
|
||||
bool: True if blurry, False otherwise.
|
||||
"""
|
||||
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
|
||||
variance = cv2.Laplacian(gray, cv2.CV_64F).var()
|
||||
return variance < threshold
|
||||
|
||||
def crop_image(frame: np.ndarray, bbox: list[float], padding_percent: float = 0.1) -> np.ndarray:
|
||||
"""
|
||||
Crops an image based on a bounding box with optional padding.
|
||||
|
||||
Args:
|
||||
frame (np.ndarray): The full image frame.
|
||||
bbox (list): [x1, y1, x2, y2]
|
||||
padding_percent (float): Percentage of padding to add around the box.
|
||||
|
||||
Returns:
|
||||
np.ndarray: The cropped image.
|
||||
"""
|
||||
h, w, _ = frame.shape
|
||||
x1, y1, x2, y2 = bbox
|
||||
|
||||
width = x2 - x1
|
||||
height = y2 - y1
|
||||
|
||||
pad_w = width * padding_percent
|
||||
pad_h = height * padding_percent
|
||||
|
||||
# Apply padding ensuring we stay within frame boundaries
|
||||
new_x1 = max(0, int(x1 - pad_w))
|
||||
new_y1 = max(0, int(y1 - pad_h))
|
||||
new_x2 = min(w, int(x2 + pad_w))
|
||||
new_y2 = min(h, int(y2 + pad_h))
|
||||
|
||||
return frame[new_y1:new_y2, new_x1:new_x2]
|
||||
|
||||
def convert_cv2_to_pil(cv2_image: np.ndarray) -> Image.Image:
|
||||
"""Conventional cv2 BGR to PIL RGB conversion."""
|
||||
color_converted = cv2.cvtColor(cv2_image, cv2.COLOR_BGR2RGB)
|
||||
return Image.fromarray(color_converted)
|
||||
66
backend/utils/label_helper.py
Normal file
66
backend/utils/label_helper.py
Normal file
@ -0,0 +1,66 @@
|
||||
import cv2
|
||||
import sys
|
||||
import os
|
||||
|
||||
def get_yolo_coordinates(image_path):
|
||||
"""
|
||||
Opens an image, lets the user draw a box, and prints the YOLO format coordinates.
|
||||
"""
|
||||
if not os.path.exists(image_path):
|
||||
print(f"Error: File {image_path} not found.")
|
||||
return
|
||||
|
||||
# Load image
|
||||
img = cv2.imread(image_path)
|
||||
if img is None:
|
||||
print("Error: Could not read image.")
|
||||
return
|
||||
|
||||
height, width, _ = img.shape
|
||||
|
||||
print("---------------------------------------------------------")
|
||||
print(f"Loaded {image_path} ({width}x{height})")
|
||||
print("INSTRUCTIONS:")
|
||||
print("1. Draw a box around the object using your mouse.")
|
||||
print("2. Press ENTER or SPACE to confirm the box.")
|
||||
print("3. Press 'c' to cancel.")
|
||||
print("---------------------------------------------------------")
|
||||
|
||||
# Select ROI
|
||||
# fromCenter=False, showCrosshair=True
|
||||
r = cv2.selectROI("Draw Box (Press Enter to Confirm)", img, fromCenter=False, showCrosshair=True)
|
||||
cv2.destroyAllWindows()
|
||||
|
||||
# r is (x, y, w, h) in pixels
|
||||
x_pixel, y_pixel, w_pixel, h_pixel = r
|
||||
|
||||
if w_pixel == 0 or h_pixel == 0:
|
||||
print("No box selected.")
|
||||
return
|
||||
|
||||
# Convert to YOLO format (Normalized)
|
||||
# center_x, center_y, width, height
|
||||
|
||||
center_x = (x_pixel + (w_pixel / 2)) / width
|
||||
center_y = (y_pixel + (h_pixel / 2)) / height
|
||||
norm_w = w_pixel / width
|
||||
norm_h = h_pixel / height
|
||||
|
||||
# Limit precision to 6 decimal places
|
||||
print(f"\nSUCCESS! Here is your YOLO label line:")
|
||||
print(f"---------------------------------------------------------")
|
||||
print(f"<class_id> {center_x:.6f} {center_y:.6f} {norm_w:.6f} {norm_h:.6f}")
|
||||
print(f"---------------------------------------------------------")
|
||||
print("Replace <class_id> with:")
|
||||
print("0 -> if it is a Traffic Sign")
|
||||
print("1 -> if it is a Pothole")
|
||||
print("2 -> if it is a Manhole")
|
||||
print(f"---------------------------------------------------------")
|
||||
|
||||
if __name__ == "__main__":
|
||||
if len(sys.argv) > 1:
|
||||
path = sys.argv[1]
|
||||
else:
|
||||
path = input("Enter the path to your image: ").strip('"')
|
||||
|
||||
get_yolo_coordinates(path)
|
||||
BIN
backend/yolov8n.pt
Normal file
BIN
backend/yolov8n.pt
Normal file
Binary file not shown.
31
datasets/road_signs_potholes/README.md
Normal file
31
datasets/road_signs_potholes/README.md
Normal file
@ -0,0 +1,31 @@
|
||||
# Dataset Structure Guide
|
||||
|
||||
This folder contains the structure required for YOLOv8 training.
|
||||
|
||||
## What goes where?
|
||||
|
||||
1. **Images**:
|
||||
- Put your training images (80% of data) in: `train/images/`
|
||||
- Put your validation images (20% of data) in: `val/images/`
|
||||
- Supported formats: `.jpg`, `.png`, `.bmp`.
|
||||
|
||||
2. **Labels**:
|
||||
- For every image `image1.jpg`, you need a text file `image1.txt` in the corresponding `labels/` folder.
|
||||
- Example:
|
||||
- `train/images/road_01.jpg`
|
||||
- `train/labels/road_01.txt`
|
||||
|
||||
3. **data.yaml**:
|
||||
- This file configures the dataset paths and class names.
|
||||
- It is the entry point for the training script.
|
||||
|
||||
## Label Format
|
||||
|
||||
YOLO expects a `.txt` file with one line per object:
|
||||
`<class_id> <x_center> <y_center> <width> <height>`
|
||||
|
||||
- **class_id**: Integer (0, 1, 2...) from `data.yaml`.
|
||||
- **coordinates**: Normalized between 0 and 1.
|
||||
|
||||
Example:
|
||||
`0 0.5 0.5 0.2 0.4` -> Class 0, centered in the middle, 20% width, 40% height.
|
||||
16
datasets/road_signs_potholes/data.yaml
Normal file
16
datasets/road_signs_potholes/data.yaml
Normal file
@ -0,0 +1,16 @@
|
||||
# Train/Val/Test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
|
||||
path: D:/Time-Pass-Projects/pothole-roadsign detection/datasets/road_signs_potholes # dataset root dir
|
||||
train: train/images
|
||||
val: val/images
|
||||
|
||||
# Classes
|
||||
names:
|
||||
0: Traffic Sign
|
||||
1: Pothole
|
||||
2: Manhole
|
||||
|
||||
# Key:
|
||||
# data.yaml is the "Map" for YOLO.
|
||||
# 1. It tells YOLO where to find the images for training and validation.
|
||||
# 2. It tells YOLO how many classes there are and what their names are (0, 1, 2...).
|
||||
# 3. YOLO reads this file first to verify everything exists.
|
||||
BIN
datasets/road_signs_potholes/train/images/40.png
Normal file
BIN
datasets/road_signs_potholes/train/images/40.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 29 KiB |
BIN
datasets/road_signs_potholes/train/images/60.png
Normal file
BIN
datasets/road_signs_potholes/train/images/60.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 73 KiB |
BIN
datasets/road_signs_potholes/train/images/manhole.png
Normal file
BIN
datasets/road_signs_potholes/train/images/manhole.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 88 KiB |
Binary file not shown.
|
After Width: | Height: | Size: 51 KiB |
BIN
datasets/road_signs_potholes/train/images/pothole1.png
Normal file
BIN
datasets/road_signs_potholes/train/images/pothole1.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 431 KiB |
BIN
datasets/road_signs_potholes/train/images/pothole2.png
Normal file
BIN
datasets/road_signs_potholes/train/images/pothole2.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 222 KiB |
1
datasets/road_signs_potholes/train/labels/40.txt
Normal file
1
datasets/road_signs_potholes/train/labels/40.txt
Normal file
@ -0,0 +1 @@
|
||||
0 0.481481 0.488739 0.757202 0.815315
|
||||
1
datasets/road_signs_potholes/train/labels/60.txt
Normal file
1
datasets/road_signs_potholes/train/labels/60.txt
Normal file
@ -0,0 +1 @@
|
||||
0 0.492045 0.492081 0.979545 0.979638
|
||||
1
datasets/road_signs_potholes/train/labels/manhole.txt
Normal file
1
datasets/road_signs_potholes/train/labels/manhole.txt
Normal file
@ -0,0 +1 @@
|
||||
2 0.485294 0.511561 0.858289 0.734104
|
||||
@ -0,0 +1 @@
|
||||
0 0.470684 0.442231 0.700326 0.749004
|
||||
1
datasets/road_signs_potholes/train/labels/pothole1.txt
Normal file
1
datasets/road_signs_potholes/train/labels/pothole1.txt
Normal file
@ -0,0 +1 @@
|
||||
1 0.511981 0.741117 0.321086 0.172589
|
||||
1
datasets/road_signs_potholes/train/labels/pothole2.txt
Normal file
1
datasets/road_signs_potholes/train/labels/pothole2.txt
Normal file
@ -0,0 +1 @@
|
||||
1 0.679221 0.808594 0.454545 0.296875
|
||||
1
datasets/road_signs_potholes/val/labels/road_003.txt
Normal file
1
datasets/road_signs_potholes/val/labels/road_003.txt
Normal file
@ -0,0 +1 @@
|
||||
0 0.4 0.4 0.15 0.15
|
||||
Loading…
x
Reference in New Issue
Block a user