YOLOv8による物体検出（Open Images V7）（ソースコードと説明と利用ガイド）

【概要】YOLOv8とOpen Images V7データセットを用いたリアルタイム物体検出プログラム。600クラスの物体を識別可能で、動画ファイルやカメラ映像から物体を検出・追跡する。CLAHE前処理により低照度環境での視認性を向上させ、ByteTrackアルゴリズムでオクルージョン発生時も追跡を継続。TTA実装により検出精度を向上。5種類のモデルサイズ(n,s,m,l,x)から選択可能。検出結果の自動保存機能を備える。

プログラム利用ガイド

このプログラムの利用シーン

動画ファイルやカメラ映像に映る多種多様な物体を、リアルタイムで自動的に識別し、追跡するためのツールである。交通量調査、店舗内の人物動線分析、スポーツ映像の解析といった場面で利用される。

主な機能

リアルタイム物体検出・追跡: 映像内の物体を検出し、それぞれに固有のIDを割り当てて追跡する。
モデル選択機能: 処理速度と精度のバランスが異なる5種類のモデル（n, s, m, l, x）から、用途に応じて選択できる。
入力ソースの選択: ローカルの動画ファイル、ウェブカメラ、サンプル動画の3種類から入力映像を選択できる。
結果の可視化と保存: 検出した物体の位置、クラス名、追跡IDを画面に表示し、全てのフレームの処理結果を終了時にテキストファイル（result.txt）へ自動保存する。

基本的な使い方

プログラムの起動:
コンソールでプログラムを実行する。
モデルの選択:
最初に表示される指示に従い、使用するモデルのキー（n, s, m, l, x のいずれか）を入力し、Enterキーを押す。
入力ソースの選択:
次に表示される指示に従い、入力ソースの番号（0: 動画ファイル, 1: カメラ, 2: サンプル動画）を入力し、Enterキーを押す。
結果の確認:
処理結果が描画されたウィンドウが開き、リアルタイムで検出・追跡の様子を確認できる。コンソールにも各フレームの検出情報が出力される。
プログラムの終了:
結果表示ウィンドウがアクティブな状態で、キーボードの q キーを押すとプログラムが終了する。

便利な機能

精度向上のための内部処理: プログラムには、暗い場所での検出精度を上げるCLAHEや、検出漏れを減らすTTAといった機能が標準で有効化されている。
詳細なログファイル: プログラム終了後、カレントディレクトリに生成される result.txt には、使用した設定やフレームごとの詳細な検出結果が記録されており、後の分析に利用できる。

事前準備

ここでは、最低限の事前準備について説明する。機械学習や深層学習を行う場合は、NVIDIA CUDA、Visual Studio、Cursorなどを追加でインストールすると便利である。これらについては別ページ https://www.kkaneko.jp/cc/dev/aiassist.htmlで詳しく解説しているので、必要に応じて参照してください。

Python 3.12 のインストール

インストール済みの場合は実行不要。

管理者権限でコマンドプロンプトを起動（手順：Windowsキーまたはスタートメニュー > cmd と入力 > 右クリック > 「管理者として実行」）し、以下を実行する。管理者権限は、wingetの--scope machineオプションでシステム全体にソフトウェアをインストールするために必要である。

REM Python をシステム領域にインストール
winget install --scope machine --id Python.Python.3.12 -e --silent --accept-source-agreements --accept-package-agreements
REM Python のパス設定
set "PYTHON_PATH=C:\Program Files\Python312"
set "PYTHON_SCRIPTS_PATH=C:\Program Files\Python312\Scripts"
echo "%PATH%" | find /i "%PYTHON_PATH%" >nul
if errorlevel 1 setx PATH "%PATH%;%PYTHON_PATH%" /M >nul
echo "%PATH%" | find /i "%PYTHON_SCRIPTS_PATH%" >nul
if errorlevel 1 setx PATH "%PATH%;%PYTHON_SCRIPTS_PATH%" /M >nul

【関連する外部ページ】

Python の公式ページ: https://www.python.org/

AI エディタ Windsurf のインストール

Pythonプログラムの編集・実行には、AI エディタの利用を推奨する。ここでは，Windsurfのインストールを説明する。

管理者権限でコマンドプロンプトを起動（手順：Windowsキーまたはスタートメニュー > cmd と入力 > 右クリック > 「管理者として実行」）し、以下を実行して、Windsurfをシステム全体にインストールする。管理者権限は、wingetの--scope machineオプションでシステム全体にソフトウェアをインストールするために必要となる。

winget install --scope machine --id Codeium.Windsurf -e --silent --accept-source-agreements --accept-package-agreements

【関連する外部ページ】

Windsurf の公式ページ: https://windsurf.com/

必要なパッケージのインストール

管理者権限でコマンドプロンプトを起動し、以下のコマンドを実行する：


pip install -U torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
pip install ultralytics opencv-python numpy pillow boxmot

YOLOv8による物体検出プログラム（Open Images V7）

概要

本プログラムは、動画やカメラ映像からリアルタイムで物体を検出し、追跡する機能を提供する。入力映像に対してCLAHE（コントラスト制限適応ヒストグラム均等化）による前処理を適用し、YOLOv8で物体検出を実行後、ByteTrackアルゴリズムで個々の物体をフレーム間で追跡する。また、TTA（Test-Time Augmentation）を実装し、検出精度の向上を図る。

主要技術

YOLOv8

Ultralytics社によって開発されたリアルタイム物体検出モデルである[1]。単一のニューラルネットワークで物体のバウンディングボックスとクラスを同時に予測する。

ByteTrack

検出結果の信頼度スコアに基づいて、低スコアの検出結果を背景として即座に破棄せず、追跡対象との関連付けに利用する物体追跡アルゴリズムである[2]。これにより、オクルージョン（物体の一時的な隠蔽）が発生した場合でも追跡を継続する性能を持つ。

技術的特徴

Open Images V7 データセット学習済みモデルの利用
600クラスの物体を識別可能な、Open Images V7データセットで事前学習されたYOLOv8モデルを使用する。
CLAHEによる前処理
映像をYUV色空間に変換し、輝度チャンネルに対してCLAHEを適用することで、低照度環境下での物体の視認性を向上させる。
TTA（Test-Time Augmentation）の実装
推論時に元の画像と水平反転させた画像の2つを入力し、それぞれの検出結果を統合することで、検出漏れを抑制する。
動的なリソース選択
プログラム実行環境に応じて、利用可能なGPU（CUDA）またはCPUを自動的に選択し、推論デバイスとして使用する。

実装の特色

本プログラムは、リアルタイムでの物体検出・追跡処理に特化しており、以下の実装上の特色を持つ。

ユーザーが実行時にモデルサイズ（n, s, m, l, x）を選択可能。
入力ソースとして、動画ファイル、PC接続のカメラ、Web上のサンプル動画の3種類に対応する。
検出結果（クラス名、信頼度、追跡ID）を映像上に描画し、処理結果のサマリーと詳細ログをテキストファイルに出力する。

参考文献

[1] Ultralytics. (2024). YOLOv8. GitHub. https://github.com/ultralytics/ultralytics
[2] Zhang, Y., et al. (2021). ByteTrack: A Simple and Strong Baseline for Multi-Object Tracking. arXiv preprint arXiv:2110.06864. https://arxiv.org/abs/2110.06864

ソースコード


"""
プログラム名: YOLOv8による物体検出・ByteTrackによる追跡とTTAの機能付き（Open Images V7 600クラス）
特徴技術名: YOLOv8 (Ultralytics)
出典: Ultralytics. (2023). YOLOv8. GitHub. https://github.com/ultralytics/ultralytics
特徴機能: 単一ニューラルネットワークによるリアルタイム物体検出。画像全体を一度に処理し、バウンディングボックスとクラス確率を同時に予測
学習済みモデル: yolov8n/s/m/l/x-oiv7.pt - YOLOv8モデル（ユーザー選択）、Open Images V7データセット（600クラス）で事前学習済み、推論に最適化、https://github.com/ultralytics/assets/releases
特徴技術および学習済モデルの利用制限: AGPL-3.0ライセンス（オープンソース）。商用利用の場合はEnterprise License要取得（Ultralytics公式サイト参照）。必ず利用者自身で利用制限を確認すること。
方式設計:
  関連利用技術:
    - OpenCV: 画像・動画処理、カメラ制御
    - CLAHE (Contrast Limited Adaptive Histogram Equalization): 低照度環境での画像品質向上
    - PyTorch: ディープラーニングフレームワーク、GPU/CPU自動選択
    - ByteTrack: カルマンフィルタとハンガリアンアルゴリズムによる物体追跡（boxmotパッケージ版）
    - TTA (Test Time Augmentation): 複数の画像変換で推論し結果を統合
  入力と出力: 入力: 動画（ユーザは「0:動画ファイル，1:カメラ，2:サンプル動画」のメニューで選択．0:動画ファイルの場合はtkinterでファイル選択．1の場合はOpenCVでカメラが開く．2の場合はhttps://raw.githubusercontent.com/opencv/opencv/master/samples/data/vtest.aviを使用）、出力: OpenCV画面でリアルタイム表示、検出結果をresult.txtに保存
  処理手順: 1.動画フレーム取得→2.CLAHE前処理→3.TTA適用→4.YOLOv8推論実行→5.バウンディングボックス抽出→6.ByteTrack追跡→7.結果描画
  前処理、後処理: 前処理：CLAHE適用による画像コントラスト強化、後処理：ByteTrack追跡による検出結果の安定化とID管理
  追加処理: TTA - 水平反転による推論結果の統合
  調整を必要とする設定値: CONF_THRESH（信頼度閾値、デフォルト0.2）- 検出感度を制御、値が低いほど多くの物体を検出、TTA_ENABLED（TTAの有効/無効、デフォルトTrue）
将来方策: 信頼度閾値の自動最適化 - 検出結果の時系列分析により、シーンごとに最適な閾値を動的に学習・適用する機能
その他の重要事項: Open Images V7 600クラス全て検出可能
前準備:
pip install -U torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
pip install ultralytics opencv-python numpy pillow boxmot
"""
import cv2
import numpy as np
import torch
import torchvision
from ultralytics import YOLO
import tkinter as tk
from tkinter import filedialog
import urllib.request
import time
import sys
import io
from datetime import datetime
from PIL import Image, ImageDraw, ImageFont
from boxmot import ByteTrack
import threading

sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8', line_buffering=True)

# GPU/CPU自動選択
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'デバイス: {str(device)}')

# GPU使用時の最適化
if device.type == 'cuda':
    torch.backends.cudnn.benchmark = True

MODEL_INFO = {
    'n': {'name': 'Nano', 'params': '3.2M', 'mAP': '18.4%', 'desc': '最速'},
    's': {'name': 'Small', 'params': '11.2M', 'mAP': '27.7%', 'desc': 'デフォルト'},
    'm': {'name': 'Medium', 'params': '25.9M', 'mAP': '33.6%', 'desc': '中程度'},
    'l': {'name': 'Large', 'params': '43.7M', 'mAP': '36.6%', 'desc': '高精度'},
    'x': {'name': 'Extra Large', 'params': '68.2M', 'mAP': '37.8%', 'desc': '最高精度'}
}

CONF_THRESH = 0.2
IOU_THRESH = 0.45
NMS_THRESHOLD = 0.4
IMG_SIZE = 1280
CLAHE_CLIP_LIMIT = 3.0
CLAHE_TILE_SIZE = (8, 8)
WINDOW_NAME = "Open Images V7 600-Class Detection"
TTA_ENABLED = True
TTA_CONF_BOOST = 0.05
USE_TRACKER = True

clahe = cv2.createCLAHE(clipLimit=CLAHE_CLIP_LIMIT, tileGridSize=CLAHE_TILE_SIZE)

tracker = ByteTrack() if USE_TRACKER else None


class ThreadedVideoCapture:
    """スレッド化されたVideoCapture（常に最新フレームを取得）"""
    def __init__(self, src, is_camera=False):
        if is_camera:
            self.cap = cv2.VideoCapture(src, cv2.CAP_DSHOW)
            fourcc = cv2.VideoWriter_fourcc('M', 'J', 'P', 'G')
            self.cap.set(cv2.CAP_PROP_FOURCC, fourcc)
            self.cap.set(cv2.CAP_PROP_FPS, 60)
        else:
            self.cap = cv2.VideoCapture(src)

        self.grabbed, self.frame = self.cap.read()
        self.stopped = False
        self.lock = threading.Lock()
        self.thread = threading.Thread(target=self.update, args=())
        self.thread.daemon = True
        self.thread.start()

    def update(self):
        """バックグラウンドでフレームを取得し続ける"""
        while not self.stopped:
            grabbed, frame = self.cap.read()
            with self.lock:
                self.grabbed = grabbed
                if grabbed:
                    self.frame = frame

    def read(self):
        """最新フレームを返す"""
        with self.lock:
            return self.grabbed, self.frame.copy() if self.grabbed else None

    def isOpened(self):
        return self.cap.isOpened()

    def get(self, prop):
        return self.cap.get(prop)

    def release(self):
        self.stopped = True
        self.thread.join()
        self.cap.release()


def bgr_to_rgb(color_bgr):
    return (color_bgr[2], color_bgr[1], color_bgr[0])


def generate_class_colors(num_classes):
    colors = []
    for i in range(num_classes):
        hue = int(180.0 * i / num_classes)
        hsv = np.uint8([[[hue, 255, 255]]])
        bgr = cv2.cvtColor(hsv, cv2.COLOR_HSV2BGR)[0][0]
        colors.append((int(bgr[0]), int(bgr[1]), int(bgr[2])))
    return colors


CLASS_COLORS = generate_class_colors(600)

# 日本語フォント設定
FONT_PATH = 'C:/Windows/Fonts/meiryo.ttc'
FONT_SIZE_MAIN = 16
FONT_SIZE_SMALL = 12
font_main = ImageFont.truetype(FONT_PATH, FONT_SIZE_MAIN)
font_small = ImageFont.truetype(FONT_PATH, FONT_SIZE_SMALL)

frame_count = 0
results_log = []
class_counts = {}
model = None


def initialize_model(model_choice):
    model_name = f'yolov8{model_choice}-oiv7.pt'
    model = YOLO(model_name)
    model.to(device)
    model.eval()
    return model, model_name


def run_model_inference(model, frame, conf, iou, img_size, device_obj):
    results = model(frame, conf=conf, iou=iou, imgsz=img_size, verbose=False, device=device_obj)
    return results


def normal_inference(frame, model, conf):
    results = run_model_inference(model, frame, conf, IOU_THRESH, IMG_SIZE, device)
    curr_dets = []
    for r in results:
        if r.boxes is not None:
            for box in r.boxes:
                x1, y1, x2, y2 = box.xyxy[0].cpu().numpy()
                conf_score = float(box.conf[0].cpu().numpy())
                cls = int(box.cls[0].cpu().numpy())
                name = model.names[cls]
                curr_dets.append({
                    'x1': int(x1), 'y1': int(y1),
                    'x2': int(x2), 'y2': int(y2),
                    'conf': conf_score,
                    'class': cls,
                    'name': name
                })
    return curr_dets


def apply_tta_inference(frame, model, conf):
    frame_width = frame.shape[1]

    flipped_frame = cv2.flip(frame, 1)

    results = model([frame, flipped_frame], conf=conf, iou=IOU_THRESH,
                    imgsz=IMG_SIZE, verbose=False, device=device)

    all_boxes = []
    all_confs = []
    all_classes = []

    if results[0].boxes is not None and len(results[0].boxes) > 0:
        boxes_orig = results[0].boxes.xyxy
        confs_orig = results[0].boxes.conf
        classes_orig = results[0].boxes.cls
        all_boxes.append(boxes_orig)
        all_confs.append(confs_orig)
        all_classes.append(classes_orig)

    if len(results) > 1 and results[1].boxes is not None and len(results[1].boxes) > 0:
        boxes_flipped = results[1].boxes.xyxy.clone()
        confs_flipped = results[1].boxes.conf
        classes_flipped = results[1].boxes.cls

        if boxes_flipped.shape[0] > 0:
            # 水平反転画像での検出結果を元の画像座標系に変換
            # x1, x2 の大小関係を保つ必要がある
            x1_flipped = boxes_flipped[:, 0].clone()
            x2_flipped = boxes_flipped[:, 2].clone()
            # 元の画像座標系での新しい座標
            boxes_flipped[:, 0] = frame_width - 1 - x2_flipped  # 新しいx1（左端）
            boxes_flipped[:, 2] = frame_width - 1 - x1_flipped  # 新しいx2（右端）

        all_boxes.append(boxes_flipped)
        all_confs.append(confs_flipped)
        all_classes.append(classes_flipped)

    if len(all_boxes) == 0:
        return []

    all_boxes = torch.cat(all_boxes, dim=0)
    all_confs = torch.cat(all_confs, dim=0)
    all_classes = torch.cat(all_classes, dim=0)

    valid_indices = all_confs > conf
    if valid_indices.sum() > 0:
        all_boxes = all_boxes[valid_indices]
        all_confs = all_confs[valid_indices]
        all_classes = all_classes[valid_indices]

        nms_indices = torchvision.ops.nms(all_boxes, all_confs, iou_threshold=NMS_THRESHOLD)
        final_boxes = all_boxes[nms_indices].cpu().numpy()
        final_confs = all_confs[nms_indices].cpu().numpy()
        final_classes = all_classes[nms_indices].cpu().numpy()

        detections = []
        for i in range(len(final_confs)):
            conf_boost = TTA_CONF_BOOST if TTA_ENABLED else 0
            detections.append({
                'x1': final_boxes[i][0], 'y1': final_boxes[i][1],
                'x2': final_boxes[i][2], 'y2': final_boxes[i][3],
                'conf': min(1.0, final_confs[i] + conf_boost),
                'class': int(final_classes[i])
            })

        for det in detections:
            det['name'] = model.names[det['class']]

        for det in detections:
            det['x1'] = int(det['x1'])
            det['y1'] = int(det['y1'])
            det['x2'] = int(det['x2'])
            det['y2'] = int(det['y2'])

        return detections

    return []


def apply_tta_if_enabled(frame, model, conf):
    if not TTA_ENABLED:
        return normal_inference(frame, model, conf)
    return apply_tta_inference(frame, model, conf)


def apply_bytetrack(detections, frame):
    global tracker

    if len(detections) > 0:
        dets_array = np.array([[d['x1'], d['y1'], d['x2'], d['y2'], d['conf'], d['class']]
                               for d in detections])
    else:
        dets_array = np.empty((0, 6))

    tracks = tracker.update(dets_array, frame)

    tracked_dets = []
    if len(tracks) > 0:
        for track in tracks:
            if len(track) >= 7:
                x1, y1, x2, y2, track_id, conf, cls = track[:7]
                name = model.names[int(cls)]
                tracked_dets.append({
                    'x1': int(x1), 'y1': int(y1),
                    'x2': int(x2), 'y2': int(y2),
                    'track_id': int(track_id),
                    'conf': float(conf),
                    'class': int(cls),
                    'name': name
                })
    return tracked_dets


def apply_tracking_if_enabled(detections, frame):
    if not USE_TRACKER:
        return detections
    return apply_bytetrack(detections, frame)


def process_detection_results(detections):
    global class_counts

    for det in detections:
        name = det['name']
        if name not in class_counts:
            class_counts[name] = 0
        class_counts[name] += 1

    return detections


def draw_detection_results(frame, detections):
    for det in detections:
        color_seed = det['class']
        color = CLASS_COLORS[color_seed % len(CLASS_COLORS)]
        cv2.rectangle(frame, (det['x1'], det['y1']),
                      (det['x2'], det['y2']), color, 2)

    if font_main is not None:
        texts_to_draw = []
        for det in detections:
            color_seed = det['class']
            color = CLASS_COLORS[color_seed % len(CLASS_COLORS)]
            track_id = det.get('track_id', 0) if USE_TRACKER else 0
            if USE_TRACKER and track_id > 0:
                label = f"ID:{track_id} {det['name']}: {det['conf']:.2f}"
            else:
                label = f"{det['name']}: {det['conf']:.2f}"

            texts_to_draw.append({
                'text': label,
                'org': (det['x1'], det['y1']-20),
                'color': bgr_to_rgb(color),
                'font_type': 'main'
            })
        frame = draw_texts_with_pillow(frame, texts_to_draw)

    tta_status = "TTA:ON" if TTA_ENABLED else "TTA:OFF"
    tracker_status = "ByteTrack:ON" if USE_TRACKER else "ByteTrack:OFF"
    info_text = f"Objects: {len(detections)} | Frame: {frame_count} | Classes: {len(set(d['name'] for d in detections)) if detections else 0} | {tta_status} | {tracker_status}"
    cv2.putText(frame, info_text, (10, 30),
                cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 255), 2)

    return frame


def format_detection_output(detections):
    if len(detections) == 0:
        return 'count=0'
    else:
        parts = []
        for det in detections:
            x1, y1, x2, y2 = det['x1'], det['y1'], det['x2'], det['y2']
            class_name = det['name']
            conf = det['conf']
            if USE_TRACKER and 'track_id' in det:
                parts.append(f'class={class_name},ID={det["track_id"]},conf={conf:.3f},box=[{x1},{y1},{x2},{y2}]')
            else:
                parts.append(f'class={class_name},conf={conf:.3f},box=[{x1},{y1},{x2},{y2}]')
        return f'count={len(detections)}; ' + ' | '.join(parts)


def draw_texts_with_pillow(bgr_frame, texts):
    """
    テキスト描画
    texts: list of dict with keys {text, org, color, font_type}
    """
    if font_main is None:
        return bgr_frame

    img_pil = Image.fromarray(cv2.cvtColor(bgr_frame, cv2.COLOR_BGR2RGB))
    draw = ImageDraw.Draw(img_pil)

    for item in texts:
        text = item['text']
        x, y = item['org']
        color = item['color']
        font_type = item.get('font_type', 'main')
        font = font_main if font_type == 'main' else font_small
        draw.text((x, y), text, font=font, fill=color)

    return cv2.cvtColor(np.array(img_pil), cv2.COLOR_RGB2BGR)


def detect_objects(frame):
    global model

    yuv_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2YUV)
    yuv_frame[:, :, 0] = clahe.apply(yuv_frame[:, :, 0])
    enh_frame = cv2.cvtColor(yuv_frame, cv2.COLOR_YUV2BGR)

    curr_dets = apply_tta_if_enabled(enh_frame, model, CONF_THRESH)

    return curr_dets


def process_video_frame(frame, timestamp_ms, is_camera):
    detections = detect_objects(frame)

    tracked_dets = apply_tracking_if_enabled(detections, frame)

    processed_dets = process_detection_results(tracked_dets)

    frame = draw_detection_results(frame, processed_dets)

    result = format_detection_output(processed_dets)

    return frame, result


def video_frame_processing(frame, timestamp_ms, is_camera):
    """動画フレーム処理（標準形式）"""
    global frame_count
    current_time = time.time()
    frame_count += 1

    processed_frame, result = process_video_frame(frame, timestamp_ms, is_camera)
    return processed_frame, result, current_time


def display_program_header():
    print('=' * 60)
    print('=== YOLOv8オブジェクト検出プログラム  ===')
    print('=' * 60)
    print('概要: CLAHEとTTAを適用し、リアルタイムでオブジェクトを検出します')
    print('機能: YOLOv8による物体検出（Open Images V7データセット600クラス）')
    print('技術: CLAHE (コントラスト強化) + ByteTrack による追跡 + TTA (Test Time Augmentation)')
    print('操作: qキーで終了')
    print('出力: 各フレームごとに処理結果を表示し、終了時にresult.txtへ保存')
    print()


display_program_header()

print("\n=== YOLOv8モデル選択 ===")
print('使用するYOLOv8モデルを選択してください:')
for key, info in MODEL_INFO.items():
    print(f'{key}: {info["name"]} ({info["params"]} params, mAP {info["mAP"]}) - {info["desc"]}')
print()

model_choice = ''
while model_choice not in MODEL_INFO.keys():
    model_choice = input("選択 (n/s/m/l/x) [デフォルト: s]: ").strip().lower()
    if model_choice == '':
        model_choice = 's'
        break
    if model_choice not in MODEL_INFO.keys():
        print("無効な選択です。もう一度入力してください。")

print(f"\nYOLOv8モデルをロード中...")
model, model_name = initialize_model(model_choice)
print(f"\n検出可能なクラス数: {len(model.names)}")
print(f"モデル情報: {MODEL_INFO[model_choice]['name']} ({MODEL_INFO[model_choice]['params']} params, mAP {MODEL_INFO[model_choice]['mAP']})")
print("モデルのロード完了")

if TTA_ENABLED:
    print("\nTest Time Augmentation (TTA): 有効")
    print("  - 水平反転による推論結果の統合")
    print(f"  - 信頼度ブースト値: {TTA_CONF_BOOST}")
    print(f"  - NMS閾値: {NMS_THRESHOLD}")
else:
    print("\nTest Time Augmentation (TTA): 無効")

if USE_TRACKER:
    print("\nByteTrack: 有効")
    print("  - カルマンフィルタによる動き予測")

print("\n=== YOLOv8リアルタイム物体検出（Open Images V7 600クラス） ===")
print("0: 動画ファイル")
print("1: カメラ")
print("2: サンプル動画")

choice = input("選択: ")

is_camera = (choice == '1')

if choice == '0':
    root = tk.Tk()
    root.withdraw()
    path = filedialog.askopenfilename()
    if not path:
        raise SystemExit(1)
    cap = cv2.VideoCapture(path)
elif choice == '1':
    cap = ThreadedVideoCapture(0, is_camera=True)
else:
    print("サンプル動画をダウンロード中...")
    SAMPLE_URL = 'https://raw.githubusercontent.com/opencv/opencv/master/samples/data/vtest.avi'
    SAMPLE_FILE = 'vtest.avi'
    urllib.request.urlretrieve(SAMPLE_URL, SAMPLE_FILE)
    cap = cv2.VideoCapture(SAMPLE_FILE)

if not cap.isOpened():
    print('動画ファイル・カメラを開けませんでした')
    raise SystemExit(1)

# フレームレートの取得とタイムスタンプ増分の計算
if is_camera:
    actual_fps = cap.get(cv2.CAP_PROP_FPS)
    print(f'カメラのfps: {actual_fps}')
    timestamp_increment = int(1000 / actual_fps) if actual_fps > 0 else 33
else:
    video_fps = cap.get(cv2.CAP_PROP_FPS)
    timestamp_increment = int(1000 / video_fps) if video_fps > 0 else 33

print('\n=== 動画処理開始 ===')
print('操作方法:')
print('  q キー: プログラム終了')

start_time = time.time()
last_info_time = start_time
info_interval = 10.0
timestamp_ms = 0
total_processing_time = 0.0

try:
    while True:
        ret, frame = cap.read()
        if not ret:
            break

        timestamp_ms += timestamp_increment

        processing_start = time.time()
        processed_frame, result, current_time = video_frame_processing(frame, timestamp_ms, is_camera)
        processing_time = time.time() - processing_start
        total_processing_time += processing_time
        cv2.imshow(WINDOW_NAME, processed_frame)

        if result:
            if is_camera:
                timestamp = datetime.fromtimestamp(current_time).strftime("%Y-%m-%d %H:%M:%S.%f")[:-3]
                print(f'{timestamp}, {result}')
            else:
                print(f'Frame {frame_count}: {result}')

            results_log.append(result)

        # 情報提供（カメラモードのみ、info_interval秒ごと）
        if is_camera:
            elapsed = current_time - last_info_time
            if elapsed >= info_interval:
                total_elapsed = current_time - start_time
                actual_fps = frame_count / total_elapsed if total_elapsed > 0 else 0
                avg_processing_time = (total_processing_time / frame_count * 1000) if frame_count > 0 else 0
                print(f'[情報] 経過時間: {total_elapsed:.1f}秒, 処理フレーム数: {frame_count}, 実測fps: {actual_fps:.1f}, 平均処理時間: {avg_processing_time:.1f}ms')
                last_info_time = current_time

        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

finally:
    print('\n=== プログラム終了 ===')
    cap.release()
    cv2.destroyAllWindows()

    if results_log:
        with open('result.txt', 'w', encoding='utf-8') as f:
            f.write('=== YOLOv8物体検出結果 ===\n')
            f.write(f'処理フレーム数: {frame_count}\n')
            f.write(f'使用モデル: {model_name}\n')
            f.write(f'モデル情報: {MODEL_INFO[model_choice]["name"]} ({MODEL_INFO[model_choice]["params"]} params, mAP {MODEL_INFO[model_choice]["mAP"]})\n')
            f.write(f'使用デバイス: {str(device).upper()}\n')
            if device.type == 'cuda':
                f.write(f'GPU: {torch.cuda.get_device_name(0)}\n')
            f.write(f'画像処理: CLAHE適用（YUV色空間）\n')
            f.write(f'TTA (Test Time Augmentation): {"有効" if TTA_ENABLED else "無効"}\n')
            if TTA_ENABLED:
                f.write(f'  - NMS閾値: {NMS_THRESHOLD}\n')
                f.write(f'  - 信頼度ブースト: {TTA_CONF_BOOST}\n')
            f.write(f'ByteTrack: {"有効" if USE_TRACKER else "無効"}\n')
            f.write(f'信頼度閾値: {CONF_THRESH}\n')
            f.write(f'\n検出されたクラス一覧:\n')
            for class_name, count in sorted(class_counts.items()):
                f.write(f'  {class_name}: {count}回\n')
            f.write('\n')
            f.write('\n'.join(results_log))
        print(f'\n処理結果をresult.txtに保存しました')
        print(f'検出されたクラス数: {len(class_counts)}')