YOLOE によるオープンボキャブラリ物体検出・追跡

【目次】

概要
事前準備
YOLOE オープンボキャブラリ物体検出プログラム
使用方法と動作原理
実験・探求のアイデア

Python開発環境，ライブラリ類

ここでは、最低限の事前準備について説明する。機械学習や深層学習を行う場合は、NVIDIA CUDA、Visual Studio、Cursorなどを追加でインストールすると便利である。これらについては別ページ https://www.kkaneko.jp/cc/dev/aiassist.htmlで詳しく解説しているので、必要に応じて参照してください。

Python 3.12 のインストール

Pythonのインストールを行い、Pythonのプログラムを実行する環境を整える。扱う環境は、Windows搭載パソコンである。金子研究室では、Python 3.12.10を推奨する。

[Windows での Python 3.12 のインストール手順を見るには、ここをクリック]

Windows での Python 3.12 のインストール

以下のいずれかの方法でPython 3.12をインストールする。Pythonがインストール済みの場合、この手順は不要である。

方法 1：winget によるインストール

【インストールコマンドの実行方法】

管理者権限でコマンドプロンプトを起動する（手順：Windowsキーまたはスタートメニュー → cmd と入力 → 右クリック → 「管理者として実行」）。そして、コマンド全体をコマンドプロンプトにコピー＆ペーストする。

--scope machine を指定することで、システム全体（全ユーザー向け）にインストールされる。このオプションの実行には管理者権限が必要である。インストール完了後、コマンドプロンプトを再起動するとPATHが反映される。

REM Python 3.12 をシステム領域にインストール
winget install --id Python.Python.3.12 -e --scope machine --silent --accept-source-agreements --accept-package-agreements --override "/quiet InstallAllUsers=1 PrependPath=1 Include_test=0 Include_pip=1 Include_launcher=1 InstallLauncherAllUsers=1 TargetDir=\"C:\Program Files\Python312\""

REM Python と Scripts を PATH 先頭に追加
powershell -NoProfile -Command "$p='C:\Program Files\Python312'; $s=\"$p\Scripts\"; $c=[Environment]::GetEnvironmentVariable('Path','Machine'); if((Test-Path $p) -and (';'+$c+';' -notlike \"*;$p;*\") -and (';'+$c+';' -notlike \"*;$s;*\")){[Environment]::SetEnvironmentVariable('Path',\"$p;$s;$c\",'Machine')}"

方法 2：インストーラーによるインストール

Python公式サイト（https://www.python.org/downloads/）にアクセスし、「Download Python 3.x.x」ボタンからWindows用インストーラーをダウンロードする。
ダウンロードしたインストーラーを実行する。
初期画面の下部に表示される「Add python.exe to PATH」にチェックを入れてから「Customize installation」を選択する。このチェックを入れ忘れると、コマンドプロンプトから python コマンドを実行できない。
「Install Python 3.xx for all users」にチェックを入れ、「Install」をクリックする。

インストールの確認

コマンドプロンプトで以下を実行する。

python --version

バージョン番号（例：Python 3.12.x）が表示されればインストール成功である。「'python' は、内部コマンドまたは外部コマンドとして認識されていません。」と表示される場合は、インストールが正常に完了していない。

Python の開発環境 Visual Studio Code のインストールと Python 用の設定

Python の開発環境Visual Studio Code（プログラムを編集するソフトウェア。以下、VS Code）を整える。

[Windows での Visual Studio Code のインストールと Python 用の設定手順を見るには、ここをクリック]

Windows での Visual Studio Code のインストールと Python 用の設定手順

1. VS Code と拡張機能のインストール

以下のコマンドにより，既存の VS Code を削除し，全ユーザー共有の設定で再インストールしたうえで，拡張機能（VS Code に機能を追加するソフトウェア）をまとめて導入する．

【インストールコマンドの実行方法】

管理者権限でコマンドプロンプトを起動する（手順：Windows キーまたはスタートメニュー → cmd と入力 → 右クリック → 「管理者として実行」）。そして，コマンド全体をコマンドプロンプトにコピー＆ペーストする。

インストールコマンド


REM ============================================================
REM Microsoft Visual Studio Code
REM ============================================================
winget uninstall -e --id Microsoft.VisualStudioCode --silent --disable-interactivity --accept-source-agreements
rmdir /s /q C:\ProgramData\vscode-extensions 2>nul
rmdir /s /q "%APPDATA%\Code" 2>nul
rmdir /s /q "%USERPROFILE%\.vscode" 2>nul
rmdir /s /q "%LOCALAPPDATA%\Microsoft\vscode-update" 2>nul

REM VS Code をシステム領域に新規インストール
winget install --scope machine --id Microsoft.VisualStudioCode -e --silent --accept-source-agreements --accept-package-agreements

REM 全ユーザー共有の拡張機能フォルダ
mkdir C:\ProgramData\vscode-extensions 2>nul
icacls "C:\ProgramData\vscode-extensions" /grant "Everyone:(OI)(CI)M" /T

REM スタートメニューのショートカットを --extensions-dir 付きで再作成
rmdir /s /q "C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Visual Studio Code" 2>nul
del "C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Visual Studio Code.lnk" 2>nul
powershell -NoProfile -Command "$s=New-Object -ComObject WScript.Shell; $lnk=$s.CreateShortcut('C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Visual Studio Code.lnk'); $lnk.TargetPath='C:\Program Files\Microsoft VS Code\Code.exe'; $lnk.Arguments='--extensions-dir \"C:\ProgramData\vscode-extensions\"'; $lnk.Save()"
REM ショートカットの検証
powershell -NoProfile -Command "$s=New-Object -ComObject WScript.Shell; $lnk=$s.CreateShortcut('C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Visual Studio Code.lnk'); Write-Host 'TargetPath:' $lnk.TargetPath; Write-Host 'Arguments:' $lnk.Arguments"

REM ファイル / フォルダ右クリックの「Code で開く」を登録
reg add "HKLM\SOFTWARE\Classes\*\shell\VSCode\command" /ve /d "\"C:\Program Files\Microsoft VS Code\Code.exe\" --extensions-dir \"C:\ProgramData\vscode-extensions\" \"%1\"" /f
reg add "HKLM\SOFTWARE\Classes\Directory\shell\VSCode\command" /ve /d "\"C:\Program Files\Microsoft VS Code\Code.exe\" --extensions-dir \"C:\ProgramData\vscode-extensions\" \"%1\"" /f
reg add "HKLM\SOFTWARE\Classes\Directory\Background\shell\VSCode\command" /ve /d "\"C:\Program Files\Microsoft VS Code\Code.exe\" --extensions-dir \"C:\ProgramData\vscode-extensions\" \"%V\"" /f

REM --extensions-dir 付きで起動する code.cmd ラッパを作成
REM （%* を echo で書くと対話的 cmd で失われるため、PowerShell で [char]37+'*' を書き出す）
powershell -NoProfile -Command "$pct=[char]37; $q=[char]34; $c='@echo off'+[char]13+[char]10+$q+'C:\Program Files\Microsoft VS Code\bin\code.cmd'+$q+' --extensions-dir '+$q+'C:\ProgramData\vscode-extensions'+$q+' '+$pct+'*'+[char]13+[char]10; [IO.File]::WriteAllText('C:\ProgramData\vscode-extensions\vscode.cmd',$c,[Text.Encoding]::ASCII)"

REM 拡張機能のインストール
set "CODE=C:\Program Files\Microsoft VS Code\bin\code.cmd"
"%CODE%" --extensions-dir "C:\ProgramData\vscode-extensions" --uninstall-extension GitHub.copilot
"%CODE%" --extensions-dir "C:\ProgramData\vscode-extensions" --uninstall-extension GitHub.copilot-chat
"%CODE%" --extensions-dir "C:\ProgramData\vscode-extensions" --install-extension ms-python.python
"%CODE%" --extensions-dir "C:\ProgramData\vscode-extensions" --install-extension ms-python.vscode-pylance
"%CODE%" --extensions-dir "C:\ProgramData\vscode-extensions" --install-extension ms-python.debugpy
"%CODE%" --extensions-dir "C:\ProgramData\vscode-extensions" --install-extension MS-CEINTL.vscode-language-pack-ja
"%CODE%" --extensions-dir "C:\ProgramData\vscode-extensions" --install-extension saoudrizwan.claude-dev
"%CODE%" --extensions-dir "C:\ProgramData\vscode-extensions" --install-extension rust-lang.rust-analyzer
"%CODE%" --extensions-dir "C:\ProgramData\vscode-extensions" --install-extension tamasfe.even-better-toml
"%CODE%" --extensions-dir "C:\ProgramData\vscode-extensions" --install-extension anthropic.claude-code
"%CODE%" --extensions-dir "C:\ProgramData\vscode-extensions" --install-extension almenon.arepl
"%CODE%" --extensions-dir "C:\ProgramData\vscode-extensions" --list-extensions --show-versions
echo === セットアップ完了 ===

2. Python インタプリタの選択

同一マシンに複数の Python がインストールされている場合，VS Code で使用する Python 本体（インタプリタ：Python プログラムを解釈・実行するソフトウェア）を選択する必要がある．

コマンドパレット（コマンド名で機能を呼び出す VS Code の入力欄）を開く（Ctrl+Shift+P）
Python: Select Interpreter と入力する
表示される一覧から，使用する Python（例：C:\Program Files\Python312\python.exe）を選択する．

Git のインストール（Windows 上） [クリックして展開]

管理者権限のコマンドプロンプトで以下を実行する．管理者権限は，winget の --scope machine オプションでシステム全体にインストールするために必要となる．

REM Git をシステム領域にインストール
winget install --scope machine --id Git.Git -e --silent --disable-interactivity --force --accept-source-agreements --accept-package-agreements --override "/VERYSILENT /NORESTART /NOCANCEL /SP- /CLOSEAPPLICATIONS /RESTARTAPPLICATIONS /COMPONENTS=""icons,ext\reg\shellhere,assoc,assoc_sh"" /o:PathOption=Cmd /o:CRLFOption=CRLFCommitAsIs /o:BashTerminalOption=MinTTY /o:DefaultBranchOption=main /o:EditorOption=VIM /o:SSHOption=OpenSSH /o:UseCredentialManager=Enabled /o:PerformanceTweaksFSCache=Enabled /o:EnableSymlinks=Disabled /o:EnableFSMonitor=Disabled"

必要なライブラリのインストール

コマンドプロンプトを管理者として実行（手順：Windowsキーまたはスタートメニュー > cmd と入力 > 右クリック > 「管理者として実行」）し、以下を実行する。

REM PyTorch をインストール（GPU対応版）
set "CUDA_TAG=cu128"
set "PYTHON_PATH=C:\Program Files\Python312"
"%PYTHON_PATH%\Scripts\pip" install --no-user -U numpy torch torchvision torchaudio --index-url https://download.pytorch.org/whl/%CUDA_TAG%
pip install ultralytics opencv-python numpy
pip install --no-cache-dir "git+https://github.com/ultralytics/CLIP.git

YOLOE オープンボキャブラリ物体検出・追跡プログラム

概要

YOLOEオープンボキャブラリ物体検出・追跡プログラムは、テキスト、視覚、内部ボキャブラリプロンプトによる任意物体の検出・追跡・セグメンテーションを実現するリアルタイム処理システムである[1]。動画の各フレームから物体を検出し、フレーム間の追跡によりオブジェクトIDを割り当て、軌跡・滞在時間・移動距離を計測する機能を持つ。

主要技術

YOLOE (Real-Time Seeing Anything)

2025年のICCV会議で発表された最新のオープンボキャブラリ物体検出・セグメンテーションモデルである[1]。従来のYOLOシリーズが固定カテゴリに制限されていた問題を解決し、テキストプロンプト、視覚プロンプト、プロンプトフリーの3つのモードで任意のオブジェクトクラスを検出可能である[2]。YOLO-Worldv2-Sと比較して3倍少ない訓練コストで3.5 AP向上、1.4倍の推論高速化を実現している[1]。

PyTorch深層学習フレームワーク

GPU加速対応のTensorライブラリであり、CUDA対応によるパフォーマンス最適化を提供する。本プログラムではYOLOEモデルの推論実行とGPU/CPU自動検出機能を担う。

OpenCV画像処理ライブラリ

カメラ制御、画像描画処理、動画入出力管理を担当する。セグメンテーションマスクの重畳描画、バウンディングボックス描画、軌跡表示等の可視化機能を実装している。

技術的特徴

オープンボキャブラリ検出機能

カスタムクラス定義により特定ドメインに特化した物体検出が可能である。プログラム内では42種類のカスタムクラス（「person wearing red shirt」「small car」等）を定義し、詳細な物体識別を実現している。

事前学習モデル

Objects365とLVISデータセットで事前学習されたモデルを使用している[3][4]。Objects365は365オブジェクトカテゴリ、200万画像、3000万以上のバウンディングボックスを持つ大規模データセットである[3]。LVISは1203オブジェクトカテゴリ、16万画像、200万インスタンス注釈を持つファインボキャブラリデータセットである[4]。

リアルタイム処理性能

640x640ピクセルへのリサイズと正規化による前処理、及び推論時オーバーヘッドゼロの再パラメータ化機能により、リアルタイム検出を実現している[1]。

実装の特色

距離ベース物体追跡アルゴリズム

フレーム間でのユークリッド距離計算による物体マッチングを行う。追跡可能最大距離100ピクセル、追跡失敗許容フレーム数30フレームのパラメータにより追跡を実現している。計算効率向上のため平方根演算を省略した距離の二乗値による比較を採用している。

追跡情報管理

各追跡オブジェクトについて物体ID、初回検出時刻、最終検出時刻、累積移動距離、軌跡履歴を記録する。軌跡表示は直近10フレーム分に制限し、メモリ使用量を最適化している。

複数入力形式対応

動画ファイル、リアルタイムカメラ、サンプル動画（OpenCVリポジトリからの自動ダウンロード）の3つの入力源に対応している。ファイル選択にはTkinterダイアログを使用し、カメラ制御にはDirectShowに対応したOpenCV機能を利用している。

結果出力システム

リアルタイム表示によるリアルタイムフィードバックと、プログラム終了時のresult.txtファイル出力による結果保存を両立している。処理結果には物体ID、滞在時間、移動距離情報が含まれる。

参考文献

[1] Wang, A., Liu, L., Chen, H., Lin, Z., Han, J., & Ding, G. (2025). YOLOE: Real-Time Seeing Anything. arXiv preprint arXiv:2503.07465. https://arxiv.org/abs/2503.07465

[2] Ultralytics. (2025). YOLOE: Real-Time Seeing Anything - Ultralytics YOLO Documentation. https://docs.ultralytics.com/models/yoloe/

[3] Shao, S., Li, Z., Zhang, T., Peng, C., Yu, G., Zhang, X., Li, J., & Sun, J. (2019). Objects365: A large-scale, high-quality dataset for object detection. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 8430-8439).

[4] Gupta, A., Dollar, P., & Girshick, R. (2019). LVIS: A dataset for large vocabulary instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5356-5364).

ソースコード

# プログラム名: YOLOE オープンボキャブラリ物体検出・追跡プログラム
# 特徴技術名: YOLOE (Real-Time Seeing Anything)
# 出典: Ultralytics YOLOE公式ドキュメント - https://docs.ultralytics.com/models/yoloe/
# 特徴機能: オープンボキャブラリ物体検出とセグメンテーション機能。テキスト、視覚、内部ボキャブラリプロンプトによる任意物体検出・追跡が可能
# 学習済みモデル: yoloe-11s-seg.pt (小型セグメンテーション), yoloe-11m-seg.pt (中型), yoloe-11l-seg.pt (大型) - Objects365 + LVIS データセット対応、Ultralyticsから自動ダウンロード
# 方式設計:
#   - 関連利用技術:
#     - PyTorch: 深層学習フレームワーク、CUDA対応によるGPU加速
#     - OpenCV: 画像処理、カメラ制御、描画処理、動画入出力管理
#   - 入力と出力: 入力: 動画（ユーザは「0:動画ファイル，1:カメラ，2:サンプル動画」のメニューで選択．0:動画ファイルの場合はtkinterでファイル選択．1の場合はOpenCVでカメラが開く．2の場合はhttps://raw.githubusercontent.com/opencv/opencv/master/samples/data/vtest.aviを使用）、出力: OpenCV画面でリアルタイム表示（検出したオブジェクトをバウンディングボックスとセグメンテーションマスクで表示、物体ID・軌跡・滞在時間も表示）、1秒間隔でprint()による処理結果表示（物体ID・滞在時間・移動距離を含む）、プログラム終了時にresult.txtファイルに保存
#   - 処理手順: 1.フレーム取得、2.YOLOE推論実行、3.オープンボキャブラリによる任意物体検出・セグメンテーション、4.フレーム間物体追跡・ID割り当て、5.信頼度閾値による選別、6.バウンディングボックス・セグメンテーションマスク・追跡情報描画
#   - 前処理、後処理: 前処理：YOLOE内部で自動実行（640x640リサイズ、正規化）。後処理：オープンボキャブラリ機能による領域テキストアライメント処理。セグメンテーションマスクの重畳描画。物体追跡による継続性維持。信頼度による閾値フィルタリング実施
#   - 追加処理: CUDA/CPU自動検出機能により、GPU搭載環境では自動的に加速。検出結果の信頼度ソートにより重要な検出を優先表示。セグメンテーションマスクの色分け表示。フレーム間物体追跡による軌跡記録・滞在時間計測・移動距離計算
#   - 調整を必要とする設定値: CONF_THRESHOLD（オブジェクト検出信頼度閾値、デフォルト0.3）- 値を上げると誤検出が減少するが検出漏れが増加
# 将来方策: CONF_THRESHOLDの動的調整機能。フレーム毎の検出数を監視し、検出数が閾値を超えた場合は信頼度を上げ、検出数が少ない場合は下げる適応的制御の実装
# その他の重要事項: Windows環境専用設計、CUDA対応GPU推奨（自動検出・CPUフォールバック機能付き）、初回実行時は学習済みモデルの自動ダウンロード、物体追跡により継続的な行動分析が可能
# 前準備:
#   - pip install -U numpy torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
#   - pip install ultralytics opencv-python numpy

import cv2
import tkinter as tk
from tkinter import filedialog
import os
import torch
import numpy as np
from ultralytics import YOLOE
import warnings
import time
import urllib.request
import math

warnings.filterwarnings('ignore')

# ===== 設定・定数管理 =====
# URLとファイル名
SAMPLE_VIDEO_URL = 'https://raw.githubusercontent.com/opencv/opencv/master/samples/data/vtest.avi'
SAMPLE_VIDEO_FILE = 'vtest.avi'

# ウィンドウ名
WINDOW_NAME = 'YOLOE Object Detection & Tracking'

# カスタムクラス定義（オープンボキャブラリの例）
CLASSES = [
    'person wearing red shirt', 'person wearing black', 'small car', 'big car',
    'small dog', 'large dog', 'orange cat', 'black cat', 'flying bird', 'perched bird',
    'open laptop', 'closed laptop', 'smartphone', 'tablet', 'thick book', 'thin book',
    'coffee cup', 'tea cup', 'water bottle', 'plastic bottle', 'wooden chair', 'metal chair',
    'dining table', 'computer desk', 'flat screen tv', 'old tv', 'wireless keyboard', 'gaming mouse',
    'red apple', 'green apple', 'fresh orange', 'ripe banana', 'hot pizza', 'birthday cake',
    'colorful flower', 'white flower', 'green tree', 'dry tree', 'school backpack', 'hiking backpack'
]

RESULT_FILE = 'result.txt'

# 検出パラメータ（調整可能）
CONF_THRESHOLD = 0.3  # オブジェクト検出信頼度閾値（0.0-1.0）

# 追跡パラメータ（調整可能）
MAX_TRACKING_DISTANCE = 100  # 追跡可能最大距離（ピクセル）
MAX_TRACKING_DISTANCE_SQ = MAX_TRACKING_DISTANCE ** 2  # 距離の2乗（高速化用）
MAX_LOST_FRAMES = 30  # 追跡失敗許容フレーム数
TRAIL_LENGTH = 10  # 軌跡表示フレーム数

# 表示設定
PRINT_INTERVAL = 1.0  # 結果出力間隔（秒）


class ObjectTracker:
    def __init__(self):
        self.next_id = 1
        self.tracked_objects = {}
        self.frame_count = 0

    def calculate_distance_squared(self, pos1, pos2):
        # 平方根計算を省略して高速化
        return (pos1[0] - pos2[0])**2 + (pos1[1] - pos2[1])**2

    def update(self, detections):
        self.frame_count += 1
        current_positions = []
        current_classes = []
        current_scores = []
        current_boxes = []
        current_masks = []

        # 検出結果から位置とクラス情報を抽出
        for detection in detections:
            if detection.boxes is not None and len(detection.boxes) > 0:
                boxes = detection.boxes.xyxy.cpu().numpy()
                scores = detection.boxes.conf.cpu().numpy()
                classes = detection.boxes.cls.cpu().numpy()
                masks = detection.masks.data.cpu().numpy() if detection.masks is not None else None

                for i, (box, score, cls) in enumerate(zip(boxes, scores, classes)):
                    center_x = (box[0] + box[2]) / 2
                    center_y = (box[1] + box[3]) / 2
                    current_positions.append((center_x, center_y))
                    current_classes.append(int(cls))
                    current_scores.append(score)
                    current_boxes.append(box)
                    if masks is not None and i < len(masks):
                        current_masks.append(masks[i])
                    else:
                        current_masks.append(None)

        # 既存の追跡オブジェクトとのマッチング
        matched_objects = {}
        used_detections = set()

        for obj_id, obj_data in self.tracked_objects.items():
            best_match = -1
            best_distance_sq = float('inf')

            for i, pos in enumerate(current_positions):
                if i in used_detections:
                    continue
                distance_sq = self.calculate_distance_squared(obj_data['last_position'], pos)
                if distance_sq < MAX_TRACKING_DISTANCE_SQ and distance_sq < best_distance_sq:
                    best_distance_sq = distance_sq
                    best_match = i

            if best_match != -1:
                # マッチングした場合、情報を更新
                used_detections.add(best_match)
                old_pos = obj_data['last_position']
                new_pos = current_positions[best_match]
                # 移動距離の計算（表示用なので平方根を計算）
                move_distance = math.sqrt(self.calculate_distance_squared(old_pos, new_pos))

                matched_objects[obj_id] = {
                    'position': new_pos,
                    'last_position': new_pos,
                    'class_id': current_classes[best_match],
                    'score': current_scores[best_match],
                    'box': current_boxes[best_match],
                    'mask': current_masks[best_match],
                    'first_seen': obj_data['first_seen'],
                    'last_seen': self.frame_count,
                    'total_distance': obj_data['total_distance'] + move_distance,
                    'trail': obj_data['trail'] + [new_pos],
                    'lost_frames': 0
                }

                # 軌跡の長さを制限
                if len(matched_objects[obj_id]['trail']) > TRAIL_LENGTH:
                    matched_objects[obj_id]['trail'] = matched_objects[obj_id]['trail'][-TRAIL_LENGTH:]
            else:
                # マッチングしなかった場合、ロストフレーム数を増加
                obj_data['lost_frames'] += 1
                if obj_data['lost_frames'] <= MAX_LOST_FRAMES:
                    matched_objects[obj_id] = obj_data

        # 新しい検出をオブジェクトとして追加
        for i, pos in enumerate(current_positions):
            if i not in used_detections:
                matched_objects[self.next_id] = {
                    'position': pos,
                    'last_position': pos,
                    'class_id': current_classes[i],
                    'score': current_scores[i],
                    'box': current_boxes[i],
                    'mask': current_masks[i],
                    'first_seen': self.frame_count,
                    'last_seen': self.frame_count,
                    'total_distance': 0.0,
                    'trail': [pos],
                    'lost_frames': 0
                }
                self.next_id += 1

        self.tracked_objects = matched_objects
        return self.tracked_objects


def video_processing(frame, model, tracker, results_log, last_print_time):
    # YOLOEによる推論実行
    results = model(frame, conf=CONF_THRESHOLD, verbose=False)

    # 物体追跡の更新
    tracked_objects = tracker.update(results)

    current_time = time.time()
    processed_frame = frame.copy()

    # 追跡結果の描画と情報収集
    detection_info = []

    for obj_id, obj_data in tracked_objects.items():
        if obj_data['lost_frames'] == 0:  # 現在フレームで検出されている物体のみ
            class_name = model.names[obj_data['class_id']] if obj_data['class_id'] < len(model.names) else f'class_{obj_data["class_id"]}'
            score = obj_data['score']
            duration = obj_data['last_seen'] - obj_data['first_seen'] + 1
            total_distance = obj_data['total_distance']

            # セグメンテーションマスクの描画
            if obj_data['mask'] is not None:
                mask_resized = cv2.resize(obj_data['mask'], (frame.shape[1], frame.shape[0]))
                mask_colored = np.zeros_like(frame)
                # オブジェクトIDに基づいた一意な色生成
                color_seed = obj_id * 50
                color = (
                    (color_seed * 7) % 255,
                    (color_seed * 13) % 255,
                    (color_seed * 19) % 255
                )
                mask_colored[mask_resized > 0.5] = color
                processed_frame = cv2.addWeighted(processed_frame, 1, mask_colored, 0.3, 0)
            else:
                # マスクがない場合のデフォルト色
                color = (0, 255, 0)

            # バウンディングボックス描画
            box = obj_data['box']
            x1, y1, x2, y2 = map(int, box)
            cv2.rectangle(processed_frame, (x1, y1), (x2, y2), color, 2)

            # ID、クラス、スコア、滞在時間の表示
            label = f'ID:{obj_id} {class_name}: {score:.2f} ({duration}f)'
            cv2.putText(processed_frame, label, (x1, y1-10),
                       cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

            # 軌跡の描画
            trail = obj_data['trail']
            if len(trail) > 1:
                for i in range(1, len(trail)):
                    pt1 = (int(trail[i-1][0]), int(trail[i-1][1]))
                    pt2 = (int(trail[i][0]), int(trail[i][1]))
                    cv2.line(processed_frame, pt1, pt2, color, 2)

            # 検出情報の記録
            detection_info.append(f'ID:{obj_id} {class_name}: {score:.2f} (滞在:{duration}f, 移動:{total_distance:.1f}px)')

    # 1秒間隔での出力
    if current_time - last_print_time[0] >= PRINT_INTERVAL:
        if detection_info:
            output = f'[{time.strftime("%H:%M:%S")}] 追跡中: {", ".join(detection_info)}'
        else:
            output = f'[{time.strftime("%H:%M:%S")}] 追跡中: No objects'
        print(output)
        results_log.append(output)
        last_print_time[0] = current_time

    return processed_frame


# プログラム概要表示
print('=== YOLOEオブジェクト検出・追跡プログラム ===')
print('概要: リアルタイムでオブジェクトを検出・追跡し、セグメンテーションマスクと軌跡で表示します')
print('機能: YOLOEによるオープンボキャブラリ物体検出・セグメンテーション・追跡')
print('追跡情報: 物体ID、滞在時間、移動距離、軌跡を表示・記録')
print('操作: qキーで終了')
print()

# YOLOEモデル設定（デフォルト：s、変更可能：s, m, l）
print('モデル選択:')
print('1. yoloe-11s-seg.pt - 軽量')
print('2. yoloe-11m-seg.pt - バランス型')
print('3. yoloe-11l-seg.pt - 精度重視')
choice = input('選択[1-3]: ')
sizes = {'1': 's', '2': 'm', '3': 'l'}
MODEL_SIZE = sizes.get(choice, 's')
MODEL_NAME = f'yoloe-11{MODEL_SIZE}-seg.pt'

# GPU/CPU自動選択
if torch.cuda.is_available():
    device = 'cuda'
    print(f'GPU検出: {torch.cuda.get_device_name(0)}')
else:
    device = 'cpu'
    print('CPUモードで実行します')

# YOLOEモデル初期化
try:
    print(f'YOLOE{MODEL_SIZE}モデルを初期化中...')
    model = YOLOE(MODEL_NAME)

    # 重要: オープンボキャブラリ機能を有効にするためのクラス設定
    model.set_classes(CLASSES, model.get_text_pe(CLASSES))
    print(f'モデル読み込み完了。検出対象: {len(CLASSES)}クラス')
except Exception as e:
    print(f'モデルの読み込みに失敗しました: {e}')
    exit()

# 物体追跡器の初期化
tracker = ObjectTracker()
results_log = []
last_print_time = [time.time()]

print('0: 動画ファイル')
print('1: カメラ')
print('2: サンプル動画')

choice = input('選択: ')
temp_file = None

if choice == '0':
    root = tk.Tk()
    root.withdraw()
    path = filedialog.askopenfilename()
    if not path:
        exit()
    cap = cv2.VideoCapture(path)
elif choice == '1':
    cap = cv2.VideoCapture(0, cv2.CAP_DSHOW)
    cap.set(cv2.CAP_PROP_BUFFERSIZE, 1)
elif choice == '2':
    # サンプル動画ダウンロード・処理
    url = SAMPLE_VIDEO_URL
    filename = SAMPLE_VIDEO_FILE
    try:
        urllib.request.urlretrieve(url, filename)
        temp_file = filename
        cap = cv2.VideoCapture(filename)
    except Exception as e:
        print(f'動画のダウンロードに失敗しました: {url}')
        print(f'エラー: {e}')
        exit()
else:
    print('無効な選択です')
    exit()

# メイン処理
try:
    while True:
        cap.grab()
        ret, frame = cap.retrieve()
        if not ret:
            break

        processed_frame = video_processing(frame, model, tracker, results_log, last_print_time)
        cv2.imshow(WINDOW_NAME, processed_frame)

        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
finally:
    cap.release()
    cv2.destroyAllWindows()
    if temp_file:
        os.remove(temp_file)

    # 結果保存（追跡統計情報を含む）
    with open(RESULT_FILE, 'w', encoding='utf-8') as f:
        f.write('=== YOLOEオブジェクト検出・追跡結果 ===\n')
        f.write(f'使用デバイス: {device.upper()}\n')
        f.write(f'総フレーム数: {tracker.frame_count}\n')
        f.write(f'追跡したオブジェクト数: {tracker.next_id - 1}\n')
        f.write('\n=== 追跡統計 ===\n')

        # 各オブジェクトの詳細統計
        for obj_id, obj_data in tracker.tracked_objects.items():
            class_name = model.names[obj_data['class_id']] if obj_data['class_id'] < len(model.names) else f'class_{obj_data["class_id"]}'
            duration = obj_data['last_seen'] - obj_data['first_seen'] + 1
            f.write(f'ID:{obj_id} {class_name} - 滞在:{duration}フレーム, 移動距離:{obj_data["total_distance"]:.1f}px\n')

        f.write('\n=== 時系列ログ ===\n')
        for log in results_log:
            f.write(log + '\n')
    print('result.txtに保存')

使用方法と動作原理

実行手順

上記のプログラムを実行する
映像が表示され、リアルタイムで物体検出が開始される
検出結果は画面上にバウンディングボックス（検出された物体を囲む矩形枠）と信頼度スコア（検出の確信度を0から1の数値で表したもの）で表示される
'q'キーを押すとプログラムが終了する

動作原理

プログラムは以下の処理を繰り返し実行する。

入力取得：Webカメラから映像フレームを取得
特徴抽出：YOLOEモデルが画像から視覚的特徴を抽出し、CUSTOM_CLASSESで指定されたテキスト情報と照合
物体検出：信頼度スコアが閾値（CONF_THRESHOLD：0.3）以上の物体を検出
結果表示：検出された物体にバウンディングボックスを描画し、クラス名と信頼度スコアを表示

表示情報

フレーム番号とモード表示
カスタムクラス数（24種類）
検出物体数の日本語表示
各検出物体の英語クラス名と信頼度

実験・探求のアイデア

実験項目

信頼度閾値の調整：CONF_THRESHOLD を 0.1、0.3、0.5、0.7 に変更し、検出精度と誤検出の関係を観察する
カスタムクラスの変更：CUSTOM_CLASSES リストを編集し、特定分野の物体（医療器具、工具、食品など）に特化した検出を行う
処理速度の測定：各モデルでフレームレート（1秒間に処理できるフレーム数）を測定し、精度と速度のトレードオフを定量化する
検出対象物体の追加

物体追加手順

新しい物体を追加する。

追加方法：

リストの最後に新しい物体名を追加する
英語で記述する（日本語は不可）
単語形式で記述する（文章は不可）
ダブルクォート（"）で囲む
カンマ（,）で区切る

例：ペンを追加する場合

CUSTOM_CLASSES = [
    "pen", ""person wearing red shirt", "person wearing blue shirt", "white car", "black car"

注意事項：

英語で記述する（"pen"は正しい、"ペン"は間違い）
複数の単語の場合はスペースで区切る（"cell phone"のように）
最後の要素にはカンマを付けない

探求アイデア

日常物体の検出限界探索：身の回りの物体をカメラに映し、検出範囲を確認する
複数物体の同時検出：複数の物体を同時にフレーム内に配置し、検出性能の変化を観察する
照明条件の影響調査：明るさや影の条件を変えて、検出精度への影響を調べる
オープンボキャブラリ機能の検証：従来のYOLOでは検出できない物体を用意し、YOLOEとの違いを確認する
リアルタイム性能の評価：動きの速い物体や複雑な背景での検出性能を評価し、実用性を検討する