EfficientNetV2による画像分類（入力：動画像）（ImageNet 1000クラス）（ソースコードと説明と利用ガイド）

【概要】EfficientNetV2-Sを用いたリアルタイム画像分類システム。ImageNet 1000クラスに対応し、動画ファイル・ウェブカメラ・サンプル動画から物体を認識する。Progressive Learning技術により学習。上位5位の分類結果表示、処理結果の自動保存機能を搭載。

【関連説明】 [PDF], [パワーポイント]

ツール利用ガイド

1. このプログラムの利用シーン

動画や静止画から物体を自動認識し、何が写っているかを分析するためのソフトウェアである。監視カメラの映像解析、画像データベースの自動分類、教育用の物体認識デモンストレーションなどに活用できる。

2. 主な機能

リアルタイム画像分類: 動画の各フレームを即座に解析し、ImageNet 1000種類のカテゴリから最も可能性の高い物体を特定する。
上位5位表示: 分類結果を信頼度順に表示し、複数の候補を確認できる。
多様な入力対応: 動画ファイル、ウェブカメラ、サンプル動画の3つの入力方式に対応する。
結果の保存: 全ての分析結果をテキストファイルに自動保存し、後から確認できる。

3. 基本的な使い方

起動と入力の選択:
プログラムを実行し、キーボードで 0（動画ファイル）、1（ウェブカメラ）、2（サンプル動画）のいずれかを入力する。
分析の開始:
映像ウィンドウが表示され、リアルタイムで物体認識が実行される。画面上に認識結果が日本語で表示される。
終了方法:
映像画面でキーボードのqキーを押すとプログラムが終了し、結果がresult.txtファイルに保存される。

4. 便利な機能

自動統計: 処理したフレーム数や検出されたクラスの種類を自動集計する。
GPU加速: 対応するGPUが搭載されている場合、自動的に高速処理モードに切り替わる。
タイムスタンプ: カメラ映像の場合、各認識結果に時刻情報が付与される。
信頼度表示: 各認識結果に確信度が数値で表示され、判定の妥当性を確認できる。

画像分類とは

画像分類（Image Classification）は、入力画像全体を見て、その画像が何のクラス（カテゴリ）に属するかを判定する技術である。画像内の物体の位置は特定せず、画像全体に対して最も適切なクラスラベルを1つ（または複数）予測する。

他の手法との主な違い：

画像分類：画像全体のクラスのみを予測、位置情報なし
物体検出：物体の位置（バウンディングボックス）とクラスを同時に予測
セマンティックセグメンテーション：画像の各ピクセルにクラスラベルを割り当て

従来の画像分類手法との比較：ResNet（残差接続による深層学習）は高精度だが計算コストが高い、YOLO11-cls（物体検出技術応用）は高速処理が特徴、EfficientNetV2は効率的スケーリングとFused-MBConvによる最適なバランスを実現している。

適用分野：画像の内容判定、カテゴリ分け、品質判定等で有効である。

使用する学習済みモデル

EfficientNetV2事前学習済みモデル：

学習データセット：ImageNet（1000クラス、約120万枚の訓練画像、5万枚の検証画像）
分類可能クラス：動物約400クラス、植物約100クラス、人工物約500クラス
出力形式：クラス確率分布（1000次元ベクトル）
入力解像度：224×224ピクセル（デフォルト）
モデルサイズ：Small版（軽量）からExtra Large版（高精度）まで5種類
性能：EfficientNetV2-Sで約75-80%のImageNet Top-1精度（YOLO11n-clsの70.0%を上回る）
データセット制約：主に欧米の画像で学習されており、日本特有の物体では分類精度が低下する場合がある

Python開発環境，ライブラリ類

ここでは、最低限の事前準備について説明する。機械学習や深層学習を行う場合は、NVIDIA CUDA、Visual Studio、Cursorなどを追加でインストールすると便利である。これらについては別ページ https://www.kkaneko.jp/cc/dev/aiassist.htmlで詳しく解説しているので、必要に応じて参照してください。

Python 3.12 のインストール（Windows 上） [クリックして展開]

以下のいずれかの方法で Python 3.12 をインストールする。Python がインストール済みの場合、この手順は不要である。

方法1：winget によるインストール

管理者権限のコマンドプロンプトで以下を実行する。管理者権限のコマンドプロンプトを起動するには、Windows キーまたはスタートメニューから「cmd」と入力し、表示された「コマンドプロンプト」を右クリックして「管理者として実行」を選択する。

winget install --scope machine --id Python.Python.3.12 -e --silent --disable-interactivity --force --accept-source-agreements --accept-package-agreements --override "/quiet InstallAllUsers=1 PrependPath=1 Include_pip=1 Include_test=0 Include_launcher=1 InstallLauncherAllUsers=1"

--scope machine を指定することで、システム全体（全ユーザー向け）にインストールされる。このオプションの実行には管理者権限が必要である。インストール完了後、コマンドプロンプトを再起動すると PATH が自動的に設定される。

方法2：インストーラーによるインストール

Python 公式サイト（https://www.python.org/downloads/）にアクセスし、「Download Python 3.x.x」ボタンから Windows 用インストーラーをダウンロードする。
ダウンロードしたインストーラーを実行する。
初期画面の下部に表示される「Add python.exe to PATH」に必ずチェックを入れてから「Customize installation」を選択する。このチェックを入れ忘れると、コマンドプロンプトから python コマンドを実行できない。
「Install Python 3.xx for all users」にチェックを入れ、「Install」をクリックする。

インストールの確認

コマンドプロンプトで以下を実行する。

python --version

バージョン番号（例：Python 3.12.x）が表示されればインストール成功である。「'python' は、内部コマンドまたは外部コマンドとして認識されていません。」と表示される場合は、インストールが正常に完了していない。

AIエディタ Windsurf のインストール（Windows 上） [クリックして展開]

Pythonプログラムの編集・実行には、AIエディタの利用を推奨する。ここでは、Windsurfのインストールを説明する。Windsurf がインストール済みの場合、この手順は不要である。

winget install --scope machine --id Codeium.Windsurf -e --silent --disable-interactivity --force --accept-source-agreements --accept-package-agreements --custom "/SP- /SUPPRESSMSGBOXES /NORESTART /CLOSEAPPLICATIONS /DIR=""C:\Program Files\Windsurf"" /MERGETASKS=!runcode,addtopath,associatewithfiles,!desktopicon"
powershell -Command "$env:Path=[System.Environment]::GetEnvironmentVariable('Path','Machine')+';'+[System.Environment]::GetEnvironmentVariable('Path','User'); windsurf --install-extension MS-CEINTL.vscode-language-pack-ja --force; windsurf --install-extension ms-python.python --force; windsurf --install-extension Codeium.windsurfPyright --force"

【関連する外部ページ】

Windsurf の公式ページ: https://windsurf.com/

必要なライブラリのインストール

管理者権限でコマンドプロンプトを起動し、以下のコマンドを実行する：

REM PyTorch をインストール（GPU対応版）
set "CUDA_TAG=cu126"
set "PYTHON_PATH=C:\Program Files\Python312"
"%PYTHON_PATH%\Scripts\pip" install -U torch torchvision torchaudio --index-url https://download.pytorch.org/whl/%CUDA_TAG%
pip install timm opencv-python pillow

EfficientNetV2による画像分類プログラム（ImageNet 1000クラス）

概要

このプログラムは、EfficientNetV2モデルを使用してImageNet 1000クラスの画像分類をリアルタイムで実行する。動画ファイル、カメラ映像、サンプル動画から各フレームを抽出し、上位5位までの分類結果を日本語で表示する。

主要技術

EfficientNetV2

Mingxing TanとQuoc V. Leが2021年に開発した畳み込みニューラルネットワーク[1]。Progressive Learning技術により、学習時の画像サイズを段階的に増加させつつ正則化を適応的に調整することで、従来モデルと比較して5〜11倍の学習速度向上を実現する。

timm（PyTorch Image Models）

Ross Wightmanが開発したPyTorch用コンピュータビジョンライブラリ[2]。300以上の事前学習済みモデル、最適化器、データ変換機能を提供し、ImageNetでの学習結果を再現できる機能を備える。

技術的特徴

Progressive Learning
学習過程で画像サイズを段階的に拡大し、同時にデータ拡張やドロップアウトなどの正則化パラメータを適応的に調整する手法である。
timm標準データ変換
ImageNet互換の正規化とリサイズ処理により、事前学習済みモデルとの整合性を確保する。
Top-K分類結果出力
ソフトマックス確率に基づいて上位5つの分類結果を抽出し、信頼度と共に表示する。

実装の特色

以下の機能を実装している：

3つの入力ソース選択（動画ファイル、カメラ、サンプル動画）
GPU/CPU自動選択による処理最適化
日本語フォントによる分類結果の描画
フレーム単位での処理結果表示と統計情報の記録
処理結果のテキストファイル保存機能

参考文献

[1] Tan, M., & Le, Q. V. (2021). EfficientNetV2: Smaller models and faster training. In Proceedings of the 38th International Conference on Machine Learning (pp. 10096-10106). PMLR. https://proceedings.mlr.press/v139/tan21a.html

[2] Wightman, R. (2019). PyTorch Image Models. GitHub repository. https://github.com/rwightman/pytorch-image-models

ソースコード

# EfficientNetV2による画像分類プログラム（ImageNet 1000クラス）
# 特徴技術名: EfficientNetV2
# 出典: Tan, M., & Le, Q. V. (2021). EfficientNetV2: Smaller models and faster training.
#       In International Conference on Machine Learning (pp. 10096-10106). PMLR.
# 特徴機能: Progressive Learning（段階的学習）- 学習時の画像サイズを段階的に増加させつつ
#           正則化を適応的に調整することで、学習速度と精度を向上
# 学習済みモデル: efficientnetv2_rw_s.ra2_in1k（timm実装版、ImageNet-1k事前学習済み、約20Mパラメータ）
#                URL: https://huggingface.co/timm/efficientnetv2_rw_s.ra2_in1k
# 特徴技術および学習済モデルの利用制限:
#   - timmライブラリ: Apache 2.0ライセンス（商用利用可能）
#   - EfficientNetV2アーキテクチャ: Apache 2.0ライセンス（商用利用可能）
#   - 学習済みモデルの重み: ImageNet-1kで学習されており、ImageNetライセンスが適用される
#     ImageNetは非商用研究目的のみでリリースされているため、商用利用する場合は法的助言が必要
#   - 必ず利用者自身で利用制限を確認すること
# 方式設計
#   - 関連利用技術:
#     * timm（PyTorch Image Models）: 学習済みモデル提供
#     * OpenCV: 動画・カメラ入力とリアルタイム表示
#     * PIL/Pillow: 画像前処理と日本語フォント描画
#     * tkinter: ファイル選択UI
#   - 入力と出力:
#     入力: 動画（ユーザは「0:動画ファイル、1:カメラ、2:サンプル動画」のメニューで選択。0:動画ファイルの場合はtkinterでファイル選択。1の場合はOpenCVでカメラが開く。2の場合はhttps://raw.githubusercontent.com/opencv/opencv/master/samples/data/vtest.aviを使用）
#     出力: OpenCV画面でリアルタイム表示、各フレームごとにprint()による分類結果表示、プログラム終了時result.txtファイル保存
#   - 処理手順:
#     1. 動画入力の取得・前処理（RGB変換、timm標準変換）
#     2. EfficientNetV2モデルによる推論実行
#     3. Top-5分類結果の算出・日本語表示
#     4. リアルタイム画面描画・結果保存
#   - 前処理: timm標準データ変換（正規化、リサイズ）によるImageNet互換形式変換
#   - 後処理: ソフトマックス確率変換、Top-k選択、日本語フォント描画
#   - 追加処理: フレームバッファクリア（最新フレーム取得）、日本語結果表示（PIL/OpenCV併用）
#   - 調整を必要とする設定値: MODEL_NAME（学習済みモデル選択）、FONT_SIZE（表示サイズ）
# 将来方策: プログラム内でのモデル性能比較機能（複数EfficientNetV2モデルの精度・速度測定）
# その他の重要事項: Windows環境対応、DirectShowバックエンド使用（Windows環境時）
# 前準備: pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
#         pip install timm opencv-python pillow

import cv2
import numpy as np
import torch
import timm
import torch.nn.functional as F
import tkinter as tk
from tkinter import filedialog
from PIL import Image, ImageDraw, ImageFont
import urllib.request
import time
import sys
import io
from datetime import datetime
import threading

# Windows文字エンコーディング設定
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8', line_buffering=True)

# GPU/CPU自動選択
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'デバイス: {str(device)}')

# GPU使用時の最適化
if device.type == 'cuda':
    torch.backends.cudnn.benchmark = True

# 定数定義
MODEL_NAME = 'efficientnetv2_rw_s.ra2_in1k'
FONT_PATH = 'C:/Windows/Fonts/meiryo.ttc'
FONT_SIZE_MAIN = 18
TOP_K = 5
WINDOW_NAME = "ImageNet Classification"

# ImageNet クラス名リスト
IMAGENET_CLASSES = [
    'tench', 'goldfish', 'great white shark', 'tiger shark', 'hammerhead', 'electric ray', 'stingray', 'cock', 'hen', 'ostrich',
    'brambling', 'goldfinch', 'house finch', 'junco', 'indigo bunting', 'robin', 'bulbul', 'jay', 'magpie', 'chickadee',
    'water ouzel', 'kite', 'bald eagle', 'vulture', 'great grey owl', 'European fire salamander', 'common newt', 'eft', 'spotted salamander', 'axolotl',
    'bullfrog', 'tree frog', 'tailed frog', 'loggerhead', 'leatherback turtle', 'mud turtle', 'terrapin', 'box turtle', 'banded gecko', 'common iguana',
    'American chameleon', 'whiptail', 'agama', 'frilled lizard', 'alligator lizard', 'Gila monster', 'green lizard', 'African chameleon', 'Komodo dragon', 'African crocodile',
    'American alligator', 'triceratops', 'thunder snake', 'ringneck snake', 'hognose snake', 'green snake', 'king snake', 'garter snake', 'water snake', 'vine snake',
    'night snake', 'boa constrictor', 'rock python', 'Indian cobra', 'green mamba', 'sea snake', 'horned viper', 'diamondback', 'sidewinder', 'trilobite',
    'harvestman', 'scorpion', 'black and gold garden spider', 'barn spider', 'garden spider', 'black widow', 'tarantula', 'wolf spider', 'tick', 'centipede',
    'black grouse', 'ptarmigan', 'ruffed grouse', 'prairie chicken', 'peacock', 'quail', 'partridge', 'African grey', 'macaw', 'sulphur-crested cockatoo',
    'lorikeet', 'coucal', 'bee eater', 'hornbill', 'hummingbird', 'jacamar', 'toucan', 'drake', 'red-breasted merganser', 'goose',
    'black swan', 'tusker', 'echidna', 'platypus', 'wallaby', 'koala', 'wombat', 'jellyfish', 'sea anemone', 'brain coral',
    'flatworm', 'nematode', 'conch', 'snail', 'slug', 'sea slug', 'chiton', 'chambered nautilus', 'Dungeness crab', 'rock crab',
    'fiddler crab', 'king crab', 'American lobster', 'spiny lobster', 'crayfish', 'hermit crab', 'isopod', 'white stork', 'black stork', 'spoonbill',
    'flamingo', 'little blue heron', 'American egret', 'bittern', 'crane', 'limpkin', 'European gallinule', 'American coot', 'bustard', 'ruddy turnstone',
    'red-backed sandpiper', 'redshank', 'dowitcher', 'oystercatcher', 'pelican', 'king penguin', 'albatross', 'grey whale', 'killer whale', 'dugong',
    'sea lion', 'Chihuahua', 'Japanese spaniel', 'Maltese dog', 'Pekinese', 'Shih-Tzu', 'Blenheim spaniel', 'papillon', 'toy terrier', 'Rhodesian ridgeback',
    'Afghan hound', 'basset', 'beagle', 'bloodhound', 'bluetick', 'black-and-tan coonhound', 'Walker hound', 'English foxhound', 'redbone', 'borzoi',
    'Irish wolfhound', 'Italian greyhound', 'whippet', 'Ibizan hound', 'Norwegian elkhound', 'otterhound', 'Saluki', 'Scottish deerhound', 'Weimaraner', 'Staffordshire bullterrier',
    'American Staffordshire terrier', 'Bedlington terrier', 'Border terrier', 'Kerry blue terrier', 'Irish terrier', 'Norfolk terrier', 'Norwich terrier', 'Yorkshire terrier', 'wire-haired fox terrier', 'Lakeland terrier',
    'Sealyham terrier', 'Airedale', 'cairn', 'Australian terrier', 'Dandie Dinmont', 'Boston bull', 'miniature schnauzer', 'giant schnauzer', 'standard schnauzer', 'Scotch terrier',
    'Tibetan terrier', 'silky terrier', 'soft-coated wheaten terrier', 'West Highland white terrier', 'Lhasa', 'flat-coated retriever', 'curly-coated retriever', 'golden retriever', 'Labrador retriever', 'Chesapeake Bay retriever',
    'German short-haired pointer', 'vizsla', 'English setter', 'Irish setter', 'Gordon setter', 'Brittany spaniel', 'clumber', 'English springer', 'Welsh springer spaniel', 'cocker spaniel',
    'Sussex spaniel', 'Irish water spaniel', 'kuvasz', 'schipperke', 'groenendael', 'malinois', 'briard', 'kelpie', 'komondor', 'Old English sheepdog',
    'Shetland sheepdog', 'collie', 'Border collie', 'Bouvier des Flandres', 'Rottweiler', 'German shepherd', 'Doberman', 'miniature pinscher', 'Greater Swiss Mountain dog', 'Bernese mountain dog',
    'Appenzeller', 'EntleBucher', 'boxer', 'bull mastiff', 'Tibetan mastiff', 'French bulldog', 'Great Dane', 'Saint Bernard', 'Eskimo dog', 'malamute',
    'Siberian husky', 'dalmatian', 'affenpinscher', 'basenji', 'pug', 'Leonberg', 'Newfoundland', 'Great Pyrenees', 'Samoyed', 'Pomeranian',
    'chow', 'keeshond', 'Brabancon griffon', 'Pembroke', 'Cardigan', 'toy poodle', 'miniature poodle', 'standard poodle', 'Mexican hairless', 'timber wolf',
    'white wolf', 'red wolf', 'coyote', 'dingo', 'dhole', 'African hunting dog', 'hyena', 'red fox', 'kit fox', 'Arctic fox',
    'grey fox', 'tabby', 'tiger cat', 'Persian cat', 'Siamese cat', 'Egyptian cat', 'cougar', 'lynx', 'leopard', 'snow leopard',
    'jaguar', 'lion', 'tiger', 'cheetah', 'brown bear', 'American black bear', 'ice bear', 'sloth bear', 'mongoose', 'meerkat',
    'tiger beetle', 'ladybug', 'ground beetle', 'long-horned beetle', 'leaf beetle', 'dung beetle', 'rhinoceros beetle', 'weevil', 'fly', 'bee',
    'ant', 'grasshopper', 'cricket', 'walking stick', 'cockroach', 'mantis', 'cicada', 'leafhopper', 'lacewing', 'dragonfly',
    'damselfly', 'admiral', 'ringlet', 'monarch', 'cabbage butterfly', 'sulphur butterfly', 'lycaenid', 'starfish', 'sea urchin', 'sea cucumber',
    'wood rabbit', 'hare', 'Angora', 'hamster', 'porcupine', 'fox squirrel', 'marmot', 'beaver', 'guinea pig', 'sorrel',
    'zebra', 'hog', 'wild boar', 'warthog', 'hippopotamus', 'ox', 'water buffalo', 'bison', 'ram', 'bighorn',
    'ibex', 'hartebeest', 'impala', 'gazelle', 'Arabian camel', 'llama', 'weasel', 'mink', 'polecat', 'black-footed ferret',
    'otter', 'skunk', 'badger', 'armadillo', 'three-toed sloth', 'orangutan', 'gorilla', 'chimpanzee', 'gibbon', 'siamang',
    'guenon', 'patas', 'baboon', 'macaque', 'langur', 'colobus', 'proboscis monkey', 'marmoset', 'capuchin', 'howler monkey',
    'titi', 'spider monkey', 'squirrel monkey', 'Madagascar cat', 'indri', 'Indian elephant', 'African elephant', 'lesser panda', 'giant panda', 'barracouta',
    'eel', 'coho', 'rock beauty', 'anemone fish', 'sturgeon', 'gar', 'lionfish', 'puffer', 'abacus', 'abaya',
    'academic gown', 'accordion', 'acoustic guitar', 'aircraft carrier', 'airliner', 'airship', 'altar', 'ambulance', 'amphibian', 'analog clock',
    'apiary', 'apron', 'ashcan', 'assault rifle', 'backpack', 'bakery', 'balance beam', 'balloon', 'ballpoint', 'Band Aid',
    'banjo', 'bannister', 'barbell', 'barber chair', 'barbershop', 'barn', 'barometer', 'barrel', 'barrow', 'baseball',
    'basketball', 'bassinet', 'bassoon', 'bathing cap', 'bath towel', 'bathtub', 'beach wagon', 'beacon', 'beaker', 'bearskin',
    'beer bottle', 'beer glass', 'bell cote', 'bib', 'bicycle-built-for-two', 'bikini', 'binder', 'binoculars', 'birdhouse', 'boathouse',
    'bobsled', 'bolo tie', 'bonnet', 'bookcase', 'bookshop', 'bottlecap', 'bow', 'bow tie', 'brass', 'brassiere',
    'breakwater', 'breastplate', 'broom', 'bucket', 'buckle', 'bulletproof vest', 'bullet train', 'butcher shop', 'cab', 'caldron',
    'candle', 'cannon', 'canoe', 'can opener', 'cardigan', 'car mirror', 'carousel', 'carpenter\'s kit', 'carton', 'car wheel',
    'cash machine', 'cassette', 'cassette player', 'castle', 'catamaran', 'CD player', 'cello', 'cellular telephone', 'chain', 'chainlink fence',
    'chain mail', 'chain saw', 'chest', 'chiffonier', 'chime', 'china cabinet', 'Christmas stocking', 'church', 'cinema', 'cleaver',
    'cliff dwelling', 'cloak', 'clog', 'cocktail shaker', 'coffee mug', 'coffeepot', 'coil', 'combination lock', 'computer keyboard', 'confectionery',
    'container ship', 'convertible', 'corkscrew', 'cornet', 'cowboy boot', 'cowboy hat', 'cradle', 'crane', 'crash helmet', 'crate',
    'crib', 'Crock Pot', 'croquet ball', 'crutch', 'cuirass', 'dam', 'desk', 'desktop computer', 'dial telephone', 'diaper',
    'digital clock', 'digital watch', 'dining table', 'dishrag', 'dishwasher', 'disk brake', 'dock', 'dogsled', 'dome', 'doormat',
    'drilling platform', 'drum', 'drumstick', 'dumbbell', 'Dutch oven', 'electric fan', 'electric guitar', 'electric locomotive', 'entertainment center', 'envelope',
    'espresso maker', 'face powder', 'feather boa', 'file', 'fireboat', 'fire engine', 'fire screen', 'flagpole', 'flute', 'folding chair',
    'football helmet', 'forklift', 'fountain', 'fountain pen', 'four-poster', 'freight car', 'French horn', 'frying pan', 'fur coat', 'garbage truck',
    'gasmask', 'gas pump', 'goblet', 'go-kart', 'golf ball', 'golfcart', 'gondola', 'gong', 'gown', 'grand piano',
    'greenhouse', 'grille', 'grocery store', 'guillotine', 'hair slide', 'hair spray', 'half track', 'hammer', 'hamper', 'hand blower',
    'hand-held computer', 'handkerchief', 'hard disc', 'harmonica', 'harp', 'harvester', 'hatchet', 'holster', 'home theater', 'honeycomb',
    'hook', 'hoopskirt', 'horizontal bar', 'horse cart', 'hourglass', 'iPod', 'iron', 'jack-o\'-lantern', 'jean', 'jeep',
    'jersey', 'jigsaw puzzle', 'jinrikisha', 'joystick', 'kimono', 'knee pad', 'knot', 'lab coat', 'ladle', 'lampshade',
    'laptop', 'lawn mower', 'lens cap', 'letter opener', 'library', 'lifeboat', 'lighter', 'limousine', 'liner', 'lipstick',
    'Loafer', 'lotion', 'loudspeaker', 'loupe', 'lumbermill', 'magnetic compass', 'mailbag', 'mailbox', 'maillot', 'maillot (tank suit)',
    'manhole cover', 'maraca', 'marimba', 'mask', 'matchstick', 'maypole', 'maze', 'measuring cup', 'medicine chest', 'megalith',
    'microphone', 'microwave', 'military uniform', 'milk can', 'minibus', 'miniskirt', 'minivan', 'missile', 'mitten', 'mixing bowl',
    'mobile home', 'Model T', 'modem', 'monastery', 'monitor', 'moped', 'mortar', 'mortarboard', 'mosque', 'mosquito net',
    'motor scooter', 'mountain bike', 'mountain tent', 'mouse', 'mousetrap', 'moving van', 'muzzle', 'nail', 'neck brace', 'necklace',
    'nipple', 'notebook', 'obelisk', 'oboe', 'ocarina', 'odometer', 'oil filter', 'organ', 'oscilloscope', 'overskirt',
    'oxcart', 'oxygen mask', 'packet', 'paddle', 'paddlewheel', 'padlock', 'paintbrush', 'pajama', 'palace', 'panpipe',
    'paper towel', 'parachute', 'parallel bars', 'park bench', 'parking meter', 'passenger car', 'patio', 'pay-phone', 'pedestal', 'pencil box',
    'pencil sharpener', 'perfume', 'Petri dish', 'photocopier', 'pick', 'pickelhaube', 'picket fence', 'pickup', 'pier', 'piggy bank',
    'pill bottle', 'pillow', 'ping-pong ball', 'pinwheel', 'pirate', 'pitcher', 'plane', 'planetarium', 'plastic bag', 'plate rack',
    'plow', 'plunger', 'Polaroid camera', 'pole', 'police van', 'poncho', 'pool table', 'pop bottle', 'pot', 'potter\'s wheel',
    'power drill', 'prayer rug', 'printer', 'prison', 'projectile', 'projector', 'puck', 'punching bag', 'purse', 'quill',
    'quilt', 'racer', 'racket', 'radiator', 'radio', 'radio telescope', 'rain barrel', 'recreational vehicle', 'reel', 'reflex camera',
    'refrigerator', 'remote control', 'restaurant', 'revolver', 'rifle', 'rocking chair', 'rotisserie', 'rubber eraser', 'rugby ball', 'rule',
    'running shoe', 'safe', 'safety pin', 'saltshaker', 'sandal', 'sarong', 'sax', 'scabbard', 'scale', 'school bus',
    'schooner', 'scoreboard', 'screen', 'screw', 'screwdriver', 'seat belt', 'sewing machine', 'shield', 'shoe shop', 'shoji',
    'shopping basket', 'shopping cart', 'shovel', 'shower cap', 'shower curtain', 'ski', 'ski mask', 'sleeping bag', 'slide rule', 'sliding door',
    'slot', 'snorkel', 'snowmobile', 'snowplow', 'soap dispenser', 'soccer ball', 'sock', 'solar dish', 'sombrero', 'soup bowl',
    'space bar', 'space heater', 'space shuttle', 'spatula', 'speedboat', 'spider web', 'spindle', 'sports car', 'spotlight', 'stage',
    'steam locomotive', 'steel arch bridge', 'steel drum', 'stethoscope', 'stole', 'stone wall', 'stopwatch', 'stove', 'strainer', 'streetcar',
    'stretcher', 'studio couch', 'stupa', 'submarine', 'suit', 'sundial', 'sunglass', 'sunglasses', 'sunscreen', 'suspension bridge',
    'swab', 'sweatshirt', 'swimming trunks', 'swing', 'switch', 'syringe', 'table lamp', 'tank', 'tape player', 'teapot',
    'teddy', 'television', 'tennis ball', 'thatch', 'theater curtain', 'thimble', 'thresher', 'throne', 'tile roof', 'toaster',
    'tobacco shop', 'toilet seat', 'torch', 'totem pole', 'tow truck', 'toyshop', 'tractor', 'trailer truck', 'tray', 'trench coat',
    'tricycle', 'trimaran', 'tripod', 'triumphal arch', 'trolleybus', 'trombone', 'tub', 'turnstile', 'typewriter keyboard', 'umbrella',
    'unicycle', 'upright', 'vacuum', 'vase', 'vault', 'velvet', 'vending machine', 'vestment', 'viaduct', 'violin',
    'volleyball', 'waffle iron', 'wall clock', 'wallet', 'wardrobe', 'warplane', 'washbasin', 'washer', 'water bottle', 'water jug',
    'water tower', 'whiskey jug', 'whistle', 'wig', 'window screen', 'window shade', 'Windsor tie', 'wine bottle', 'wing', 'wok',
    'wooden spoon', 'wool', 'worm fence', 'wreck', 'yawl', 'yurt', 'web site', 'comic book', 'crossword puzzle', 'street sign',
    'traffic light', 'book jacket', 'menu', 'plate', 'guacamole', 'consomme', 'hot pot', 'trifle', 'ice cream', 'ice lolly',
    'French loaf', 'bagel', 'pretzel', 'cheeseburger', 'hotdog', 'mashed potato', 'head cabbage', 'broccoli', 'cauliflower', 'zucchini',
    'spaghetti squash', 'acorn squash', 'butternut squash', 'cucumber', 'artichoke', 'bell pepper', 'cardoon', 'mushroom', 'Granny Smith', 'strawberry',
    'orange', 'lemon', 'fig', 'pineapple', 'banana', 'jackfruit', 'custard apple', 'pomegranate', 'hay', 'carbonara',
    'chocolate sauce', 'dough', 'meat loaf', 'pizza', 'potpie', 'burrito', 'red wine', 'espresso', 'cup', 'eggnog',
    'alp', 'bubble', 'cliff', 'coral reef', 'geyser', 'lakeside', 'promontory', 'sandbar', 'seashore', 'valley',
    'volcano', 'ballplayer', 'groom', 'scuba diver', 'rapeseed', 'daisy', 'yellow lady\'s slipper', 'corn', 'acorn', 'hip',
    'buckeye', 'coral fungus', 'agaric', 'gyromitra', 'stinkhorn', 'earthstar', 'hen-of-the-woods', 'bolete', 'ear', 'toilet tissue'
]

# 日本語フォント設定
font_main = ImageFont.truetype(FONT_PATH, FONT_SIZE_MAIN)

# グローバル変数
frame_count = 0
results_log = []
class_counts = {}
model = None
transforms = None


class ThreadedVideoCapture:
    """スレッド化されたVideoCapture（常に最新フレームを取得）"""
    def __init__(self, src, is_camera=False):
        if is_camera:
            self.cap = cv2.VideoCapture(src, cv2.CAP_DSHOW)
            fourcc = cv2.VideoWriter_fourcc('M', 'J', 'P', 'G')
            self.cap.set(cv2.CAP_PROP_FOURCC, fourcc)
            self.cap.set(cv2.CAP_PROP_FPS, 60)
        else:
            self.cap = cv2.VideoCapture(src)

        self.grabbed, self.frame = self.cap.read()
        self.stopped = False
        self.lock = threading.Lock()
        self.thread = threading.Thread(target=self.update, args=())
        self.thread.daemon = True
        self.thread.start()

    def update(self):
        """バックグラウンドでフレームを取得し続ける"""
        while not self.stopped:
            grabbed, frame = self.cap.read()
            with self.lock:
                self.grabbed = grabbed
                if grabbed:
                    self.frame = frame

    def read(self):
        """最新フレームを返す"""
        with self.lock:
            return self.grabbed, self.frame.copy() if self.grabbed else None

    def isOpened(self):
        return self.cap.isOpened()

    def get(self, prop):
        return self.cap.get(prop)

    def release(self):
        self.stopped = True
        self.thread.join()
        self.cap.release()


def display_program_header():
    print('=' * 60)
    print('=== EfficientNetV2画像分類プログラム ===')
    print('=' * 60)
    print('概要: ImageNet 1000クラス分類をリアルタイムで実行')
    print('機能: EfficientNetV2による画像分類（ImageNet 1000クラス）')
    print('技術: timm標準データ変換、Progressive Learning')
    print('操作: qキーで終了')
    print('出力: 各フレームごとに処理結果を表示し、終了時にresult.txtへ保存')
    print()


def draw_texts_with_pillow(bgr_frame, texts):
    """テキスト描画, texts: list of dict with keys {text, org, color, font_type}"""
    img_pil = Image.fromarray(cv2.cvtColor(bgr_frame, cv2.COLOR_BGR2RGB))
    draw = ImageDraw.Draw(img_pil)

    for item in texts:
        text = item['text']
        x, y = item['org']
        color = item['color']
        draw.text((x, y), text, font=font_main, fill=color)

    return cv2.cvtColor(np.array(img_pil), cv2.COLOR_RGB2BGR)


def draw_classification_results(frame, classifications):
    """画像分類の描画処理"""
    texts_to_draw = []
    texts_to_draw.append({
        'text': f'画像分類結果 (上位{TOP_K}位):',
        'org': (10, 30),
        'color': (0, 255, 0),
        'font_type': 'main'
    })

    for i, classification in enumerate(classifications):
        result_text = f'{i+1}位: {classification["name"]} ({classification["conf"]:.3f})'
        texts_to_draw.append({
            'text': result_text,
            'org': (10, 60 + i * 25),
            'color': (255, 255, 255),
            'font_type': 'main'
        })

    frame = draw_texts_with_pillow(frame, texts_to_draw)
    return frame


def format_classification_output(classifications):
    """画像分類の出力フォーマット"""
    if len(classifications) == 0:
        return '分類なし'
    else:
        result = classifications[0]['name'] + f' ({classifications[0]["conf"]:.3f})'
        return result


def classify_image(frame):
    """共通の分類処理（前処理、推論、分類を実行）"""
    global model, transforms

    pil_image = Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
    input_tensor = transforms(pil_image).unsqueeze(0).to(device)

    with torch.no_grad():
        outputs = model(input_tensor)
        probabilities = F.softmax(outputs, dim=1)
        topk_prob, topk_indices = torch.topk(probabilities, TOP_K)
        topk_prob = topk_prob.cpu().numpy()[0]
        topk_indices = topk_indices.cpu().numpy()[0]

    curr_classifications = []
    for i, (class_index, confidence) in enumerate(zip(topk_indices, topk_prob)):
        if class_index < len(IMAGENET_CLASSES):
            class_name = IMAGENET_CLASSES[class_index]
            curr_classifications.append({
                'name': class_name,
                'conf': float(confidence),
                'class': int(class_index)
            })

    return curr_classifications


def process_video_frame(frame, timestamp_ms, is_camera):
    """動画用ラッパー"""
    classifications = classify_image(frame)

    global class_counts
    for classification in classifications:
        name = classification['name']
        if name not in class_counts:
            class_counts[name] = 0
        class_counts[name] += 1

    frame = draw_classification_results(frame, classifications)
    result = format_classification_output(classifications)

    return frame, result


def video_frame_processing(frame, timestamp_ms, is_camera):
    """動画フレーム処理（標準形式）"""
    global frame_count
    current_time = time.time()
    frame_count += 1

    processed_frame, result = process_video_frame(frame, timestamp_ms, is_camera)
    return processed_frame, result, current_time


# プログラムヘッダー表示
display_program_header()

print(f'モデル {MODEL_NAME} をロード中...')
model = timm.create_model(MODEL_NAME, pretrained=True)
model.to(device)
model.eval()

# timm標準のデータ変換設定
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

print('モデルのロード完了')

# 入力選択
print("\n=== EfficientNetV2リアルタイム画像分類（ImageNet 1000クラス） ===")
print("0: 動画ファイル")
print("1: カメラ")
print("2: サンプル動画")

choice = input("選択: ")

is_camera = (choice == '1')

if choice == '0':
    root = tk.Tk()
    root.withdraw()
    path = filedialog.askopenfilename()
    if not path:
        raise SystemExit(1)
    cap = cv2.VideoCapture(path)
elif choice == '1':
    cap = ThreadedVideoCapture(0, is_camera=True)
else:
    SAMPLE_URL = 'https://raw.githubusercontent.com/opencv/opencv/master/samples/data/vtest.avi'
    SAMPLE_FILE = 'vtest.avi'
    print('サンプル動画をダウンロード中...')
    urllib.request.urlretrieve(SAMPLE_URL, SAMPLE_FILE)
    cap = cv2.VideoCapture(SAMPLE_FILE)

if not cap.isOpened():
    print('動画ファイル・カメラを開けませんでした')
    raise SystemExit(1)

# フレームレートの取得とタイムスタンプ増分の計算
if is_camera:
    actual_fps = cap.get(cv2.CAP_PROP_FPS)
    print(f'カメラのfps: {actual_fps}')
    timestamp_increment = int(1000 / actual_fps) if actual_fps > 0 else 33
else:
    video_fps = cap.get(cv2.CAP_PROP_FPS)
    timestamp_increment = int(1000 / video_fps) if video_fps > 0 else 33

# メイン処理
print('\n=== 動画処理開始 ===')
print('操作方法:')
print('  q キー: プログラム終了')

start_time = time.time()
last_info_time = start_time
info_interval = 10.0
timestamp_ms = 0
total_processing_time = 0.0

try:
    while True:
        ret, frame = cap.read()
        if not ret:
            break

        timestamp_ms += timestamp_increment

        processing_start = time.time()
        processed_frame, result, current_time = video_frame_processing(frame, timestamp_ms, is_camera)
        processing_time = time.time() - processing_start
        total_processing_time += processing_time
        cv2.imshow(WINDOW_NAME, processed_frame)

        if result:
            if is_camera:
                timestamp = datetime.fromtimestamp(current_time).strftime("%Y-%m-%d %H:%M:%S.%f")[:-3]
                print(f'{timestamp}, {result}')
            else:
                print(f'Frame {frame_count}: {result}')

            results_log.append(result)

        if is_camera:
            elapsed = current_time - last_info_time
            if elapsed >= info_interval:
                total_elapsed = current_time - start_time
                actual_fps = frame_count / total_elapsed if total_elapsed > 0 else 0
                avg_processing_time = (total_processing_time / frame_count * 1000) if frame_count > 0 else 0
                print(f'[情報] 経過時間: {total_elapsed:.1f}秒, 処理フレーム数: {frame_count}, 実測fps: {actual_fps:.1f}, 平均処理時間: {avg_processing_time:.1f}ms')
                last_info_time = current_time

        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

finally:
    print('\n=== プログラム終了 ===')
    cap.release()
    cv2.destroyAllWindows()

    if results_log:
        with open('result.txt', 'w', encoding='utf-8') as f:
            f.write('=== EfficientNetV2画像分類結果 ===\n')
            f.write(f'処理フレーム数: {frame_count}\n')
            f.write(f'使用モデル: {MODEL_NAME}\n')
            f.write(f'使用デバイス: {str(device).upper()}\n')
            if device.type == 'cuda':
                f.write(f'GPU: {torch.cuda.get_device_name(0)}\n')
            f.write(f'前処理: timm標準データ変換（正規化、リサイズ）\n')
            f.write(f'上位表示数: {TOP_K}\n')
            f.write(f'\n分類されたクラス一覧:\n')
            for class_name, count in sorted(class_counts.items()):
                f.write(f'  {class_name}: {count}回\n')
            f.write('\n')
            if is_camera:
                f.write('形式: タイムスタンプ, クラス名 (信頼度)\n')
            else:
                f.write('形式: フレーム番号, クラス名 (信頼度)\n')
            f.write('\n')
            f.write('\n'.join(results_log))
        print(f'\n処理結果をresult.txtに保存しました')
        print(f'分類されたクラス数: {len(class_counts)}')

実験・探求のアイデア

EfficientNetV2モデル選択実験

プログラム冒頭のMODEL_NAMEを変更することで、異なるEfficientNetV2モデルを比較できる：

efficientnetv2_rw_s：Small版（高速、実用性重視、約20Mパラメータ）
efficientnetv2_rw_m：Medium版（バランス型、約54Mパラメータ）
efficientnetv2_rw_l：Large版（高精度重視、約120Mパラメータ）
efficientnetv2_xl：Extra Large版（最高精度、約208Mパラメータ）

分類精度の検証実験

Fused-MBConvとProgressive Learningの効果を評価する：

画像品質による分類精度：明度、コントラスト、ノイズが分類結果に与える影響測定
複数物体画像での分類：複数の物体が写った画像での主要物体分類能力の評価
角度変化対応：同一物体を様々な角度から撮影した場合の分類一貫性
背景複雑度の影響：背景が複雑な環境での分類精度

リアルタイム応用実験

医療画像診断：X線、CT画像での疾患分類
品質管理：製品の良品・不良品分類
食品分類：料理の種類自動判定
動植物識別：野生動物や植物の種類分類

性能評価実験

効率的スケーリング能力の実験：Compound Scalingによる異なるサイズの画像での分類性能を評価する

信頼度閾値実験：様々な信頼度閾値での分類精度と検出率の変化を測定する

比較実験

従来手法との比較：EfficientNetV2以外の画像分類手法（YOLO11-cls、ResNet、ConvNeXt等）との性能比較

パラメータ効率性の検証：同等パラメータ数での他手法との精度比較