MediaPipe Hands による3次元手指ランドマーク検出

【概要】MediaPipe Handsは、カメラの映像から手の21点の3次元座標を推定する技術である。機械学習モデルにより単一のRGB画像から手の位置を検出し、各指の関節位置を3次元座標として出力する。手の動きが21個の3次元座標点として可視化され、指の関節角度，手のひらの向きなどの姿勢情報をリアルタイムで観察できる。実験を通じて、コンピュータビジョンとジェスチャー認識の基礎を確認できる。Windows環境での実行手順、プログラムコード、実験アイデアを含む。

Python開発環境，ライブラリ類
プログラムコード
使用方法
実験・探求のアイデア

2. Python開発環境，ライブラリ類

ここでは、最低限の事前準備について説明する。機械学習や深層学習を行う場合は、NVIDIA CUDA、Visual Studio、Cursorなどを追加でインストールすると便利である。これらについては別ページ https://www.kkaneko.jp/cc/dev/aiassist.htmlで詳しく解説しているので、必要に応じて参照してください。

Python 3.12 のインストール

インストール済みの場合は実行不要。

管理者権限でコマンドプロンプトを起動（手順：Windowsキーまたはスタートメニュー > cmd と入力 > 右クリック > 「管理者として実行」）し、以下を実行する。管理者権限は、wingetの--scope machineオプションでシステム全体にソフトウェアをインストールするために必要である。

REM Python をシステム領域にインストール
winget install --scope machine --id Python.Python.3.12 -e --silent
REM Python のパス設定
set "PYTHON_PATH=C:\Program Files\Python312"
set "PYTHON_SCRIPTS_PATH=C:\Program Files\Python312\Scripts"
echo "%PATH%" | find /i "%PYTHON_PATH%" >nul
if errorlevel 1 setx PATH "%PATH%;%PYTHON_PATH%" /M >nul
echo "%PATH%" | find /i "%PYTHON_SCRIPTS_PATH%" >nul
if errorlevel 1 setx PATH "%PATH%;%PYTHON_SCRIPTS_PATH%" /M >nul

【関連する外部ページ】

Python の公式ページ: https://www.python.org/

AI エディタ Windsurf のインストール

Pythonプログラムの編集・実行には、AI エディタの利用を推奨する。ここでは，Windsurfのインストールを説明する。

管理者権限でコマンドプロンプトを起動（手順：Windowsキーまたはスタートメニュー > cmd と入力 > 右クリック > 「管理者として実行」）し、以下を実行して、Windsurfをシステム全体にインストールする。管理者権限は、wingetの--scope machineオプションでシステム全体にソフトウェアをインストールするために必要となる。

winget install --scope machine Codeium.Windsurf -e --silent

【関連する外部ページ】

Windsurf の公式ページ: https://windsurf.com/

必要なPythonライブラリのインストール

管理者権限でコマンドプロンプトを起動する（手順：Windowsキーまたはスタートメニュー > cmd と入力 > 右クリック > 「管理者として実行」）。


pip install mediapipe opencv-python numpy pillow

3. プログラムコード

用語集

手指のランドマーク: 手指の関節位置を3次元座標（x,y,z）で表現した特徴点
MCP関節: 指の付け根の関節（中手指節関節）
PIP関節: 指の第1関節（近位指節間関節）
DIP関節: 指の第2関節（遠位指節間関節）
掌法線ベクトル: 手のひら面に垂直な方向ベクトルで手の向きを表現指
接触判定: 指と他の指の距離から接触状態を判定

主要技術

主要技術：MediaPipe Hands

技術的仕組み：MediaPipe Handsは機械学習モデルを使用して、単一のRGB画像から手の位置を検出し、21個の3次元ランドマークを推定する。このモデルは、2段階のパイプラインで構成される：手の検出段階と、検出された手領域から21点の座標を推定する段階。深度情報は、学習データから獲得した手の形状に関する事前知識を用いて、2次元画像から推定される。モバイルデバイスでのリアルタイム動作を実現するため、モデルアーキテクチャと推論処理が最適化されている。

このプログラムでの3次元座標系：

x軸：画面の右方向が正（0から1の範囲で正規化）
y軸：画面の下方向が正（0から1の範囲で正規化）
z軸：手首を基準とした相対的な奥行き（約-0.1から0.1の範囲、カメラに近いほど負の値）

このプログラムの調整可能ポイント：

静的/動的モード: static_image_modeで単一画像か連続フレーム処理かを切替
信頼度閾値: min_detection_confidenceで手検出の精度を調整（0.0-1.0）
モデル複雑度: model_complexityで軽量版（0）と高精度版（1）を選択可能

出典

Zhang, F., Bazarevsky, V., Vakunov, A., Tkachenka, A., Sung, G., Chang, C. L., & Grundmann, M. (2020). MediaPipe Hands: On-device Real-time Hand Tracking. arXiv preprint arXiv:2006.10214.

ソースコード


# プログラム名: MediaPipe 3D手指ランドマーク検出と指接触判定
# 特徴技術名: MediaPipe Hands
# 出典: F. Zhang et al., "MediaPipe Hands: On-device Real-time Hand Tracking," arXiv preprint arXiv:2006.10214, 2020.
# 特徴機能: 21点3D手指ランドマークのリアルタイム検出。手のひら検出モデルと手指ランドマーク検出モデルの二段階パイプラインにより、単一のRGBカメラから手指の21個の関節位置を3次元座標（x, y, z）として推定。z座標は手首を基準とした相対的な深度情報を提供する。指接触判定により日本語指文字の認識支援を行う。
# 学習済みモデル: MediaPipeモデルバンドル（手のひら検出モデルと手指ランドマーク検出モデルを含む）。約30K枚の実画像と合成手モデルで訓練。model_complexity=0（軽量版）とmodel_complexity=1（標準版）が利用可能。MediaPipeライブラリに内蔵されており、自動的に読み込まれる。
# 方式設計:
#   - 関連利用技術: OpenCV（カメラ入力・画像表示）、NumPy（ベクトル演算・角度計算）、Pillow（日本語テキスト描画）
#   - 入力と出力: 入力: 動画（ユーザは「0:動画ファイル，1:カメラ，2:サンプル動画」のメニューで選択．0:動画ファイルの場合はtkinterでファイル選択．1の場合はOpenCVでカメラが開く．2の場合はhttps://raw.githubusercontent.com/opencv/opencv/master/samples/data/vtest.aviを使用）、出力: OpenCV画面でリアルタイム表示（検出された手指の3Dランドマークと関連情報）、各フレームごとにprint()で処理結果を表示、プログラム終了時にprint()で表示した処理結果をresult.txtファイルに保存
#   - 処理手順: 1.カメラから画像取得、2.MediaPipe Handsで手指検出、3.21点の3D座標抽出、4.関節角度・手の向き・掌法線ベクトル計算、5.結果を画面に描画
#   - 前処理、後処理: 前処理: BGR→RGB変換（MediaPipeの入力要件）、後処理: 時系列フィルタリング（過去3フレームの移動平均によるランドマーク位置の安定化）
#   - 追加処理: z座標の正規化（10倍スケーリング）により3次元ベクトル演算の精度向上、関節角度計算による手指姿勢の定量化、掌法線ベクトル計算による手の向き推定、指接触判定による日本語指文字認識支援
#   - 調整を必要とする設定値: HAND_CONFIDENCE（手検出の信頼度閾値、デフォルト0.5）、TRACKING_CONFIDENCE（追跡の信頼度閾値、デフォルト0.5）、MAX_NUM_HANDS（検出する手の最大数、デフォルト2）
# 将来方策: HAND_CONFIDENCEとTRACKING_CONFIDENCEの最適値を自動調整するため、検出成功率を監視し、一定時間ごとに閾値を動的に調整する機能の実装が可能
# その他の重要事項: Windows環境専用（DirectShowバックエンド使用）、日本語フォントはメイリオ（meiryo.ttc）を使用
# 前準備: pip install mediapipe opencv-python numpy pillow

import cv2
import numpy as np
import mediapipe as mp
import math
from PIL import Image, ImageDraw, ImageFont
import tkinter as tk
from tkinter import filedialog
import time
import urllib.request
import collections
from datetime import datetime

# 定数定義
HAND_CONFIDENCE = 0.5           # 手検出の信頼度閾値（MediaPipe推奨値）
TRACKING_CONFIDENCE = 0.5       # 追跡の信頼度閾値（MediaPipe推奨値）
MAX_NUM_HANDS = 2              # 検出する手の最大数

# 指文字認識用閾値定数
BASE_CONTACT_THRESHOLD = 0.05  # 基準接触判定距離（手のスケールで調整）
ANGLE_THRESHOLD = 120          # 関節角度判定の基準角度
DEPTH_SCALING = 10             # z座標の正規化スケーリング

# フォントサイズ定数
FONT_LARGE = 30
FONT_MEDIUM = 20
FONT_SMALL = 16
FONT_TINY = 12

# 履歴管理（時系列フィルタリング用）
HISTORY_SIZE = 3

# 浮動小数点比較の許容誤差
FLOAT_TOLERANCE = 1e-6

# 色定義（視覚的フィードバック用：指ごとに異なる色で表示）
COLORS = {
    'thumb': (255, 0, 0),      # 親指 - 赤
    'index': (0, 255, 0),      # 人差し指 - 緑
    'middle': (0, 0, 255),     # 中指 - 青
    'ring': (255, 255, 0),     # 薬指 - 黄
    'pinky': (255, 0, 255),    # 小指 - マゼンタ
    'palm': (0, 255, 255),     # 手のひら - シアン
    'wrist': (128, 128, 128)   # 手首 - グレー
}

# ウィンドウタイトルの定数化
WINDOW_TITLE = "MediaPipe 3D手指ランドマーク検出"

# 手指ランドマーク構造定義（MediaPipe 21点）
FINGER_LANDMARKS = {
    'WRIST': 0,
    'THUMB_CMC': 1, 'THUMB_MCP': 2, 'THUMB_IP': 3, 'THUMB_TIP': 4,
    'INDEX_FINGER_MCP': 5, 'INDEX_FINGER_PIP': 6, 'INDEX_FINGER_DIP': 7, 'INDEX_FINGER_TIP': 8,
    'MIDDLE_FINGER_MCP': 9, 'MIDDLE_FINGER_PIP': 10, 'MIDDLE_FINGER_DIP': 11, 'MIDDLE_FINGER_TIP': 12,
    'RING_FINGER_MCP': 13, 'RING_FINGER_PIP': 14, 'RING_FINGER_DIP': 15, 'RING_FINGER_TIP': 16,
    'PINKY_MCP': 17, 'PINKY_PIP': 18, 'PINKY_DIP': 19, 'PINKY_TIP': 20
}

# MediaPipe接続定義の定数化
HAND_CONNECTIONS = mp.solutions.hands.HAND_CONNECTIONS

# 指インデックスからグループ名を返す補助関数
def finger_group(index):
    if index in (1, 2, 3, 4):
        return 'thumb'
    if index in (5, 6, 7, 8):
        return 'index'
    if index in (9, 10, 11, 12):
        return 'middle'
    if index in (13, 14, 15, 16):
        return 'ring'
    if index in (17, 18, 19, 20):
        return 'pinky'
    if index == 0:
        return 'wrist'
    return 'palm'

# 日本語フォント設定（メイリオ使用）
try:
    font_large = ImageFont.truetype('C:/Windows/Fonts/meiryo.ttc', FONT_LARGE)
    font_medium = ImageFont.truetype('C:/Windows/Fonts/meiryo.ttc', FONT_MEDIUM)
    font_small = ImageFont.truetype('C:/Windows/Fonts/meiryo.ttc', FONT_SMALL)
    font_tiny = ImageFont.truetype('C:/Windows/Fonts/meiryo.ttc', FONT_TINY)
except Exception as e:
    print(f'フォントの読み込みに失敗しました: {e}')
    exit()

# プログラム開始時の説明
print('MediaPipe 3D手指ランドマーク検出プログラム')
print('=' * 50)
print('概要: MediaPipe Handsを使用して手指の21点3Dランドマークをリアルタイム検出します')
print('特徴: 単一のRGBカメラから3次元座標（x, y, z）を推定')
print('　　　z座標は手首を基準とした相対的な深度情報')
print('　　　指の接触判定により日本語指文字の認識を支援')
print('操作: qキーで終了')
print('=' * 50)

# モデル選択
print('使用するモデルを選択してください:')
print('0: MediaPipe Hands Lite（軽量版）')
print('   - 処理速度: 標準以上')
print('   - 精度: 標準')
print('   - 推奨用途: リアルタイム処理重視')
print('')
print('1: MediaPipe Hands Full（標準版）')
print('   - 処理速度: 標準')
print('   - 精度: 標準')
print('   - 推奨用途: 精度重視（デフォルト）')
print('')

model_choice = input('モデル選択 (0 または 1): ')
if model_choice == '0':
    model_complexity = 0
    print('選択: MediaPipe Hands Lite（軽量版）')
elif model_choice == '1':
    model_complexity = 1
    print('選択: MediaPipe Hands Full（標準版）')
else:
    model_complexity = 1
    print('無効な選択です。デフォルト（標準版）を使用します')

print('=' * 50)
print('実装済み日本語指文字の判定特徴量：')
print('')
print('親指と他の指の先端接触（輪を作る）')
print('  お：親指と人差し指で輪')
print('  き：親指と人差し指・中指で輪（3本）')
print('  ら：親指と人差し指・中指で輪（横向き）')
print('')
print('親指と他の指の側面接触')
print('  す：親指が人差し指の側面に接触')
print('  せ：親指が人差し指の第一関節付近に接触')
print('  ぬ：親指が人差し指と中指の根元を押さえる')
print('')
print('親指と他の指の根元接触')
print('  め：親指が小指の根元（MCP関節）に接触')
print('  む：親指が人差し指の根元に接触')
print('')
print('指同士の交差')
print('  ね：人差し指と中指を交差')
print('  れ：親指が他の4指の下を通る')
print('=' * 50)

# グローバル変数
frame_count = 0
results_log = []
info_displayed = False  # 色分け凡例と技術情報の表示フラグ

# キャッシュ用グローバル変数（手ごと・メモリ効率向上）
_landmark_cache = {}  # key: 'Left' / 'Right' -> np.ndarray
_normal_cache = {}    # key: 'Left' / 'Right' -> np.ndarray
_scale_cache = {}     # key: 'Left' / 'Right' -> float

# ランドマーク履歴管理（時系列フィルタリング用：手ごと）
landmark_history = {}  # key: 'Left' / 'Right' -> deque(maxlen=HISTORY_SIZE)

def get_landmark_array_from_normalized(landmarks_3d, landmark_key):
    """正規化済み3D座標配列からランドマーク座標を取得（z座標スケーリング適用）"""
    landmark_idx = FINGER_LANDMARKS[landmark_key]
    landmark = landmarks_3d[landmark_idx].copy()
    # z座標のスケーリングを適用
    landmark[2] *= DEPTH_SCALING
    return landmark

def draw_japanese_text(img, text, position, font, color=(255, 255, 255)):
    """日本語テキストを画像に描画"""
    # 日本語テキスト描画の実体は draw_japanese_text_optimized に委譲
    return draw_japanese_text_optimized(img, [(text, position)], font, color)

def calculate_angle(v1, v2):
    """2つのベクトル間の角度を計算（度）（数値安定性向上版）"""
    # ベクトルの正規化による数値安定性の向上
    v1_norm = np.linalg.norm(v1)
    v2_norm = np.linalg.norm(v2)

    # 微小なベクトルの場合は0度を返す
    if v1_norm < 1e-10 or v2_norm < 1e-10:
        return 0

    v1_normalized = v1 / v1_norm
    v2_normalized = v2 / v2_norm

    dot_product = np.dot(v1_normalized, v2_normalized)
    cos_angle = np.clip(dot_product, -1.0, 1.0)
    return math.degrees(math.acos(cos_angle))

def calculate_distance(p1, p2):
    """2点間の距離を計算"""
    return np.linalg.norm(np.array(p1) - np.array(p2))

def smooth_landmarks(current_landmarks, hand_id):
    """ランドマークの時系列フィルタリング：過去3フレームの移動平均によりランドマーク位置を安定化（手ごと履歴）"""
    if hand_id not in landmark_history:
        landmark_history[hand_id] = collections.deque(maxlen=HISTORY_SIZE)
    dq = landmark_history[hand_id]
    dq.append(current_landmarks)
    if len(dq) < HISTORY_SIZE:
        return current_landmarks

    # 過去3フレームの移動平均を計算
    smoothed = np.zeros_like(current_landmarks)
    for landmarks in dq:
        smoothed += landmarks
    smoothed /= len(dq)
    return smoothed

def calculate_palm_normal(landmarks_normalized):
    """掌の法線ベクトルを計算：手の向き推定に使用（スケール整合版）"""
    wrist = get_landmark_array_from_normalized(landmarks_normalized, 'WRIST')
    index_mcp = get_landmark_array_from_normalized(landmarks_normalized, 'INDEX_FINGER_MCP')
    pinky_mcp = get_landmark_array_from_normalized(landmarks_normalized, 'PINKY_MCP')

    v1 = index_mcp - wrist
    v2 = pinky_mcp - wrist

    # 3次元ベクトルのクロス積を計算
    normal = np.cross(v1, v2)

    # 正規化（ゼロベクトルチェック付き）
    norm = np.linalg.norm(normal)
    if norm > 1e-10:
        normal = normal / norm
    else:
        normal = np.array([0, 0, 1])  # デフォルト法線ベクトル

    return normal

def calculate_hand_scale(landmarks_normalized):
    """手のスケールを計算：手首から中指先端までの距離（スケール整合版）"""
    wrist = get_landmark_array_from_normalized(landmarks_normalized, 'WRIST')
    middle_tip = get_landmark_array_from_normalized(landmarks_normalized, 'MIDDLE_FINGER_TIP')
    return calculate_distance(wrist, middle_tip)

def calculate_line_point_distance(point, line_start, line_end):
    """点と線分の最短距離を計算"""
    line_vec = line_end - line_start
    line_len_sq = np.dot(line_vec, line_vec)

    if line_len_sq < 1e-10:  # 数値安定性の向上
        return calculate_distance(point, line_start)

    t = max(0, min(1, np.dot(point - line_start, line_vec) / line_len_sq))
    projection = line_start + t * line_vec
    return calculate_distance(point, projection)

def draw_japanese_text_optimized(frame, texts_and_positions, font, color=(255, 255, 255)):
    """日本語テキスト描画：複数テキストを一度にPillowで処理"""
    if not texts_and_positions:
        return frame

    # BGR→RGB変換を1回のみ実行
    img_pil = Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
    draw = ImageDraw.Draw(img_pil)

    # 複数のテキストを一度に描画
    for text, position in texts_and_positions:
        draw.text(position, text, font=font, fill=color)

    # RGB→BGR変換を1回のみ実行
    return cv2.cvtColor(np.array(img_pil), cv2.COLOR_RGB2BGR)

def draw_landmark_with_label(frame, x, y, landmark_idx, color, z_value_scaled):
    """ランドマーク描画とラベル表示（スケール修正版）"""
    # 深度による円サイズの動的変更（スケール修正）
    base_radius = 5
    # z_value_scaledはスケーリング済みのz座標を使用
    depth_factor = max(0.5, min(2.0, 1.0 + z_value_scaled * 0.1))
    radius = max(2, int(base_radius * depth_factor))

    # ランドマーク円を描画
    cv2.circle(frame, (x, y), radius, color, -1)
    cv2.circle(frame, (x, y), radius + 1, (0, 0, 0), 1)

    # ランドマーク番号表示
    cv2.putText(frame, str(landmark_idx), (x + 3, y - 3),
                cv2.FONT_HERSHEY_SIMPLEX, 0.3, (255, 255, 255), 1)

    # 重要点の強調表示
    if landmark_idx == FINGER_LANDMARKS['THUMB_TIP']:
        cv2.putText(frame, 'THUMB', (x + 8, y + 8),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.4, (255, 255, 255), 1)
        cv2.circle(frame, (x, y), radius + 3, (255, 255, 255), 2)
    elif landmark_idx == FINGER_LANDMARKS['INDEX_FINGER_TIP']:
        cv2.putText(frame, 'INDEX', (x + 8, y + 8),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.4, (255, 255, 255), 1)
        cv2.circle(frame, (x, y), radius + 3, (255, 255, 255), 2)

def batch_draw_landmarks(frame, landmarks_normalized, w, h):
    """ランドマーク描画：バッチ処理（スケール修正版）"""
    landmark_positions = []

    for i, landmark in enumerate(landmarks_normalized):
        x = int(landmark[0] * w)
        y = int(landmark[1] * h)
        z_value_scaled = landmark[2] * DEPTH_SCALING  # スケーリング適用

        # 指ごとに色を決定
        color = COLORS[finger_group(i)]

        # ランドマーク描画
        draw_landmark_with_label(frame, x, y, i, color, z_value_scaled)
        landmark_positions.append((x, y))

    return frame, landmark_positions

def draw_connections_optimized(frame, landmark_positions):
    """接続線描画"""
    # 接続線を一括描画
    for connection in HAND_CONNECTIONS:
        start_idx, end_idx = connection[0], connection[1]
        if start_idx < len(landmark_positions) and end_idx < len(landmark_positions):
            start_point = landmark_positions[start_idx]
            end_point = landmark_positions[end_idx]
            cv2.line(frame, start_point, end_point, (255, 255, 255), 2)

    return frame

def detect_finger_contacts(landmarks_normalized, hand_scale):
    """指接触判定による日本語指文字の認識：適応的閾値を使用（スケール整合版）"""
    contacts = []

    # 手のスケールに基づく適応的閾値
    contact_threshold = hand_scale * BASE_CONTACT_THRESHOLD

    # 必要ランドマークを一括取得
    keys = [
        'THUMB_TIP',
        'INDEX_FINGER_TIP', 'MIDDLE_FINGER_TIP', 'RING_FINGER_TIP', 'PINKY_TIP',
        'INDEX_FINGER_PIP', 'INDEX_FINGER_DIP',
        'INDEX_FINGER_MCP', 'MIDDLE_FINGER_MCP', 'PINKY_MCP',
        'MIDDLE_FINGER_PIP'
    ]
    L = {k: get_landmark_array_from_normalized(landmarks_normalized, k) for k in keys}

    thumb_tip = L['THUMB_TIP']
    index_tip = L['INDEX_FINGER_TIP']
    middle_tip = L['MIDDLE_FINGER_TIP']
    ring_tip = L['RING_FINGER_TIP']
    pinky_tip = L['PINKY_TIP']

    # よく使う距離を事前計算
    d_thumb_index = calculate_distance(thumb_tip, index_tip)
    d_thumb_middle = calculate_distance(thumb_tip, middle_tip)
    d_thumb_index_mcp = calculate_distance(thumb_tip, L['INDEX_FINGER_MCP'])
    d_thumb_middle_mcp = calculate_distance(thumb_tip, L['MIDDLE_FINGER_MCP'])
    d_thumb_pinky_mcp = calculate_distance(thumb_tip, L['PINKY_MCP'])

    # 「お」の判定：親指と人差し指で輪
    if d_thumb_index < contact_threshold * 1.2:  # 閾値を20%緩和
        # DIP関節の角度チェック（輪の形成を確認）
        index_pip = L['INDEX_FINGER_PIP']
        index_dip = L['INDEX_FINGER_DIP']
        v1 = index_dip - index_pip
        v2 = index_tip - index_dip
        angle = calculate_angle(v1, v2)
        # 角度閾値を90度に緩和（より検出しやすく）
        if angle > 90:
            contacts.append('お')

    # 「き」と「ら」の判定（独立した判定）
    thumb_index_contact = d_thumb_index < contact_threshold
    thumb_middle_contact = d_thumb_middle < contact_threshold

    if thumb_index_contact and thumb_middle_contact:
        # 掌法線ベクトルで向きを判定
        normal = calculate_palm_normal(landmarks_normalized)

        # 平面的（z成分が大きい）な場合は「き」
        if abs(normal[2]) > 0.7:
            contacts.append('き')
        # 横向き（x成分が大きい）な場合は「ら」
        if abs(normal[0]) > 0.7:
            contacts.append('ら')

    # 「す」の判定：親指が人差し指の側面に接触
    index_pip = L['INDEX_FINGER_PIP']
    index_dip = L['INDEX_FINGER_DIP']
    side_distance = calculate_line_point_distance(thumb_tip, index_pip, index_dip)
    if side_distance < contact_threshold * 0.8:
        contacts.append('す')

    # 「せ」の判定：親指が人差し指の第一関節付近に接触
    if calculate_distance(thumb_tip, L['INDEX_FINGER_DIP']) < contact_threshold:
        contacts.append('せ')

    # 「ぬ」の判定：親指が人差し指と中指の根元を押さえる
    if (d_thumb_index_mcp < contact_threshold and
        d_thumb_middle_mcp < contact_threshold):
        contacts.append('ぬ')

    # 「め」の判定：親指が小指の根元に接触
    if d_thumb_pinky_mcp < contact_threshold:
        contacts.append('め')

    # 「む」の判定：親指が人差し指の根元に接触
    if d_thumb_index_mcp < contact_threshold:
        contacts.append('む')

    # 「ね」の判定：人差し指と中指を交差
    middle_pip = L['MIDDLE_FINGER_PIP']
    # 先端距離とPIP関節同士の距離（交差により接近）
    if (calculate_distance(index_tip, middle_tip) < contact_threshold * 1.5 and
        calculate_distance(index_pip, middle_pip) < contact_threshold * 1.2):
        contacts.append('ね')

    # 「れ」の判定：親指が他の4指の下を通る（安全性向上版）
    # 注: MediaPipeのzはxと同スケールで、値が小さい（より負）ほどカメラに近い＝手前。
    # 本実装は「親指が他の4指より手前（カメラ寄り）を通る」ことを条件とするため、
    # thumb_tip[2] < min(other_fingers_z) を用いる。
    other_fingers_z = [index_tip[2], middle_tip[2], ring_tip[2], pinky_tip[2]]
    if (thumb_tip[2] < min(other_fingers_z) and
        min(index_tip[0], pinky_tip[0]) < thumb_tip[0] < max(index_tip[0], pinky_tip[0])):
        contacts.append('れ')

    return contacts

def arrays_equal_with_tolerance(arr1, arr2, tolerance=FLOAT_TOLERANCE):
    """浮動小数点許容誤差を考慮した配列比較"""
    if arr1 is None or arr2 is None:
        return arr1 is arr2
    return np.allclose(arr1, arr2, atol=tolerance)

def process_hand(frame, hand_landmarks, hand_id):
    """手指ランドマークの処理と描画（スケール整合版・手ごとキャッシュ/履歴）"""
    h, w, _ = frame.shape

    # 正規化座標での3D配列作成（スケール整合性確保）
    landmarks_normalized = np.array([[landmark.x, landmark.y, landmark.z]
                                   for landmark in hand_landmarks.landmark])

    # 時系列フィルタリング：過去3フレームの移動平均によりランドマーク位置を安定化（手ごと）
    landmarks_normalized = smooth_landmarks(landmarks_normalized, hand_id)

    # 計算結果のキャッシュチェック（浮動小数点許容誤差対応）
    prev_cache = _landmark_cache.get(hand_id)
    landmarks_changed = not arrays_equal_with_tolerance(prev_cache, landmarks_normalized)

    if landmarks_changed:
        # 手のスケール計算（正規化座標使用）
        _scale_cache[hand_id] = calculate_hand_scale(landmarks_normalized)

        # 掌の法線ベクトル計算：手の向き推定（正規化座標使用）
        _normal_cache[hand_id] = calculate_palm_normal(landmarks_normalized)

        # キャッシュ更新
        _landmark_cache[hand_id] = landmarks_normalized.copy()

    hand_scale = _scale_cache[hand_id]
    normal = _normal_cache[hand_id]

    # 深度推定範囲計算（インライン化）
    z_scaled = landmarks_normalized[:, 2] * DEPTH_SCALING
    depth_min, depth_max = float(np.min(z_scaled)), float(np.max(z_scaled))

    # ランドマーク描画
    frame, landmark_positions = batch_draw_landmarks(frame, landmarks_normalized, w, h)

    # 接続線描画
    frame = draw_connections_optimized(frame, landmark_positions)

    # 指文字認識：日本語指文字の接触判定（適応的閾値使用）
    contacts = detect_finger_contacts(landmarks_normalized, hand_scale)

    # 情報表示用データ準備
    info_texts_positions = [
        (f'検出: 手', (10, 30)),
        (f'ランドマーク: 21点', (10, 55)),
        (f'手のスケール: {hand_scale:.3f}', (10, 80)),
        (f'深度範囲: [{depth_min:.3f}, {depth_max:.3f}]', (10, 105)),
        (f'掌法線: ({normal[0]:.2f}, {normal[1]:.2f}, {normal[2]:.2f})', (10, 130)),
    ]

    if contacts:
        info_texts_positions.append((f'指文字: {", ".join(contacts)}', (10, 155)))

    # 日本語テキスト描画
    frame = draw_japanese_text_optimized(frame, info_texts_positions, font_small)

    # 返り値用情報
    info_text = [
        f'検出: 手',
        f'ランドマーク: 21点',
        f'手のスケール: {hand_scale:.3f}',
        f'深度範囲: [{depth_min:.3f}, {depth_max:.3f}]',
        f'掌法線: ({normal[0]:.2f}, {normal[1]:.2f}, {normal[2]:.2f})',
    ]

    if contacts:
        info_text.append(f'指文字: {", ".join(contacts)}')

    return frame, info_text, hand_scale, (depth_min, depth_max)

def video_frame_processing(frame):
    """動画フレームの処理"""
    global frame_count, info_displayed
    current_time = time.time()
    frame_count += 1

    # BGR→RGB変換（MediaPipeの入力要件）
    rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

    # MediaPipe処理
    results = hands.process(rgb_frame)

    # フレーム数表示（OpenCV画面）
    cv2.putText(frame, f'Frame: {frame_count}', (10, frame.shape[0] - 10),
                cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 0), 2)

    # 色分け凡例と技術情報を最初に1回だけ表示
    if not info_displayed:
        print("\n" + "="*50)
        print("色分け凡例:")
        print(f"  親指: 赤 {COLORS['thumb']}")
        print(f"  人差し指: 緑 {COLORS['index']}")
        print(f"  中指: 青 {COLORS['middle']}")
        print(f"  薬指: 黄 {COLORS['ring']}")
        print(f"  小指: マゼンタ {COLORS['pinky']}")
        print(f"  手首: グレー {COLORS['wrist']}")
        print("")
        print("技術情報:")
        print("  特徴抽出: 3D手指姿勢 + 深度推定 + 指接触")
        print("  モデル: MediaPipe Hands")
        print("  z座標: 手首を基準とした相対深度")
        print("  視覚化: 深度による円サイズ変更 + ランドマーク番号表示")
        print("  バッチ描画 + キャッシュ機能 + メモリ効率向上")
        print("="*50)
        info_displayed = True

    # 処理結果
    processing_info = []
    hand_scale_info = []
    depth_info = []

    if results.multi_hand_landmarks:
        # handedness と landmarks を対応付けて処理（手ごとに識別）
        for hand_landmarks, handedness in zip(results.multi_hand_landmarks, results.multi_handedness):
            hand_id = handedness.classification[0].label  # 'Left' または 'Right'
            frame, info, hand_scale, depth_range = process_hand(frame, hand_landmarks, hand_id)
            processing_info.extend(info)
            hand_scale_info.append(hand_scale)
            depth_info.append(depth_range)
    else:
        # 「手が検出されていません」を直接OpenCVで描画
        cv2.putText(frame, 'No hands detected', (10, 50),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)
        processing_info.append('検出: なし')

    # 結果を文字列として整形
    result = ' | '.join(processing_info)

    return frame, result, current_time

# MediaPipe初期化
mp_hands = mp.solutions.hands
hands = mp_hands.Hands(
    static_image_mode=False,
    max_num_hands=MAX_NUM_HANDS,
    model_complexity=model_complexity,
    min_detection_confidence=HAND_CONFIDENCE,
    min_tracking_confidence=TRACKING_CONFIDENCE
)

print('0: 動画ファイル')
print('1: カメラ')
print('2: サンプル動画')

choice = input('選択: ')

if choice == '0':
    root = tk.Tk()
    root.withdraw()
    path = filedialog.askopenfilename()
    if not path:
        exit()
    cap = cv2.VideoCapture(path)
elif choice == '1':
    cap = cv2.VideoCapture(0, cv2.CAP_DSHOW)
    if not cap.isOpened():
        cap = cv2.VideoCapture(0)
    cap.set(cv2.CAP_PROP_BUFFERSIZE, 1)
else:
    # サンプル動画ダウンロード・処理
    SAMPLE_URL = 'https://raw.githubusercontent.com/opencv/opencv/master/samples/data/vtest.avi'
    SAMPLE_FILE = 'vtest.avi'
    urllib.request.urlretrieve(SAMPLE_URL, SAMPLE_FILE)
    cap = cv2.VideoCapture(SAMPLE_FILE)

if not cap.isOpened():
    print('動画ファイル・カメラを開けませんでした')
    exit()

# メイン処理
print('\n=== 動画処理開始 ===')
print('操作方法:')
print('  q キー: プログラム終了')
try:
    while True:
        ret, frame = cap.read()
        if not ret:
            break

        processed_frame, result, current_time = video_frame_processing(frame)
        cv2.imshow(WINDOW_TITLE, processed_frame)
        if choice == '1':  # カメラの場合
            print(datetime.fromtimestamp(current_time).strftime("%Y-%m-%d %H:%M:%S.%f")[:-3], result)
        else:  # 動画ファイルの場合
            print(frame_count, result)
        results_log.append(result)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
finally:
    print('\n=== プログラム終了 ===')
    cap.release()
    cv2.destroyAllWindows()
    hands.close()
    if results_log:
        with open('result.txt', 'w', encoding='utf-8') as f:
            f.write('=== 結果 ===\n')
            f.write(f'処理フレーム数: {frame_count}\n')
            f.write(f'使用モデル: MediaPipe Hands {"Lite" if model_complexity == 0 else "Full"}\n')
            f.write('\n')
            f.write('\n'.join(results_log))
        print(f'\n処理結果をresult.txtに保存しました')

4. 使用方法

上記のプログラムを実行する
カメラを選択した場合は，Webカメラが起動し、手を映すと21点の3次元ランドマークが表示される。
終了するにはqキーを押す。