PiDiNetエッジ検出（ソースコードと実行結果）

Python開発環境，ライブラリ類

ここでは、最低限の事前準備について説明する。機械学習や深層学習を行う場合は、NVIDIA CUDA、Visual Studio、Cursorなどを追加でインストールすると便利である。これらについては別ページ https://www.kkaneko.jp/cc/dev/aiassist.htmlで詳しく解説しているので、必要に応じて参照してください。

Python 3.12 のインストール

インストール済みの場合は実行不要。

管理者権限でコマンドプロンプトを起動（手順：Windowsキーまたはスタートメニュー > cmd と入力 > 右クリック > 「管理者として実行」）し、以下を実行する。管理者権限は、wingetの--scope machineオプションでシステム全体にソフトウェアをインストールするために必要である。

REM Python をシステム領域にインストール
winget install --scope machine --id Python.Python.3.12 -e --silent --accept-source-agreements --accept-package-agreements
REM Python のパス設定
set "PYTHON_PATH=C:\Program Files\Python312"
set "PYTHON_SCRIPTS_PATH=C:\Program Files\Python312\Scripts"
echo "%PATH%" | find /i "%PYTHON_PATH%" >nul
if errorlevel 1 setx PATH "%PATH%;%PYTHON_PATH%" /M >nul
echo "%PATH%" | find /i "%PYTHON_SCRIPTS_PATH%" >nul
if errorlevel 1 setx PATH "%PATH%;%PYTHON_SCRIPTS_PATH%" /M >nul

【関連する外部ページ】

Python の公式ページ: https://www.python.org/

AI エディタ Windsurf のインストール

Pythonプログラムの編集・実行には、AI エディタの利用を推奨する。ここでは，Windsurfのインストールを説明する。

管理者権限でコマンドプロンプトを起動（手順：Windowsキーまたはスタートメニュー > cmd と入力 > 右クリック > 「管理者として実行」）し、以下を実行して、Windsurfをシステム全体にインストールする。管理者権限は、wingetの--scope machineオプションでシステム全体にソフトウェアをインストールするために必要となる。

winget install --scope machine --id Codeium.Windsurf -e --silent --accept-source-agreements --accept-package-agreements

【関連する外部ページ】

Windsurf の公式ページ: https://windsurf.com/

必要なライブラリをシステム領域にインストール

コマンドプロンプトを管理者として実行（手順：Windowsキーまたはスタートメニュー > cmd と入力 > 右クリック > 「管理者として実行」）し、以下を実行する


pip install -U torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
pip install opencv-python numpy pillow

PiDiNetエッジ検出プログラム

概要

このプログラムは、動画や画像からエッジを検出する。

主要技術

PiDiNet (Pixel Difference Network)
エッジ検出手法を深層学習に統合したエッジ検出モデルである。Pixel Difference Convolutionという畳み込み演算を導入し、隣接ピクセル間の差分を学習する。 [1]。
マルチスケール特徴融合
異なる解像度の特徴マップから抽出したエッジ情報を統合する技術である。細かいテクスチャから大きな境界まで、様々なスケールのエッジを同時に捉える。 [1]。

参考文献

[1] Su, Z., Liu, W., Yu, Z., Hu, D., Liao, Q., Tian, Q., Pietikainen, M., & Liu, L. (2021). Pixel Difference Networks for Efficient Edge Detection. Proceedings of IEEE International Conference on Computer Vision (ICCV). arXiv:2108.07009


# PiDiNetエッジ検出プログラム
# 特徴技術名: PiDiNet (Pixel Difference Network) - 従来のエッジ検出器の知識を活用した軽量で効率的なディープラーニングベースのエッジ検出モデル
# 出典: Su, Z., Liu, W., Yu, Z., Hu, D., Liao, Q., Tian, Q., Pietikainen, M., & Liu, L. (2021). Pixel Difference Networks for Efficient Edge Detection. Proceedings of IEEE International Conference on Computer Vision (ICCV). arXiv:2108.07009
# 特徴機能: Pixel Difference Convolution (PDC)を用いたエッジマップ生成 - 深層学習の表現力と従来手法の効率性を融合し、100FPS処理と1M未満のパラメータで動作。ODS F-score 0.807を達成
# 学習済みモデル: table5_pidinet.pth - BSDS500データセットでODS F-score 0.807を達成。複雑な環境下でもエッジを検出。GitHubから自動的にダウンロードされる
# 前準備:
# pip install -U torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
# pip install opencv-python numpy pillow

import cv2
import tkinter as tk
from tkinter import filedialog
import urllib.request
import os
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import time
from datetime import datetime
from PIL import Image, ImageDraw, ImageFont

# GPU/CPU自動選択
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'デバイス: {str(device)}')
# GPU使用時の最適化
if device.type == 'cuda':
    torch.backends.cudnn.benchmark = True

# 設定パラメータ
ALPHA = 0.7
BETA = 0.6
COLORMAP = cv2.COLORMAP_JET
FONT_PATH = 'C:/Windows/Fonts/meiryo.ttc'
FONT_SIZE = 20
USE_ADAPTIVE_BLENDING = True

#
MODEL_PATH = 'pidinet_model.pth'

frame_count = 0
results_log = []
model = None

class PiDiBlock(nn.Module):
    """PiDiNetの基本ブロック"""
    def __init__(self, in_channels, out_channels, stride=1):
        super(PiDiBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, in_channels, kernel_size=3,
                              stride=stride, padding=1, bias=False, groups=in_channels)
        self.conv2 = nn.Conv2d(in_channels, out_channels, kernel_size=1,
                              stride=1, padding=0, bias=False)
        self.shortcut = None
        if stride != 1 or in_channels != out_channels:
            self.shortcut = nn.Conv2d(in_channels, out_channels, kernel_size=1,
                                    stride=stride, padding=0, bias=True)

    def forward(self, x):
        residual = x
        out = self.conv1(x)
        out = self.conv2(out)
        if self.shortcut is not None:
            residual = self.shortcut(x)
        out += residual
        return out

class AttentionModule(nn.Module):
    """アテンション機構"""
    def __init__(self, in_channels=24):
        super(AttentionModule, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, 4, kernel_size=1, bias=True)
        self.conv2 = nn.Conv2d(4, 1, kernel_size=3, padding=1, bias=False)

    def forward(self, x):
        att = self.conv1(x)
        att = self.conv2(att)
        att = torch.sigmoid(att)
        return x * att

class DilationModule(nn.Module):
    """拡張畳み込みモジュール"""
    def __init__(self, in_channels):
        super(DilationModule, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, 24, kernel_size=1, bias=True)
        self.conv2_1 = nn.Conv2d(24, 24, kernel_size=3, padding=1, dilation=1, bias=False)
        self.conv2_2 = nn.Conv2d(24, 24, kernel_size=3, padding=2, dilation=2, bias=False)
        self.conv2_3 = nn.Conv2d(24, 24, kernel_size=3, padding=3, dilation=3, bias=False)
        self.conv2_4 = nn.Conv2d(24, 24, kernel_size=3, padding=4, dilation=4, bias=False)

    def forward(self, x):
        x = self.conv1(x)
        out1 = self.conv2_1(x)
        out2 = self.conv2_2(x)
        out3 = self.conv2_3(x)
        out4 = self.conv2_4(x)
        return out1 + out2 + out3 + out4

class ConvReduceModule(nn.Module):
    """チャンネル削減モジュール"""
    def __init__(self):
        super(ConvReduceModule, self).__init__()
        self.conv = nn.Conv2d(24, 1, kernel_size=1, bias=True)

    def forward(self, x):
        return self.conv(x)

class OfficialPiDiNet(nn.Module):
    """公式PiDiNet構造"""
    def __init__(self):
        super(OfficialPiDiNet, self).__init__()
        self.module = nn.Module()
        self.module.init_block = nn.Conv2d(3, 60, kernel_size=3, padding=1, bias=False)

        self.module.block1_1 = PiDiBlock(60, 60)
        self.module.block1_2 = PiDiBlock(60, 60)
        self.module.block1_3 = PiDiBlock(60, 60)

        self.module.block2_1 = PiDiBlock(60, 120, stride=2)
        self.module.block2_2 = PiDiBlock(120, 120)
        self.module.block2_3 = PiDiBlock(120, 120)
        self.module.block2_4 = PiDiBlock(120, 120)

        self.module.block3_1 = PiDiBlock(120, 240, stride=2)
        self.module.block3_2 = PiDiBlock(240, 240)
        self.module.block3_3 = PiDiBlock(240, 240)
        self.module.block3_4 = PiDiBlock(240, 240)

        self.module.block4_1 = PiDiBlock(240, 240, stride=2)
        self.module.block4_2 = PiDiBlock(240, 240)
        self.module.block4_3 = PiDiBlock(240, 240)
        self.module.block4_4 = PiDiBlock(240, 240)

        self.module.dilations = nn.ModuleList([
            DilationModule(60), DilationModule(120),
            DilationModule(240), DilationModule(240)
        ])
        self.module.attentions = nn.ModuleList([AttentionModule(24) for _ in range(4)])
        self.module.conv_reduces = nn.ModuleList([ConvReduceModule() for _ in range(4)])
        self.module.classifier = nn.Conv2d(4, 1, kernel_size=1, bias=True)

    def forward(self, x):
        x = self.module.init_block(x)
        x1 = self.module.block1_1(x)
        x1 = self.module.block1_2(x1)
        x1 = self.module.block1_3(x1)
        x2 = self.module.block2_1(x1)
        x2 = self.module.block2_2(x2)
        x2 = self.module.block2_3(x2)
        x2 = self.module.block2_4(x2)
        x3 = self.module.block3_1(x2)
        x3 = self.module.block3_2(x3)
        x3 = self.module.block3_3(x3)
        x3 = self.module.block3_4(x3)
        x4 = self.module.block4_1(x3)
        x4 = self.module.block4_2(x4)
        x4 = self.module.block4_3(x4)
        x4 = self.module.block4_4(x4)

        features = [x1, x2, x3, x4]
        edge_outputs = []
        for i, (feature, dilation, attention, conv_reduce) in enumerate(
            zip(features, self.module.dilations, self.module.attentions, self.module.conv_reduces)):
            dilated = dilation(feature)
            attended = attention(dilated)
            edge = conv_reduce(attended)
            if i > 0:
                edge = F.interpolate(edge, size=edge_outputs[0].shape[2:],
                                   mode='bilinear', align_corners=False)
            edge_outputs.append(edge)
        fused = torch.cat(edge_outputs, dim=1)
        final_output = self.module.classifier(fused)
        return final_output, edge_outputs[0], edge_outputs[1], edge_outputs[2], edge_outputs[3]

def load_official_weights(model, weight_path='pidinet_model.pth'):
    """公式重みの読み込み"""
    try:
        try:
            checkpoint = torch.load(weight_path, map_location='cpu', weights_only=True)
        except TypeError:
            checkpoint = torch.load(weight_path, map_location='cpu')
        official_state_dict = checkpoint['state_dict']
        model.load_state_dict(official_state_dict, strict=True)
        print("✓ 公式重みの読み込みが完了しました")
        return model
    except Exception as e:
        print(f"重みの読み込みに失敗しました: {e}")
        exit()

def download_model():
    if not os.path.exists(MODEL_PATH):
        print('PiDiNet学習済みモデルをダウンロード中...')
        try:
            urllib.request.urlretrieve('https://github.com/hellozhuo/pidinet/raw/master/trained_models/table5_pidinet.pth', MODEL_PATH)
            print('ダウンロード完了')
        except Exception as e:
            print(f'モデルのダウンロードに失敗しました: {e}')
            exit()
    return MODEL_PATH

def initialize_model():
    """PiDiNetモデルを初期化"""
    print('PiDiNetモデルを初期化中...')
    model_path = download_model()
    model = OfficialPiDiNet()
    model = load_official_weights(model, model_path)
    model = model.to(device)
    model.eval()
    print('PiDiNetモデルの初期化が完了しました')
    return model

def preprocess_image(img):
    """画像を前処理してモデル入力用のテンソルに変換"""
    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img_resized = cv2.resize(img_rgb, (512, 512))
    img_tensor = torch.from_numpy(img_resized.transpose(2, 0, 1)).float() / 255.0
    mean = torch.tensor([0.485, 0.456, 0.406]).view(3, 1, 1)
    std = torch.tensor([0.229, 0.224, 0.225]).view(3, 1, 1)
    img_tensor = (img_tensor - mean) / std
    img_tensor = img_tensor.unsqueeze(0)
    img_tensor = img_tensor.to(device)
    return img_tensor

def video_frame_processing(frame):
    global frame_count
    current_time = time.time()
    frame_count += 1

    # 推論実行
    img_tensor = preprocess_image(frame)
    with torch.no_grad():
        outputs = model(img_tensor)
        edge_map = torch.sigmoid(outputs[0]).squeeze().cpu().numpy()

    edge_map = cv2.resize(edge_map, (frame.shape[1], frame.shape[0]))
    edge_colored = cv2.applyColorMap((edge_map * 255).astype(np.uint8), COLORMAP)

    if USE_ADAPTIVE_BLENDING:
        edge_strength = edge_map.copy()
        edge_strength_normalized = cv2.normalize(edge_strength, None, 0, 1, cv2.NORM_MINMAX)
        alpha_mask = ALPHA + (1.0 - ALPHA) * edge_strength_normalized
        alpha_mask = np.expand_dims(alpha_mask, axis=2)
        alpha_mask = np.repeat(alpha_mask, 3, axis=2)
        processed_frame = (frame * alpha_mask + edge_colored * BETA * (1 - alpha_mask)).astype(np.uint8)
    else:
        processed_frame = cv2.addWeighted(frame, ALPHA, edge_colored, BETA, 0)

    # フォント設定と描画
    FONT_PATH = 'C:/Windows/Fonts/meiryo.ttc'
    FONT_SIZE = 20
    font = ImageFont.truetype(FONT_PATH, FONT_SIZE)
    img_pil = Image.fromarray(cv2.cvtColor(processed_frame, cv2.COLOR_BGR2RGB))
    draw = ImageDraw.Draw(img_pil)

    edge_strength_avg = np.mean(edge_map)
    result_info = f'エッジ強度平均: {edge_strength_avg:.3f}'

    draw.text((10, 30), result_info, font=font, fill=(0, 255, 0))
    draw.text((10, 60), f'フレーム: {frame_count}', font=font, fill=(0, 255, 0))
    draw.text((10, 90), 'モデル: PiDiNet', font=font, fill=(0, 255, 0))
    processed_frame = cv2.cvtColor(np.array(img_pil), cv2.COLOR_RGB2BGR)

    result = f'エッジ強度平均: {edge_strength_avg:.3f}'
    return processed_frame, result, current_time

# メイン処理
print('=' * 60)
print('PiDiNetエッジ検出プログラム')
print('=' * 60)
print('【概要説明】')
print('  PiDiNetモデルを使用して動画からエッジマップを生成します')
print('  エッジマップはグラデーションを保持したまま表示されます')
print('  元画像とエッジマップを重ね合わせた結果を表示します')
print('')
print('【操作方法】')
print('  1. 入力ソースを選択してください（0/1/2）')
print('  2. 動画処理中はqキーで終了できます')
print('')
print('【注意事項】')
print('  - GPU環境がある場合は自動的にGPUを使用します')
print('  - 初回実行時はモデルのダウンロードが行われます')
print('=' * 60)

model = initialize_model()

print("0: 動画ファイル")
print("1: カメラ")
print("2: サンプル動画")

choice = input("選択: ")

if choice == '0':
    root = tk.Tk()
    root.withdraw()
    path = filedialog.askopenfilename()
    if not path:
        exit()
    cap = cv2.VideoCapture(path)
elif choice == '1':
    cap = cv2.VideoCapture(0, cv2.CAP_DSHOW)
    if not cap.isOpened():
        cap = cv2.VideoCapture(0)
    cap.set(cv2.CAP_PROP_BUFFERSIZE, 1)
else:
    # サンプル動画ダウンロード・処理
    SAMPLE_URL = 'https://raw.githubusercontent.com/opencv/opencv/master/samples/data/vtest.avi'
    SAMPLE_FILE = 'vtest.avi'
    urllib.request.urlretrieve(SAMPLE_URL, SAMPLE_FILE)
    cap = cv2.VideoCapture(SAMPLE_FILE)

if not cap.isOpened():
    print('動画ファイル・カメラを開けませんでした')
    exit()

# メイン処理
print('\n=== 動画処理開始 ===')
print('操作方法:')
print('  q キー: プログラム終了')
try:
    while True:
        ret, frame = cap.read()
        if not ret:
            break

        MAIN_FUNC_DESC = "PiDiNetエッジ検出"
        processed_frame, result, current_time = video_frame_processing(frame)
        cv2.imshow(MAIN_FUNC_DESC, processed_frame)
        if choice == '1':  # カメラの場合
            print(datetime.fromtimestamp(current_time).strftime("%Y-%m-%d %H:%M:%S.%f")[:-3], result)
        else:  # 動画ファイルの場合
            print(frame_count, result)
        results_log.append(result)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
finally:
    print('\n=== プログラム終了 ===')
    cap.release()
    cv2.destroyAllWindows()
    if results_log:
        with open('result.txt', 'w', encoding='utf-8') as f:
            f.write('=== 結果 ===\n')
            f.write(f'処理フレーム数: {frame_count}\n')
            f.write(f'使用デバイス: {str(device).upper()}\n')
            if device.type == 'cuda':
                f.write(f'GPU: {torch.cuda.get_device_name(0)}\n')
            f.write('\n')
            f.write('\n'.join(results_log))
        print(f'\n処理結果をresult.txtに保存しました')