ChatGPT の利用（文書の整形，文書の要約）（ChatGPT の API，Python を使用）（Windows 上）

【要約】 ChatGPTとPythonを使った文章の整形や要約の方法をWindowsで紹介している．特徴として，一度に処理できない長い文章の場合には分割処理し，結合して，最終的な結果を得るようにしている．具体的なプログラムの実行方法についても触れられている．文書整形プログラム `arrange.py` は，指定されたテキストファイルをOpenAIのChatGPT 3.5 turboを使用して整形する．要約プログラム `summary.py` は，指定されたテキストファイルをOpenAIのChatGPT 3.5 turboを使用して要約し，長さ500以下にする．いずれも，テキストが長い場合には分割して処理する．そして，いずれも，入力ファイルと出力ファイルをコマンドライン引数で指定する．いずれのプログラムも，実行には Chat GPT のAPIキーが必要である．プログラムを実行する際に必要なPython開発環境やプログラムの保存方法，ファイル名の指定方法についても説明している．PythonプログラムはUbuntuでも動く．

【目次】

前準備
文書の整形プログラム（ChatGPT API，Python を使用）
文書の要約プログラム（ChatGPT API，Python を使用）

前準備

Python のインストールと必要なPythonライブラリのインストール（Windows上）

Python のインストール
注：既にPython（バージョン3.12を推奨）がインストール済みの場合は，この手順は不要である．

winget（Windowsパッケージマネージャー）を使用してインストールを行う
1. Windowsで，管理者権限でコマンドプロンプトを起動（手順：Windowsキーまたはスタートメニュー > cmd と入力 > 右クリック > 「管理者として実行」）。
2. winget（Windowsパッケージマネージャー）が利用可能か確認する：
  winget --version
3. Pythonのインストール（下のコマンドにより Python 3.12 がインストールされる）．
  reg add "HKLM\SYSTEM\CurrentControlSet\Control\FileSystem" /v LongPathsEnabled /t REG_DWORD /d 1 /f REM Python をシステム領域にインストール winget install --scope machine --id Python.Python.3.12 -e --silent REM Python のパス set "INSTALL_PATH=C:\Program Files\Python312" echo "%PATH%" | find /i "%INSTALL_PATH%" >nul if errorlevel 1 setx PATH "%PATH%;%INSTALL_PATH%" /M >nul echo "%PATH%" | find /i "%INSTALL_PATH%\Scripts" >nul if errorlevel 1 setx PATH "%PATH%;%INSTALL_PATH%\Scripts" /M >nul
必要なPythonライブラリのインストール
1. Windowsで，管理者権限でコマンドプロンプトを起動（手順：Windowsキーまたはスタートメニュー > cmd と入力 > 右クリック > 「管理者として実行」）。
2. 以下のコマンドを実行し，必要なライブラリをインストールする．
  pip install -U openai

【関連する外部ページ】

Python公式サイト：https://www.python.org/

【サイト内の関連ページ】

Python詳細ガイド：別ページ »

文書の整形プログラム（ChatGPT API，Python を使用）

ChatGPT 3.5 turbo にのプロンプトを与えることにより，日本語の文書を整形して，最終結果をファイルに保存する． HTMLファイルやPDFやパワーポイントなどから抜き出して得られる書式が乱れた文書を整える用途を想定． 指定したファイルの中身が長いときには，分割して，ChatGPT 3.5 turbo で処理を行う．このプログラムの利用では，OpenAI の APIキーが必要である．

OpenAI の APIキーを準備する
【関連する外部ページ】
- OpenAI の API キーのページ
  https://platform.openai.com/api-keys
- 料金の条件や利用履歴はこちらで確認．
  https://platform.openai.com/settings/organization/limits
Windows で，コマンドプロンプトを実行
エディタを起動
cd /d c:%HOMEPATH% notepad arrange.py

エディタで，次のプログラムを保存

'''
ChatGPT 3.5 turbo に「次の日本語の文章を整えてください.」のプロンプトを与えることにより，
整形して，最終結果をファイルに保存する．
指定したファイルの中身が長いときには，分割して，ChatGPT 3.5 turbo で処理を行う．
このプログラムの利用では，APIキーが必要である．
[利用法]
python arrange.py --input input.txt --output output.txt --api_key your_api_key
'''

import argparse
import openai
import sys
import textwrap
import time


MAX_CHUNK_LENGTH = 1500
DEBUG_PRINT = False


def get_arguments():
    parser = argparse.ArgumentParser(description='ChatGPT Text Refinement.')
    parser.add_argument('--input', type=str, required=True,
                        help='Input file path')
    parser.add_argument('--output', type=str, required=True,
                        help='Output file path')
    parser.add_argument('--api_key', type=str, required=True,
                        help='OpenAI API Key')
    parser.add_argument('--remove_url_and_source_code', type=bool, default=False,
                        help='Remove URL and source code from text')
    parser.add_argument('--model', type=str, default="gpt-3.5-turbo",
                        help='GPT model')

    args = parser.parse_args()
    return args


def send_messages(model, content, chunk):
    messages = [
        {"role": "system", "content": content},
        {"role": "user", "content": chunk}
    ]

    if DEBUG_PRINT:
        print("----------------------------------------------------")
        print(messages)

    try:
        # APIにリクエスト
        response = openai.ChatCompletion.create(
            model=model,
            messages=messages,
        )
    except openai.api_resources.abstract.Error as e:
        print(f"Failed to send request to OpenAI API: {str(e)}")
        sys.exit(1)

    return response['choices'][0]['message']['content']


def handle_chunk(model, content, current_chunk):
    results = []
    # テキストの長さが長すぎる場合、その位置で分割
    chunks = textwrap.wrap(
        current_chunk, width=MAX_CHUNK_LENGTH, break_long_words=True)
    for chunk in chunks:
        # APIにリクエスト
        response = send_messages(model, content, chunk)
        if DEBUG_PRINT:
            print("----------------------------------------------------")
            print("response,", response)
        # レスポンスをリストに追加
        results.append(response)
        # APIのレート制限．1分間に40kトークン以下、1分間に200以下リクエストを目指したい
        time.sleep(20)
    return results


def request(model, content, text):
    sentences = text.split("\n")

    results = []
    current_chunk = ""

    for sentence in sentences:
        # 一定の長さに達するまで文章を追加
        if len(current_chunk) + len(sentence) < (MAX_CHUNK_LENGTH - 100):
            current_chunk += sentence + "\n"
        else:
            if len(current_chunk) > 0:
                results += handle_chunk(model, content, current_chunk)
            # 現在のチャンクをリセット．次の文を設定
            current_chunk = sentence + "\n"

    # 最終チャンクの処理
    if current_chunk:
        results += handle_chunk(model, content, current_chunk)

    if DEBUG_PRINT:
        print("----------------------------------------------------")
        print("results,", results)
    # 結合して最終的な結果を作成
    final_result = "\n".join(results)
    return final_result


def main():
    args = get_arguments()

    # OpenAIのAPIキーを設定
    openai.api_key = args.api_key

    # ファイル名は，コマンドライン引数
    filename = args.input
    output_filename = args.output

    # コマンドライン引数から remove_url_and_source_code の値を取得
    remove_url_and_source_code = args.remove_url_and_source_code

    # コマンドライン引数から使用する GPT モデル名を取得
    model = args.model

    try:
        with open(filename, 'r', encoding='utf-8') as file:
            text = file.read()
    except FileNotFoundError:
        print(f"The file {filename} was not found.")
        sys.exit(1)

    if remove_url_and_source_code:
        prompt = "Please examine the provided Japanese text, and enhance its format by correcting any grammar, spelling, syntax, and punctuation errors, while maintaining the original meaning. If you identify any clear redundancies or disorganized sections, restructure them to improve readability, while ensuring the original intent remains intact. Please exclude any source code and URL in the output. Do not translate or predict subsequent sentences. The output should be in Japanese."
    else:
        prompt = "Please examine the provided Japanese text, and enhance its format by correcting any grammar, spelling, syntax, and punctuation errors, while maintaining the original meaning. If you identify any clear redundancies or disorganized sections, restructure them to improve readability, while ensuring the original intent remains intact. Do not translate or predict subsequent sentences. The output should be in Japanese."

    final_result = request(
        model,
        prompt, text)

    try:
        # 結果をファイルに書き込む
        with open(output_filename, 'w', encoding='utf-8') as output_file:
            output_file.write(final_result)
        print("\033[32m結果が保存されました．入力ファイル名は", filename, "出力ファイル名は", output_filename, "\033[0m")
    except IOError as e:
        print("\033[31mファイルへの書き込みに失敗しました．エラー内容:", str(e), "\033[0m")
        sys.exit(1)


if __name__ == "__main__":
    main()

Python プログラムの実行
Python プログラムの実行
- Windows では python （Python ランチャーは py）
- Ubuntu では python3
【サイト内の関連ページ】 Python のまとめ: 別ページ »
プログラムを arrange.pyのようなファイル名で保存したので，「python arrange.py」のようなコマンドで行う．
input.txt のところには処理したいファイル名を指定すること．
output.txt のところには結果を保存したいファイル名を指定すること．
your_api_key のところには OpenAI の API キーを指定すること．
もし，ソースコードおよび URL を取り除いて処理したいときは「--remove_url_and_source_code 1」を付けることができる．
既定（デフォルト）のモデルは，「gpt-3.5-turbo」である．もしモデルを変更したいときは，「--model ＜モデル名＞」のように指定できる．
python arrange.py --input input.txt --output output.txt --api_key your_api_key
結果の確認
- 処理前のテキストファイルの先頭部分
  
  （以下省略）
- 処理結果のテキストファイルの先頭部分
  処理結果では，文章として整うように整形される．
  
  （以下省略）

文書の要約プログラム（ChatGPT API，Python を使用）

指定したテキストファイルを読み込み， ChatGPT 3.5 turbo にプロンプトを与えることにより，要約して，最終結果をファイルに保存する．指定したファイルの中身が長いときには，分割して，ChatGPT 3.5 turbo で処理を行い，処理結果を結合するようにしている． 要約した結果が長いときは，要約を繰り返す．

OpenAI の API キーを準備しておく
OpenAI の API キーのページ
https://platform.openai.com/api-keys
料金の条件や利用履歴はこちらで確認．
https://platform.openai.com/settings/organization/limits
Windows で，コマンドプロンプトを実行
エディタを起動
cd /d c:%HOMEPATH% notepad summary.py

エディタで，次のプログラムを保存

'''
指定したテキストファイルを読み込み，
ChatGPT 3.5 turbo にプロンプトを与えることにより，要約して，最終結果をファイルに保存する．指定したファイルの中身が長いときには，分割して，ChatGPT 3.5 turbo で処理を行い，処理結果を結合するようにしている．
要約した結果が長いときは，要約を繰り返す．
[利用法]
python arrange.py --input input.txt --output output.txt --api_key your_api_key
'''

import argparse
import openai
import sys
import textwrap
import time


MAX_CHUNK_LENGTH = 1500
DEBUG_PRINT = False
CHARACTERS = 500

def get_arguments():
    parser = argparse.ArgumentParser(description='ChatGPT Text Refinement.')
    parser.add_argument('--input', type=str, required=True,
                        help='Input file path')
    parser.add_argument('--output', type=str, required=True,
                        help='Output file path')
    parser.add_argument('--api_key', type=str, required=True,
                        help='OpenAI API Key')
    parser.add_argument('--remove_url_and_source_code', type=bool, default=False,
                        help='Remove URL and source code from text')
    parser.add_argument('--model', type=str, default="gpt-3.5-turbo",
                        help='GPT model')

    args = parser.parse_args()
    return args


def send_messages(model, content, chunk):
    messages = [
        {"role": "system", "content": content},
        {"role": "user", "content": chunk}
    ]

    if DEBUG_PRINT:
        print("----------------------------------------------------")
        print(messages)

    try:
        # APIにリクエスト
        response = openai.ChatCompletion.create(
            model=model,
            messages=messages,
        )
    except openai.api_resources.abstract.Error as e:
        print(f"Failed to send request to OpenAI API: {str(e)}")
        sys.exit(1)

    return response['choices'][0]['message']['content']


def handle_chunk(model, content, current_chunk):
    results = []
    # テキストの長さが長すぎる場合、その位置で分割
    chunks = textwrap.wrap(
        current_chunk, width=MAX_CHUNK_LENGTH, break_long_words=True)
    for chunk in chunks:
        # APIにリクエスト
        response = send_messages(model, content, chunk)
        if DEBUG_PRINT:
            print("----------------------------------------------------")
            print("response,", response)
        # レスポンスをリストに追加
        results.append(response)
        # APIのレート制限．1分間に40kトークン以下、1分間に200以下リクエストを目指したい
        time.sleep(20)
    return results


def request(model, content, text):
    sentences = text.split("\n")

    results = []
    current_chunk = ""

    for sentence in sentences:
        # 一定の長さに達するまで文章を追加
        if len(current_chunk) + len(sentence) < (MAX_CHUNK_LENGTH - 100):
            current_chunk += sentence + "\n"
        else:
            if len(current_chunk) > 0:
                results += handle_chunk(model, content, current_chunk)
            # 現在のチャンクをリセット．次の文を設定
            current_chunk = sentence + "\n"

    # 最終チャンクの処理
    if current_chunk:
        results += handle_chunk(model, content, current_chunk)

    if DEBUG_PRINT:
        print("----------------------------------------------------")
        print("results,", results)
    # 結合して最終的な結果を作成
    final_result = "\n".join(results)
    return final_result


def main():
    args = get_arguments()

    # OpenAIのAPIキーを設定
    openai.api_key = args.api_key

    # ファイル名は，コマンドライン引数
    filename = args.input
    output_filename = args.output

    # コマンドライン引数から remove_url_and_source_code の値を取得
    remove_url_and_source_code = args.remove_url_and_source_code

    # コマンドライン引数から使用する GPT モデル名を取得
    model = args.model

    try:
        with open(filename, 'r', encoding='utf-8') as file:
            text = file.read()
    except FileNotFoundError:
        print(f"The file {filename} was not found.")
        sys.exit(1)

    prompt1 = "Please provide a summary of the supplied Japanese text, excluding any source code and URL. Do not translate or predict any future sentences. The output should be a single paragraph and it should be in Japanese."
    prompt2 = "Please provide a summary of the supplied Japanese text. Do not translate or predict any future sentences. The output should be a single paragraph and it should be in Japanese."

    if remove_url_and_source_code:
        prompt = prompt1
    else:
        prompt = prompt2

    final_result = request(
        model,
        prompt, text)

    if DEBUG_PRINT:
        print("----------------------------------------------------")
        print("final_result,", final_result)

    while(len(final_result) > CHARACTERS):
        # 長いので、もう一度要約する。
        text = final_result
        final_result = request(
            "gpt-3.5-turbo",
            prompt, text)

        if DEBUG_PRINT:
            print("----------------------------------------------------")
            print("final_result,", final_result)

    # 全体を整えるように ChatGPT に頼む
    text = final_result
    final_result = request(
        "gpt-3.5-turbo",
        "As a professional proofreader, your task is to refine and correct the provided Japanese text while preserving its original meaning. The refined utput should not translate or predict any future sentences, and it should be consolidated into a single paragraph. Importantly, the output must remain in Japanese and not exceed " + str(CHARACTERS) + " characters.", text)

    if DEBUG_PRINT:
        print("----------------------------------------------------")
        print("final_result,", final_result)


    try:
        # 結果をファイルに書き込む
        with open(output_filename, 'w', encoding='utf-8') as output_file:
            output_file.write(final_result.replace('\r', '').replace('\n', ''))
        print("\033[32m結果が保存されました．入力ファイル名は", filename, "出力ファイル名は", output_filename, "\033[0m")
    except IOError as e:
        print("\033[31mファイルへの書き込みに失敗しました．エラー内容:", str(e), "\033[0m")
        sys.exit(1)


if __name__ == "__main__":
    main()

Python プログラムの実行
Python プログラムの実行
- Windows では python （Python ランチャーは py）
- Ubuntu では python3
【サイト内の関連ページ】 Python のまとめ: 別ページ »
プログラムを summary.pyのようなファイル名で保存したので，「python summary.py」のようなコマンドで行う．
output.txt のところには処理したいファイル名を指定すること．
summary.txt のところには結果を保存したいファイル名を指定すること．
your_api_key のところには OpenAI の API キーを指定すること．
もし，ソースコードおよび URL を取り除いて処理したいときは「--remove_url_and_source_code 1」を付けることができる．
既定（デフォルト）のモデルは，「gpt-3.5-turbo」である．もしモデルを変更したいときは，「--model ＜モデル名＞」のように指定できる．
python arrange.py --input output.txt --output summary.txt --api_key your_api_key
結果の確認
- 処理前のテキストファイルの先頭部分
  
  （以下省略）
- 処理結果のテキストファイル
  処理結果では要約される．文字数は「CHARACTERS = 500」で調整してください．