Iris データセット（Python を使用）

Iris データセットを紹介する．利用条件は利用者で確認すること．

【目次】

Google Colab へのリンク
前準備
Iris データセットのロードと確認表示（scikit-learn を使用）
Iris データセットのロードと確認表示（TensorFlow データセットを使用）

行数: 150行
属性: sepal length, sepal width, petal length, petal width, species

【文献】 R.A. Fisher, The use of multiple measurements in taxonomic problems, Annual Eugenics, 7, Part II, pp. 179-188, 1936.

【サイト内の関連ページ】

Iris データセットについての説明資料: iris.pdf [PDF], [パワーポイント]
Iris データセットを扱う Python プログラム: 別ページで説明している．

【関連する外部ページ】

TensorFlow データセットの iris データセット: https://www.tensorflow.org/datasets/catalog/iris

1. Google Colab へのリンク

Google Colaboratory のページ:

次のリンクをクリックすると，Google Colaboratory のノートブックが開く．そして，Google アカウントでログインすると，Google Colaboratory のノートブック内のコード等を編集したり再実行したりができる．編集した場合でも，他の人に影響が出たりということはない．そして，編集後のものを，各自の Google ドライブ内に保存することもできる．

https://colab.research.google.com/drive/10u8owk1y9l-OocyenRZuDb0sKxxicVwK?usp=sharing

2. 前準備

Python 3.12 のインストール（Windows 上） [クリックして展開]

以下のいずれかの方法で Python 3.12 をインストールする。Python がインストール済みの場合、この手順は不要である。

方法1：winget によるインストール

管理者権限のコマンドプロンプトで以下を実行する。管理者権限のコマンドプロンプトを起動するには、Windows キーまたはスタートメニューから「cmd」と入力し、表示された「コマンドプロンプト」を右クリックして「管理者として実行」を選択する。

winget install --id Python.Python.3.12 -e --scope machine --silent --accept-source-agreements --accept-package-agreements --override "/quiet InstallAllUsers=1 PrependPath=1 Include_test=0 Include_pip=1 Include_launcher=1 InstallLauncherAllUsers=1 TargetDir=\"C:\Program Files\Python312\""
powershell -Command "$p='C:\Program Files\Python312'; $s=\"$p\Scripts\"; $m=[Environment]::GetEnvironmentVariable('Path','Machine'); if($m -notlike \"*$s*\") { [Environment]::SetEnvironmentVariable('Path', \"$p;$s;$m\", 'Machine') }"

--scope machine を指定することで、システム全体（全ユーザー向け）にインストールされる。このオプションの実行には管理者権限が必要である。インストール完了後、コマンドプロンプトを再起動すると PATH が自動的に設定される。

方法2：インストーラーによるインストール

Python 公式サイト（https://www.python.org/downloads/）にアクセスし、「Download Python 3.x.x」ボタンから Windows 用インストーラーをダウンロードする。
ダウンロードしたインストーラーを実行する。
初期画面の下部に表示される「Add python.exe to PATH」に必ずチェックを入れてから「Customize installation」を選択する。このチェックを入れ忘れると、コマンドプロンプトから python コマンドを実行できない。
「Install Python 3.xx for all users」にチェックを入れ、「Install」をクリックする。

インストールの確認

コマンドプロンプトで以下を実行する。

python --version

バージョン番号（例：Python 3.12.x）が表示されればインストール成功である。「'python' は、内部コマンドまたは外部コマンドとして認識されていません。」と表示される場合は、インストールが正常に完了していない。

AIエディタ Windsurf のインストール（Windows 上） [クリックして展開]

Pythonプログラムの編集・実行には、AIエディタの利用を推奨する。ここでは、Windsurfのインストールを説明する。Windsurf がインストール済みの場合、この手順は不要である。

winget install --scope machine --id Codeium.Windsurf -e --silent --disable-interactivity --force --accept-source-agreements --accept-package-agreements --custom "/SP- /SUPPRESSMSGBOXES /NORESTART /CLOSEAPPLICATIONS /DIR=""C:\Program Files\Windsurf"" /MERGETASKS=!runcode,addtopath,associatewithfiles,!desktopicon"
powershell -Command "$env:Path=[System.Environment]::GetEnvironmentVariable('Path','Machine')+';'+[System.Environment]::GetEnvironmentVariable('Path','User'); windsurf --install-extension MS-CEINTL.vscode-language-pack-ja --force; windsurf --install-extension ms-python.python --force; windsurf --install-extension Codeium.windsurfPyright --force"

【関連する外部ページ】

Windsurf の公式ページ: https://windsurf.com/

TensorFlow データセット, scikit-learn のインストール

Windows の場合

Windows では，コマンドプロンプトを 管理者として実行し，次のコマンドを実行する．

Windows で pip を実行するときは，管理者権限のコマンドプロンプトを使用し，システム領域へのインストールを行う．

python -m pip install -U tensorflow-gpu tensorflow_datasets scikit-learn scikit-learn-intelex

Windows での TensorFlow のインストールの詳細: 別ページ »で説明

（このページで，Build Tools for Visual Studio 2022，NVIDIA ドライバ， NVIDIA CUDA ツールキット， NVIDIA cuDNNのインストールも説明している．）

Ubuntu の場合

Ubuntu では，次のコマンドを実行．

# パッケージリストの情報を更新
sudo apt update
sudo apt -y install python3-sklearn
sudo pip3 install -U tensorflow-gpu tensorflow_datasets

Ubuntu での TensorFlow のインストールの詳細: 別ページ »で説明

（このページで，NVIDIA ドライバ， NVIDIA CUDA ツールキット， NVIDIA cuDNNのインストールも説明している．）

3. Iris データセットのロードと確認表示（scikit-learn を使用）

Python 3.12 のインストール（Windows 上） [クリックして展開]

以下のいずれかの方法で Python 3.12 をインストールする。Python がインストール済みの場合、この手順は不要である。

方法1：winget によるインストール

winget install --id Python.Python.3.12 -e --scope machine --silent --accept-source-agreements --accept-package-agreements --override "/quiet InstallAllUsers=1 PrependPath=1 Include_test=0 Include_pip=1 Include_launcher=1 InstallLauncherAllUsers=1 TargetDir=\"C:\Program Files\Python312\""
powershell -Command "$p='C:\Program Files\Python312'; $s=\"$p\Scripts\"; $m=[Environment]::GetEnvironmentVariable('Path','Machine'); if($m -notlike \"*$s*\") { [Environment]::SetEnvironmentVariable('Path', \"$p;$s;$m\", 'Machine') }"

方法2：インストーラーによるインストール

Python 公式サイト（https://www.python.org/downloads/）にアクセスし、「Download Python 3.x.x」ボタンから Windows 用インストーラーをダウンロードする。
ダウンロードしたインストーラーを実行する。
初期画面の下部に表示される「Add python.exe to PATH」に必ずチェックを入れてから「Customize installation」を選択する。このチェックを入れ忘れると、コマンドプロンプトから python コマンドを実行できない。
「Install Python 3.xx for all users」にチェックを入れ、「Install」をクリックする。

インストールの確認

コマンドプロンプトで以下を実行する。

python --version

AIエディタ Windsurf のインストール（Windows 上） [クリックして展開]

winget install --scope machine --id Codeium.Windsurf -e --silent --disable-interactivity --force --accept-source-agreements --accept-package-agreements --custom "/SP- /SUPPRESSMSGBOXES /NORESTART /CLOSEAPPLICATIONS /DIR=""C:\Program Files\Windsurf"" /MERGETASKS=!runcode,addtopath,associatewithfiles,!desktopicon"
powershell -Command "$env:Path=[System.Environment]::GetEnvironmentVariable('Path','Machine')+';'+[System.Environment]::GetEnvironmentVariable('Path','User'); windsurf --install-extension MS-CEINTL.vscode-language-pack-ja --force; windsurf --install-extension ms-python.python --force; windsurf --install-extension Codeium.windsurfPyright --force"

【関連する外部ページ】

Windsurf の公式ページ: https://windsurf.com/

scikit-learn を使用，データフレームにロード

Pandas データフレームの df にロードしている．

import pandas as pd
from sklearn.datasets import load_iris
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
print(df)

scikit-learn を使用，配列にロード

次の Python プログラムは，配列 X, y にロードしている．

import pandas as pd
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target
print(X)
print(y)

4. Iris データセットのロードと確認表示（TensorFlow データセットを使用）

Iris データセットのロード

import tensorflow_datasets as tfds
iris, iris_info = tfds.load('iris', with_info = True, shuffle_files=True, as_supervised=True)

データセットの情報を表示

print(iris_info)
print(iris_info.features["label"].num_classes)
print(iris_info.features["label"].names)

tf.data.Dataset オブジェクトによるデータセットの生成

ロード時に「as_supervised=False」としたときは，「features, label = data['features'], data['label']」

import tensorflow as tf
ds_train = iris['train']
it = ds_train.cache().shuffle(1000).batch(128).prefetch(tf.data.experimental.AUTOTUNE)
for data in it.take(1):
    features, label = data[0], data[1]
    print(features)
    print(label)

TensorFlow データセットの Iris データセットをデータフレームに変換
```
train = tfds.as_dataframe(iris['train'], iris_info)
print(train)
```
データフレームの行列と属性数
行数は len(＜データフレーム＞), 属性数は len(＜データフレーム＞.columns)
```
print(len(train))
print(len(train.columns))
```

データセットの先頭 10行をデータフレームに変換

train = tfds.as_dataframe(iris['train'].take(10), iris_info)
print(train)

データフレームの行列と属性数
行数は len(＜データフレーム＞), 属性数は len(＜データフレーム＞.columns)
```
print(len(train))
print(len(train.columns))
```