CIFAR 10, CIFAR 100, MNIST, Fashion MNIST データセットの主成分分析プロット（Python, matplotlib, seaborn を使用）

Keras に付属のデータセットの主成分分析を行い，その結果として偉える第１主成分スコアと第２主成分スコアをプロットする．

【目次】

前準備
データセットの準備とデータセットの主成分分析プロット

【関連する外部ページ】

keras に付属のデータセットに関する Web ページ: https://keras.io/ja/datasets/

Google Colaboratory のページ:

次のリンクをクリックすると，Google Colaboratory のノートブックが開く．そして，Google アカウントでログインすると，Google Colaboratory のノートブック内のコード等を編集したり再実行したりができる．編集した場合でも，他の人に影響が出たりということはない．そして，編集後のものを，各自の Google ドライブ内に保存することもできる．

https://colab.research.google.com/drive/1Blm3l62DN_4dqUoltwhq-sdtsfr7ZaiU?usp=sharing

1. 前準備

Python の準備（Windows，Ubuntu 上）

Windows での Python 3.10，関連パッケージ，Python 開発環境のインストール（winget を使用しないインストール）: 別ページ »で説明
Ubuntu では，システム Pythonを使うことができる．Python3 開発用ファイル，pip, setuptools のインストール: 別ページ »で説明

【サイト内の関連ページ】

Python のまとめ: 別ページ »にまとめ
Google Colaboratory の使い方など: 別ページ »で説明

【関連する外部ページ】 Python の公式ページ: https://www.python.org/

TensorFlow，tensorflow_datasets のインストール

Windows の場合:
Windows で pip を実行するときは，コマンドプロンプトを管理者として開き，それを使って pip を実行することにする．
python -m pip uninstall -y tensorflow tensorflow-cpu tensorflow-gpu tensorflow-intel tensorflow-text tensorflow-estimator tf-models-official tf_slim tensorflow_datasets tensorflow-hub keras keras-tuner keras-visualizer python -m pip install -U tensorflow tensorflow_datasets
Windows でのインストール詳細（NVIDIA ドライバ，NVIDIA CUDA ツールキット，NVIDIA cuDNN, TensorFlow 関連ソフトウェアを含む）: 別ページ »で説明

Ubuntu の場合:

次のコマンドを実行．

sudo pip3 uninstall -y tensorflow tensorflow-cpu tensorflow-gpu tensorflow-intel tensorflow-text tensorflow-estimator tf-models-official tf_slim tensorflow_datasets tensorflow-hub keras keras-tuner keras-visualizer
sudo pip3 uninstall -y six wheel astunparse tensorflow-estimator numpy keras-preprocessing absl-py wrapt gast flatbuffers grpcio opt-einsum protobuf termcolor typing-extensions google-pasta h5py tensorboard-plugin-wit markdown werkzeug requests-oauthlib rsa cachetools google-auth google-auth-oauthlib tensorboard tensorflow
sudo apt -y install python3-six python3-wheel python3-numpy python3-grpcio python3-protobuf python3-termcolor python3-typing-extensions python3-h5py python3-markdown python3-werkzeug python3-requests-oauthlib python3-rsa python3-cachetools python3-google-auth
sudo pip3 install -U tensorflow-gpu tensorflow_datasets

Ubuntu でのインストール詳細（NVIDIA ドライバ，NVIDIA CUDA ツールキット，NVIDIA cuDNN, TensorFlow 関連ソフトウェアを含む）: 別ページ »で説明

Python の numpy, pandas, seaborn, matplotlib, scikit-learn のインストール

Windows の場合
Windows では，コマンドプロンプトを 管理者として実行し，次のコマンドを実行する．
python -m pip install -U pip setuptools numpy pandas matplotlib seaborn scikit-learn scikit-learn-intelex

Ubuntu の場合

端末で，次のコマンドを実行

# パッケージリストの情報を更新
sudo apt update
sudo apt -y install python3-numpy python3-pandas python3-seaborn python3-matplotlib python3-sklearn

2. データセットの準備とデータセットの主成分分析プロット

keras に付属のデータセットに関する Web ページ: https://keras.io/ja/datasets/

主成分分析プロットの前準備

import pandas as pd
import seaborn as sns
sns.set()
import numpy as np
import sklearn.decomposition
%matplotlib inline
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')   # Suppress Matplotlib warnings

# 主成分分析
def prin(A, n):
    pca = sklearn.decomposition.PCA(n_components=n)
    return pca.fit_transform(A)

# 主成分分析で２つの成分を得る
def prin2(A):
    return prin(A, 2)

# M の最初の2列を，b で色を付けてプロット
def scatter_plot(M, b, alpha):
    a12 = pd.DataFrame( M[:,0:2], columns=['a1', 'a2'] )
    a12['target'] = b
    sns.scatterplot(x='a1', y='a2', hue='target', data=a12, palette=sns.color_palette("hls", np.max(b) + 1), legend="full", alpha=alpha)

# 主成分分析プロット
def pcaplot(A, b, alpha):
    scatter_plot(prin2(A), b, alpha)

CIFAR10 データセット

CIFAR10 データセットのロード

from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
import numpy as np
import tensorflow_datasets as tfds

%matplotlib inline
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')   # Suppress Matplotlib warnings

cifar10, cifar10_metadata = tfds.load('cifar10', with_info = True, shuffle_files=True, as_supervised=True, batch_size = -1)
x_train, y_train, x_test, y_test = cifar10['train'][0], cifar10['train'][1], cifar10['test'][0], cifar10['test'][1]
print(cifar10_metadata)
# 【x_train, x_test, y_train, y_test の numpy ndarray への変換と，値の範囲の調整（値の範囲が 0 ～ 255 であるのを，0 ～ 1 に調整）する】
print(type(x_train), x_train.shape, np.max(x_train), np.min(x_train))
print(type(x_test), x_test.shape, np.max(x_test), np.min(x_test))
print(type(y_train), y_train.shape, np.max(y_train), np.min(y_train))
print(type(y_test), y_test.shape, np.max(y_test), np.min(y_test))
# numpy に変換
x_train = x_train.numpy().astype("float32") / 255.0
x_test = x_test.numpy().astype("float32") / 255.0
y_train = y_train.numpy()
y_test = y_test.numpy()
print(type(x_train), x_train.shape, np.max(x_train), np.min(x_train))
print(type(x_test), x_test.shape, np.max(x_test), np.min(x_test))
print(type(y_train), y_train.shape, np.max(y_train), np.min(y_train))
print(type(y_test), y_test.shape, np.max(y_test), np.min(y_test))

CIFAR10 データセットの主成分分析プロット

x_train = x_train.reshape(x_train.shape[0], -1) # サブフラット化
x_test = x_test.reshape(x_test.shape[0], -1) # サブフラット化
print(x_train.shape)
print(x_test.shape)
pcaplot(np.concatenate( (x_train, x_test) ), np.concatenate( (y_train, y_test) ), 0.1)

CIFAR 100データセット

CIFAR100 データセットのロード

from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
import numpy as np
import tensorflow_datasets as tfds

%matplotlib inline
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')   # Suppress Matplotlib warnings

cifar100, cifar100_metadata = tfds.load('cifar100', with_info = True, shuffle_files=True, as_supervised=True, batch_size = -1)
x_train, y_train, x_test, y_test = cifar100['train'][0], cifar100['train'][1], cifar100['test'][0], cifar100['test'][1]
print(cifar100_metadata)
# 【x_train, x_test, y_train, y_test の numpy ndarray への変換と，値の範囲の調整（値の範囲が 0 ～ 255 であるのを，0 ～ 1 に調整）する】
print(type(x_train), x_train.shape, np.max(x_train), np.min(x_train))
print(type(x_test), x_test.shape, np.max(x_test), np.min(x_test))
print(type(y_train), y_train.shape, np.max(y_train), np.min(y_train))
print(type(y_test), y_test.shape, np.max(y_test), np.min(y_test))
# numpy に変換
x_train = x_train.numpy().astype("float32") / 255.0
x_test = x_test.numpy().astype("float32") / 255.0
y_train = y_train.numpy()
y_test = y_test.numpy()
print(type(x_train), x_train.shape, np.max(x_train), np.min(x_train))
print(type(x_test), x_test.shape, np.max(x_test), np.min(x_test))
print(type(y_train), y_train.shape, np.max(y_train), np.min(y_train))
print(type(y_test), y_test.shape, np.max(y_test), np.min(y_test))

CIFAR100 データセットの主成分分析プロット

x_train = x_train.reshape(x_train.shape[0], -1) # サブフラット化
x_test = x_test.reshape(x_test.shape[0], -1) # サブフラット化
print(x_train.shape)
print(x_test.shape)
pcaplot(np.concatenate( (x_train, x_test) ), np.concatenate( (y_train, y_test) ), 0.1)

MNIST データセットのロード

次の Python プログラムを用いて，MNIST データセットのロードを行う．

データセットの準備

from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
import numpy as np
import tensorflow_datasets as tfds

%matplotlib inline
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')   # Suppress Matplotlib warnings

mnist, mnist_metadata = tfds.load('mnist', with_info = True, shuffle_files=True, as_supervised=True, batch_size = -1)
x_train, y_train, x_test, y_test = mnist['train'][0], mnist['train'][1], mnist['test'][0], mnist['test'][1]
print(mnist_metadata)
# 【x_train, x_test, y_train, y_test の numpy ndarray への変換と，値の範囲の調整（値の範囲が 0 ～ 255 であるのを，0 ～ 1 に調整）する】
print(type(x_train), x_train.shape, np.max(x_train), np.min(x_train))
print(type(x_test), x_test.shape, np.max(x_test), np.min(x_test))
print(type(y_train), y_train.shape, np.max(y_train), np.min(y_train))
print(type(y_test), y_test.shape, np.max(y_test), np.min(y_test))
# numpy に変換
x_train = x_train.numpy().astype("float32") / 255.0
x_test = x_test.numpy().astype("float32") / 255.0
y_train = y_train.numpy()
y_test = y_test.numpy()
print(type(x_train), x_train.shape, np.max(x_train), np.min(x_train))
print(type(x_test), x_test.shape, np.max(x_test), np.min(x_test))
print(type(y_train), y_train.shape, np.max(y_train), np.min(y_train))
print(type(y_test), y_test.shape, np.max(y_test), np.min(y_test))

MNISTデータセットの主成分分析プロット

x_train = x_train.reshape(x_train.shape[0], -1) # サブフラット化
x_test = x_test.reshape(x_test.shape[0], -1) # サブフラット化
print(x_train.shape)
print(x_test.shape)
pcaplot(np.concatenate( (x_train, x_test) ), np.concatenate( (y_train, y_test) ), 0.1)

Fashion MNIST データセット

Fashion MNIST データセットのロード

from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
import numpy as np
import tensorflow_datasets as tfds

%matplotlib inline
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')   # Suppress Matplotlib warnings

fashion_mnist, fashion_mnist_metadata = tfds.load('fashion_mnist', with_info = True, shuffle_files=True, as_supervised=True, batch_size = -1)
x_train, y_train, x_test, y_test = fashion_mnist['train'][0], fashion_mnist['train'][1], fashion_mnist['test'][0], fashion_mnist['test'][1]
print(fashion_mnist_metadata)
# 【x_train, x_test, y_train, y_test の numpy ndarray への変換と，値の範囲の調整（値の範囲が 0 ～ 255 であるのを，0 ～ 1 に調整）する】
print(type(x_train), x_train.shape, np.max(x_train), np.min(x_train))
print(type(x_test), x_test.shape, np.max(x_test), np.min(x_test))
print(type(y_train), y_train.shape, np.max(y_train), np.min(y_train))
print(type(y_test), y_test.shape, np.max(y_test), np.min(y_test))
# numpy に変換
x_train = x_train.numpy().astype("float32") / 255.0
x_test = x_test.numpy().astype("float32") / 255.0
y_train = y_train.numpy()
y_test = y_test.numpy()
print(type(x_train), x_train.shape, np.max(x_train), np.min(x_train))
print(type(x_test), x_test.shape, np.max(x_test), np.min(x_test))
print(type(y_train), y_train.shape, np.max(y_train), np.min(y_train))
print(type(y_test), y_test.shape, np.max(y_test), np.min(y_test))

Fashion MNIST データセットの主成分分析プロット

x_train = x_train.reshape(x_train.shape[0], -1) # サブフラット化
x_test = x_test.reshape(x_test.shape[0], -1) # サブフラット化
print(x_train.shape)
print(x_test.shape)
pcaplot(np.concatenate( (x_train, x_test) ), np.concatenate( (y_train, y_test) ), 0.1)