Octave で書かれた K-means プログラムの例
Octave で書かれた K-means プログラムの例を示す.K-means はクラスタリングのアルゴリズム. なぜ Octave で書いているかというと,学習用,またいろいろな距離関数 (metric function) を試したいという気持ちがあり, C 言語などにこだわらず(速いことにこだわらず),Octave で書かれたK-means プログラムにも価値があると考えたからです.
* 試しに動作させましたが,Octave でも,むちゃくちゃは遅くありません. 私の Linux マシンで 512×512 のカラー画像をクラスタ数 10 でクラスタリングしたところ 20秒弱.
* とはいえ,性能をとても重視する(ソースコードには興味ない)場合には,「k-means クラスタリングの例 (Open-CV を使用) 」の Web ページに,OpenCV を利用した K-means の Octave プログラムを載せていますので,そちらを見てください.
前準備 (Pre-requisite)
Octave のインストール (Install Octave)
The following code is the K-Means algorithm implementationwrittein in Octave
# # The following code is the K-Means algorithm implementationwrittein in Octave # Octave で書かれた K-means アルゴリズムの例 # # The following code is the K-Means algorithm implementationwrittein in Octave # Octave で書かれた K-means アルゴリズムの例 function [width, height] = mat_size(X) height = size(X)(1); width = size(X)(2); endfunction function Metrics = calc_metrics(Data, p) # metrics between all data points in 'Data' and 'p'. Diff = (Data - repmat( p, size(Data)(1), 1 ))'; # Euclidian distance Metrics = sum(Diff .* Diff); clear Diff endfunction function num = find_empty_cluster(Data, assignment, n_clusters) for i = 1:n_clusters # select data points in the i-th cluster, Points = Data( find( assignment(:,2) == i ), : ); if ( size(Points)(1) == 0 ) num = i; clear Points; return; endif clear Points; endfor num = 0; endfunction function [NewCentroidPoints, assignment] = kmeans_step(Data, CentroidPoints, n_clusters) # get the number of data points (= height) and the dimesnion (= width) [width, height] = mat_size(Data); # 'D' is a matrix to store the distance between data points and centroids # データ点と centroids 間の距離を行列 D に格納する D = zeros(height,n_clusters); for i = 1:n_clusters D(:,i) = calc_metrics( Data, CentroidPoints(i,:) ); endfor # the shortes distance from each data point to any centroids # 各データポイントから centroids までの最短距離 shortest = min(D')'; # get the centroid number for each data point # 各データポイントについて,最も近い centroid の番号を得る [m,n] = find( D == repmat(shortest,1,n_clusters) ); assignment = zeros(height,2); assignment(:,1) = 1:height; # data point number assignment(m,2) = n; # cluster number done = 0; while ( done == 0 ) # try to find an empty cluster empty_cluster_num = find_empty_cluster(Data, assignment, n_clusters); if ( empty_cluster_num > 0 ) # assign the farest data point to the empty cluster data_point_num = find( shortest == max(shortest) ); assignment(data_point_num, empty_cluster_num); # new, the farest data point becomes a centroid shorttest(data_point_num) = 0; else done = 1; endif endwhile clear shortest; clear D; # new centoid of each cluster # 各クラスタの centroid を求める for i = 1:n_clusters # select data points in the i-th cluster, Points = Data( find( assignment(:,2) == i ), : ); # compute the centroid of i-th cluster if ( size(Points)(1) == 1 ) NewCentroidPoints(i,:) = Points; else NewCentroidPoints(i,:) = mean(Points); endif clear Points; endfor endfunction #' # Read Color Image Data [rgb, immap, alpha] = imread("lena_std.jpg"); mono = rgb2gray( rgb ); colormap(gray(256)); # Initialize 3-dimensional data points # Set color image data into 'Data' A = double(rgb)/255; Data = zeros( size(A)(1) * size(A)(2), size(A)(3) ); Data(:,1) = reshape( A(:,:,1), size(A)(1) * size(A)(2), 1 ); Data(:,2) = reshape( A(:,:,2), size(A)(1) * size(A)(2), 1 ); Data(:,3) = reshape( A(:,:,3), size(A)(1) * size(A)(2), 1 ); n_clusters = 10; # get the number of data points (= height) and the dimesnion (= width) [width, height] = mat_size(Data); # clusters not yet constructed. # select different 'n_clusters' data points number randomly to create initial 'centoroids' # クラスタはまだできていない # 'centroids' の初期値を得るために,'n_clusters' 個の異なるデータ点番号をランダムに選ぶ. CentroidPoints = Data( randperm(height)(1:n_clusters), : ); [NewCentroidPoints, assignment] = kmeans_step(Data, CentroidPoints, n_clusters); a = 1; while( a > 0 ) CentroidPoints = NewCentroidPoints; clear NewCentroidPoints; old_assignment = assignment(:,2); clear assignment; [NewCentroidPoints, assignment] = kmeans_step(Data, CentroidPoints, n_clusters); # assignment changed ? # 'assignment' に変更があったか a = max( abs( old_assignment - assignment(:,2) ) ); endwhile V = NewCentroidPoints( assignment(:,2),: ); NEWIMAGE = zeros( size(A)(1), size(A)(2), size(A)(3) ); NEWIMAGE(:,:,1) = reshape( V(:,1), size(A)(1), size(A)(2) ); NEWIMAGE(:,:,2) = reshape( V(:,2), size(A)(1), size(A)(2) ); NEWIMAGE(:,:,3) = reshape( V(:,3), size(A)(1), size(A)(2) ); clear V; imwrite(NEWIMAGE, "new.png"); disp "done"
Input Data Example
Output Data Example