金子邦彦研究室研究道具箱と教材オープンデータ階層クラスタリング(数値ベクトルの集合)(a set of numeric vectors)

階層クラスタリング(数値ベクトルの集合)(a set of numeric vectors)

サンプルデータ

■ サンプルデータ2次元

require(data.table)
require(ggplot2)

x1 <- rnorm(100)
y1 <- rnorm(100) * 0.2

a2 <- rnorm(200)
b2 <- rnorm(200) * 0.4
x2 <- a2 * 2 + b2 + 3
y2 <- - a2 + 2 * b2 + 4

D <- data.table( x=c(x1,x2), y=c(y1,y2) )
p <- ggplot(D, aes(x = x, y = y))
p + geom_point(size = 2)

[image]

■ サンプルデータ6次元

require(data.table)
require(ggplot2)
require(mvtnorm)

T <- data.table( rbind(rmvnorm(200, rep(0, 6), diag(c(5, rep(1,5)))),
     rmvnorm( 15, c(0, rep(20, 5)), diag(rep(1, 6)))) )
plot(T)

[image]

階層クラスタリング (hierarchical clustering)

mclust パッケージの Mclust() 関数を用いたクラスタリング (clusering using Mclust() in the mclust package)

  1. 準備
    install.packages("mclust")
    

  2. クラスタリング結果のプロット (plot clustering result)

    ■ テストデータ

    require(mclust)
    C = Mclust(D)
    summary(C)
    plot(C)
    

    [image]
    require(mclust)
    C2 = Mclust(T)
    summary(C2)
    plot(C2)
    

    [image]

    ■ iris

    require(mclust)
    C3 = Mclust(iris[,c(1:4)])
    summary(C3)
    plot(C3)
    

    [image]

    [image]

  3. クラスタリング結果を整数ベクトルで得る (get clustering result)

    ■ テストデータ

    C$classification
    C$uncertainty
    

    [image]
    C2$classification
    C2$uncertainty
    

    [image]

    ■ iris

    C3$classification
    C3$uncertainty
    

    [image]

  4. 元データとクラスタに関する属性の表示

    ■ テストデータ

    C$d
    C$n
    C$G
    

    [image]
    C2$d
    C2$n
    C2$G
    

    [image]

    ■ iris

    C3$d
    C3$n
    C3$G
    

    [image]

  5. Mclust VVV モデル (ellipsoidal, varying volume, shape, and orientation) model) の各クラスタの属性の表示

階層クラスタリングと clPairs によるプロット

■ テストデータ

hcTree <- hc(modelName = "VVV", data = D)
cl <- hclass(hcTree, c(2,3,4,5))

## 
par(pty = "s", mfrow = c(1,1))
clPairs(D, cl=cl[,"2"])
clPairs(D, cl=cl[,"5"])

[image]

[image]
hcTree <- hc(modelName = "VVV", data = T)
cl <- hclass(hcTree, c(2,3,4,5))

## 
par(pty = "s", mfrow = c(1,1))
clPairs(T, cl=cl[,"2"])
clPairs(T, cl=cl[,"5"])

[image]

[image]

■ iris

hcTree <- hc(modelName = "VVV", data = iris[,c(1:4)])
cl <- hclass(hcTree,c(2,3,4,5))

## Not run: 
par(pty = "s", mfrow = c(1,1))
clPairs(iris[,c(1:4)],cl=cl[,"2"])
clPairs(iris[,c(1:4)],cl=cl[,"5"])

[image]

[image]

階層クラスタリングと coordProj によるプロット

■ テストデータ

hcTree <- hc(modelName = "VVV", data = D)
cl <- hclass(hcTree, c(2,3,4,5))

par(mfrow = c(1,2))
dimens <- c(1,2)
coordProj(D, dimens = dimens, classification=cl[,"2"])
coordProj(D, dimens = dimens, classification=cl[,"5"])

[image]

hcTree <- hc(modelName = "VVV", data = T)
cl <- hclass(hcTree, c(2,3,4,5))

par(mfrow = c(1,2))
dimens <- c(1,2)
coordProj(T, dimens = dimens, classification=cl[,"2"])
coordProj(T, dimens = dimens, classification=cl[,"5"])

[image]

■ iris

hcTree <- hc(modelName = "VVV", data = iris[,c(1:4)])
cl <- hclass(hcTree, c(2,3,4,5))

par(mfrow = c(1,2))
dimens <- c(1,2)
coordProj(iris[,c(1:4)], dimens = dimens, classification=cl[,"2"])
coordProj(iris[,c(1:4)], dimens = dimens, classification=cl[,"5"])

[image]

density plot

■ テストデータ

require(mclust)
dens = densityMclust(D)
plot(dens)
plot(dens, D)

[image]
require(mclust)
dens = densityMclust(T)
plot(dens)
plot(dens, T)

[image]

[image]

■ iris

require(mclust)
dens = densityMclust(iris[,1:4])
plot(dens)
plot(dens, iris[,1:4])

[image]

[image]

density plot 色付き

■ テストデータ

require(mclust)
C = Mclust(D)
dens = densityMclust(D)
plot(dens)
plot(dens, D, col = "grey",
points.col = mclust.options()$classPlotColors[C$classification],
pch = C$classification)

[image]
require(mclust)
C2 = Mclust(T)
dens = densityMclust(T)
plot(dens, T, col = "grey",
points.col = mclust.options()$classPlotColors[C2$classification],
pch = C2$classification)

[image]

■ iris

require(mclust)
C3 = Mclust(iris[,c(1:4)])
dens = densityMclust(iris[,c(1:4)])
plot(dens, iris[,c(1:4)], col = "grey",
points.col = mclust.options()$classPlotColors[C3$classification],
pch = C3$classification)

[image]

prediction

require(mclust)
C3 = Mclust(iris[,c(1:4)])
ct <- table(iris$Species, C3$classification)
ct

[image]