Non-hierarchical cluster

Non-nested agglomeration procedures are also called K-media agglomeration methods. In non – hierarchical ROCEDURES not built trees . Instead, objects are assigned to clusters once the number of clusters to form is specified.

Summary

[ hide ]

  • 1 Functions in R. software
  • 2 K-means grouping
  • 3 PAM algorithm
  • 4 CLARA algorithm
  • 5 FANNY algorithm
  • 6 Requirements
  • 7 Sources

Functions in R. software

In the R language (R Development Core Team2016) a series of non-hierarchical algorithms are implemented. In the stats package you can find the kmeans () function of the K-medias agglomeration method and in the cluster package the functions pam (), clara () and fanny () for the pam, clara and fanny methods respectively. Examples for the four methods in R codes:

Load database
data (iris)
head (iris)
remove column five and scale data
iris.scaled <- scale (iris [, -5])

K-means grouping

R has the kmeans () function with 4 grouping methods.
set.seed (123)
km.HW <- kmeans (iris.scaled, 3, algorithm = “Hartigan-Wong”, nstart = 25)
km.L <- kmeans (iris.scaled, 3, algorithm = “Lloyd”, nstart = 25)
km.F <- kmeans (iris.scaled, 3, algorithm = “Forgy”, nstart = 25)
km.MQ <- kmeans (iris.scaled, 3, algorithm = “MacQueen”, nstart = 25)
km.HW $ cluster # cluster number of membership of each individual
km.HW $ centers # Centers of the groups (average of the individuals in each variable)
km.HW $ withinss # Sum of squares within groups.
km.HW $ size # Number of individuals assigned to each group
km.HW $ totss # Total sum of squares
km.HW $ tot.withinss # Sum of squares of the 3 groups
km.HW $ betweenss # The subtraction of the total sum of squares minus the sum of #squares of the 3 groups (totss – tot.withinss)
plot (iris. scaled, col = km.HW $ cluster) # Display of the conglomerates
points (km.HW $ centers, col = 1: 2, pch = 8, cex = 2)

 

PAM algorithm

(PartitioningAroundMedoids), uses k-medoid to identify clusters, works fine on small databases, but is slow on large ones. A medoidese could be defined as the object belonging to a cluster or conglomerate, whose dissimilarity average to all the objects in the conglomerate is minimal, that is, it can be considered as the most centric point of the considered grouping.

PAM grouping
pam.I <- pam (iris.scaled, 3)
summary (pam.res)
plot (pam.res)

 

CLARA algorithm

(ClusteringLargeApplications) creates multiple samples of the data and then applies PAM to the sample.
CLARA grouping
clara.I <- clara (iris.scaled, 2)
clara.I
clara.I $ clusinfo
plot (clara.I, ask = TRUE)

 

FANNY algorithm

uses a value “k” that indicates the number of groups to form 0 <k <n / 2; n is the number of observations. It is a cluster analysis where each individual has a diffuse degree of group membership.
Grouping FANNY
fanny.I <- fanny (iris.scaled, 3)
summary (fanny.I)
plot (fanny.I, ask = TRUE)

Requirements

Its execution requires a Windows® platform and the object-oriented programming language, called R. This is a programming language and an environment for statistical analysis and graphing.

 

by Abdullah Sam
I’m a teacher, researcher and writer. I write about study subjects to improve the learning of college and university students. I write top Quality study notes Mostly, Tech, Games, Education, And Solutions/Tips and Tricks. I am a person who helps students to acquire knowledge, competence or virtue.

Leave a Comment