Ferebee Tunno <ferebee.tunno <at> mathstat.astate.edu> writes:

> Hi everyone -

>

> I know that R is capable of clustering using the k-means algorithm, but can

> R do k-means++ clustering as well?

k-means++ is a routine to suggest center points before the classical k-means

is called. The following lines of code will do that, where X is a matrix of

data points, as requested for kmeans, and k the number of centers:

kmpp <- function(X, k) {

n <- nrow(X)

C <- numeric(k)

C[1] <- sample(1:n, 1)

for (i in 2:k) {

dm <- distmat(X, X[C, ])

pr <- apply(dm, 1, min); pr[C] <- 0

C[i] <- sample(1:n, 1, prob = pr)

}

kmeans(X, X[C, ])

}

Here distmat(a, b) should return the distances between the rows of two

matrices a and b There may be several implementations in R, one is distmat()

in package pracma.

Please note that AFAIK it is not clear whether the approach of kmeans++ is

really better than, e.g., kmeans with several restarts.

Hans Werner

>

> Thanks,

>

> --

> Dr. Ferebee Tunno

> Assistant Professor

> Department of Mathematics and Statistics

> Arkansas State University

> P.O. Box 70

> State University, AR. 72467

>

[hidden email]
> (870) 329-7710

______________________________________________

[hidden email] mailing list

https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide

http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.