k-means++

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

k-means++

Ferebee Tunno
Hi everyone -

I know that R is capable of clustering using the k-means algorithm, but can
R do k-means++ clustering as well?

Thanks,

--
Dr. Ferebee Tunno
Assistant Professor
Department of Mathematics and Statistics
Arkansas State University
P.O. Box 70
State University, AR. 72467
[hidden email]
(870) 329-7710

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: k-means++

Hans W Borchers
Ferebee Tunno <ferebee.tunno <at> mathstat.astate.edu> writes:

> Hi everyone -
>
> I know that R is capable of clustering using the k-means algorithm, but can
> R do k-means++ clustering as well?

k-means++ is a routine to suggest center points before the classical k-means
is called. The following lines of code will do that, where X is a matrix of
data points, as requested for kmeans, and k the number of centers:

    kmpp <- function(X, k) {
        n <- nrow(X)
        C <- numeric(k)
        C[1] <- sample(1:n, 1)

        for (i in 2:k) {
            dm <- distmat(X, X[C, ])
            pr <- apply(dm, 1, min); pr[C] <- 0
            C[i] <- sample(1:n, 1, prob = pr)
        }

        kmeans(X, X[C, ])
    }

Here distmat(a, b) should return the distances between the rows of two
matrices a and b There may be several implementations in R, one is distmat()
in package pracma.

Please note that AFAIK it is not clear whether the approach of kmeans++ is
really better than, e.g., kmeans with several restarts.

Hans Werner

>
> Thanks,
>
> --
> Dr. Ferebee Tunno
> Assistant Professor
> Department of Mathematics and Statistics
> Arkansas State University
> P.O. Box 70
> State University, AR. 72467
> [hidden email]
> (870) 329-7710

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.