K-means results understanding!!!

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Dzu
Reply | Threaded
Open this post in threaded view
|

K-means results understanding!!!

Dzu
Dear  members.

I am having problems to understand the kmeans- results in R. I am applying kmeans-algorithms to my big data file, and it is producing the results of the clusters.

Q1) Does anybody knows how to find out in which cluster (I have fixed numberofclusters = 5 ) which data have been used?
COMMAND
(kmeans.results <- kmeans(mydata,centers =5, iter.max= 1000, nstart =10000))

Q2) When I call kmeans.results I have the following output:


K-means clustering with 5 clusters of sizes 17, 1, 6, 4, 32

Cluster means:
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]     [,11]        [,12]
1    0    0    0    0    0    0    0    0    0     0 0.0000000 0.0008235294
2    0    0    0    0    0    0    0    0    0     0 0.0000000 0.0000000000
3    0    0    0    0    0    0    0    0    0     0 0.0000000 0.0000000000
4    0    0    0    0    0    0    0    0    0     0 0.0000000 0.0040000000
5    0    0    0    0    0    0    0    0    0     0 0.0003125 0.0003750000
         [,13]       [,14]       [,15]       [,16]       [,17]      [,18]
1 0.0008235294 0.001176471 0.005176471 0.012471295 0.041181652 0.10663935
2 0.0000000000 0.000000000 0.000000000 0.000000000 0.169491525 0.61016949
3 0.0000000000 0.000000000 0.000000000 0.002333333 0.006666667 0.07695015
4 0.0030000000 0.001500000 0.001000000 0.017500000 0.029000000 0.06150000
5 0.0015625000 0.003437500 0.010687500 0.046375000 0.100062500 0.14306250
       [,19]     [,20]     [,21]     [,22]      [,23]      [,24]       [,25]
1 0.12946535 1.0017347 0.3360283 0.2455259 0.08565672 0.02553212 0.006000000
2 0.94915254 0.1694915 0.1016949 0.0000000 0.00000000 0.00000000 0.000000000
3 0.09376439 1.3857837 0.2659812 0.1015707 0.03804953 0.02023362 0.007666667
4 0.17100000 0.6665000 0.7860000 0.1860000 0.04650000 0.01450000 0.012000000
5 0.18100000 0.5200625 0.4156875 0.3461250 0.16925000 0.04918750 0.011500000
         [,26]       [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35]
1 0.0005882353 0.001176471     0     0     0     0     0     0     0     0
2 0.0000000000 0.000000000     0     0     0     0     0     0     0     0
3 0.0010000000 0.000000000     0     0     0     0     0     0     0     0
4 0.0000000000 0.000000000     0     0     0     0     0     0     0     0
5 0.0013125000 0.000000000     0     0     0     0     0     0     0     0
  [,36] [,37] [,38] [,39] [,40]
1     0     0     0     0     0
2     0     0     0     0     0
3     0     0     0     0     0
4     0     0     0     0     0
5     0     0     0     0     0

Clustering vector:
 [1] 1 5 5 3 1 5 5 5 5 1 4 1 5 5 5 5 4 5 2 3 5 5 1 5 5 5 5 1 3 1 4 5 5 1 5 5 5 1
[39] 3 1 5 5 3 1 1 1 1 5 5 1 4 1 3 5 5 5 5 5 5 1

Within cluster sum of squares by cluster:
[1] 0.6702803 0.0000000 0.2453294 0.1860180 1.3535263
 (between_SS / total_SS =  76.8 %)

Available components:

[1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
[6] "betweenss"    "size"        
>
Q3)I would like to understand which raw data are in which cluster ?  Does somebody knows how to access the table of raw data which are in the same cluster ?

Thanks for help
DZU
Ms.Dizem Uerek
Reply | Threaded
Open this post in threaded view
|

Re: K-means results understanding!!!

David Carlson
You should read the help page

?kmeans

Especially the section labeled "Value" which tells you what kmeans
returns. You will see that the cluster membership is returned as a
vector of integers called "cluster." If you don't know how to access
that from kmeans.results, you haven't read any of the basic
tutorials on R.

-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352


-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Dzu
Sent: Monday, June 24, 2013 4:25 AM
To: [hidden email]
Subject: [R] K-means results understanding!!!

Dear  members.

I am having problems to understand the kmeans- results in R. I am
applying
kmeans-algorithms to my big data file, and it is producing the
results of
the clusters.

Q1) Does anybody knows how to find out in which cluster (I have
fixed
numberofclusters = 5 ) which data have been used?
COMMAND
(kmeans.results <- kmeans(mydata,centers =5, iter.max= 1000, nstart
=10000))

Q2) When I call kmeans.results I have the following output:


K-means clustering with 5 clusters of sizes 17, 1, 6, 4, 32

Cluster means:
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]     [,11]
[,12]
1    0    0    0    0    0    0    0    0    0     0 0.0000000
0.0008235294
2    0    0    0    0    0    0    0    0    0     0 0.0000000
0.0000000000
3    0    0    0    0    0    0    0    0    0     0 0.0000000
0.0000000000
4    0    0    0    0    0    0    0    0    0     0 0.0000000
0.0040000000
5    0    0    0    0    0    0    0    0    0     0 0.0003125
0.0003750000
         [,13]       [,14]       [,15]       [,16]       [,17]
[,18]
1 0.0008235294 0.001176471 0.005176471 0.012471295 0.041181652
0.10663935
2 0.0000000000 0.000000000 0.000000000 0.000000000 0.169491525
0.61016949
3 0.0000000000 0.000000000 0.000000000 0.002333333 0.006666667
0.07695015
4 0.0030000000 0.001500000 0.001000000 0.017500000 0.029000000
0.06150000
5 0.0015625000 0.003437500 0.010687500 0.046375000 0.100062500
0.14306250
       [,19]     [,20]     [,21]     [,22]      [,23]      [,24]
[,25]
1 0.12946535 1.0017347 0.3360283 0.2455259 0.08565672 0.02553212
0.006000000
2 0.94915254 0.1694915 0.1016949 0.0000000 0.00000000 0.00000000
0.000000000
3 0.09376439 1.3857837 0.2659812 0.1015707 0.03804953 0.02023362
0.007666667
4 0.17100000 0.6665000 0.7860000 0.1860000 0.04650000 0.01450000
0.012000000
5 0.18100000 0.5200625 0.4156875 0.3461250 0.16925000 0.04918750
0.011500000
         [,26]       [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34]
[,35]
1 0.0005882353 0.001176471     0     0     0     0     0     0     0
0
2 0.0000000000 0.000000000     0     0     0     0     0     0     0
0
3 0.0010000000 0.000000000     0     0     0     0     0     0     0
0
4 0.0000000000 0.000000000     0     0     0     0     0     0     0
0
5 0.0013125000 0.000000000     0     0     0     0     0     0     0
0
  [,36] [,37] [,38] [,39] [,40]
1     0     0     0     0     0
2     0     0     0     0     0
3     0     0     0     0     0
4     0     0     0     0     0
5     0     0     0     0     0

Clustering vector:
 [1] 1 5 5 3 1 5 5 5 5 1 4 1 5 5 5 5 4 5 2 3 5 5 1 5 5 5 5 1 3 1 4 5
5 1 5 5
5 1
[39] 3 1 5 5 3 1 1 1 1 5 5 1 4 1 3 5 5 5 5 5 5 1

Within cluster sum of squares by cluster:
[1] 0.6702803 0.0000000 0.2453294 0.1860180 1.3535263
 (between_SS / total_SS =  76.8 %)

Available components:

[1] "cluster"      "centers"      "totss"        "withinss"    
"tot.withinss"
[6] "betweenss"    "size"        
>
Q3)I would like to understand which raw data are in which cluster ?
Does
somebody knows how to access the table of raw data which are in the
same
cluster ?

Thanks for help
DZU



--
View this message in context:
http://r.789695.n4.nabble.com/K-means-results-understanding-tp467017
1.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Dzu
Reply | Threaded
Open this post in threaded view
|

Re: K-means results understanding!!!

Dzu
Hi,
Thanks for reply but I already read the help page I am new in R and did not understand the output description of kmeans -function. That is why I wanted to ask some experts in the group.

My point is that I do not understand which data are combined in the specific cluster?

I tried the following :

(kmeans.results <- kmeans(mydata,centers =4, iter.max= 1000, nstart =10000))
# The output data type is logical , cl1 is the cluster 1

cl1 <- data.frame(as.numeric(kmeans.results$cluster == 1))
nbcl1 <- sum (cl1, na.rm = 1)
#output of the number of cl1 logical 1 values is for example 22
#this means there are 22 vectors which are similar

but when I call  :
mydata[kmeans.results$cluster==1,]
I only get 1 vector not 22 vectors that are in the cluster 1.

I thought in the cluster 1 there are many vectors that are similar based on kmeans -function. But the output is only one vector!
Ms.Dizem Uerek