Help with K-Means output

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Help with K-Means output

Bill Poling
Good afternoon. I hope I have provided enough info to get my question answered.

I am running windows 10 -- R3.5.1 -- RStudio Version 1.1.456

When running a K-Means clustering routine is it possible to get the actual data from each cluster into a DF?

I have reviewed a number of tutorials and unless I missed it somewhere I would like to know if it is possible.

https://www.datacamp.com/community/tutorials/k-means-clustering-r
https://www.guru99.com/r-k-means-clustering.html
https://datascienceplus.com/k-means-clustering-in-r/
https://datascienceplus.com/finding-optimal-number-of-clusters/
http://enhancedatascience.com/2017/10/24/machine-learning-explained-kmeans/
http://enhancedatascience.com/2017/04/30/r-basics-k-means-r/

For example:

I ran the below and get K-means clustering with 10 clusters of sizes 1511, 1610, 702, 926, 996, 1076, 580, 2429, 728, 3797
Can the 1511 values of SavingsReversed and ProviderID , 1610 values of SavingsReversed and ProviderID, etc.. be run out into DF's?

Thank you for your help.

WHP

str(rr0)
Classes 'data.table' and 'data.frame':14355 obs. of  2 variables:
 $ SavingsReversed: num  0 0 61 128 160 ...
 $ ProviderID     : num  113676 113676 116494 116641 116641 ...
 - attr(*, ".internal.selfref")=<externalptr>

head(rr0, n=35)
    SavingsReversed ProviderID
 1:            0.00     113676
 2:            0.00     113676
 3:           61.00     116494
 4:          128.25     116641
 5:          159.60     116641
 6:          372.66     119316
 7:           18.79     121319
 8:           15.64     121319
 9:            0.00     121319
10:           18.79     121319
11:           23.00     121319
12:           18.79     121319
13:            0.00     121319
14:           25.86     121319
15:           14.00     121319
16:          113.00     121545
17:           50.00     121545
18:         1155.32     121545
19:          113.00     121545
20:          197.20     121545
21:            0.00     121780
22:           36.00     122536
23:         1171.32     125198
24:         1171.32     125198
25:           43.00     125303
26:            0.00     125881
27:           69.64     128435
28:          420.18     128435
29:          175.18     128435
30:           71.54     128435
31:           99.85     128435
32:            0.00     128435
33:           42.75     128435
34:          175.18     128435
35:          846.45     128435

set.seed(213)
rr0a <- kmeans(rr0, 10)
View(rr0a)
summary(rr0a)
# Length Class  Mode
# cluster      14355  -none- numeric
# centers         20  -none- numeric
# totss            1  -none- numeric
# withinss        10  -none- numeric
# tot.withinss     1  -none- numeric
# betweenss        1  -none- numeric
# size            10  -none- numeric
# iter             1  -none- numeric
# ifault           1  -none- numeric

x1 <- as.data.frame(rr0a$centers)
sort(x1)
#SavingsReversed ProviderID
# 2         75.19665  2773789.2
# 3         99.31959  4147091.6
# 5        101.21070  3558532.7
# 4        103.41147  3893274.4
# 1        105.38310  2241031.2
# 8        114.61562  3240701.5
# 10       121.14184  4718727.6
# 9        153.70536  4470878.9
# 6        156.84426  5560636.6
# 7        185.09745   173732.9
print(rr0a)
# K-means clustering with 10 clusters of sizes 1511, 1610, 702, 926, 996, 1076, 580, 2429, 728, 3797
#
# Cluster means:
#   SavingsReversed ProviderID
# 1        105.38310  2241031.2
# 2         75.19665  2773789.2
# 3         99.31959  4147091.6
# 4        103.41147  3893274.4
# 5        101.21070  3558532.7
# 6        156.84426  5560636.6
# 7        185.09745   173732.9
# 8        114.61562  3240701.5
# 9        153.70536  4470878.9
# 10       121.14184  4718727.6
#Within cluster sum of squares by cluster:
# [1] 74529288379846 25846368411171  4692898666512  6277704963344  8428785199973 90824041558798  1468798013919 12143462193009  5483877005233
# [10] 51547955737867
# (between_SS / total_SS =  98.7 %)
#
# Available components:
#
#   [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss" "betweenss"    "size"         "iter"         "ifault"









Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}}

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with K-Means output

Bert Gunter-2
Please see ?kmeans and note the "cluster" component of the returned value
that would appear to provide the info you seek.

-- Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Dec 8, 2018 at 7:03 AM Bill Poling <[hidden email]> wrote:

> Good afternoon. I hope I have provided enough info to get my question
> answered.
>
> I am running windows 10 -- R3.5.1 -- RStudio Version 1.1.456
>
> When running a K-Means clustering routine is it possible to get the actual
> data from each cluster into a DF?
>
> I have reviewed a number of tutorials and unless I missed it somewhere I
> would like to know if it is possible.
>
> https://www.datacamp.com/community/tutorials/k-means-clustering-r
> https://www.guru99.com/r-k-means-clustering.html
> https://datascienceplus.com/k-means-clustering-in-r/
> https://datascienceplus.com/finding-optimal-number-of-clusters/
> http://enhancedatascience.com/2017/10/24/machine-learning-explained-kmeans/
> http://enhancedatascience.com/2017/04/30/r-basics-k-means-r/
>
> For example:
>
> I ran the below and get K-means clustering with 10 clusters of sizes 1511,
> 1610, 702, 926, 996, 1076, 580, 2429, 728, 3797
> Can the 1511 values of SavingsReversed and ProviderID , 1610 values of
> SavingsReversed and ProviderID, etc.. be run out into DF's?
>
> Thank you for your help.
>
> WHP
>
> str(rr0)
> Classes 'data.table' and 'data.frame':14355 obs. of  2 variables:
>  $ SavingsReversed: num  0 0 61 128 160 ...
>  $ ProviderID     : num  113676 113676 116494 116641 116641 ...
>  - attr(*, ".internal.selfref")=<externalptr>
>
> head(rr0, n=35)
>     SavingsReversed ProviderID
>  1:            0.00     113676
>  2:            0.00     113676
>  3:           61.00     116494
>  4:          128.25     116641
>  5:          159.60     116641
>  6:          372.66     119316
>  7:           18.79     121319
>  8:           15.64     121319
>  9:            0.00     121319
> 10:           18.79     121319
> 11:           23.00     121319
> 12:           18.79     121319
> 13:            0.00     121319
> 14:           25.86     121319
> 15:           14.00     121319
> 16:          113.00     121545
> 17:           50.00     121545
> 18:         1155.32     121545
> 19:          113.00     121545
> 20:          197.20     121545
> 21:            0.00     121780
> 22:           36.00     122536
> 23:         1171.32     125198
> 24:         1171.32     125198
> 25:           43.00     125303
> 26:            0.00     125881
> 27:           69.64     128435
> 28:          420.18     128435
> 29:          175.18     128435
> 30:           71.54     128435
> 31:           99.85     128435
> 32:            0.00     128435
> 33:           42.75     128435
> 34:          175.18     128435
> 35:          846.45     128435
>
> set.seed(213)
> rr0a <- kmeans(rr0, 10)
> View(rr0a)
> summary(rr0a)
> # Length Class  Mode
> # cluster      14355  -none- numeric
> # centers         20  -none- numeric
> # totss            1  -none- numeric
> # withinss        10  -none- numeric
> # tot.withinss     1  -none- numeric
> # betweenss        1  -none- numeric
> # size            10  -none- numeric
> # iter             1  -none- numeric
> # ifault           1  -none- numeric
>
> x1 <- as.data.frame(rr0a$centers)
> sort(x1)
> #SavingsReversed ProviderID
> # 2         75.19665  2773789.2
> # 3         99.31959  4147091.6
> # 5        101.21070  3558532.7
> # 4        103.41147  3893274.4
> # 1        105.38310  2241031.2
> # 8        114.61562  3240701.5
> # 10       121.14184  4718727.6
> # 9        153.70536  4470878.9
> # 6        156.84426  5560636.6
> # 7        185.09745   173732.9
> print(rr0a)
> # K-means clustering with 10 clusters of sizes 1511, 1610, 702, 926, 996,
> 1076, 580, 2429, 728, 3797
> #
> # Cluster means:
> #   SavingsReversed ProviderID
> # 1        105.38310  2241031.2
> # 2         75.19665  2773789.2
> # 3         99.31959  4147091.6
> # 4        103.41147  3893274.4
> # 5        101.21070  3558532.7
> # 6        156.84426  5560636.6
> # 7        185.09745   173732.9
> # 8        114.61562  3240701.5
> # 9        153.70536  4470878.9
> # 10       121.14184  4718727.6
> #Within cluster sum of squares by cluster:
> # [1] 74529288379846 25846368411171  4692898666512  6277704963344
> 8428785199973 90824041558798  1468798013919 12143462193009  5483877005233
> # [10] 51547955737867
> # (between_SS / total_SS =  98.7 %)
> #
> # Available components:
> #
> #   [1] "cluster"      "centers"      "totss"        "withinss"
>  "tot.withinss" "betweenss"    "size"         "iter"         "ifault"
>
>
>
>
>
>
>
>
>
> Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}}
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with K-Means output

David Carlson
You should also read the manual page for ?split and learn how to work with lists:

# Split the data according to cluster membership
# to create a list of data frames
rr0.clus <- split(rr0, rr0a$cluster)

# The data frame for cluster 1:
rr0.clus[[1]]

--------------------------------------------------------
David L. Carlson
Department of Anthropology
Texas A&M University

-----Original Message-----
From: R-help [mailto:[hidden email]] On Behalf Of Bert Gunter
Sent: Saturday, December 8, 2018 9:46 AM
To: [hidden email]
Cc: R-help <[hidden email]>
Subject: Re: [R] Help with K-Means output

Please see ?kmeans and note the "cluster" component of the returned value
that would appear to provide the info you seek.

-- Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Dec 8, 2018 at 7:03 AM Bill Poling <[hidden email]> wrote:

> Good afternoon. I hope I have provided enough info to get my question
> answered.
>
> I am running windows 10 -- R3.5.1 -- RStudio Version 1.1.456
>
> When running a K-Means clustering routine is it possible to get the actual
> data from each cluster into a DF?
>
> I have reviewed a number of tutorials and unless I missed it somewhere I
> would like to know if it is possible.
>
> https://www.datacamp.com/community/tutorials/k-means-clustering-r
> https://www.guru99.com/r-k-means-clustering.html
> https://datascienceplus.com/k-means-clustering-in-r/
> https://datascienceplus.com/finding-optimal-number-of-clusters/
> http://enhancedatascience.com/2017/10/24/machine-learning-explained-kmeans/
> http://enhancedatascience.com/2017/04/30/r-basics-k-means-r/
>
> For example:
>
> I ran the below and get K-means clustering with 10 clusters of sizes 1511,
> 1610, 702, 926, 996, 1076, 580, 2429, 728, 3797
> Can the 1511 values of SavingsReversed and ProviderID , 1610 values of
> SavingsReversed and ProviderID, etc.. be run out into DF's?
>
> Thank you for your help.
>
> WHP
>
> str(rr0)
> Classes 'data.table' and 'data.frame':14355 obs. of  2 variables:
>  $ SavingsReversed: num  0 0 61 128 160 ...
>  $ ProviderID     : num  113676 113676 116494 116641 116641 ...
>  - attr(*, ".internal.selfref")=<externalptr>
>
> head(rr0, n=35)
>     SavingsReversed ProviderID
>  1:            0.00     113676
>  2:            0.00     113676
>  3:           61.00     116494
>  4:          128.25     116641
>  5:          159.60     116641
>  6:          372.66     119316
>  7:           18.79     121319
>  8:           15.64     121319
>  9:            0.00     121319
> 10:           18.79     121319
> 11:           23.00     121319
> 12:           18.79     121319
> 13:            0.00     121319
> 14:           25.86     121319
> 15:           14.00     121319
> 16:          113.00     121545
> 17:           50.00     121545
> 18:         1155.32     121545
> 19:          113.00     121545
> 20:          197.20     121545
> 21:            0.00     121780
> 22:           36.00     122536
> 23:         1171.32     125198
> 24:         1171.32     125198
> 25:           43.00     125303
> 26:            0.00     125881
> 27:           69.64     128435
> 28:          420.18     128435
> 29:          175.18     128435
> 30:           71.54     128435
> 31:           99.85     128435
> 32:            0.00     128435
> 33:           42.75     128435
> 34:          175.18     128435
> 35:          846.45     128435
>
> set.seed(213)
> rr0a <- kmeans(rr0, 10)
> View(rr0a)
> summary(rr0a)
> # Length Class  Mode
> # cluster      14355  -none- numeric
> # centers         20  -none- numeric
> # totss            1  -none- numeric
> # withinss        10  -none- numeric
> # tot.withinss     1  -none- numeric
> # betweenss        1  -none- numeric
> # size            10  -none- numeric
> # iter             1  -none- numeric
> # ifault           1  -none- numeric
>
> x1 <- as.data.frame(rr0a$centers)
> sort(x1)
> #SavingsReversed ProviderID
> # 2         75.19665  2773789.2
> # 3         99.31959  4147091.6
> # 5        101.21070  3558532.7
> # 4        103.41147  3893274.4
> # 1        105.38310  2241031.2
> # 8        114.61562  3240701.5
> # 10       121.14184  4718727.6
> # 9        153.70536  4470878.9
> # 6        156.84426  5560636.6
> # 7        185.09745   173732.9
> print(rr0a)
> # K-means clustering with 10 clusters of sizes 1511, 1610, 702, 926, 996,
> 1076, 580, 2429, 728, 3797
> #
> # Cluster means:
> #   SavingsReversed ProviderID
> # 1        105.38310  2241031.2
> # 2         75.19665  2773789.2
> # 3         99.31959  4147091.6
> # 4        103.41147  3893274.4
> # 5        101.21070  3558532.7
> # 6        156.84426  5560636.6
> # 7        185.09745   173732.9
> # 8        114.61562  3240701.5
> # 9        153.70536  4470878.9
> # 10       121.14184  4718727.6
> #Within cluster sum of squares by cluster:
> # [1] 74529288379846 25846368411171  4692898666512  6277704963344
> 8428785199973 90824041558798  1468798013919 12143462193009  5483877005233
> # [10] 51547955737867
> # (between_SS / total_SS =  98.7 %)
> #
> # Available components:
> #
> #   [1] "cluster"      "centers"      "totss"        "withinss"
>  "tot.withinss" "betweenss"    "size"         "iter"         "ifault"
>
>
>
>
>
>
>
>
>
> Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}}
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with K-Means output

Bill Poling
In reply to this post by Bert Gunter-2
Thank you Bert, I see, so I think this is the process?

set.seed(213)
rr0a1 <- kmeans(rr0, 10)

summary(rr0a1) #Just the cluster
#Length Class  Mode
#cluster      14355  -none- numeric

head(rr0a1$cluster, n=35)
# [1] 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7

Xcluster <- as.data.frame(rr0a1$cluster)

head(Xcluster, n=5)
#rr0a1$cluster
# 1             7
# 2             7
# 3             7
# 4             7
# 5             7

tail(Xcluster, n=5)
#rr0a1$cluster
# 14351             6
# 14352             6
# 14353             6
# 14354             6
# 14355             6

And I can just join this DF with my original DF used for the KMean, correct?
The vertical order is the same?

WHP


From: Bert Gunter <[hidden email]>
Sent: Saturday, December 8, 2018 10:46 AM
To: Bill Poling <[hidden email]>
Cc: R-help <[hidden email]>
Subject: Re: [R] Help with K-Means output

Please see ?kmeans and note the "cluster" component of the returned value that would appear to provide the info you seek.

-- Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Dec 8, 2018 at 7:03 AM Bill Poling <mailto:[hidden email]> wrote:
Good afternoon. I hope I have provided enough info to get my question answered.

I am running windows 10 -- R3.5.1 -- RStudio Version 1.1.456

When running a K-Means clustering routine is it possible to get the actual data from each cluster into a DF?

I have reviewed a number of tutorials and unless I missed it somewhere I would like to know if it is possible.

https://www.datacamp.com/community/tutorials/k-means-clustering-r
https://www.guru99.com/r-k-means-clustering.html
https://datascienceplus.com/k-means-clustering-in-r/
https://datascienceplus.com/finding-optimal-number-of-clusters/
http://enhancedatascience.com/2017/10/24/machine-learning-explained-kmeans/
http://enhancedatascience.com/2017/04/30/r-basics-k-means-r/

For example:

I ran the below and get K-means clustering with 10 clusters of sizes 1511, 1610, 702, 926, 996, 1076, 580, 2429, 728, 3797
Can the 1511 values of SavingsReversed and ProviderID , 1610 values of SavingsReversed and ProviderID, etc.. be run out into DF's?

Thank you for your help.

WHP

str(rr0)
Classes 'data.table' and 'data.frame':14355 obs. of  2 variables:
 $ SavingsReversed: num  0 0 61 128 160 ...
 $ ProviderID     : num  113676 113676 116494 116641 116641 ...
 - attr(*, ".internal.selfref")=<externalptr>

head(rr0, n=35)
    SavingsReversed ProviderID
 1:            0.00     113676
 2:            0.00     113676
 3:           61.00     116494
 4:          128.25     116641
 5:          159.60     116641
 6:          372.66     119316
 7:           18.79     121319
 8:           15.64     121319
 9:            0.00     121319
10:           18.79     121319
11:           23.00     121319
12:           18.79     121319
13:            0.00     121319
14:           25.86     121319
15:           14.00     121319
16:          113.00     121545
17:           50.00     121545
18:         1155.32     121545
19:          113.00     121545
20:          197.20     121545
21:            0.00     121780
22:           36.00     122536
23:         1171.32     125198
24:         1171.32     125198
25:           43.00     125303
26:            0.00     125881
27:           69.64     128435
28:          420.18     128435
29:          175.18     128435
30:           71.54     128435
31:           99.85     128435
32:            0.00     128435
33:           42.75     128435
34:          175.18     128435
35:          846.45     128435

set.seed(213)
rr0a <- kmeans(rr0, 10)
View(rr0a)
summary(rr0a)
# Length Class  Mode
# cluster      14355  -none- numeric
# centers         20  -none- numeric
# totss            1  -none- numeric
# withinss        10  -none- numeric
# tot.withinss     1  -none- numeric
# betweenss        1  -none- numeric
# size            10  -none- numeric
# iter             1  -none- numeric
# ifault           1  -none- numeric

x1 <- as.data.frame(rr0a$centers)
sort(x1)
#SavingsReversed ProviderID
# 2         75.19665  2773789.2
# 3         99.31959  4147091.6
# 5        101.21070  3558532.7
# 4        103.41147  3893274.4
# 1        105.38310  2241031.2
# 8        114.61562  3240701.5
# 10       121.14184  4718727.6
# 9        153.70536  4470878.9
# 6        156.84426  5560636.6
# 7        185.09745   173732.9
print(rr0a)
# K-means clustering with 10 clusters of sizes 1511, 1610, 702, 926, 996, 1076, 580, 2429, 728, 3797
#
# Cluster means:
#   SavingsReversed ProviderID
# 1        105.38310  2241031.2
# 2         75.19665  2773789.2
# 3         99.31959  4147091.6
# 4        103.41147  3893274.4
# 5        101.21070  3558532.7
# 6        156.84426  5560636.6
# 7        185.09745   173732.9
# 8        114.61562  3240701.5
# 9        153.70536  4470878.9
# 10       121.14184  4718727.6
#Within cluster sum of squares by cluster:
# [1] 74529288379846 25846368411171  4692898666512  6277704963344  8428785199973 90824041558798  1468798013919 12143462193009  5483877005233
# [10] 51547955737867
# (between_SS / total_SS =  98.7 %)
#
# Available components:
#
#   [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss" "betweenss"    "size"         "iter"         "ifault"









Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}}

______________________________________________
mailto:[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Confidentiality Notice This message is sent from Zelis. This transmission may contain information which is privileged and confidential and is intended for the personal and confidential use of the named recipient only. Such information may be protected by applicable State and Federal laws from this disclosure or unauthorized use. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any disclosure, review, discussion, copying, or taking any action in reliance on the contents of this transmission is strictly prohibited. If you have received this transmission in error, please contact the sender immediately. Zelis, 2018.
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with K-Means output

Bert Gunter-2
See David Carlson's reply -- and his advice for learning about how to use
lists.

"And I can just join this DF with my original DF used for the KMean,
correct?"

Define "join" . See, e.g.
http://desktop.arcgis.com/en/arcmap/10.3/manage-data/tables/essentials-of-joining-tables.htm
See also ?merge

I consider it to be your job to learn how to work with R's data structures.
There are numerous web tutorials to help you do so. Others may disagree and
reply to such queries.

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Dec 8, 2018 at 8:43 AM Bill Poling <[hidden email]> wrote:

> Thank you Bert, I see, so I think this is the process?
>
> set.seed(213)
> rr0a1 <- kmeans(rr0, 10)
>
> summary(rr0a1) #Just the cluster
> #Length Class  Mode
> #cluster      14355  -none- numeric
>
> head(rr0a1$cluster, n=35)
> # [1] 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
>
> Xcluster <- as.data.frame(rr0a1$cluster)
>
> head(Xcluster, n=5)
> #rr0a1$cluster
> # 1             7
> # 2             7
> # 3             7
> # 4             7
> # 5             7
>
> tail(Xcluster, n=5)
> #rr0a1$cluster
> # 14351             6
> # 14352             6
> # 14353             6
> # 14354             6
> # 14355             6
>
> And I can just join this DF with my original DF used for the KMean,
> correct?
> The vertical order is the same?
>
> WHP
>
>
> From: Bert Gunter <[hidden email]>
> Sent: Saturday, December 8, 2018 10:46 AM
> To: Bill Poling <[hidden email]>
> Cc: R-help <[hidden email]>
> Subject: Re: [R] Help with K-Means output
>
> Please see ?kmeans and note the "cluster" component of the returned value
> that would appear to provide the info you seek.
>
> -- Bert
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Sat, Dec 8, 2018 at 7:03 AM Bill Poling <mailto:[hidden email]>
> wrote:
> Good afternoon. I hope I have provided enough info to get my question
> answered.
>
> I am running windows 10 -- R3.5.1 -- RStudio Version 1.1.456
>
> When running a K-Means clustering routine is it possible to get the actual
> data from each cluster into a DF?
>
> I have reviewed a number of tutorials and unless I missed it somewhere I
> would like to know if it is possible.
>
> https://www.datacamp.com/community/tutorials/k-means-clustering-r
> https://www.guru99.com/r-k-means-clustering.html
> https://datascienceplus.com/k-means-clustering-in-r/
> https://datascienceplus.com/finding-optimal-number-of-clusters/
> http://enhancedatascience.com/2017/10/24/machine-learning-explained-kmeans/
> http://enhancedatascience.com/2017/04/30/r-basics-k-means-r/
>
> For example:
>
> I ran the below and get K-means clustering with 10 clusters of sizes 1511,
> 1610, 702, 926, 996, 1076, 580, 2429, 728, 3797
> Can the 1511 values of SavingsReversed and ProviderID , 1610 values of
> SavingsReversed and ProviderID, etc.. be run out into DF's?
>
> Thank you for your help.
>
> WHP
>
> str(rr0)
> Classes 'data.table' and 'data.frame':14355 obs. of  2 variables:
>  $ SavingsReversed: num  0 0 61 128 160 ...
>  $ ProviderID     : num  113676 113676 116494 116641 116641 ...
>  - attr(*, ".internal.selfref")=<externalptr>
>
> head(rr0, n=35)
>     SavingsReversed ProviderID
>  1:            0.00     113676
>  2:            0.00     113676
>  3:           61.00     116494
>  4:          128.25     116641
>  5:          159.60     116641
>  6:          372.66     119316
>  7:           18.79     121319
>  8:           15.64     121319
>  9:            0.00     121319
> 10:           18.79     121319
> 11:           23.00     121319
> 12:           18.79     121319
> 13:            0.00     121319
> 14:           25.86     121319
> 15:           14.00     121319
> 16:          113.00     121545
> 17:           50.00     121545
> 18:         1155.32     121545
> 19:          113.00     121545
> 20:          197.20     121545
> 21:            0.00     121780
> 22:           36.00     122536
> 23:         1171.32     125198
> 24:         1171.32     125198
> 25:           43.00     125303
> 26:            0.00     125881
> 27:           69.64     128435
> 28:          420.18     128435
> 29:          175.18     128435
> 30:           71.54     128435
> 31:           99.85     128435
> 32:            0.00     128435
> 33:           42.75     128435
> 34:          175.18     128435
> 35:          846.45     128435
>
> set.seed(213)
> rr0a <- kmeans(rr0, 10)
> View(rr0a)
> summary(rr0a)
> # Length Class  Mode
> # cluster      14355  -none- numeric
> # centers         20  -none- numeric
> # totss            1  -none- numeric
> # withinss        10  -none- numeric
> # tot.withinss     1  -none- numeric
> # betweenss        1  -none- numeric
> # size            10  -none- numeric
> # iter             1  -none- numeric
> # ifault           1  -none- numeric
>
> x1 <- as.data.frame(rr0a$centers)
> sort(x1)
> #SavingsReversed ProviderID
> # 2         75.19665  2773789.2
> # 3         99.31959  4147091.6
> # 5        101.21070  3558532.7
> # 4        103.41147  3893274.4
> # 1        105.38310  2241031.2
> # 8        114.61562  3240701.5
> # 10       121.14184  4718727.6
> # 9        153.70536  4470878.9
> # 6        156.84426  5560636.6
> # 7        185.09745   173732.9
> print(rr0a)
> # K-means clustering with 10 clusters of sizes 1511, 1610, 702, 926, 996,
> 1076, 580, 2429, 728, 3797
> #
> # Cluster means:
> #   SavingsReversed ProviderID
> # 1        105.38310  2241031.2
> # 2         75.19665  2773789.2
> # 3         99.31959  4147091.6
> # 4        103.41147  3893274.4
> # 5        101.21070  3558532.7
> # 6        156.84426  5560636.6
> # 7        185.09745   173732.9
> # 8        114.61562  3240701.5
> # 9        153.70536  4470878.9
> # 10       121.14184  4718727.6
> #Within cluster sum of squares by cluster:
> # [1] 74529288379846 25846368411171  4692898666512  6277704963344
> 8428785199973 90824041558798  1468798013919 12143462193009  5483877005233
> # [10] 51547955737867
> # (between_SS / total_SS =  98.7 %)
> #
> # Available components:
> #
> #   [1] "cluster"      "centers"      "totss"        "withinss"
>  "tot.withinss" "betweenss"    "size"         "iter"         "ifault"
>
>
>
>
>
>
>
>
>
> Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}}
>
> ______________________________________________
> mailto:[hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> Confidentiality Notice This message is sent from Zelis. This transmission
> may contain information which is privileged and confidential and is
> intended for the personal and confidential use of the named recipient only.
> Such information may be protected by applicable State and Federal laws from
> this disclosure or unauthorized use. If the reader of this message is not
> the intended recipient, or the employee or agent responsible for delivering
> the message to the intended recipient, you are hereby notified that any
> disclosure, review, discussion, copying, or taking any action in reliance
> on the contents of this transmission is strictly prohibited. If you have
> received this transmission in error, please contact the sender immediately.
> Zelis, 2018.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with K-Means output

Bill Poling
In reply to this post by David Carlson
Thank you David I will try that as well.

WHP

From: David L Carlson <[hidden email]>
Sent: Saturday, December 8, 2018 11:12 AM
To: Bert Gunter <[hidden email]>; Bill Poling <[hidden email]>
Cc: R-help <[hidden email]>
Subject: RE: [R] Help with K-Means output

You should also read the manual page for ?split and learn how to work with lists:

# Split the data according to cluster membership
# to create a list of data frames
rr0.clus <- split(rr0, rr0a$cluster)

# The data frame for cluster 1:
rr0.clus[[1]]

--------------------------------------------------------
David L. Carlson
Department of Anthropology
Texas A&M University

-----Original Message-----
From: R-help [mailto:[hidden email]] On Behalf Of Bert Gunter
Sent: Saturday, December 8, 2018 9:46 AM
To: mailto:[hidden email]
Cc: R-help <mailto:[hidden email]>
Subject: Re: [R] Help with K-Means output

Please see ?kmeans and note the "cluster" component of the returned value
that would appear to provide the info you seek.

-- Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Dec 8, 2018 at 7:03 AM Bill Poling <mailto:[hidden email]> wrote:

> Good afternoon. I hope I have provided enough info to get my question
> answered.
>
> I am running windows 10 -- R3.5.1 -- RStudio Version 1.1.456
>
> When running a K-Means clustering routine is it possible to get the actual
> data from each cluster into a DF?
>
> I have reviewed a number of tutorials and unless I missed it somewhere I
> would like to know if it is possible.
>
> https://www.datacamp.com/community/tutorials/k-means-clustering-r
> https://www.guru99.com/r-k-means-clustering.html
> https://datascienceplus.com/k-means-clustering-in-r/
> https://datascienceplus.com/finding-optimal-number-of-clusters/
> http://enhancedatascience.com/2017/10/24/machine-learning-explained-kmeans/
> http://enhancedatascience.com/2017/04/30/r-basics-k-means-r/
>
> For example:
>
> I ran the below and get K-means clustering with 10 clusters of sizes 1511,
> 1610, 702, 926, 996, 1076, 580, 2429, 728, 3797
> Can the 1511 values of SavingsReversed and ProviderID , 1610 values of
> SavingsReversed and ProviderID, etc.. be run out into DF's?
>
> Thank you for your help.
>
> WHP
>
> str(rr0)
> Classes 'data.table' and 'data.frame':14355 obs. of 2 variables:
> $ SavingsReversed: num 0 0 61 128 160 ...
> $ ProviderID : num 113676 113676 116494 116641 116641 ...
> - attr(*, ".internal.selfref")=<externalptr>
>
> head(rr0, n=35)
> SavingsReversed ProviderID
> 1: 0.00 113676
> 2: 0.00 113676
> 3: 61.00 116494
> 4: 128.25 116641
> 5: 159.60 116641
> 6: 372.66 119316
> 7: 18.79 121319
> 8: 15.64 121319
> 9: 0.00 121319
> 10: 18.79 121319
> 11: 23.00 121319
> 12: 18.79 121319
> 13: 0.00 121319
> 14: 25.86 121319
> 15: 14.00 121319
> 16: 113.00 121545
> 17: 50.00 121545
> 18: 1155.32 121545
> 19: 113.00 121545
> 20: 197.20 121545
> 21: 0.00 121780
> 22: 36.00 122536
> 23: 1171.32 125198
> 24: 1171.32 125198
> 25: 43.00 125303
> 26: 0.00 125881
> 27: 69.64 128435
> 28: 420.18 128435
> 29: 175.18 128435
> 30: 71.54 128435
> 31: 99.85 128435
> 32: 0.00 128435
> 33: 42.75 128435
> 34: 175.18 128435
> 35: 846.45 128435
>
> set.seed(213)
> rr0a <- kmeans(rr0, 10)
> View(rr0a)
> summary(rr0a)
> # Length Class Mode
> # cluster 14355 -none- numeric
> # centers 20 -none- numeric
> # totss 1 -none- numeric
> # withinss 10 -none- numeric
> # tot.withinss 1 -none- numeric
> # betweenss 1 -none- numeric
> # size 10 -none- numeric
> # iter 1 -none- numeric
> # ifault 1 -none- numeric
>
> x1 <- as.data.frame(rr0a$centers)
> sort(x1)
> #SavingsReversed ProviderID
> # 2 75.19665 2773789.2
> # 3 99.31959 4147091.6
> # 5 101.21070 3558532.7
> # 4 103.41147 3893274.4
> # 1 105.38310 2241031.2
> # 8 114.61562 3240701.5
> # 10 121.14184 4718727.6
> # 9 153.70536 4470878.9
> # 6 156.84426 5560636.6
> # 7 185.09745 173732.9
> print(rr0a)
> # K-means clustering with 10 clusters of sizes 1511, 1610, 702, 926, 996,
> 1076, 580, 2429, 728, 3797
> #
> # Cluster means:
> # SavingsReversed ProviderID
> # 1 105.38310 2241031.2
> # 2 75.19665 2773789.2
> # 3 99.31959 4147091.6
> # 4 103.41147 3893274.4
> # 5 101.21070 3558532.7
> # 6 156.84426 5560636.6
> # 7 185.09745 173732.9
> # 8 114.61562 3240701.5
> # 9 153.70536 4470878.9
> # 10 121.14184 4718727.6
> #Within cluster sum of squares by cluster:
> # [1] 74529288379846 25846368411171 4692898666512 6277704963344
> 8428785199973 90824041558798 1468798013919 12143462193009 5483877005233
> # [10] 51547955737867
> # (between_SS / total_SS = 98.7 %)
> #
> # Available components:
> #
> # [1] "cluster" "centers" "totss" "withinss"
> "tot.withinss" "betweenss" "size" "iter" "ifault"
>
>
>
>
>
>
>
>
>
> Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}}
>
> ______________________________________________
> mailto:[hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

______________________________________________
mailto:[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}}

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with K-Means output

Bill Poling
In reply to this post by David Carlson
Terrific David, that's got it thanks again!

From: David L Carlson <[hidden email]>
Sent: Saturday, December 8, 2018 11:12 AM
To: Bert Gunter <[hidden email]>; Bill Poling <[hidden email]>
Cc: R-help <[hidden email]>
Subject: RE: [R] Help with K-Means output

You should also read the manual page for ?split and learn how to work with lists:

# Split the data according to cluster membership
# to create a list of data frames
rr0.clus <- split(rr0, rr0a$cluster)

# The data frame for cluster 1:
rr0.clus[[1]]

--------------------------------------------------------
David L. Carlson
Department of Anthropology
Texas A&M University

-----Original Message-----
From: R-help [mailto:[hidden email]] On Behalf Of Bert Gunter
Sent: Saturday, December 8, 2018 9:46 AM
To: mailto:[hidden email]
Cc: R-help <mailto:[hidden email]>
Subject: Re: [R] Help with K-Means output

Please see ?kmeans and note the "cluster" component of the returned value
that would appear to provide the info you seek.

-- Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Dec 8, 2018 at 7:03 AM Bill Poling <mailto:[hidden email]> wrote:

> Good afternoon. I hope I have provided enough info to get my question
> answered.
>
> I am running windows 10 -- R3.5.1 -- RStudio Version 1.1.456
>
> When running a K-Means clustering routine is it possible to get the actual
> data from each cluster into a DF?
>
> I have reviewed a number of tutorials and unless I missed it somewhere I
> would like to know if it is possible.
>
> https://www.datacamp.com/community/tutorials/k-means-clustering-r
> https://www.guru99.com/r-k-means-clustering.html
> https://datascienceplus.com/k-means-clustering-in-r/
> https://datascienceplus.com/finding-optimal-number-of-clusters/
> http://enhancedatascience.com/2017/10/24/machine-learning-explained-kmeans/
> http://enhancedatascience.com/2017/04/30/r-basics-k-means-r/
>
> For example:
>
> I ran the below and get K-means clustering with 10 clusters of sizes 1511,
> 1610, 702, 926, 996, 1076, 580, 2429, 728, 3797
> Can the 1511 values of SavingsReversed and ProviderID , 1610 values of
> SavingsReversed and ProviderID, etc.. be run out into DF's?
>
> Thank you for your help.
>
> WHP
>
> str(rr0)
> Classes 'data.table' and 'data.frame':14355 obs. of 2 variables:
> $ SavingsReversed: num 0 0 61 128 160 ...
> $ ProviderID : num 113676 113676 116494 116641 116641 ...
> - attr(*, ".internal.selfref")=<externalptr>
>
> head(rr0, n=35)
> SavingsReversed ProviderID
> 1: 0.00 113676
> 2: 0.00 113676
> 3: 61.00 116494
> 4: 128.25 116641
> 5: 159.60 116641
> 6: 372.66 119316
> 7: 18.79 121319
> 8: 15.64 121319
> 9: 0.00 121319
> 10: 18.79 121319
> 11: 23.00 121319
> 12: 18.79 121319
> 13: 0.00 121319
> 14: 25.86 121319
> 15: 14.00 121319
> 16: 113.00 121545
> 17: 50.00 121545
> 18: 1155.32 121545
> 19: 113.00 121545
> 20: 197.20 121545
> 21: 0.00 121780
> 22: 36.00 122536
> 23: 1171.32 125198
> 24: 1171.32 125198
> 25: 43.00 125303
> 26: 0.00 125881
> 27: 69.64 128435
> 28: 420.18 128435
> 29: 175.18 128435
> 30: 71.54 128435
> 31: 99.85 128435
> 32: 0.00 128435
> 33: 42.75 128435
> 34: 175.18 128435
> 35: 846.45 128435
>
> set.seed(213)
> rr0a <- kmeans(rr0, 10)
> View(rr0a)
> summary(rr0a)
> # Length Class Mode
> # cluster 14355 -none- numeric
> # centers 20 -none- numeric
> # totss 1 -none- numeric
> # withinss 10 -none- numeric
> # tot.withinss 1 -none- numeric
> # betweenss 1 -none- numeric
> # size 10 -none- numeric
> # iter 1 -none- numeric
> # ifault 1 -none- numeric
>
> x1 <- as.data.frame(rr0a$centers)
> sort(x1)
> #SavingsReversed ProviderID
> # 2 75.19665 2773789.2
> # 3 99.31959 4147091.6
> # 5 101.21070 3558532.7
> # 4 103.41147 3893274.4
> # 1 105.38310 2241031.2
> # 8 114.61562 3240701.5
> # 10 121.14184 4718727.6
> # 9 153.70536 4470878.9
> # 6 156.84426 5560636.6
> # 7 185.09745 173732.9
> print(rr0a)
> # K-means clustering with 10 clusters of sizes 1511, 1610, 702, 926, 996,
> 1076, 580, 2429, 728, 3797
> #
> # Cluster means:
> # SavingsReversed ProviderID
> # 1 105.38310 2241031.2
> # 2 75.19665 2773789.2
> # 3 99.31959 4147091.6
> # 4 103.41147 3893274.4
> # 5 101.21070 3558532.7
> # 6 156.84426 5560636.6
> # 7 185.09745 173732.9
> # 8 114.61562 3240701.5
> # 9 153.70536 4470878.9
> # 10 121.14184 4718727.6
> #Within cluster sum of squares by cluster:
> # [1] 74529288379846 25846368411171 4692898666512 6277704963344
> 8428785199973 90824041558798 1468798013919 12143462193009 5483877005233
> # [10] 51547955737867
> # (between_SS / total_SS = 98.7 %)
> #
> # Available components:
> #
> # [1] "cluster" "centers" "totss" "withinss"
> "tot.withinss" "betweenss" "size" "iter" "ifault"
>
>
>
>
>
>
>
>
>
> Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}}
>
> ______________________________________________
> mailto:[hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

______________________________________________
mailto:[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}}

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with K-Means output

Bill Poling
In reply to this post by Bert Gunter-2
Thank you Bert.


From: Bert Gunter <[hidden email]>
Sent: Saturday, December 8, 2018 12:19 PM
To: Bill Poling <[hidden email]>
Cc: R-help <[hidden email]>
Subject: Re: [R] Help with K-Means output

See David Carlson's reply -- and his advice for learning about how to use lists.

"And I can just join this DF with my original DF used for the KMean, correct?"

Define "join" . See, e.g. http://desktop.arcgis.com/en/arcmap/10.3/manage-data/tables/essentials-of-joining-tables.htm
See also ?merge

I consider it to be your job to learn how to work with R's data structures. There are numerous web tutorials to help you do so. Others may disagree and reply to such queries.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Dec 8, 2018 at 8:43 AM Bill Poling <mailto:[hidden email]> wrote:
Thank you Bert, I see, so I think this is the process?

set.seed(213)
rr0a1 <- kmeans(rr0, 10)

summary(rr0a1) #Just the cluster
#Length Class  Mode
#cluster      14355  -none- numeric

head(rr0a1$cluster, n=35)
# [1] 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7

Xcluster <- as.data.frame(rr0a1$cluster)

head(Xcluster, n=5)
#rr0a1$cluster
# 1             7
# 2             7
# 3             7
# 4             7
# 5             7

tail(Xcluster, n=5)
#rr0a1$cluster
# 14351             6
# 14352             6
# 14353             6
# 14354             6
# 14355             6

And I can just join this DF with my original DF used for the KMean, correct?
The vertical order is the same?

WHP


From: Bert Gunter <mailto:[hidden email]>
Sent: Saturday, December 8, 2018 10:46 AM
To: Bill Poling <mailto:[hidden email]>
Cc: R-help <mailto:[hidden email]>
Subject: Re: [R] Help with K-Means output

Please see ?kmeans and note the "cluster" component of the returned value that would appear to provide the info you seek.

-- Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Dec 8, 2018 at 7:03 AM Bill Poling <mailto:mailto:[hidden email]> wrote:
Good afternoon. I hope I have provided enough info to get my question answered.

I am running windows 10 -- R3.5.1 -- RStudio Version 1.1.456

When running a K-Means clustering routine is it possible to get the actual data from each cluster into a DF?

I have reviewed a number of tutorials and unless I missed it somewhere I would like to know if it is possible.

https://www.datacamp.com/community/tutorials/k-means-clustering-r
https://www.guru99.com/r-k-means-clustering.html
https://datascienceplus.com/k-means-clustering-in-r/
https://datascienceplus.com/finding-optimal-number-of-clusters/
http://enhancedatascience.com/2017/10/24/machine-learning-explained-kmeans/
http://enhancedatascience.com/2017/04/30/r-basics-k-means-r/

For example:

I ran the below and get K-means clustering with 10 clusters of sizes 1511, 1610, 702, 926, 996, 1076, 580, 2429, 728, 3797
Can the 1511 values of SavingsReversed and ProviderID , 1610 values of SavingsReversed and ProviderID, etc.. be run out into DF's?

Thank you for your help.

WHP

str(rr0)
Classes 'data.table' and 'data.frame':14355 obs. of  2 variables:
 $ SavingsReversed: num  0 0 61 128 160 ...
 $ ProviderID     : num  113676 113676 116494 116641 116641 ...
 - attr(*, ".internal.selfref")=<externalptr>

head(rr0, n=35)
    SavingsReversed ProviderID
 1:            0.00     113676
 2:            0.00     113676
 3:           61.00     116494
 4:          128.25     116641
 5:          159.60     116641
 6:          372.66     119316
 7:           18.79     121319
 8:           15.64     121319
 9:            0.00     121319
10:           18.79     121319
11:           23.00     121319
12:           18.79     121319
13:            0.00     121319
14:           25.86     121319
15:           14.00     121319
16:          113.00     121545
17:           50.00     121545
18:         1155.32     121545
19:          113.00     121545
20:          197.20     121545
21:            0.00     121780
22:           36.00     122536
23:         1171.32     125198
24:         1171.32     125198
25:           43.00     125303
26:            0.00     125881
27:           69.64     128435
28:          420.18     128435
29:          175.18     128435
30:           71.54     128435
31:           99.85     128435
32:            0.00     128435
33:           42.75     128435
34:          175.18     128435
35:          846.45     128435

set.seed(213)
rr0a <- kmeans(rr0, 10)
View(rr0a)
summary(rr0a)
# Length Class  Mode
# cluster      14355  -none- numeric
# centers         20  -none- numeric
# totss            1  -none- numeric
# withinss        10  -none- numeric
# tot.withinss     1  -none- numeric
# betweenss        1  -none- numeric
# size            10  -none- numeric
# iter             1  -none- numeric
# ifault           1  -none- numeric

x1 <- as.data.frame(rr0a$centers)
sort(x1)
#SavingsReversed ProviderID
# 2         75.19665  2773789.2
# 3         99.31959  4147091.6
# 5        101.21070  3558532.7
# 4        103.41147  3893274.4
# 1        105.38310  2241031.2
# 8        114.61562  3240701.5
# 10       121.14184  4718727.6
# 9        153.70536  4470878.9
# 6        156.84426  5560636.6
# 7        185.09745   173732.9
print(rr0a)
# K-means clustering with 10 clusters of sizes 1511, 1610, 702, 926, 996, 1076, 580, 2429, 728, 3797
#
# Cluster means:
#   SavingsReversed ProviderID
# 1        105.38310  2241031.2
# 2         75.19665  2773789.2
# 3         99.31959  4147091.6
# 4        103.41147  3893274.4
# 5        101.21070  3558532.7
# 6        156.84426  5560636.6
# 7        185.09745   173732.9
# 8        114.61562  3240701.5
# 9        153.70536  4470878.9
# 10       121.14184  4718727.6
#Within cluster sum of squares by cluster:
# [1] 74529288379846 25846368411171  4692898666512  6277704963344  8428785199973 90824041558798  1468798013919 12143462193009  5483877005233
# [10] 51547955737867
# (between_SS / total_SS =  98.7 %)
#
# Available components:
#
#   [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss" "betweenss"    "size"         "iter"         "ifault"









Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}}

______________________________________________
mailto:mailto:[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Confidentiality Notice This message is sent from Zelis. This transmission may contain information which is privileged and confidential and is intended for the personal and confidential use of the named recipient only. Such information may be protected by applicable State and Federal laws from this disclosure or unauthorized use. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any disclosure, review, discussion, copying, or taking any action in reliance on the contents of this transmission is strictly prohibited. If you have received this transmission in error, please contact the sender immediately. Zelis, 2018.

Confidentiality Notice This message is sent from Zelis. This transmission may contain information which is privileged and confidential and is intended for the personal and confidential use of the named recipient only. Such information may be protected by applicable State and Federal laws from this disclosure or unauthorized use. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any disclosure, review, discussion, copying, or taking any action in reliance on the contents of this transmission is strictly prohibited. If you have received this transmission in error, please contact the sender immediately. Zelis, 2018.
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.