Clustering using Non Negative Matrix Factorization

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Clustering using Non Negative Matrix Factorization

Bongi
I have written code using nmf. Row 3 is a duplicate of row 4. How can I explain that in my output?

Source Code
library(NMF)
metrics_equinox <- read.csv("C:/Users/test/Documents/R/R-3.4.0/equinox.csv",
header = TRUE,sep = ",")
equinox_df <- data.frame(metrics_equinox)
head(equinox_df)
# X ~ WH'
# X is an n x p matrix
# W = n x r  Module feature matrix
# H = r x p  Change metrics feature matrix
R <- equinox_df
set.seed(3000)
res <- nmf(R, 4,"lee") # lee & seung method
V.hat <- fitted(res)
print(V.hat) # estimated target matrix
w <- basis(res) #  W  Module feature matrix
dim(w) # n x r (n= 5  r = 4)
print(w)

h <- coef(res) # H  Change metrics feature matrix
dim(h) #  r x p (r = 4 p = 4)
print(h)
# Bugs change metrics via clustering based on vectors in H
change_bugs <- data.frame(t(h))
features <- cbind(change_bugs$X1, change_bugs$X2)
plot(features)
title("Change Metrics Feature Plot")

***Output****

Matrix X
bugs VU FU RU AU
1    0  2  0  0  1
2    0 18  4  1  7
3    1 14  1  0  4
4    1 14  1  0  4


# X ~ WH'
> # X is an n x p matrix
> # W = n x r  Module feature matrix
> # H = r x p  Change metrics feature matrix
>  
>
>  
> R <- equinox_df
>  
> set.seed(3000)
>  
> res <- nmf(R, 4,"lee") # lee & seung method
>  
> V.hat <- fitted(res)
> print(V.hat) # estimated target matrix
 bugs        VU         FU           RU       AU  
[1,] 0.01476705  1.999181 0.04525734 3.123617e-07 0.999307
[2,] 0.13310783 17.999619 4.00166629 9.759197e-01 7.002471
[3,] 0.99402515 14.000128 0.99765368 1.086607e-01 3.998064
[4,] 0.98804793 14.000461 0.99690141 1.091562e-01 3.998530
>  
> w <- basis(res) #  W  Module feature matrix
> dim(w) # n x r (n= 5  r = 4)
[1] 4 4
> print(w)
         [,1]        [,2]         [,3]       [,4]
[1,] 0.1370419 0.006933057 2.612780e-07 0.02204343
[2,] 0.3094723 0.062493462 8.175333e-01 0.45813795
[3,] 0.2763321 0.466689877 9.102564e-02 0.25956430
[4,] 0.2771537 0.463883604 9.144076e-02 0.26025432
>  
> h    <- coef(res) # H  Change metrics feature matrix
> dim(h) #  r x p (r = 4 p = 4)
[1] 4 5
> print(h)
             bugs        VU         FU           RU        AU
[1,] 1.689811e-09 10.102915 0.27422101 2.759498e-09 6.9896825
[2,] 2.129948e+00  9.481615 1.04823111 6.151097e-09 2.8818078
[3,] 3.447862e-09  3.513075 4.70047692 1.193737e+00 5.1539532
[4,] 1.734431e-09 24.901783 0.01854968 1.985025e-09 0.9729284
>  
> # Bugs change metrics via clustering based on vectors in H
> change_bugs <- data.frame(t(h))
> features <- cbind(change_bugs$X1, change_bugs$X2)
> plot(features)
> title("Change Metrics Feature Plot")
>
>
The rows in Matrix W and H looks different to me even though I had the duplicate rows in the initial Matrix X