Clustering question \ dist(datmat)

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Clustering question \ dist(datmat)

kumar zaman
Hello everybody. I am trying to cluster circular data (data points which are angles), thus i can not use the "dist" function in "mclust" to generate my distance matrix, I am using the function " Dij = 0.5*( 1 - cos(theta_i - theta_j)). The thing is "hclust" will not accept this distance matrix, i tried to put it in a data frame, but again i get an error message saying " Error in if (n < 2) stop("must have n >= 2 objects to cluster") : argument is of length zero". The distance matrix "dist" producing is a lower triangular one, mine is a square matrix, which i think does not matter. My question how to make "hclust" process my distance matrix, what i am doing wrong. I am sure the problem is with the distance matrix format, Any suggestions are highly apprciated, the code below shows what i have done.
   
  clust1<- as.vector(rvm(5,5,15))
clust2<- as.vector(rvm(5,10,15))
clust3<- as.vector(rvm(5,15,15))
clust4<- as.vector(rvm(5,20,15))
clust5<- as.vector(rvm(5,25,15))
data1<- rbind(clust1,clust2,clust3,clust4,clust5)
datmat<- matrix(data1,nrow=25,ncol=1,byrow=TRUE)
circ.plot(datmat)
    df<- array(dim=c(25,25))
    for (i in 1:25){
       for (j in 1:25){
    df[i,j]<- 0.5*(1 - cos(datmat[i] - datmat[j]))
          }
          }                  
hcA<-hclust(df,method="average")
  ****************************************************
  Ahmed
  Florida

               
---------------------------------

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Clustering question \ dist(datmat)

Gabor Grothendieck
A distance matrix must be of class "dist".  Try

hclust(as.dist(df))


On 3/26/06, kumar zaman <[hidden email]> wrote:

> Hello everybody. I am trying to cluster circular data (data points which are angles), thus i can not use the "dist" function in "mclust" to generate my distance matrix, I am using the function " Dij = 0.5*( 1 - cos(theta_i - theta_j)). The thing is "hclust" will not accept this distance matrix, i tried to put it in a data frame, but again i get an error message saying " Error in if (n < 2) stop("must have n >= 2 objects to cluster") : argument is of length zero". The distance matrix "dist" producing is a lower triangular one, mine is a square matrix, which i think does not matter. My question how to make "hclust" process my distance matrix, what i am doing wrong. I am sure the problem is with the distance matrix format, Any suggestions are highly apprciated, the code below shows what i have done.
>
>  clust1<- as.vector(rvm(5,5,15))
> clust2<- as.vector(rvm(5,10,15))
> clust3<- as.vector(rvm(5,15,15))
> clust4<- as.vector(rvm(5,20,15))
> clust5<- as.vector(rvm(5,25,15))
> data1<- rbind(clust1,clust2,clust3,clust4,clust5)
> datmat<- matrix(data1,nrow=25,ncol=1,byrow=TRUE)
> circ.plot(datmat)
>    df<- array(dim=c(25,25))
>    for (i in 1:25){
>       for (j in 1:25){
>    df[i,j]<- 0.5*(1 - cos(datmat[i] - datmat[j]))
>          }
>          }
> hcA<-hclust(df,method="average")
>  ****************************************************
>  Ahmed
>  Florida
>
>
> ---------------------------------
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Clustering question \ dist(datmat)

kumar zaman
Dear Gabor and all ;
   
  I know this will work; but i already have a distance matrix calculated using my distance measure Dij = 0.5 * ( 1 - cos(theta_i - theta_j)), if i do hclust(as.dist(df)) then i am taking distance another time for a matrix " df " which is supposed to be a distance matrix, i hope i am clear ;
   
  ps: I just found out i can use " kmeans(df, 3, iter.max=100)" it will take df as calculated by Dij. I still need to use methods in hclust like " single, average, ward, median, mcquitty, ...etc"
   
  Thank u anyway.


Gabor Grothendieck <[hidden email]> wrote:
  A distance matrix must be of class "dist". Try

hclust(as.dist(df))


On 3/26/06, kumar zaman wrote:

> Hello everybody. I am trying to cluster circular data (data points which are angles), thus i can not use the "dist" function in "mclust" to generate my distance matrix, I am using the function " Dij = 0.5*( 1 - cos(theta_i - theta_j)). The thing is "hclust" will not accept this distance matrix, i tried to put it in a data frame, but again i get an error message saying " Error in if (n < 2) stop("must have n >= 2 objects to cluster") : argument is of length zero". The distance matrix "dist" producing is a lower triangular one, mine is a square matrix, which i think does not matter. My question how to make "hclust" process my distance matrix, what i am doing wrong. I am sure the problem is with the distance matrix format, Any suggestions are highly apprciated, the code below shows what i have done.
>
> clust1<- as.vector(rvm(5,5,15))
> clust2<- as.vector(rvm(5,10,15))
> clust3<- as.vector(rvm(5,15,15))
> clust4<- as.vector(rvm(5,20,15))
> clust5<- as.vector(rvm(5,25,15))
> data1<- rbind(clust1,clust2,clust3,clust4,clust5)
> datmat<- matrix(data1,nrow=25,ncol=1,byrow=TRUE)
> circ.plot(datmat)
> df<- array(dim=c(25,25))
> for (i in 1:25){
> for (j in 1:25){
> df[i,j]<- 0.5*(1 - cos(datmat[i] - datmat[j]))
> }
> }
> hcA<-hclust(df,method="average")
> ****************************************************
> Ahmed
> Florida
>
>
> ---------------------------------
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>



Ahmed Albatineh,PhD
Assistant Professor of Statistics
Nova Southeastern University
Fort Lauderdale, FL 33314
U.S.A
               
---------------------------------

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Clustering question \ dist(datmat)

Liaw, Andy
In reply to this post by kumar zaman
as.dist() does _not_ recompute the distances if given a matrix.  It simply
takes the lower triangular portion of the distance matrix given and attach
some attributes about the original dimension.  I don't think you need to
object to that.

Andy

From: kumar zaman

>
> Dear Gabor and all ;
>    
>   I know this will work; but i already have a distance matrix
> calculated using my distance measure Dij = 0.5 * ( 1 -
> cos(theta_i - theta_j)), if i do hclust(as.dist(df)) then i
> am taking distance another time for a matrix " df " which is
> supposed to be a distance matrix, i hope i am clear ;
>    
>   ps: I just found out i can use " kmeans(df, 3,
> iter.max=100)" it will take df as calculated by Dij. I still
> need to use methods in hclust like " single, average, ward,
> median, mcquitty, ...etc"
>    
>   Thank u anyway.
>
>
> Gabor Grothendieck <[hidden email]> wrote:
>   A distance matrix must be of class "dist". Try
>
> hclust(as.dist(df))
>
>
> On 3/26/06, kumar zaman wrote:
> > Hello everybody. I am trying to cluster circular data (data points
> > which are angles), thus i can not use the "dist" function
> in "mclust"
> > to generate my distance matrix, I am using the function "
> Dij = 0.5*(
> > 1 - cos(theta_i - theta_j)). The thing is "hclust" will not accept
> > this distance matrix, i tried to put it in a data frame,
> but again i
> > get an error message saying " Error in if (n < 2)
> stop("must have n >=
> > 2 objects to cluster") : argument is of length zero". The distance
> > matrix "dist" producing is a lower triangular one, mine is a square
> > matrix, which i think does not matter. My question how to make
> > "hclust" process my distance matrix, what i am doing wrong.
> I am sure
> > the problem is with the distance matrix format, Any suggestions are
> > highly apprciated, the code below shows what i have done.
> >
> > clust1<- as.vector(rvm(5,5,15))
> > clust2<- as.vector(rvm(5,10,15))
> > clust3<- as.vector(rvm(5,15,15))
> > clust4<- as.vector(rvm(5,20,15))
> > clust5<- as.vector(rvm(5,25,15))
> > data1<- rbind(clust1,clust2,clust3,clust4,clust5)
> > datmat<- matrix(data1,nrow=25,ncol=1,byrow=TRUE)
> > circ.plot(datmat)
> > df<- array(dim=c(25,25))
> > for (i in 1:25){
> > for (j in 1:25){
> > df[i,j]<- 0.5*(1 - cos(datmat[i] - datmat[j]))
> > }
> > }
> > hcA<-hclust(df,method="average")
> > ****************************************************
> > Ahmed
> > Florida
> >
> >
> > ---------------------------------
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
> >
>
>
>
> Ahmed Albatineh,PhD
> Assistant Professor of Statistics
> Nova Southeastern University
> Fort Lauderdale, FL 33314
> U.S.A
>
> ---------------------------------
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Clustering question \ dist(datmat)

kumar zaman
Liaw and Gabor : Thank you a bunch, you both are right, i just doubled check my data it just put things the way hclust likes. Thank you again
   
  Ahmed
   
  Florida
 

"Liaw, Andy" <[hidden email]> wrote:
  as.dist() does _not_ recompute the distances if given a matrix. It simply
takes the lower triangular portion of the distance matrix given and attach
some attributes about the original dimension. I don't think you need to
object to that.

Andy

From: kumar zaman

>
> Dear Gabor and all ;
>
> I know this will work; but i already have a distance matrix
> calculated using my distance measure Dij = 0.5 * ( 1 -
> cos(theta_i - theta_j)), if i do hclust(as.dist(df)) then i
> am taking distance another time for a matrix " df " which is
> supposed to be a distance matrix, i hope i am clear ;
>
> ps: I just found out i can use " kmeans(df, 3,
> iter.max=100)" it will take df as calculated by Dij. I still
> need to use methods in hclust like " single, average, ward,
> median, mcquitty, ...etc"
>
> Thank u anyway.
>
>
> Gabor Grothendieck wrote:
> A distance matrix must be of class "dist". Try
>
> hclust(as.dist(df))
>
>
> On 3/26/06, kumar zaman wrote:
> > Hello everybody. I am trying to cluster circular data (data points
> > which are angles), thus i can not use the "dist" function
> in "mclust"
> > to generate my distance matrix, I am using the function "
> Dij = 0.5*(
> > 1 - cos(theta_i - theta_j)). The thing is "hclust" will not accept
> > this distance matrix, i tried to put it in a data frame,
> but again i
> > get an error message saying " Error in if (n < 2)
> stop("must have n >=
> > 2 objects to cluster") : argument is of length zero". The distance
> > matrix "dist" producing is a lower triangular one, mine is a square
> > matrix, which i think does not matter. My question how to make
> > "hclust" process my distance matrix, what i am doing wrong.
> I am sure
> > the problem is with the distance matrix format, Any suggestions are
> > highly apprciated, the code below shows what i have done.
> >
> > clust1<- as.vector(rvm(5,5,15))
> > clust2<- as.vector(rvm(5,10,15))
> > clust3<- as.vector(rvm(5,15,15))
> > clust4<- as.vector(rvm(5,20,15))
> > clust5<- as.vector(rvm(5,25,15))
> > data1<- rbind(clust1,clust2,clust3,clust4,clust5)
> > datmat<- matrix(data1,nrow=25,ncol=1,byrow=TRUE)
> > circ.plot(datmat)
> > df<- array(dim=c(25,25))
> > for (i in 1:25){
> > for (j in 1:25){
> > df[i,j]<- 0.5*(1 - cos(datmat[i] - datmat[j]))
> > }
> > }
> > hcA<-hclust(df,method="average")
> > ****************************************************
> > Ahmed
> > Florida
> >
> >
> > ---------------------------------
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
> >
>
>
>
> Ahmed Albatineh,PhD
> Assistant Professor of Statistics
> Nova Southeastern University
> Fort Lauderdale, FL 33314
> U.S.A
>
> ---------------------------------
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>


------------------------------------------------------------------------------

------------------------------------------------------------------------------



Ahmed Albatineh,PhD
Assistant Professor of Statistics
Nova Southeastern University
Fort Lauderdale, FL 33314
U.S.A
               
---------------------------------

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Clustering question \ dist(datmat)

Sean Davis
In reply to this post by kumar zaman



On 3/27/06 12:19 AM, "kumar zaman" <[hidden email]> wrote:

> Dear Gabor and all ;
>    
>   I know this will work; but i already have a distance matrix calculated using
> my distance measure Dij = 0.5 * ( 1 - cos(theta_i - theta_j)), if i do
> hclust(as.dist(df)) then i am taking distance another time for a matrix " df "
> which is supposed to be a distance matrix, i hope i am clear ;
>    
>   ps: I just found out i can use " kmeans(df, 3, iter.max=100)" it will take
> df as calculated by Dij. I still need to use methods in hclust like " single,
> average, ward, median, mcquitty, ...etc"
>    
>   Thank u anyway.

Kumar,

If I understand Your point, you are misunderstanding what as.dist() does.
It does not compute a distance matrix.  Instead, it simply makes a matrix
into a "dist" object, which is NOT just a matrix.  However, the distances in
a matrix converted to a "dist" object are not altered.  Therefore, you are
not "taking distance another time"; instead, you are simply converting the
distance matrix into a form that hclust can understand.

Hope that helps clarify.

Sean


 

> Gabor Grothendieck <[hidden email]> wrote:
>   A distance matrix must be of class "dist". Try
>
> hclust(as.dist(df))
>
>
> On 3/26/06, kumar zaman wrote:
>> Hello everybody. I am trying to cluster circular data (data points which are
>> angles), thus i can not use the "dist" function in "mclust" to generate my
>> distance matrix, I am using the function " Dij = 0.5*( 1 - cos(theta_i -
>> theta_j)). The thing is "hclust" will not accept this distance matrix, i
>> tried to put it in a data frame, but again i get an error message saying "
>> Error in if (n < 2) stop("must have n >= 2 objects to cluster") : argument is
>> of length zero". The distance matrix "dist" producing is a lower triangular
>> one, mine is a square matrix, which i think does not matter. My question how
>> to make "hclust" process my distance matrix, what i am doing wrong. I am sure
>> the problem is with the distance matrix format, Any suggestions are highly
>> apprciated, the code below shows what i have done.
>>
>> clust1<- as.vector(rvm(5,5,15))
>> clust2<- as.vector(rvm(5,10,15))
>> clust3<- as.vector(rvm(5,15,15))
>> clust4<- as.vector(rvm(5,20,15))
>> clust5<- as.vector(rvm(5,25,15))
>> data1<- rbind(clust1,clust2,clust3,clust4,clust5)
>> datmat<- matrix(data1,nrow=25,ncol=1,byrow=TRUE)
>> circ.plot(datmat)
>> df<- array(dim=c(25,25))
>> for (i in 1:25){
>> for (j in 1:25){
>> df[i,j]<- 0.5*(1 - cos(datmat[i] - datmat[j]))
>> }
>> }
>> hcA<-hclust(df,method="average")
>> ****************************************************
>> Ahmed
>> Florida
>>
>>
>> ---------------------------------
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>>
>
>
>
> Ahmed Albatineh,PhD
> Assistant Professor of Statistics
> Nova Southeastern University
> Fort Lauderdale, FL 33314
> U.S.A
>
> ---------------------------------
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html