Why is it not possible to cut a tree returned by Agnes or Diana by height?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Why is it not possible to cut a tree returned by Agnes or Diana by height?

Leszek Nowina
    > asdf = data.frame(x=c(1,2,3), y=c(4,5,6), z=c(7,8,9))
    > cutree(agnes(asdf), h=100)
    Error in cutree(agnes(asdf), h = 100) :
      the 'height' component of 'tree' is not sorted (increasingly)
    > cutree(diana(asdf), h=100)
    Error in cutree(diana(asdf), h = 100) :
      the 'height' component of 'tree' is not sorted (increasingly)

I'm not sure if I understand why this is the case.

This is what I want: Cluster stuff by the //distances//, **not** by
how many clusters I want to have.

If two things are further from each other than X, they should go to
different clusters. Otherwise, the same cluster.

Is it unreasonable what I'm asking for? I image if I was to manually
implement Agnes or Diana this would go like that: stop joining
clusters if the smallest distance between any pair of clusters is
larger than X (Agnes) or stop dividing clusters if the largest cluster
has a diameter of X (Diana); but since both methods always join/divide
to the very end I thought using cutree with a height parameter would
give me what I need. It won't.

Am I missing something?

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Why is it not possible to cut a tree returned by Agnes or Diana by height?

Bert Gunter-2
Inline.

Bert Gunter


On Sun, Apr 14, 2019 at 4:12 PM Leszek Nowina <[hidden email]> wrote:

>     > asdf = data.frame(x=c(1,2,3), y=c(4,5,6), z=c(7,8,9))
>     > cutree(agnes(asdf), h=100)
>     Error in cutree(agnes(asdf), h = 100) :
>       the 'height' component of 'tree' is not sorted (increasingly)
>     > cutree(diana(asdf), h=100)
>     Error in cutree(diana(asdf), h = 100) :
>       the 'height' component of 'tree' is not sorted (increasingly)
>
> I'm not sure if I understand why this is the case.
>
> This is what I want: Cluster stuff by the //distances//, **not** by
> how many clusters I want to have.
>
> If two things are further from each other than X, they should go to
> different clusters. Otherwise, the same cluster.
>
> Is it unreasonable what I'm asking for?

Yes.

X and Y are at a distance 2. Y and Z are at a distance 2. X and Z are at a
distance 4. Your idea cannot be consistently applied if 3 is the cutoff for
clustering: Xand Z would have to go in different clusters but both be in
the same cluster as Y.

Maybe you need to spend some time with the literature before trying to cook
up your own notions.

Cheers,
Bert



> I image if I was to manually
> implement Agnes or Diana this would go like that: stop joining
> clusters if the smallest distance between any pair of clusters is
> larger than X (Agnes) or stop dividing clusters if the largest cluster
> has a diameter of X (Diana); but since both methods always join/divide
> to the very end I thought using cutree with a height parameter would
> give me what I need. It won't.
>
> Am I missing something?
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Why is it not possible to cut a tree returned by Agnes or Diana by height?

R help mailing list-2
In reply to this post by Leszek Nowina
I think cutree() only works on things inheriting from class 'hclust' and
agnes, et al do not produce such things.  There are as.hclust methods for
the output of agnes so you might try
    cutree( as.hclust( agnes(...)), h)
instead of
    cutree( agnes(...), h)

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Sun, Apr 14, 2019 at 4:12 PM Leszek Nowina <[hidden email]> wrote:

>     > asdf = data.frame(x=c(1,2,3), y=c(4,5,6), z=c(7,8,9))
>     > cutree(agnes(asdf), h=100)
>     Error in cutree(agnes(asdf), h = 100) :
>       the 'height' component of 'tree' is not sorted (increasingly)
>     > cutree(diana(asdf), h=100)
>     Error in cutree(diana(asdf), h = 100) :
>       the 'height' component of 'tree' is not sorted (increasingly)
>
> I'm not sure if I understand why this is the case.
>
> This is what I want: Cluster stuff by the //distances//, **not** by
> how many clusters I want to have.
>
> If two things are further from each other than X, they should go to
> different clusters. Otherwise, the same cluster.
>
> Is it unreasonable what I'm asking for? I image if I was to manually
> implement Agnes or Diana this would go like that: stop joining
> clusters if the smallest distance between any pair of clusters is
> larger than X (Agnes) or stop dividing clusters if the largest cluster
> has a diameter of X (Diana); but since both methods always join/divide
> to the very end I thought using cutree with a height parameter would
> give me what I need. It won't.
>
> Am I missing something?
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Why is it not possible to cut a tree returned by Agnes or Diana by height?

Leszek Nowina
In reply to this post by Bert Gunter-2
Either way, it would seem to me that cutree(tree, h=height) could be
easily implemented as cutree(tree, k=sum(tree$height>height)+1) - why
isn't it?

Or is this not really the same, despite what seems to me?

pon., 15 kwi 2019 o 01:30 Bert Gunter <[hidden email]> napisał(a):

>
> Inline.
>
> Bert Gunter
>
>
> On Sun, Apr 14, 2019 at 4:12 PM Leszek Nowina <[hidden email]> wrote:
>>
>>     > asdf = data.frame(x=c(1,2,3), y=c(4,5,6), z=c(7,8,9))
>>     > cutree(agnes(asdf), h=100)
>>     Error in cutree(agnes(asdf), h = 100) :
>>       the 'height' component of 'tree' is not sorted (increasingly)
>>     > cutree(diana(asdf), h=100)
>>     Error in cutree(diana(asdf), h = 100) :
>>       the 'height' component of 'tree' is not sorted (increasingly)
>>
>> I'm not sure if I understand why this is the case.
>>
>> This is what I want: Cluster stuff by the //distances//, **not** by
>> how many clusters I want to have.
>>
>> If two things are further from each other than X, they should go to
>> different clusters. Otherwise, the same cluster.
>>
>> Is it unreasonable what I'm asking for?
>
> Yes.
>
> X and Y are at a distance 2. Y and Z are at a distance 2. X and Z are at a distance 4. Your idea cannot be consistently applied if 3 is the cutoff for clustering: Xand Z would have to go in different clusters but both be in the same cluster as Y.
>
> Maybe you need to spend some time with the literature before trying to cook up your own notions.
>
> Cheers,
> Bert
>
>
>>
>> I image if I was to manually
>> implement Agnes or Diana this would go like that: stop joining
>> clusters if the smallest distance between any pair of clusters is
>> larger than X (Agnes) or stop dividing clusters if the largest cluster
>> has a diameter of X (Diana); but since both methods always join/divide
>> to the very end I thought using cutree with a height parameter would
>> give me what I need. It won't.
>>
>> Am I missing something?
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Why is it not possible to cut a tree returned by Agnes or Diana by height?

David Carlson
You can certainly use your computation to define the number of clusters.

Some clustering methods (e.g. Centroid, Median) use the distance to the centers of the clusters as the criterion for combining clusters, but these locations change as clusters are combined so that the distances between clusters can decrease as the clustering process continues. When this happens, the same height can refer to more than one number-of-clusters solution. These are referred to as dendrogram inversions. Only clustering methods that produce ultrametric trees guarantee to have a unique number-of-clusters at every height so the cutree() function uses the order of the heights to determine if the tree is ultrametric. Also the function was written before the cluster package so the documentation for that package should address this since it indicates that cutree() will work on an agnes object, but I haven't been able to find it so far. As Bill mentioned, the simplest solution is to use as.hclust():

> asdf.ag <- agnes(asdf)
> cutree(asdf.ag, h=2)
Error in cutree(asdf.ag, h = 2) :
  the 'height' component of 'tree' is not sorted (increasingly)
> cutree(as.hclust(asdf.ag), h=2)
[1] 1 2 2

> asdf.di <- diana(asdf)
> cutree(asdf.di, h=2)
Error in cutree(asdf.di, h = 2) :
  the 'height' component of 'tree' is not sorted (increasingly)
> cutree(as.hclust(asdf.di), h=2)
[1] 1 2 2

----------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77843-4352


-----Original Message-----
From: R-help <[hidden email]> On Behalf Of Leszek Nowina
Sent: Monday, April 15, 2019 8:10 AM
To: Bert Gunter <[hidden email]>
Cc: R-help <[hidden email]>
Subject: Re: [R] Why is it not possible to cut a tree returned by Agnes or Diana by height?

Either way, it would seem to me that cutree(tree, h=height) could be
easily implemented as cutree(tree, k=sum(tree$height>height)+1) - why
isn't it?

Or is this not really the same, despite what seems to me?

pon., 15 kwi 2019 o 01:30 Bert Gunter <[hidden email]> napisał(a):

>
> Inline.
>
> Bert Gunter
>
>
> On Sun, Apr 14, 2019 at 4:12 PM Leszek Nowina <[hidden email]> wrote:
>>
>>     > asdf = data.frame(x=c(1,2,3), y=c(4,5,6), z=c(7,8,9))
>>     > cutree(agnes(asdf), h=100)
>>     Error in cutree(agnes(asdf), h = 100) :
>>       the 'height' component of 'tree' is not sorted (increasingly)
>>     > cutree(diana(asdf), h=100)
>>     Error in cutree(diana(asdf), h = 100) :
>>       the 'height' component of 'tree' is not sorted (increasingly)
>>
>> I'm not sure if I understand why this is the case.
>>
>> This is what I want: Cluster stuff by the //distances//, **not** by
>> how many clusters I want to have.
>>
>> If two things are further from each other than X, they should go to
>> different clusters. Otherwise, the same cluster.
>>
>> Is it unreasonable what I'm asking for?
>
> Yes.
>
> X and Y are at a distance 2. Y and Z are at a distance 2. X and Z are at a distance 4. Your idea cannot be consistently applied if 3 is the cutoff for clustering: Xand Z would have to go in different clusters but both be in the same cluster as Y.
>
> Maybe you need to spend some time with the literature before trying to cook up your own notions.
>
> Cheers,
> Bert
>
>
>>
>> I image if I was to manually
>> implement Agnes or Diana this would go like that: stop joining
>> clusters if the smallest distance between any pair of clusters is
>> larger than X (Agnes) or stop dividing clusters if the largest cluster
>> has a diameter of X (Diana); but since both methods always join/divide
>> to the very end I thought using cutree with a height parameter would
>> give me what I need. It won't.
>>
>> Am I missing something?
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.