which element is duplicated?

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

which element is duplicated?

Duncan Murdoch-2
The duplicated() function gives TRUE if an item in a vector (or row in a
matrix, etc.) is a duplicate of an earlier item.  But what I would like
to know is which item does it duplicate?

For example,

v <- c("a", "b", "b", "a")
duplicated(v)

returns

[1] FALSE FALSE  TRUE  TRUE

What I want is a fast way to calculate

  [1] NA NA 2 1

or (equally useful to me)

  [1] 1 2 2 1

The result should have the property that if result[i] == j, then v[i] ==
v[j], at least for i != j.

Does this already exist somewhere, or is it easy to write?

Duncan Murdoch

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: which element is duplicated?

Michael Sumner-2
what about   as.integer(factor(v, levels = unique(v)))

I recall very clearly when I realized the power of this feature of
factor(), but I've not seen it discussed much.

Cheers, Mike.

On Tue, 13 Nov 2018 at 12:08 Duncan Murdoch <[hidden email]>
wrote:

> The duplicated() function gives TRUE if an item in a vector (or row in a
> matrix, etc.) is a duplicate of an earlier item.  But what I would like
> to know is which item does it duplicate?
>
> For example,
>
> v <- c("a", "b", "b", "a")
> duplicated(v)
>
> returns
>
> [1] FALSE FALSE  TRUE  TRUE
>
> What I want is a fast way to calculate
>
>   [1] NA NA 2 1
>
> or (equally useful to me)
>
>   [1] 1 2 2 1
>
> The result should have the property that if result[i] == j, then v[i] ==
> v[j], at least for i != j.
>
> Does this already exist somewhere, or is it easy to write?
>
> Duncan Murdoch
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Dr. Michael Sumner
Software and Database Engineer
Australian Antarctic Division
203 Channel Highway
Kingston Tasmania 7050 Australia

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: which element is duplicated?

Bert Gunter-2
In reply to this post by Duncan Murdoch-2
> match(v, unique(v))
[1] 1 2 2 1

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Nov 12, 2018 at 5:08 PM Duncan Murdoch <[hidden email]>
wrote:

> The duplicated() function gives TRUE if an item in a vector (or row in a
> matrix, etc.) is a duplicate of an earlier item.  But what I would like
> to know is which item does it duplicate?
>
> For example,
>
> v <- c("a", "b", "b", "a")
> duplicated(v)
>
> returns
>
> [1] FALSE FALSE  TRUE  TRUE
>
> What I want is a fast way to calculate
>
>   [1] NA NA 2 1
>
> or (equally useful to me)
>
>   [1] 1 2 2 1
>
> The result should have the property that if result[i] == j, then v[i] ==
> v[j], at least for i != j.
>
> Does this already exist somewhere, or is it easy to write?
>
> Duncan Murdoch
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: which element is duplicated?

Hervé Pagès-2
In reply to this post by Duncan Murdoch-2
Hi,

On 11/12/18 17:08, Duncan Murdoch wrote:

> The duplicated() function gives TRUE if an item in a vector (or row in
> a matrix, etc.) is a duplicate of an earlier item.  But what I would
> like to know is which item does it duplicate?
>
> For example,
>
> v <- c("a", "b", "b", "a")
> duplicated(v)
>
> returns
>
> [1] FALSE FALSE  TRUE  TRUE
>
> What I want is a fast way to calculate
>
>  [1] NA NA 2 1
>
> or (equally useful to me)
>
>  [1] 1 2 2 1
>
> The result should have the property that if result[i] == j, then v[i]
> == v[j], at least for i != j.
>
> Does this already exist somewhere, or is it easy to write?

I generally use match() for that:

 > v <- c("a", "b", "b", "a")

 > match(v, v)

[1] 1 2 2 1

H.

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [hidden email]
Phone:  (206) 667-5791
Fax:    (206) 667-1319

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: which element is duplicated?

Bert Gunter-2
In reply to this post by Bert Gunter-2
It is not clear to what you want for the general case. Perhaps:

> v <- letters[c(2,2,1,2,1,1)]
> wh <- tapply(seq_along(v),factor(v), '[',1)
> w <- wh[match(v,v[wh])]
> w
b b a b a a
1 1 3 1 3 3
> ## and if you want NA's for the first occurences of unique values
> ## of course:
> w[wh] <- NA
> w
 b  b  a  b  a  a
NA  1 NA  1  3  3

I'd like to see a cleverer solution that vectorizes and avoids the
tapply(), though.

Cheers,
Bert




On Mon, Nov 12, 2018 at 8:33 PM Bert Gunter <[hidden email]> wrote:

> > match(v, unique(v))
> [1] 1 2 2 1
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Mon, Nov 12, 2018 at 5:08 PM Duncan Murdoch <[hidden email]>
> wrote:
>
>> The duplicated() function gives TRUE if an item in a vector (or row in a
>> matrix, etc.) is a duplicate of an earlier item.  But what I would like
>> to know is which item does it duplicate?
>>
>> For example,
>>
>> v <- c("a", "b", "b", "a")
>> duplicated(v)
>>
>> returns
>>
>> [1] FALSE FALSE  TRUE  TRUE
>>
>> What I want is a fast way to calculate
>>
>>   [1] NA NA 2 1
>>
>> or (equally useful to me)
>>
>>   [1] 1 2 2 1
>>
>> The result should have the property that if result[i] == j, then v[i] ==
>> v[j], at least for i != j.
>>
>> Does this already exist somewhere, or is it easy to write?
>>
>> Duncan Murdoch
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: which element is duplicated?

Bert Gunter-2
"I'd like to see a cleverer solution that vectorizes..."

and Herve provided it.


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Nov 12, 2018 at 9:43 PM Bert Gunter <[hidden email]> wrote:

> It is not clear to what you want for the general case. Perhaps:
>
> > v <- letters[c(2,2,1,2,1,1)]
> > wh <- tapply(seq_along(v),factor(v), '[',1)
> > w <- wh[match(v,v[wh])]
> > w
> b b a b a a
> 1 1 3 1 3 3
> > ## and if you want NA's for the first occurences of unique values
> > ## of course:
> > w[wh] <- NA
> > w
>  b  b  a  b  a  a
> NA  1 NA  1  3  3
>
> I'd like to see a cleverer solution that vectorizes and avoids the
> tapply(), though.
>
> Cheers,
> Bert
>
>
>
>
> On Mon, Nov 12, 2018 at 8:33 PM Bert Gunter <[hidden email]>
> wrote:
>
>> > match(v, unique(v))
>> [1] 1 2 2 1
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Mon, Nov 12, 2018 at 5:08 PM Duncan Murdoch <[hidden email]>
>> wrote:
>>
>>> The duplicated() function gives TRUE if an item in a vector (or row in a
>>> matrix, etc.) is a duplicate of an earlier item.  But what I would like
>>> to know is which item does it duplicate?
>>>
>>> For example,
>>>
>>> v <- c("a", "b", "b", "a")
>>> duplicated(v)
>>>
>>> returns
>>>
>>> [1] FALSE FALSE  TRUE  TRUE
>>>
>>> What I want is a fast way to calculate
>>>
>>>   [1] NA NA 2 1
>>>
>>> or (equally useful to me)
>>>
>>>   [1] 1 2 2 1
>>>
>>> The result should have the property that if result[i] == j, then v[i] ==
>>> v[j], at least for i != j.
>>>
>>> Does this already exist somewhere, or is it easy to write?
>>>
>>> Duncan Murdoch
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: which element is duplicated?

PIKAL Petr
In reply to this post by Bert Gunter-2
Hi

similar result (with different numerical values) could be achieved by making v a factor.

> v <- letters[c(2,2,1,2,1,1)]
> vf<-factor(v)
> as.numeric(vf)
[1] 2 2 1 2 1 1

Cheers
Petr

> -----Original Message-----
> From: R-help <[hidden email]> On Behalf Of Bert Gunter
> Sent: Tuesday, November 13, 2018 6:44 AM
> To: Duncan Murdoch <[hidden email]>
> Cc: R-help <[hidden email]>
> Subject: Re: [R] which element is duplicated?
>
> It is not clear to what you want for the general case. Perhaps:
>
> > v <- letters[c(2,2,1,2,1,1)]
> > wh <- tapply(seq_along(v),factor(v), '[',1) w <- wh[match(v,v[wh])] w
> b b a b a a
> 1 1 3 1 3 3
> > ## and if you want NA's for the first occurences of unique values ##
> > of course:
> > w[wh] <- NA
> > w
>  b  b  a  b  a  a
> NA  1 NA  1  3  3
>
> I'd like to see a cleverer solution that vectorizes and avoids the tapply(),
> though.
>
> Cheers,
> Bert
>
>
>
>
> On Mon, Nov 12, 2018 at 8:33 PM Bert Gunter <[hidden email]>
> wrote:
>
> > > match(v, unique(v))
> > [1] 1 2 2 1
> >
> > Bert Gunter
> >
> > "The trouble with having an open mind is that people keep coming along
> > and sticking things into it."
> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >
> >
> > On Mon, Nov 12, 2018 at 5:08 PM Duncan Murdoch
> > <[hidden email]>
> > wrote:
> >
> >> The duplicated() function gives TRUE if an item in a vector (or row
> >> in a matrix, etc.) is a duplicate of an earlier item.  But what I
> >> would like to know is which item does it duplicate?
> >>
> >> For example,
> >>
> >> v <- c("a", "b", "b", "a")
> >> duplicated(v)
> >>
> >> returns
> >>
> >> [1] FALSE FALSE  TRUE  TRUE
> >>
> >> What I want is a fast way to calculate
> >>
> >>   [1] NA NA 2 1
> >>
> >> or (equally useful to me)
> >>
> >>   [1] 1 2 2 1
> >>
> >> The result should have the property that if result[i] == j, then v[i]
> >> == v[j], at least for i != j.
> >>
> >> Does this already exist somewhere, or is it easy to write?
> >>
> >> Duncan Murdoch
> >>
> >> ______________________________________________
> >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner’s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/
Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: which element is duplicated?

Martin Maechler
>>>>> PIKAL Petr
>>>>>     on Tue, 13 Nov 2018 08:42:22 +0000 writes:

    > Hi
    > similar result (with different numerical values) could
    > be achieved by making v a factor.

> > v <- letters[c(2,2,1,2,1,1)]
> > vf<-factor(v)
> > as.numeric(vf)
> [1] 2 2 1 2 1 1
>
> Cheers
> Petr

Yes, as was already remarked by Michael Sumner.

But really the power is in  match() :  It is called at *twice* by factor().

Martin

> > -----Original Message-----
> > From: R-help <[hidden email]> On Behalf Of Bert Gunter
> > Sent: Tuesday, November 13, 2018 6:44 AM
> > To: Duncan Murdoch <[hidden email]>
> > Cc: R-help <[hidden email]>
> > Subject: Re: [R] which element is duplicated?
> >
> > It is not clear to what you want for the general case. Perhaps:
> >
> > > v <- letters[c(2,2,1,2,1,1)]
> > > wh <- tapply(seq_along(v),factor(v), '[',1) w <- wh[match(v,v[wh])] w
> > b b a b a a
> > 1 1 3 1 3 3
> > > ## and if you want NA's for the first occurences of unique values ##
> > > of course:
> > > w[wh] <- NA
> > > w
> >  b  b  a  b  a  a
> > NA  1 NA  1  3  3
> >
> > I'd like to see a cleverer solution that vectorizes and avoids the tapply(),
> > though.
> >
> > Cheers,
> > Bert
> >
> >
> >
> >
> > On Mon, Nov 12, 2018 at 8:33 PM Bert Gunter <[hidden email]>
> > wrote:
> >
> > > > match(v, unique(v))
> > > [1] 1 2 2 1
> > >
> > > Bert Gunter
> > >
> > > "The trouble with having an open mind is that people keep coming along
> > > and sticking things into it."
> > > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> > >
> > >
> > > On Mon, Nov 12, 2018 at 5:08 PM Duncan Murdoch
> > > <[hidden email]>
> > > wrote:
> > >
> > >> The duplicated() function gives TRUE if an item in a vector (or row
> > >> in a matrix, etc.) is a duplicate of an earlier item.  But what I
> > >> would like to know is which item does it duplicate?
> > >>
> > >> For example,
> > >>
> > >> v <- c("a", "b", "b", "a")
> > >> duplicated(v)
> > >>
> > >> returns
> > >>
> > >> [1] FALSE FALSE  TRUE  TRUE
> > >>
> > >> What I want is a fast way to calculate
> > >>
> > >>   [1] NA NA 2 1
> > >>
> > >> or (equally useful to me)
> > >>
> > >>   [1] 1 2 2 1
> > >>
> > >> The result should have the property that if result[i] == j, then v[i]
> > >> == v[j], at least for i != j.
> > >>
> > >> Does this already exist somewhere, or is it easy to write?
> > >>
> > >> Duncan Murdoch
> > >>
> > >> ______________________________________________
> > >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide
> > >> http://www.R-project.org/posting-guide.html
> > >> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: which element is duplicated?

Duncan Murdoch-2
In reply to this post by Hervé Pagès-2
On 13/11/2018 12:35 AM, Pages, Herve wrote:

> Hi,
>
> On 11/12/18 17:08, Duncan Murdoch wrote:
>> The duplicated() function gives TRUE if an item in a vector (or row in
>> a matrix, etc.) is a duplicate of an earlier item.  But what I would
>> like to know is which item does it duplicate?
>>
>> For example,
>>
>> v <- c("a", "b", "b", "a")
>> duplicated(v)
>>
>> returns
>>
>> [1] FALSE FALSE  TRUE  TRUE
>>
>> What I want is a fast way to calculate
>>
>>   [1] NA NA 2 1
>>
>> or (equally useful to me)
>>
>>   [1] 1 2 2 1
>>
>> The result should have the property that if result[i] == j, then v[i]
>> == v[j], at least for i != j.
>>
>> Does this already exist somewhere, or is it easy to write?
>
> I generally use match() for that:
>
>   > v <- c("a", "b", "b", "a")
>
>   > match(v, v)
>
> [1] 1 2 2 1

Yes, this is perfect.  Thanks to you (and the private answer I received
that suggested the same).

Duncan Murdoch

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: which element is duplicated?

R help mailing list-2
You also asked about doing this for the rows of a matrix.  unique() give
the unique rows but match operates on a per element, not per row,
basis.  You can use split, which operates on rows of a matrix, to help.

> m <- cbind( A=c(i=5,ii=5,iii=5,iv=4,v=4,vi=4), B=c(2,3,2,2,2,2) )
> unique(m)
   A B
i  5 2
ii 5 3
iv 4 2
> match(m, unique(m)) # bad
 [1] 1 1 1 3 3 3 4 5 4 4 4 4
> asRows <- function(x) split(x, seq_len(NROW(x))) # convert to list of rows
> match(asRows(m), unique(asRows(m)))
[1] 1 2 1 3 3 3


For data.frames unique works on rows but match works on columns, and
converting
to a list of rows does not quite work, because unique looks at the row
names.  A
modification of asRoiws works around that:

> d <- data.frame(m)
> unique(d)
   A B
i  5 2
ii 5 3
iv 4 2
> match(d, unique(d))
[1] NA NA
> asRows <- function(x) lapply(split(x, seq_len(NROW(x))), as.list)
> match(asRows(d), unique(asRows(d)))
[1] 1 2 1 3 3 3


Is this the sort of issue that Hadley's vectors package is addressing?

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Nov 13, 2018 at 2:15 AM, Duncan Murdoch <[hidden email]>
wrote:

> On 13/11/2018 12:35 AM, Pages, Herve wrote:
>
>> Hi,
>>
>> On 11/12/18 17:08, Duncan Murdoch wrote:
>>
>>> The duplicated() function gives TRUE if an item in a vector (or row in
>>> a matrix, etc.) is a duplicate of an earlier item.  But what I would
>>> like to know is which item does it duplicate?
>>>
>>> For example,
>>>
>>> v <- c("a", "b", "b", "a")
>>> duplicated(v)
>>>
>>> returns
>>>
>>> [1] FALSE FALSE  TRUE  TRUE
>>>
>>> What I want is a fast way to calculate
>>>
>>>   [1] NA NA 2 1
>>>
>>> or (equally useful to me)
>>>
>>>   [1] 1 2 2 1
>>>
>>> The result should have the property that if result[i] == j, then v[i]
>>> == v[j], at least for i != j.
>>>
>>> Does this already exist somewhere, or is it easy to write?
>>>
>>
>> I generally use match() for that:
>>
>>   > v <- c("a", "b", "b", "a")
>>
>>   > match(v, v)
>>
>> [1] 1 2 2 1
>>
>
> Yes, this is perfect.  Thanks to you (and the private answer I received
> that suggested the same).
>
> Duncan Murdoch
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posti
> ng-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: which element is duplicated?

Duncan Murdoch-2
On 13/11/2018 12:58 PM, William Dunlap wrote:

> You also asked about doing this for the rows of a matrix.  unique() give
> the unique rows but match operates on a per element, not per row,
> basis.  You can use split, which operates on rows of a matrix, to help.
>
>      > m <- cbind( A=c(i=5,ii=5,iii=5,iv=4,v=4,vi=4), B=c(2,3,2,2,2,2) )
>      > unique(m)
>         A B
>     i  5 2
>     ii 5 3
>     iv 4 2
>      > match(m, unique(m)) # bad
>       [1] 1 1 1 3 3 3 4 5 4 4 4 4
>      > asRows <- function(x) split(x, seq_len(NROW(x))) # convert to
>     list of rows
>      > match(asRows(m), unique(asRows(m)))
>     [1] 1 2 1 3 3 3
>
>
> For data.frames unique works on rows but match works on columns, and
> converting
> to a list of rows does not quite work, because unique looks at the row
> names.  A
> modification of asRoiws works around that:
>
>      > d <- data.frame(m)
>      > unique(d)
>         A B
>     i  5 2
>     ii 5 3
>     iv 4 2
>      > match(d, unique(d))
>     [1] NA NA
>      > asRows <- function(x) lapply(split(x, seq_len(NROW(x))), as.list)
>      > match(asRows(d), unique(asRows(d)))
>     [1] 1 2 1 3 3 3
>

Thanks!  That's very nice.

>
> Is this the sort of issue that Hadley's vectors package is addressing?
I don't know; hopefully someone else will respond...

Duncan Murdoch

>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com <http://tibco.com>
>
> On Tue, Nov 13, 2018 at 2:15 AM, Duncan Murdoch
> <[hidden email] <mailto:[hidden email]>> wrote:
>
>     On 13/11/2018 12:35 AM, Pages, Herve wrote:
>
>         Hi,
>
>         On 11/12/18 17:08, Duncan Murdoch wrote:
>
>             The duplicated() function gives TRUE if an item in a vector
>             (or row in
>             a matrix, etc.) is a duplicate of an earlier item.  But what
>             I would
>             like to know is which item does it duplicate?
>
>             For example,
>
>             v <- c("a", "b", "b", "a")
>             duplicated(v)
>
>             returns
>
>             [1] FALSE FALSE  TRUE  TRUE
>
>             What I want is a fast way to calculate
>
>                [1] NA NA 2 1
>
>             or (equally useful to me)
>
>                [1] 1 2 2 1
>
>             The result should have the property that if result[i] == j,
>             then v[i]
>             == v[j], at least for i != j.
>
>             Does this already exist somewhere, or is it easy to write?
>
>
>         I generally use match() for that:
>
>            > v <- c("a", "b", "b", "a")
>
>            > match(v, v)
>
>         [1] 1 2 2 1
>
>
>     Yes, this is perfect.  Thanks to you (and the private answer I
>     received that suggested the same).
>
>     Duncan Murdoch
>
>     ______________________________________________
>     [hidden email] <mailto:[hidden email]> mailing list --
>     To UNSUBSCRIBE and more, see
>     https://stat.ethz.ch/mailman/listinfo/r-help
>     <https://stat.ethz.ch/mailman/listinfo/r-help>
>     PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>     <http://www.R-project.org/posting-guide.html>
>     and provide commented, minimal, self-contained, reproducible code.
>
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.