# which element is duplicated?

11 messages
Open this post in threaded view
|

## which element is duplicated?

 The duplicated() function gives TRUE if an item in a vector (or row in a matrix, etc.) is a duplicate of an earlier item.  But what I would like to know is which item does it duplicate? For example, v <- c("a", "b", "b", "a") duplicated(v) returns [1] FALSE FALSE  TRUE  TRUE What I want is a fast way to calculate   [1] NA NA 2 1 or (equally useful to me)   [1] 1 2 2 1 The result should have the property that if result[i] == j, then v[i] == v[j], at least for i != j. Does this already exist somewhere, or is it easy to write? Duncan Murdoch ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: which element is duplicated?

 what about   as.integer(factor(v, levels = unique(v))) I recall very clearly when I realized the power of this feature of factor(), but I've not seen it discussed much. Cheers, Mike. On Tue, 13 Nov 2018 at 12:08 Duncan Murdoch <[hidden email]> wrote: > The duplicated() function gives TRUE if an item in a vector (or row in a > matrix, etc.) is a duplicate of an earlier item.  But what I would like > to know is which item does it duplicate? > > For example, > > v <- c("a", "b", "b", "a") > duplicated(v) > > returns > > [1] FALSE FALSE  TRUE  TRUE > > What I want is a fast way to calculate > >   [1] NA NA 2 1 > > or (equally useful to me) > >   [1] 1 2 2 1 > > The result should have the property that if result[i] == j, then v[i] == > v[j], at least for i != j. > > Does this already exist somewhere, or is it easy to write? > > Duncan Murdoch > > ______________________________________________ > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. > -- Dr. Michael Sumner Software and Database Engineer Australian Antarctic Division 203 Channel Highway Kingston Tasmania 7050 Australia         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: which element is duplicated?

 In reply to this post by Duncan Murdoch-2 > match(v, unique(v)) [1] 1 2 2 1 Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Nov 12, 2018 at 5:08 PM Duncan Murdoch <[hidden email]> wrote: > The duplicated() function gives TRUE if an item in a vector (or row in a > matrix, etc.) is a duplicate of an earlier item.  But what I would like > to know is which item does it duplicate? > > For example, > > v <- c("a", "b", "b", "a") > duplicated(v) > > returns > > [1] FALSE FALSE  TRUE  TRUE > > What I want is a fast way to calculate > >   [1] NA NA 2 1 > > or (equally useful to me) > >   [1] 1 2 2 1 > > The result should have the property that if result[i] == j, then v[i] == > v[j], at least for i != j. > > Does this already exist somewhere, or is it easy to write? > > Duncan Murdoch > > ______________________________________________ > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. >         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: which element is duplicated?

 In reply to this post by Duncan Murdoch-2 Hi, On 11/12/18 17:08, Duncan Murdoch wrote: > The duplicated() function gives TRUE if an item in a vector (or row in > a matrix, etc.) is a duplicate of an earlier item.  But what I would > like to know is which item does it duplicate? > > For example, > > v <- c("a", "b", "b", "a") > duplicated(v) > > returns > > [1] FALSE FALSE  TRUE  TRUE > > What I want is a fast way to calculate > >  [1] NA NA 2 1 > > or (equally useful to me) > >  [1] 1 2 2 1 > > The result should have the property that if result[i] == j, then v[i] > == v[j], at least for i != j. > > Does this already exist somewhere, or is it easy to write? I generally use match() for that:  > v <- c("a", "b", "b", "a")  > match(v, v) [1] 1 2 2 1 H. > > Duncan Murdoch > > ______________________________________________ > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=APEsp-OzJs6YdfshtiYe715BsAor8xTu26lpN4KGOrU&s=opxT_5og2YaWKdiXD-cRz0gWxGGMRG6kq20Jo8711qA&e=  > > PLEASE do read the posting guide > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=APEsp-OzJs6YdfshtiYe715BsAor8xTu26lpN4KGOrU&s=ZaPnASTzuEmE8EHqFL6F5wYkPhhg_uv-CMrGjY2-_Q4&e=> and provide commented, minimal, self-contained, reproducible code. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: [hidden email] Phone:  (206) 667-5791 Fax:    (206) 667-1319 ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: which element is duplicated?

 In reply to this post by Bert Gunter-2 It is not clear to what you want for the general case. Perhaps: > v <- letters[c(2,2,1,2,1,1)] > wh <- tapply(seq_along(v),factor(v), '[',1) > w <- wh[match(v,v[wh])] > w b b a b a a 1 1 3 1 3 3 > ## and if you want NA's for the first occurences of unique values > ## of course: > w[wh] <- NA > w  b  b  a  b  a  a NA  1 NA  1  3  3 I'd like to see a cleverer solution that vectorizes and avoids the tapply(), though. Cheers, Bert On Mon, Nov 12, 2018 at 8:33 PM Bert Gunter <[hidden email]> wrote: > > match(v, unique(v)) > [1] 1 2 2 1 > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Mon, Nov 12, 2018 at 5:08 PM Duncan Murdoch <[hidden email]> > wrote: > >> The duplicated() function gives TRUE if an item in a vector (or row in a >> matrix, etc.) is a duplicate of an earlier item.  But what I would like >> to know is which item does it duplicate? >> >> For example, >> >> v <- c("a", "b", "b", "a") >> duplicated(v) >> >> returns >> >> [1] FALSE FALSE  TRUE  TRUE >> >> What I want is a fast way to calculate >> >>   [1] NA NA 2 1 >> >> or (equally useful to me) >> >>   [1] 1 2 2 1 >> >> The result should have the property that if result[i] == j, then v[i] == >> v[j], at least for i != j. >> >> Does this already exist somewhere, or is it easy to write? >> >> Duncan Murdoch >> >> ______________________________________________ >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help>> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html>> and provide commented, minimal, self-contained, reproducible code. >> >         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: which element is duplicated?

 "I'd like to see a cleverer solution that vectorizes..." and Herve provided it. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Nov 12, 2018 at 9:43 PM Bert Gunter <[hidden email]> wrote: > It is not clear to what you want for the general case. Perhaps: > > > v <- letters[c(2,2,1,2,1,1)] > > wh <- tapply(seq_along(v),factor(v), '[',1) > > w <- wh[match(v,v[wh])] > > w > b b a b a a > 1 1 3 1 3 3 > > ## and if you want NA's for the first occurences of unique values > > ## of course: > > w[wh] <- NA > > w >  b  b  a  b  a  a > NA  1 NA  1  3  3 > > I'd like to see a cleverer solution that vectorizes and avoids the > tapply(), though. > > Cheers, > Bert > > > > > On Mon, Nov 12, 2018 at 8:33 PM Bert Gunter <[hidden email]> > wrote: > >> > match(v, unique(v)) >> [1] 1 2 2 1 >> >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along >> and sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> On Mon, Nov 12, 2018 at 5:08 PM Duncan Murdoch <[hidden email]> >> wrote: >> >>> The duplicated() function gives TRUE if an item in a vector (or row in a >>> matrix, etc.) is a duplicate of an earlier item.  But what I would like >>> to know is which item does it duplicate? >>> >>> For example, >>> >>> v <- c("a", "b", "b", "a") >>> duplicated(v) >>> >>> returns >>> >>> [1] FALSE FALSE  TRUE  TRUE >>> >>> What I want is a fast way to calculate >>> >>>   [1] NA NA 2 1 >>> >>> or (equally useful to me) >>> >>>   [1] 1 2 2 1 >>> >>> The result should have the property that if result[i] == j, then v[i] == >>> v[j], at least for i != j. >>> >>> Does this already exist somewhere, or is it easy to write? >>> >>> Duncan Murdoch >>> >>> ______________________________________________ >>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help>>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html>>> and provide commented, minimal, self-contained, reproducible code. >>> >>         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: which element is duplicated?

Open this post in threaded view
|

## Re: which element is duplicated?

 >>>>> PIKAL Petr >>>>>     on Tue, 13 Nov 2018 08:42:22 +0000 writes:     > Hi     > similar result (with different numerical values) could     > be achieved by making v a factor. > > v <- letters[c(2,2,1,2,1,1)] > > vf<-factor(v) > > as.numeric(vf) > [1] 2 2 1 2 1 1 > > Cheers > Petr Yes, as was already remarked by Michael Sumner. But really the power is in  match() :  It is called at *twice* by factor(). Martin > > -----Original Message----- > > From: R-help <[hidden email]> On Behalf Of Bert Gunter > > Sent: Tuesday, November 13, 2018 6:44 AM > > To: Duncan Murdoch <[hidden email]> > > Cc: R-help <[hidden email]> > > Subject: Re: [R] which element is duplicated? > > > > It is not clear to what you want for the general case. Perhaps: > > > > > v <- letters[c(2,2,1,2,1,1)] > > > wh <- tapply(seq_along(v),factor(v), '[',1) w <- wh[match(v,v[wh])] w > > b b a b a a > > 1 1 3 1 3 3 > > > ## and if you want NA's for the first occurences of unique values ## > > > of course: > > > w[wh] <- NA > > > w > >  b  b  a  b  a  a > > NA  1 NA  1  3  3 > > > > I'd like to see a cleverer solution that vectorizes and avoids the tapply(), > > though. > > > > Cheers, > > Bert > > > > > > > > > > On Mon, Nov 12, 2018 at 8:33 PM Bert Gunter <[hidden email]> > > wrote: > > > > > > match(v, unique(v)) > > > [1] 1 2 2 1 > > > > > > Bert Gunter > > > > > > "The trouble with having an open mind is that people keep coming along > > > and sticking things into it." > > > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > > > > > > > On Mon, Nov 12, 2018 at 5:08 PM Duncan Murdoch > > > <[hidden email]> > > > wrote: > > > > > >> The duplicated() function gives TRUE if an item in a vector (or row > > >> in a matrix, etc.) is a duplicate of an earlier item.  But what I > > >> would like to know is which item does it duplicate? > > >> > > >> For example, > > >> > > >> v <- c("a", "b", "b", "a") > > >> duplicated(v) > > >> > > >> returns > > >> > > >> [1] FALSE FALSE  TRUE  TRUE > > >> > > >> What I want is a fast way to calculate > > >> > > >>   [1] NA NA 2 1 > > >> > > >> or (equally useful to me) > > >> > > >>   [1] 1 2 2 1 > > >> > > >> The result should have the property that if result[i] == j, then v[i] > > >> == v[j], at least for i != j. > > >> > > >> Does this already exist somewhere, or is it easy to write? > > >> > > >> Duncan Murdoch > > >> > > >> ______________________________________________ > > >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see > > >> https://stat.ethz.ch/mailman/listinfo/r-help> > >> PLEASE do read the posting guide > > >> http://www.R-project.org/posting-guide.html> > >> and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: which element is duplicated?

 In reply to this post by Hervé Pagès-2 On 13/11/2018 12:35 AM, Pages, Herve wrote: > Hi, > > On 11/12/18 17:08, Duncan Murdoch wrote: >> The duplicated() function gives TRUE if an item in a vector (or row in >> a matrix, etc.) is a duplicate of an earlier item.  But what I would >> like to know is which item does it duplicate? >> >> For example, >> >> v <- c("a", "b", "b", "a") >> duplicated(v) >> >> returns >> >> [1] FALSE FALSE  TRUE  TRUE >> >> What I want is a fast way to calculate >> >>   [1] NA NA 2 1 >> >> or (equally useful to me) >> >>   [1] 1 2 2 1 >> >> The result should have the property that if result[i] == j, then v[i] >> == v[j], at least for i != j. >> >> Does this already exist somewhere, or is it easy to write? > > I generally use match() for that: > >   > v <- c("a", "b", "b", "a") > >   > match(v, v) > > [1] 1 2 2 1 Yes, this is perfect.  Thanks to you (and the private answer I received that suggested the same). Duncan Murdoch ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.