Quantcast

Simple question on finding duplicates

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Simple question on finding duplicates

Jeff-3

   I'm  trying  to find duplicate values in a column of a data frame. For
   example, dataframe (a) below has two 3's. I would like to mark each value of
   each row as either not being a duplicate of the one before (0), or as a
   duplicate (1) - for example, as in dataframe (b). In SPSS, I would simply
   compare each value to it's "lagged" value, but I can't figure out how to do
   this with R.
   Can someone point me in the right direction?
   Thanks
   a <- data.frame( col1 = c(1,2,3,3,4))
   b <- data.frame( col1 = c(1,2,3,3,4), duplicate = c(0,0,0,1,0))
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Simple question on finding duplicates

David Carlson
duplicate <- ifelse(c(0, a$col[-length(a$col)])==c(a$col), 1, 0)

----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352


> -----Original Message-----
> From: [hidden email] [mailto:r-help-bounces@r-
> project.org] On Behalf Of Jeff
> Sent: Wednesday, July 25, 2012 3:06 PM
> To: [hidden email]
> Subject: [R] Simple question on finding duplicates
>
>
>    I'm  trying  to find duplicate values in a column of a data frame.
> For
>    example, dataframe (a) below has two 3's. I would like to mark each
> value of
>    each row as either not being a duplicate of the one before (0), or
> as a
>    duplicate (1) - for example, as in dataframe (b). In SPSS, I would
> simply
>    compare each value to it's "lagged" value, but I can't figure out
> how to do
>    this with R.
>    Can someone point me in the right direction?
>    Thanks
>    a <- data.frame( col1 = c(1,2,3,3,4))
>    b <- data.frame( col1 = c(1,2,3,3,4), duplicate = c(0,0,0,1,0))
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Simple question on finding duplicates

David Carlson
In reply to this post by Jeff-3
Minor correction:

duplicate <- ifelse(c(0, a$col[-length(a$col)])==a$col, 1, 0)

-------
David


> -----Original Message-----
> From: David L Carlson [mailto:[hidden email]]
> Sent: Wednesday, July 25, 2012 3:23 PM
> To: 'Jeff'; '[hidden email]'
> Subject: RE: [R] Simple question on finding duplicates
>
> duplicate <- ifelse(c(0, a$col[-length(a$col)])==c(a$col), 1, 0)
>
> ----------------------------------------------
> David L Carlson
> Associate Professor of Anthropology
> Texas A&M University
> College Station, TX 77843-4352
>
>
> > -----Original Message-----
> > From: [hidden email] [mailto:r-help-bounces@r-
> > project.org] On Behalf Of Jeff
> > Sent: Wednesday, July 25, 2012 3:06 PM
> > To: [hidden email]
> > Subject: [R] Simple question on finding duplicates
> >
> >
> >    I'm  trying  to find duplicate values in a column of a data frame.
> > For
> >    example, dataframe (a) below has two 3's. I would like to mark
> each
> > value of
> >    each row as either not being a duplicate of the one before (0), or
> > as a
> >    duplicate (1) - for example, as in dataframe (b). In SPSS, I would
> > simply
> >    compare each value to it's "lagged" value, but I can't figure out
> > how to do
> >    this with R.
> >    Can someone point me in the right direction?
> >    Thanks
> >    a <- data.frame( col1 = c(1,2,3,3,4))
> >    b <- data.frame( col1 = c(1,2,3,3,4), duplicate = c(0,0,0,1,0))
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Simple question on finding duplicates

arun kirshna
In reply to this post by Jeff-3
HI,
Try this:


  a <- data.frame( col1 = c(1,2,3,3,4))
a<-within(a, duplicate<-c(0,ifelse(diff(a$col1)==0,1,0)))
 a
  col1 duplicate
1    1         0
2    2         0
3    3         0
4    3         1
5    4         0
A.K.



----- Original Message -----
From: Jeff <[hidden email]>
To: [hidden email]
Cc:
Sent: Wednesday, July 25, 2012 4:05 PM
Subject: [R] Simple question on finding duplicates


   I'm  trying  to find duplicate values in a column of a data frame. For
   example, dataframe (a) below has two 3's. I would like to mark each value of
   each row as either not being a duplicate of the one before (0), or as a
   duplicate (1) - for example, as in dataframe (b). In SPSS, I would simply
   compare each value to it's "lagged" value, but I can't figure out how to do
   this with R.
   Can someone point me in the right direction?
   Thanks
   a <- data.frame( col1 = c(1,2,3,3,4))
   b <- data.frame( col1 = c(1,2,3,3,4), duplicate = c(0,0,0,1,0))
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Simple question on finding duplicates

Bert Gunter
In reply to this post by David Carlson
ummm...
?duplicates

-- Bert

On Wed, Jul 25, 2012 at 1:22 PM, David L Carlson <[hidden email]> wrote:

> duplicate <- ifelse(c(0, a$col[-length(a$col)])==c(a$col), 1, 0)
>
> ----------------------------------------------
> David L Carlson
> Associate Professor of Anthropology
> Texas A&M University
> College Station, TX 77843-4352
>
>
>> -----Original Message-----
>> From: [hidden email] [mailto:r-help-bounces@r-
>> project.org] On Behalf Of Jeff
>> Sent: Wednesday, July 25, 2012 3:06 PM
>> To: [hidden email]
>> Subject: [R] Simple question on finding duplicates
>>
>>
>>    I'm  trying  to find duplicate values in a column of a data frame.
>> For
>>    example, dataframe (a) below has two 3's. I would like to mark each
>> value of
>>    each row as either not being a duplicate of the one before (0), or
>> as a
>>    duplicate (1) - for example, as in dataframe (b). In SPSS, I would
>> simply
>>    compare each value to it's "lagged" value, but I can't figure out
>> how to do
>>    this with R.
>>    Can someone point me in the right direction?
>>    Thanks
>>    a <- data.frame( col1 = c(1,2,3,3,4))
>>    b <- data.frame( col1 = c(1,2,3,3,4), duplicate = c(0,0,0,1,0))
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Simple question on finding duplicates

Bert Gunter
Sorry...
?duplicated

-- Bert

On Wed, Jul 25, 2012 at 1:28 PM, Bert Gunter <[hidden email]> wrote:

> ummm...
> ?duplicates
>
> -- Bert
>
> On Wed, Jul 25, 2012 at 1:22 PM, David L Carlson <[hidden email]> wrote:
>> duplicate <- ifelse(c(0, a$col[-length(a$col)])==c(a$col), 1, 0)
>>
>> ----------------------------------------------
>> David L Carlson
>> Associate Professor of Anthropology
>> Texas A&M University
>> College Station, TX 77843-4352
>>
>>
>>> -----Original Message-----
>>> From: [hidden email] [mailto:r-help-bounces@r-
>>> project.org] On Behalf Of Jeff
>>> Sent: Wednesday, July 25, 2012 3:06 PM
>>> To: [hidden email]
>>> Subject: [R] Simple question on finding duplicates
>>>
>>>
>>>    I'm  trying  to find duplicate values in a column of a data frame.
>>> For
>>>    example, dataframe (a) below has two 3's. I would like to mark each
>>> value of
>>>    each row as either not being a duplicate of the one before (0), or
>>> as a
>>>    duplicate (1) - for example, as in dataframe (b). In SPSS, I would
>>> simply
>>>    compare each value to it's "lagged" value, but I can't figure out
>>> how to do
>>>    this with R.
>>>    Can someone point me in the right direction?
>>>    Thanks
>>>    a <- data.frame( col1 = c(1,2,3,3,4))
>>>    b <- data.frame( col1 = c(1,2,3,3,4), duplicate = c(0,0,0,1,0))
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>>> guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



--

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Simple question on finding duplicates

Peter Ehlers
In reply to this post by Jeff-3

duplicate <- c(0, diff(a[,"col1"]) == 0)

Peter Ehlers

On 2012-07-25 13:05, Jeff wrote:

>
>     I'm  trying  to find duplicate values in a column of a data frame. For
>     example, dataframe (a) below has two 3's. I would like to mark each value of
>     each row as either not being a duplicate of the one before (0), or as a
>     duplicate (1) - for example, as in dataframe (b). In SPSS, I would simply
>     compare each value to it's "lagged" value, but I can't figure out how to do
>     this with R.
>     Can someone point me in the right direction?
>     Thanks
>     a <- data.frame( col1 = c(1,2,3,3,4))
>     b <- data.frame( col1 = c(1,2,3,3,4), duplicate = c(0,0,0,1,0))
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...