|
I'm trying to find duplicate values in a column of a data frame. For example, dataframe (a) below has two 3's. I would like to mark each value of each row as either not being a duplicate of the one before (0), or as a duplicate (1) - for example, as in dataframe (b). In SPSS, I would simply compare each value to it's "lagged" value, but I can't figure out how to do this with R. Can someone point me in the right direction? Thanks a <- data.frame( col1 = c(1,2,3,3,4)) b <- data.frame( col1 = c(1,2,3,3,4), duplicate = c(0,0,0,1,0)) ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
duplicate <- ifelse(c(0, a$col[-length(a$col)])==c(a$col), 1, 0)
---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352 > -----Original Message----- > From: [hidden email] [mailto:r-help-bounces@r- > project.org] On Behalf Of Jeff > Sent: Wednesday, July 25, 2012 3:06 PM > To: [hidden email] > Subject: [R] Simple question on finding duplicates > > > I'm trying to find duplicate values in a column of a data frame. > For > example, dataframe (a) below has two 3's. I would like to mark each > value of > each row as either not being a duplicate of the one before (0), or > as a > duplicate (1) - for example, as in dataframe (b). In SPSS, I would > simply > compare each value to it's "lagged" value, but I can't figure out > how to do > this with R. > Can someone point me in the right direction? > Thanks > a <- data.frame( col1 = c(1,2,3,3,4)) > b <- data.frame( col1 = c(1,2,3,3,4), duplicate = c(0,0,0,1,0)) > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Jeff-3
Minor correction:
duplicate <- ifelse(c(0, a$col[-length(a$col)])==a$col, 1, 0) ------- David > -----Original Message----- > From: David L Carlson [mailto:[hidden email]] > Sent: Wednesday, July 25, 2012 3:23 PM > To: 'Jeff'; '[hidden email]' > Subject: RE: [R] Simple question on finding duplicates > > duplicate <- ifelse(c(0, a$col[-length(a$col)])==c(a$col), 1, 0) > > ---------------------------------------------- > David L Carlson > Associate Professor of Anthropology > Texas A&M University > College Station, TX 77843-4352 > > > > -----Original Message----- > > From: [hidden email] [mailto:r-help-bounces@r- > > project.org] On Behalf Of Jeff > > Sent: Wednesday, July 25, 2012 3:06 PM > > To: [hidden email] > > Subject: [R] Simple question on finding duplicates > > > > > > I'm trying to find duplicate values in a column of a data frame. > > For > > example, dataframe (a) below has two 3's. I would like to mark > each > > value of > > each row as either not being a duplicate of the one before (0), or > > as a > > duplicate (1) - for example, as in dataframe (b). In SPSS, I would > > simply > > compare each value to it's "lagged" value, but I can't figure out > > how to do > > this with R. > > Can someone point me in the right direction? > > Thanks > > a <- data.frame( col1 = c(1,2,3,3,4)) > > b <- data.frame( col1 = c(1,2,3,3,4), duplicate = c(0,0,0,1,0)) > > ______________________________________________ > > [hidden email] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting- > > guide.html > > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Jeff-3
HI,
Try this: a <- data.frame( col1 = c(1,2,3,3,4)) a<-within(a, duplicate<-c(0,ifelse(diff(a$col1)==0,1,0))) a col1 duplicate 1 1 0 2 2 0 3 3 0 4 3 1 5 4 0 A.K. ----- Original Message ----- From: Jeff <[hidden email]> To: [hidden email] Cc: Sent: Wednesday, July 25, 2012 4:05 PM Subject: [R] Simple question on finding duplicates I'm trying to find duplicate values in a column of a data frame. For example, dataframe (a) below has two 3's. I would like to mark each value of each row as either not being a duplicate of the one before (0), or as a duplicate (1) - for example, as in dataframe (b). In SPSS, I would simply compare each value to it's "lagged" value, but I can't figure out how to do this with R. Can someone point me in the right direction? Thanks a <- data.frame( col1 = c(1,2,3,3,4)) b <- data.frame( col1 = c(1,2,3,3,4), duplicate = c(0,0,0,1,0)) ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by David Carlson
ummm...
?duplicates -- Bert On Wed, Jul 25, 2012 at 1:22 PM, David L Carlson <[hidden email]> wrote: > duplicate <- ifelse(c(0, a$col[-length(a$col)])==c(a$col), 1, 0) > > ---------------------------------------------- > David L Carlson > Associate Professor of Anthropology > Texas A&M University > College Station, TX 77843-4352 > > >> -----Original Message----- >> From: [hidden email] [mailto:r-help-bounces@r- >> project.org] On Behalf Of Jeff >> Sent: Wednesday, July 25, 2012 3:06 PM >> To: [hidden email] >> Subject: [R] Simple question on finding duplicates >> >> >> I'm trying to find duplicate values in a column of a data frame. >> For >> example, dataframe (a) below has two 3's. I would like to mark each >> value of >> each row as either not being a duplicate of the one before (0), or >> as a >> duplicate (1) - for example, as in dataframe (b). In SPSS, I would >> simply >> compare each value to it's "lagged" value, but I can't figure out >> how to do >> this with R. >> Can someone point me in the right direction? >> Thanks >> a <- data.frame( col1 = c(1,2,3,3,4)) >> b <- data.frame( col1 = c(1,2,3,3,4), duplicate = c(0,0,0,1,0)) >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting- >> guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Sorry...
?duplicated -- Bert On Wed, Jul 25, 2012 at 1:28 PM, Bert Gunter <[hidden email]> wrote: > ummm... > ?duplicates > > -- Bert > > On Wed, Jul 25, 2012 at 1:22 PM, David L Carlson <[hidden email]> wrote: >> duplicate <- ifelse(c(0, a$col[-length(a$col)])==c(a$col), 1, 0) >> >> ---------------------------------------------- >> David L Carlson >> Associate Professor of Anthropology >> Texas A&M University >> College Station, TX 77843-4352 >> >> >>> -----Original Message----- >>> From: [hidden email] [mailto:r-help-bounces@r- >>> project.org] On Behalf Of Jeff >>> Sent: Wednesday, July 25, 2012 3:06 PM >>> To: [hidden email] >>> Subject: [R] Simple question on finding duplicates >>> >>> >>> I'm trying to find duplicate values in a column of a data frame. >>> For >>> example, dataframe (a) below has two 3's. I would like to mark each >>> value of >>> each row as either not being a duplicate of the one before (0), or >>> as a >>> duplicate (1) - for example, as in dataframe (b). In SPSS, I would >>> simply >>> compare each value to it's "lagged" value, but I can't figure out >>> how to do >>> this with R. >>> Can someone point me in the right direction? >>> Thanks >>> a <- data.frame( col1 = c(1,2,3,3,4)) >>> b <- data.frame( col1 = c(1,2,3,3,4), duplicate = c(0,0,0,1,0)) >>> ______________________________________________ >>> [hidden email] mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting- >>> guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Jeff-3
duplicate <- c(0, diff(a[,"col1"]) == 0) Peter Ehlers On 2012-07-25 13:05, Jeff wrote: > > I'm trying to find duplicate values in a column of a data frame. For > example, dataframe (a) below has two 3's. I would like to mark each value of > each row as either not being a duplicate of the one before (0), or as a > duplicate (1) - for example, as in dataframe (b). In SPSS, I would simply > compare each value to it's "lagged" value, but I can't figure out how to do > this with R. > Can someone point me in the right direction? > Thanks > a <- data.frame( col1 = c(1,2,3,3,4)) > b <- data.frame( col1 = c(1,2,3,3,4), duplicate = c(0,0,0,1,0)) > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
| Powered by Nabble | Edit this page |
