Unique?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Unique?

Guenther, Cameron

Hello,
I have sample data set that looks like:

YEAR MONTH DAY CONTINUE SPL TIMEFISH
TIMEUNIT AREA COUNTY DEPTH DEPUNIT GEAR TRIPID
CONVUNIT
1992 1 26 1 SP0073928 8
H 7 25 4 NA 1000000
02163399054 161
1992 1 26 1 SP0073928 8
H 7 25 4 NA 1000000
02163399054 8
1992 1 26 2 SP0004228 8
H 7 25 4 NA 1000000
02163399054 161
1992 1 26 2 SP0004228 8
H 7 25 4 NA 1000000
02163399054 8
1992 1 25 NA SP0052652 8
H 7 25 4 NA 1000000
02163399057 85
1992 1 26 NA SP0037940 8
H 7 25 4 NA 1000000
02163399058 70
1992 1 27 NA SP0072357 8
H 7 25 4 NA 1000000
02163399059 15
1992 1 27 NA SP0072357 8
H 7 25 4 NA 1000000
02163399059 20
1992 1 27 NA SP0026324 8
H 7 25 4 NA 1000000
02163399060 8
1992 1 28 1 SP0072357 8
H 7 25 4 NA 1000000
02163399062 200

How can I use unique to extract the rows that have repeated tripid's
only, not a unique value for each variable but only for TRIPID.  I then
want to condense the unique values by summing the CONVUNIT for each
unique value of TRIPID.  I posted a similar question last week and
received a sufficient answer of how to do this without using uniqe.  The
solution below worked just fine on this sample data set but the full
data set has 446,000 rows of data and my computer and R simply cannot
handle this follwing code on data this large.

conds<-by(Step4,Step4$TRIPID,function(x)
replace(x[1,],"CONVUNIT",sum(x$CONVUNIT)))
Step5<-do.call(rbind,conds)

Thank you,

Cameron Guenther, Ph.D.
Associate Research Scientist
FWC/FWRI, Marine Fisheries Research
100 8th Avenue S.E.
St. Petersburg, FL 33701
(727)896-8626 Ext. 4305
[hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Unique?

Robert Citek

On May 10, 2006, at 4:02 PM, Guenther, Cameron wrote:
> How can I use unique to extract the rows that have repeated tripid's
> only, not a unique value for each variable but only for TRIPID.  I  
> then
> want to condense the unique values by summing the CONVUNIT for each
> unique value of TRIPID.

Thanks, Cameron, for this question.  This type of manipulation would  
be relatively simple to do in a RDBMS (e.g. MySQL, PostgreSQL,  
Oracle, etc.)  But I'm curious to see how one would do the same in  
R.  So, if folks send you solutions off-list, please do post them  
back to the list.

Regards,
- Robert
http://www.cwelug.org/downloads
Help others get OpenSource software.  Distribute FLOSS
for Windows, Linux, *BSD, and MacOS X with BitTorrent

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Unique?

Francisco J. Zagmutt
In reply to this post by Guenther, Cameron
If you only care about the sum of CONVUNIT by each TRIPID then you can use
tapply i.e.:

step4<-data.frame(TRIPID=rep(c(111,222,333),3),CONVUNIT=rpois(9,40))
result<-tapply(step4$CONVUNIT,INDEX=step4$TRIPID,FUN=sum)
result
111 222 333
115 107 123

Is this what you wanted to do?  I can't think of anything faster than tapply
for your problem.

I hope this helps

Francisco




>From: "Guenther, Cameron" <[hidden email]>
>To: <[hidden email]>
>Subject: [R] Unique?
>Date: Wed, 10 May 2006 17:02:33 -0400
>
>
>Hello,
>I have sample data set that looks like:
>
>YEAR MONTH DAY CONTINUE SPL TIMEFISH
>TIMEUNIT AREA COUNTY DEPTH DEPUNIT GEAR TRIPID
>CONVUNIT
>1992 1 26 1 SP0073928 8
>H 7 25 4 NA 1000000
>02163399054 161
>1992 1 26 1 SP0073928 8
>H 7 25 4 NA 1000000
>02163399054 8
>1992 1 26 2 SP0004228 8
>H 7 25 4 NA 1000000
>02163399054 161
>1992 1 26 2 SP0004228 8
>H 7 25 4 NA 1000000
>02163399054 8
>1992 1 25 NA SP0052652 8
>H 7 25 4 NA 1000000
>02163399057 85
>1992 1 26 NA SP0037940 8
>H 7 25 4 NA 1000000
>02163399058 70
>1992 1 27 NA SP0072357 8
>H 7 25 4 NA 1000000
>02163399059 15
>1992 1 27 NA SP0072357 8
>H 7 25 4 NA 1000000
>02163399059 20
>1992 1 27 NA SP0026324 8
>H 7 25 4 NA 1000000
>02163399060 8
>1992 1 28 1 SP0072357 8
>H 7 25 4 NA 1000000
>02163399062 200
>
>How can I use unique to extract the rows that have repeated tripid's
>only, not a unique value for each variable but only for TRIPID.  I then
>want to condense the unique values by summing the CONVUNIT for each
>unique value of TRIPID.  I posted a similar question last week and
>received a sufficient answer of how to do this without using uniqe.  The
>solution below worked just fine on this sample data set but the full
>data set has 446,000 rows of data and my computer and R simply cannot
>handle this follwing code on data this large.
>
>conds<-by(Step4,Step4$TRIPID,function(x)
>replace(x[1,],"CONVUNIT",sum(x$CONVUNIT)))
>Step5<-do.call(rbind,conds)
>
>Thank you,
>
>Cameron Guenther, Ph.D.
>Associate Research Scientist
>FWC/FWRI, Marine Fisheries Research
>100 8th Avenue S.E.
>St. Petersburg, FL 33701
>(727)896-8626 Ext. 4305
>[hidden email]
>
>______________________________________________
>[hidden email] mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide!
>http://www.R-project.org/posting-guide.html

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Unique?

Dave Armstrong
In reply to this post by Robert Citek
Dear Cameron,

This is not with unique, but it gets the job done.  Just create a new
variable that is the three variables concatenated together.  Then, you can
just sum by this variable, like the following:

mymat <- matrix(letters, ncol=3, nrow=260)
mymat <- as.data.frame(mymat)
mymat$dat <- rnorm(260)
mymat$id <- paste(mymat[,1], mymat[,2], mymat[,3])
aggregate(mymat$dat, list(mymat$id), sum)

HTH,
Dave.

On 5/10/06, Robert Citek <[hidden email]> wrote:

>
> On May 10, 2006, at 4:02 PM, Guenther, Cameron wrote:
> > How can I use unique to extract the rows that have repeated tripid's
> > only, not a unique value for each variable but only for TRIPID.  I
> > then
> > want to condense the unique values by summing the CONVUNIT for each
> > unique value of TRIPID.
>
> Thanks, Cameron, for this question.  This type of manipulation would
> be relatively simple to do in a RDBMS (e.g. MySQL, PostgreSQL,
> Oracle, etc.)  But I'm curious to see how one would do the same in
> R.  So, if folks send you solutions off-list, please do post them
> back to the list.
>
> Regards,
> - Robert
> http://www.cwelug.org/downloads
> Help others get OpenSource software.  Distribute FLOSS
> for Windows, Linux, *BSD, and MacOS X with BitTorrent
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>



--
Dave Armstrong
University of Maryland
Dept of Government and Politics
3140 Tydings Hall
College Park, MD 20742
Office: 2103L Cole Field House
Phone: 301-405-9735
e-mail: [hidden email]
web: www.davearmstrong-ps.com

Facts are meaningless.  You can use facts to prove anything that's even
remotely true. - Homer Simpson

To this day, philosophers suffer from Plato's disease: the assumption that
reality fundamentally consists of
abstract essences best described by words or geometry. (In truth, reality is
largely a probabilistic affair best
described by statistics) - Steve Sailer "The Unexpected Uselessness of
Philosophy"

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Unique?

Francisco J. Zagmutt
In reply to this post by Guenther, Cameron
Hi Cameron

You need to be more specific when you ask a question so you can get a better
answer.  Anyhow, when you say that you want to retain all the other
variables do you mean that you want to create a new column in the dataset
that contains the calculated sum?   If that is the case you can use a
construction like:

set.seed(1)
step4<-data.frame(TRIPID=rep(c(111,222,333),3),CONVUNIT=rpois(9,40))
result<-tapply(step4$CONVUNIT,INDEX=step4$TRIPID,FUN=sum)
step4[,"SUM"]=result[match(step4[,"TRIPID"],names(result))]
step4
  TRIPID CONVUNIT Sum
1    111       36 122
2    222       48 121
3    333       48 129
4    111       42 122
5    222       30 121
6    333       43 129
7    111       44 122
8    222       43 121
9    333       38 129


Cheers

Francisco

>From: "Guenther, Cameron" <[hidden email]>
>To: "Francisco J. Zagmutt" <[hidden email]>
>Subject: RE: [R] Unique?
>Date: Thu, 11 May 2006 12:08:31 -0400
>
>It is close but not quite what I want.  I need to retain all of the
>other variables as well.
>
>
>Cameron Guenther, Ph.D.
>Associate Research Scientist
>FWC/FWRI, Marine Fisheries Research
>100 8th Avenue S.E.
>St. Petersburg, FL 33701
>(727)896-8626 Ext. 4305
>[hidden email]
>-----Original Message-----
>From: Francisco J. Zagmutt [mailto:[hidden email]]
>Sent: Wednesday, May 10, 2006 6:06 PM
>To: Guenther, Cameron; [hidden email]
>Subject: RE: [R] Unique?
>
>If you only care about the sum of CONVUNIT by each TRIPID then you can
>use tapply i.e.:
>
>step4<-data.frame(TRIPID=rep(c(111,222,333),3),CONVUNIT=rpois(9,40))
>result<-tapply(step4$CONVUNIT,INDEX=step4$TRIPID,FUN=sum)
>result
>111 222 333
>115 107 123
>
>Is this what you wanted to do?  I can't think of anything faster than
>tapply for your problem.
>
>I hope this helps
>
>Francisco
>
>
>
>
> >From: "Guenther, Cameron" <[hidden email]>
> >To: <[hidden email]>
> >Subject: [R] Unique?
> >Date: Wed, 10 May 2006 17:02:33 -0400
> >
> >
> >Hello,
> >I have sample data set that looks like:
> >
> >YEAR MONTH DAY CONTINUE SPL TIMEFISH
> >TIMEUNIT AREA COUNTY DEPTH DEPUNIT GEAR TRIPID
> >CONVUNIT
> >1992 1 26 1 SP0073928 8
> >H 7 25 4 NA 1000000
> >02163399054 161
> >1992 1 26 1 SP0073928 8
> >H 7 25 4 NA 1000000
> >02163399054 8
> >1992 1 26 2 SP0004228 8
> >H 7 25 4 NA 1000000
> >02163399054 161
> >1992 1 26 2 SP0004228 8
> >H 7 25 4 NA 1000000
> >02163399054 8
> >1992 1 25 NA SP0052652 8
> >H 7 25 4 NA 1000000
> >02163399057 85
> >1992 1 26 NA SP0037940 8
> >H 7 25 4 NA 1000000
> >02163399058 70
> >1992 1 27 NA SP0072357 8
> >H 7 25 4 NA 1000000
> >02163399059 15
> >1992 1 27 NA SP0072357 8
> >H 7 25 4 NA 1000000
> >02163399059 20
> >1992 1 27 NA SP0026324 8
> >H 7 25 4 NA 1000000
> >02163399060 8
> >1992 1 28 1 SP0072357 8
> >H 7 25 4 NA 1000000
> >02163399062 200
> >
> >How can I use unique to extract the rows that have repeated tripid's
> >only, not a unique value for each variable but only for TRIPID.  I then
>
> >want to condense the unique values by summing the CONVUNIT for each
> >unique value of TRIPID.  I posted a similar question last week and
> >received a sufficient answer of how to do this without using uniqe.
> >The solution below worked just fine on this sample data set but the
> >full data set has 446,000 rows of data and my computer and R simply
> >cannot handle this follwing code on data this large.
> >
> >conds<-by(Step4,Step4$TRIPID,function(x)
> >replace(x[1,],"CONVUNIT",sum(x$CONVUNIT)))
> >Step5<-do.call(rbind,conds)
> >
> >Thank you,
> >
> >Cameron Guenther, Ph.D.
> >Associate Research Scientist
> >FWC/FWRI, Marine Fisheries Research
> >100 8th Avenue S.E.
> >St. Petersburg, FL 33701
> >(727)896-8626 Ext. 4305
> >[hidden email]
> >
> >______________________________________________
> >[hidden email] mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide!
> >http://www.R-project.org/posting-guide.html
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html