Find last row (observation) for each combination of variables

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Find last row (observation) for each combination of variables

Leif Kirschenbaum-4
Let's say I have a data.frame like
A B C TS other columns
1 1 1 12345
1 1 1 56789
1 2 1 23456
1 2 2 23457
2 4 7 23458
2 4 7 34567
2 4 7 45678

and I want the last row for each unique combination of A/B/C, where by "last" I mean greatest TS.
A B C TS other columns
1 1 1 56789
1 2 1 23456
1 2 2 23457
2 4 7 45678

I did this simply in SAS:
 proc sort data=DF;
   by A B C descending TS
 run;
 proc sort data=DF NODUPKEY;
   by A B C;
 run;

I tried using "aggregate" to find the maximum TS for each combination of A/B/C, but it's slow.
I also tried "by" but it's also slow.
My current (faster) solution is:

 DF$abc<-paste(DF$A,DF$B,DF$C,sep="")
 abclist<-unique(DF$ABC)
 numtest<-length(abclist)
 maxTS<-rep(0,numtest)
 for(i in 1:numtest){
  maxTS[i]<-max(DF$TS[DF$abc==abclist[i]],na.rm=TRUE)
 }
 maxTSdf<-data.frame(device=I(abc),maxTS=maxTS )
 DF<-merge(DF,maxTSdf,by="abc",all.x=TRUE)
 DF<-Df[DF$TS==DF$maxTS,,drop=TRUE]
 DF$maxTS<-NULL

This seems a bit lengthy for such a simple task.

Any simpler suggestions?

-Leif K.

Leif Kirschenbaum
Senior Yield Engineer
Reflectivity, Inc.
(408) 737-8100 x307
[hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Find last row (observation) for each combination of variables

Bert Gunter
Leif:

Rather than trying to mimic what you might do in SAS take advantage of R's
ability to use arbitrary data structures, e.g. lists. So, one approach is:

(your.df is the data frame)

your.list<-split(your.df,your.ts[,1:3],drop=TRUE)
t(sapply(your.list,function(x)x[which.max(x$TS),]))

Cheers,
Bert

-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
"The business of the statistician is to catalyze the scientific learning
process."  - George E. P. Box
 
 

> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Leif
> Kirschenbaum
> Sent: Tuesday, January 10, 2006 1:04 PM
> To: [hidden email]
> Subject: [R] Find last row (observation) for each combination
> of variables
>
> Let's say I have a data.frame like
> A B C TS other columns
> 1 1 1 12345
> 1 1 1 56789
> 1 2 1 23456
> 1 2 2 23457
> 2 4 7 23458
> 2 4 7 34567
> 2 4 7 45678
>
> and I want the last row for each unique combination of A/B/C,
> where by "last" I mean greatest TS.
> A B C TS other columns
> 1 1 1 56789
> 1 2 1 23456
> 1 2 2 23457
> 2 4 7 45678
>
> I did this simply in SAS:
>  proc sort data=DF;
>    by A B C descending TS
>  run;
>  proc sort data=DF NODUPKEY;
>    by A B C;
>  run;
>
> I tried using "aggregate" to find the maximum TS for each
> combination of A/B/C, but it's slow.
> I also tried "by" but it's also slow.
> My current (faster) solution is:
>
>  DF$abc<-paste(DF$A,DF$B,DF$C,sep="")
>  abclist<-unique(DF$ABC)
>  numtest<-length(abclist)
>  maxTS<-rep(0,numtest)
>  for(i in 1:numtest){
>   maxTS[i]<-max(DF$TS[DF$abc==abclist[i]],na.rm=TRUE)
>  }
>  maxTSdf<-data.frame(device=I(abc),maxTS=maxTS )
>  DF<-merge(DF,maxTSdf,by="abc",all.x=TRUE)
>  DF<-Df[DF$TS==DF$maxTS,,drop=TRUE]
>  DF$maxTS<-NULL
>
> This seems a bit lengthy for such a simple task.
>
> Any simpler suggestions?
>
> -Leif K.
>
> Leif Kirschenbaum
> Senior Yield Engineer
> Reflectivity, Inc.
> (408) 737-8100 x307
> [hidden email]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html