Quantcast

Select rows based on matching conditions and logical operators

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Select rows based on matching conditions and logical operators

kborgmann
Hi,
I have a dataset in which I would like to select rows based on matching conditions and return the maximum value of a variable else return one row if duplicate counts exist.  My dataset looks like this:
PGID PTID Year Visit  Count
6755 53121 2009 1 0
6755 53121 2009 2 0
6755 53121 2009 3 0
6755 53122 2008 1 0
6755 53122 2008 2 0
6755 53122 2008 3 1
6755 53122 2009 1 0
6755 53122 2009 2 1
6755 53122 2009 3 2

I would like to select rows if PTID and Year match and return the maximum count else return one row if counts are the same, such that I get this output
PGID PTID Year Visit  Count
6755 53121 2009 1 0
6755 53122 2008 3 1
6755 53122 2009 3 2

I tried the following code and the output is almost correct but duplicate values were included
df2<-with(df, sapply(split(df, list(PTID, Year)),
function(x) if (nrow(x)) x[which(x$Count==max(x$Count)),]))
df<-do.call(rbind,df)
rownames(df)<-1:nrow(df)

Any suggestions?
Thanks much for your responses!
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Select rows based on matching conditions and logical operators

Rui Barradas
Hello,

Apart from the output order this does it.
(I have changed 'df' to 'df1', 'df' is an R function, the F distribution
density.)


df1 <- read.table(text="
PGID PTID Year Visit  Count
6755 53121 2009 1 0
6755 53121 2009 2 0
6755 53121 2009 3 0
6755 53122 2008 1 0
6755 53122 2008 2 0
6755 53122 2008 3 1
6755 53122 2009 1 0
6755 53122 2009 2 1
6755 53122 2009 3 2", header=TRUE)


df2 <- with(df1, sapply(split(df1, list(PTID, Year)),
     function(x) if (nrow(x)) x[which.max(x$Count), ]))
df2 <- do.call(rbind, df2)
rownames(df2) <- 1:nrow(df2)
df2

which.max(9, not which().

Hope this helps,

Rui Barradas
Em 25-07-2012 18:10, kborgmann escreveu:

> Hi,
> I have a dataset in which I would like to select rows based on matching
> conditions and return the maximum value of a variable else return one row if
> duplicate counts exist.  My dataset looks like this:
> PGID PTID Year Visit  Count
> 6755 53121 2009 1 0
> 6755 53121 2009 2 0
> 6755 53121 2009 3 0
> 6755 53122 2008 1 0
> 6755 53122 2008 2 0
> 6755 53122 2008 3 1
> 6755 53122 2009 1 0
> 6755 53122 2009 2 1
> 6755 53122 2009 3 2
>
> I would like to select rows if PTID and Year match and return the maximum
> count else return one row if counts are the same, such that I get this
> output
> PGID PTID Year Visit  Count
> 6755 53121 2009 1 0
> 6755 53122 2008 3 1
> 6755 53122 2009 3 2
>
> I tried the following code and the output is almost correct but duplicate
> values were included
> df2<-with(df, sapply(split(df, list(PTID, Year)),
> function(x) if (nrow(x)) x[which(x$Count==max(x$Count)),]))
> df<-do.call(rbind,df)
> rownames(df)<-1:nrow(df)
>
> Any suggestions?
> Thanks much for your responses!
>
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Select-rows-based-on-matching-conditions-and-logical-operators-tp4637809.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Select rows based on matching conditions and logical operators

kborgmann
Thanks! which.max did the trick
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Select rows based on matching conditions and logical operators

arun kirshna
In reply to this post by kborgmann
Hi,

Try this:

dat1<-read.table(text="
PGID    PTID    Year    Visit  Count
6755    53121    2009    1    0
6755    53121    2009    2    0
6755    53121    2009    3    0
6755    53122    2008    1    0
6755    53122    2008    2    0
6755    53122    2008    3    1
6755    53122    2009    1    0
6755    53122    2009    2    1
6755    53122    2009    3    2
",sep="",header=TRUE)


dat2<-lapply(split(dat1,dat1$Count),function(x) x[which.max(x$Count),])
 do.call(rbind,dat2)
  PGID  PTID Year Visit Count
0 6755 53121 2009     1     0
1 6755 53122 2008     3     1
2 6755 53122 2009     3     2

A.K.



----- Original Message -----
From: kborgmann <[hidden email]>
To: [hidden email]
Cc:
Sent: Wednesday, July 25, 2012 1:10 PM
Subject: [R] Select rows based on matching conditions and logical operators

Hi,
I have a dataset in which I would like to select rows based on matching
conditions and return the maximum value of a variable else return one row if
duplicate counts exist.  My dataset looks like this:
PGID    PTID    Year     Visit  Count
6755    53121    2009    1    0
6755    53121    2009    2    0
6755    53121    2009    3    0
6755    53122    2008    1    0
6755    53122    2008    2    0
6755    53122    2008    3    1
6755    53122    2009    1    0
6755    53122    2009    2    1
6755    53122    2009    3    2

I would like to select rows if PTID and Year match and return the maximum
count else return one row if counts are the same, such that I get this
output
PGID    PTID    Year     Visit  Count
6755    53121    2009    1    0
6755    53122    2008    3    1
6755    53122    2009    3    2

I tried the following code and the output is almost correct but duplicate
values were included
df2<-with(df, sapply(split(df, list(PTID, Year)),
function(x) if (nrow(x)) x[which(x$Count==max(x$Count)),]))
df<-do.call(rbind,df)
rownames(df)<-1:nrow(df)

Any suggestions?
Thanks much for your responses!




--
View this message in context: http://r.789695.n4.nabble.com/Select-rows-based-on-matching-conditions-and-logical-operators-tp4637809.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Select rows based on matching conditions and logical operators

William Dunlap
In reply to this post by Rui Barradas
Rui,
  Your solution works, but it can be faster for large data.frames if you compute
the indices of the desired rows of the input data.frame and then using one
subscripting call to select the rows  instead of splitting the input data.frame
into a list of data.frames, extracting the desired row from each component,
and then calling rbind to put the rows together again.  E.g., compare your
approach, which I've put into the function f1
  f1 <- function (dataFrame)  {
      retval <- with(dataFrame, sapply(split(dataFrame, list(PTID,
          Year)), function(x) if (nrow(x))
          x[which.max(x$Count), ]))
      retval <- do.call(rbind, retval)
      rownames(retval) <- 1:nrow(retval)
      retval
  }
with one that computes a logical subscripting vector (by splitting just the
Counts vector, not the whole data.frame)
  f2 <- function (dataFrame)  {
      keep <- as.logical(ave(dataFrame$Count, droplevels(interaction(dataFrame$PTID,
          dataFrame$Year)), FUN = function(x) if (length(x)) seq_along(x) ==
          which.max(x)))
      dataFrame[keep, ]
  }

The both compute the same thing, aside from the fact that the rows
are in a different order (f2 keeps the order of the original data.frame)
and f2 leaves the original row label with the row.
> f1(df1)
  PGID  PTID Year Visit Count
1 6755 53122 2008     3     1
2 6755 53121 2009     1     0
3 6755 53122 2009     3     2
> f2(df1)
  PGID  PTID Year Visit Count
1 6755 53121 2009     1     0
6 6755 53122 2008     3     1
9 6755 53122 2009     3     2
When there are a lot of output rows the f2 can be quite a bit faster.

(I put the call to droplevels(interaction(...)) into the call to ave because ave
can waste a lot of time calling FUN for nonexistent interaction levels.)

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On
> Behalf Of Rui Barradas
> Sent: Wednesday, July 25, 2012 10:24 AM
> To: kborgmann
> Cc: r-help
> Subject: Re: [R] Select rows based on matching conditions and logical operators
>
> Hello,
>
> Apart from the output order this does it.
> (I have changed 'df' to 'df1', 'df' is an R function, the F distribution
> density.)
>
>
> df1 <- read.table(text="
> PGID PTID Year Visit  Count
> 6755 53121 2009 1 0
> 6755 53121 2009 2 0
> 6755 53121 2009 3 0
> 6755 53122 2008 1 0
> 6755 53122 2008 2 0
> 6755 53122 2008 3 1
> 6755 53122 2009 1 0
> 6755 53122 2009 2 1
> 6755 53122 2009 3 2", header=TRUE)
>
>
> df2 <- with(df1, sapply(split(df1, list(PTID, Year)),
>      function(x) if (nrow(x)) x[which.max(x$Count), ]))
> df2 <- do.call(rbind, df2)
> rownames(df2) <- 1:nrow(df2)
> df2
>
> which.max(9, not which().
>
> Hope this helps,
>
> Rui Barradas
> Em 25-07-2012 18:10, kborgmann escreveu:
> > Hi,
> > I have a dataset in which I would like to select rows based on matching
> > conditions and return the maximum value of a variable else return one row if
> > duplicate counts exist.  My dataset looks like this:
> > PGID PTID Year Visit  Count
> > 6755 53121 2009 1 0
> > 6755 53121 2009 2 0
> > 6755 53121 2009 3 0
> > 6755 53122 2008 1 0
> > 6755 53122 2008 2 0
> > 6755 53122 2008 3 1
> > 6755 53122 2009 1 0
> > 6755 53122 2009 2 1
> > 6755 53122 2009 3 2
> >
> > I would like to select rows if PTID and Year match and return the maximum
> > count else return one row if counts are the same, such that I get this
> > output
> > PGID PTID Year Visit  Count
> > 6755 53121 2009 1 0
> > 6755 53122 2008 3 1
> > 6755 53122 2009 3 2
> >
> > I tried the following code and the output is almost correct but duplicate
> > values were included
> > df2<-with(df, sapply(split(df, list(PTID, Year)),
> > function(x) if (nrow(x)) x[which(x$Count==max(x$Count)),]))
> > df<-do.call(rbind,df)
> > rownames(df)<-1:nrow(df)
> >
> > Any suggestions?
> > Thanks much for your responses!
> >
> >
> >
> >
> > --
> > View this message in context: http://r.789695.n4.nabble.com/Select-rows-based-
> on-matching-conditions-and-logical-operators-tp4637809.html
> > Sent from the R help mailing list archive at Nabble.com.
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Select rows based on matching conditions and logical operators

Rui Barradas
Hello,

You're right, thanks.
In my solution, I had tried to keep to the op as much as possible. A
glance at it made me realize that one change only would do the job, and
that was it, no performance worries.
I particularly liked the interaction/droplevels trick.

Rui Barradas

Em 25-07-2012 22:13, William Dunlap escreveu:

> Rui,
>    Your solution works, but it can be faster for large data.frames if you compute
> the indices of the desired rows of the input data.frame and then using one
> subscripting call to select the rows  instead of splitting the input data.frame
> into a list of data.frames, extracting the desired row from each component,
> and then calling rbind to put the rows together again.  E.g., compare your
> approach, which I've put into the function f1
>    f1 <- function (dataFrame)  {
>        retval <- with(dataFrame, sapply(split(dataFrame, list(PTID,
>            Year)), function(x) if (nrow(x))
>            x[which.max(x$Count), ]))
>        retval <- do.call(rbind, retval)
>        rownames(retval) <- 1:nrow(retval)
>        retval
>    }
> with one that computes a logical subscripting vector (by splitting just the
> Counts vector, not the whole data.frame)
>    f2 <- function (dataFrame)  {
>        keep <- as.logical(ave(dataFrame$Count, droplevels(interaction(dataFrame$PTID,
>            dataFrame$Year)), FUN = function(x) if (length(x)) seq_along(x) ==
>            which.max(x)))
>        dataFrame[keep, ]
>    }
>
> The both compute the same thing, aside from the fact that the rows
> are in a different order (f2 keeps the order of the original data.frame)
> and f2 leaves the original row label with the row.
>> f1(df1)
>    PGID  PTID Year Visit Count
> 1 6755 53122 2008     3     1
> 2 6755 53121 2009     1     0
> 3 6755 53122 2009     3     2
>> f2(df1)
>    PGID  PTID Year Visit Count
> 1 6755 53121 2009     1     0
> 6 6755 53122 2008     3     1
> 9 6755 53122 2009     3     2
> When there are a lot of output rows the f2 can be quite a bit faster.
>
> (I put the call to droplevels(interaction(...)) into the call to ave because ave
> can waste a lot of time calling FUN for nonexistent interaction levels.)
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
>> -----Original Message-----
>> From: [hidden email] [mailto:[hidden email]] On
>> Behalf Of Rui Barradas
>> Sent: Wednesday, July 25, 2012 10:24 AM
>> To: kborgmann
>> Cc: r-help
>> Subject: Re: [R] Select rows based on matching conditions and logical operators
>>
>> Hello,
>>
>> Apart from the output order this does it.
>> (I have changed 'df' to 'df1', 'df' is an R function, the F distribution
>> density.)
>>
>>
>> df1 <- read.table(text="
>> PGID PTID Year Visit  Count
>> 6755 53121 2009 1 0
>> 6755 53121 2009 2 0
>> 6755 53121 2009 3 0
>> 6755 53122 2008 1 0
>> 6755 53122 2008 2 0
>> 6755 53122 2008 3 1
>> 6755 53122 2009 1 0
>> 6755 53122 2009 2 1
>> 6755 53122 2009 3 2", header=TRUE)
>>
>>
>> df2 <- with(df1, sapply(split(df1, list(PTID, Year)),
>>       function(x) if (nrow(x)) x[which.max(x$Count), ]))
>> df2 <- do.call(rbind, df2)
>> rownames(df2) <- 1:nrow(df2)
>> df2
>>
>> which.max(9, not which().
>>
>> Hope this helps,
>>
>> Rui Barradas
>> Em 25-07-2012 18:10, kborgmann escreveu:
>>> Hi,
>>> I have a dataset in which I would like to select rows based on matching
>>> conditions and return the maximum value of a variable else return one row if
>>> duplicate counts exist.  My dataset looks like this:
>>> PGID PTID Year Visit  Count
>>> 6755 53121 2009 1 0
>>> 6755 53121 2009 2 0
>>> 6755 53121 2009 3 0
>>> 6755 53122 2008 1 0
>>> 6755 53122 2008 2 0
>>> 6755 53122 2008 3 1
>>> 6755 53122 2009 1 0
>>> 6755 53122 2009 2 1
>>> 6755 53122 2009 3 2
>>>
>>> I would like to select rows if PTID and Year match and return the maximum
>>> count else return one row if counts are the same, such that I get this
>>> output
>>> PGID PTID Year Visit  Count
>>> 6755 53121 2009 1 0
>>> 6755 53122 2008 3 1
>>> 6755 53122 2009 3 2
>>>
>>> I tried the following code and the output is almost correct but duplicate
>>> values were included
>>> df2<-with(df, sapply(split(df, list(PTID, Year)),
>>> function(x) if (nrow(x)) x[which(x$Count==max(x$Count)),]))
>>> df<-do.call(rbind,df)
>>> rownames(df)<-1:nrow(df)
>>>
>>> Any suggestions?
>>> Thanks much for your responses!
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context: http://r.789695.n4.nabble.com/Select-rows-based-
>> on-matching-conditions-and-logical-operators-tp4637809.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Select rows based on matching conditions and logical operators

Bert Gunter
Wouldn't

> interaction(..., drop=TRUE)

be the same, but terser in this situation?

Also I tend to use paste() for this, i.e. instead of

> interaction(v1,v2, drop=TRUE)

simply

> paste(v1,v2)

Again, this seems shorter and simpler -- but are there good reasons to
prefer the use of interaction()?

Cheers,
Bert

On Wed, Jul 25, 2012 at 2:51 PM, Rui Barradas <[hidden email]> wrote:

> Hello,
>
> You're right, thanks.
> In my solution, I had tried to keep to the op as much as possible. A glance
> at it made me realize that one change only would do the job, and that was
> it, no performance worries.
> I particularly liked the interaction/droplevels trick.
>
> Rui Barradas
>
> Em 25-07-2012 22:13, William Dunlap escreveu:
>>
>> Rui,
>>    Your solution works, but it can be faster for large data.frames if you
>> compute
>> the indices of the desired rows of the input data.frame and then using one
>> subscripting call to select the rows  instead of splitting the input
>> data.frame
>> into a list of data.frames, extracting the desired row from each
>> component,
>> and then calling rbind to put the rows together again.  E.g., compare your
>> approach, which I've put into the function f1
>>    f1 <- function (dataFrame)  {
>>        retval <- with(dataFrame, sapply(split(dataFrame, list(PTID,
>>            Year)), function(x) if (nrow(x))
>>            x[which.max(x$Count), ]))
>>        retval <- do.call(rbind, retval)
>>        rownames(retval) <- 1:nrow(retval)
>>        retval
>>    }
>> with one that computes a logical subscripting vector (by splitting just
>> the
>> Counts vector, not the whole data.frame)
>>    f2 <- function (dataFrame)  {
>>        keep <- as.logical(ave(dataFrame$Count,
>> droplevels(interaction(dataFrame$PTID,
>>            dataFrame$Year)), FUN = function(x) if (length(x)) seq_along(x)
>> ==
>>            which.max(x)))
>>        dataFrame[keep, ]
>>    }
>>
>> The both compute the same thing, aside from the fact that the rows
>> are in a different order (f2 keeps the order of the original data.frame)
>> and f2 leaves the original row label with the row.
>>>
>>> f1(df1)
>>
>>    PGID  PTID Year Visit Count
>> 1 6755 53122 2008     3     1
>> 2 6755 53121 2009     1     0
>> 3 6755 53122 2009     3     2
>>>
>>> f2(df1)
>>
>>    PGID  PTID Year Visit Count
>> 1 6755 53121 2009     1     0
>> 6 6755 53122 2008     3     1
>> 9 6755 53122 2009     3     2
>> When there are a lot of output rows the f2 can be quite a bit faster.
>>
>> (I put the call to droplevels(interaction(...)) into the call to ave
>> because ave
>> can waste a lot of time calling FUN for nonexistent interaction levels.)
>>
>> Bill Dunlap
>> Spotfire, TIBCO Software
>> wdunlap tibco.com
>>
>>
>>> -----Original Message-----
>>> From: [hidden email] [mailto:[hidden email]]
>>> On
>>> Behalf Of Rui Barradas
>>> Sent: Wednesday, July 25, 2012 10:24 AM
>>> To: kborgmann
>>> Cc: r-help
>>> Subject: Re: [R] Select rows based on matching conditions and logical
>>> operators
>>>
>>> Hello,
>>>
>>> Apart from the output order this does it.
>>> (I have changed 'df' to 'df1', 'df' is an R function, the F distribution
>>> density.)
>>>
>>>
>>> df1 <- read.table(text="
>>> PGID PTID Year Visit  Count
>>> 6755 53121 2009 1 0
>>> 6755 53121 2009 2 0
>>> 6755 53121 2009 3 0
>>> 6755 53122 2008 1 0
>>> 6755 53122 2008 2 0
>>> 6755 53122 2008 3 1
>>> 6755 53122 2009 1 0
>>> 6755 53122 2009 2 1
>>> 6755 53122 2009 3 2", header=TRUE)
>>>
>>>
>>> df2 <- with(df1, sapply(split(df1, list(PTID, Year)),
>>>       function(x) if (nrow(x)) x[which.max(x$Count), ]))
>>> df2 <- do.call(rbind, df2)
>>> rownames(df2) <- 1:nrow(df2)
>>> df2
>>>
>>> which.max(9, not which().
>>>
>>> Hope this helps,
>>>
>>> Rui Barradas
>>> Em 25-07-2012 18:10, kborgmann escreveu:
>>>>
>>>> Hi,
>>>> I have a dataset in which I would like to select rows based on matching
>>>> conditions and return the maximum value of a variable else return one
>>>> row if
>>>> duplicate counts exist.  My dataset looks like this:
>>>> PGID    PTID    Year     Visit  Count
>>>> 6755    53121   2009    1       0
>>>> 6755    53121   2009    2       0
>>>> 6755    53121   2009    3       0
>>>> 6755    53122   2008    1       0
>>>> 6755    53122   2008    2       0
>>>> 6755    53122   2008    3       1
>>>> 6755    53122   2009    1       0
>>>> 6755    53122   2009    2       1
>>>> 6755    53122   2009    3       2
>>>>
>>>> I would like to select rows if PTID and Year match and return the
>>>> maximum
>>>> count else return one row if counts are the same, such that I get this
>>>> output
>>>> PGID    PTID    Year     Visit  Count
>>>> 6755    53121   2009    1       0
>>>> 6755    53122   2008    3       1
>>>> 6755    53122   2009    3       2
>>>>
>>>> I tried the following code and the output is almost correct but
>>>> duplicate
>>>> values were included
>>>> df2<-with(df, sapply(split(df, list(PTID, Year)),
>>>> function(x) if (nrow(x)) x[which(x$Count==max(x$Count)),]))
>>>> df<-do.call(rbind,df)
>>>> rownames(df)<-1:nrow(df)
>>>>
>>>> Any suggestions?
>>>> Thanks much for your responses!
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://r.789695.n4.nabble.com/Select-rows-based-
>>>
>>> on-matching-conditions-and-logical-operators-tp4637809.html
>>>>
>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Select rows based on matching conditions and logical operators

William Dunlap
Any of those would work.  I wish ave() did that part of the job.
I don't think there is any reason it shouldn't.  The following only
needs to call FUN three times, not 9:
   > z <- ave(LETTERS[1:3], 1:3, 1:3, FUN=function(x)print(x))
   [1] "A"
   character(0)
   character(0)
   character(0)
   [1] "B"
   character(0)
   character(0)
   character(0)
   [1] "C"
   > z
   [1] "A" "B" "C"

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: Bert Gunter [mailto:[hidden email]]
> Sent: Wednesday, July 25, 2012 3:04 PM
> To: Rui Barradas
> Cc: William Dunlap; r-help
> Subject: Re: [R] Select rows based on matching conditions and logical operators
>
> Wouldn't
>
> > interaction(..., drop=TRUE)
>
> be the same, but terser in this situation?
>
> Also I tend to use paste() for this, i.e. instead of
>
> > interaction(v1,v2, drop=TRUE)
>
> simply
>
> > paste(v1,v2)
>
> Again, this seems shorter and simpler -- but are there good reasons to
> prefer the use of interaction()?
>
> Cheers,
> Bert
>
> On Wed, Jul 25, 2012 at 2:51 PM, Rui Barradas <[hidden email]> wrote:
> > Hello,
> >
> > You're right, thanks.
> > In my solution, I had tried to keep to the op as much as possible. A glance
> > at it made me realize that one change only would do the job, and that was
> > it, no performance worries.
> > I particularly liked the interaction/droplevels trick.
> >
> > Rui Barradas
> >
> > Em 25-07-2012 22:13, William Dunlap escreveu:
> >>
> >> Rui,
> >>    Your solution works, but it can be faster for large data.frames if you
> >> compute
> >> the indices of the desired rows of the input data.frame and then using one
> >> subscripting call to select the rows  instead of splitting the input
> >> data.frame
> >> into a list of data.frames, extracting the desired row from each
> >> component,
> >> and then calling rbind to put the rows together again.  E.g., compare your
> >> approach, which I've put into the function f1
> >>    f1 <- function (dataFrame)  {
> >>        retval <- with(dataFrame, sapply(split(dataFrame, list(PTID,
> >>            Year)), function(x) if (nrow(x))
> >>            x[which.max(x$Count), ]))
> >>        retval <- do.call(rbind, retval)
> >>        rownames(retval) <- 1:nrow(retval)
> >>        retval
> >>    }
> >> with one that computes a logical subscripting vector (by splitting just
> >> the
> >> Counts vector, not the whole data.frame)
> >>    f2 <- function (dataFrame)  {
> >>        keep <- as.logical(ave(dataFrame$Count,
> >> droplevels(interaction(dataFrame$PTID,
> >>            dataFrame$Year)), FUN = function(x) if (length(x)) seq_along(x)
> >> ==
> >>            which.max(x)))
> >>        dataFrame[keep, ]
> >>    }
> >>
> >> The both compute the same thing, aside from the fact that the rows
> >> are in a different order (f2 keeps the order of the original data.frame)
> >> and f2 leaves the original row label with the row.
> >>>
> >>> f1(df1)
> >>
> >>    PGID  PTID Year Visit Count
> >> 1 6755 53122 2008     3     1
> >> 2 6755 53121 2009     1     0
> >> 3 6755 53122 2009     3     2
> >>>
> >>> f2(df1)
> >>
> >>    PGID  PTID Year Visit Count
> >> 1 6755 53121 2009     1     0
> >> 6 6755 53122 2008     3     1
> >> 9 6755 53122 2009     3     2
> >> When there are a lot of output rows the f2 can be quite a bit faster.
> >>
> >> (I put the call to droplevels(interaction(...)) into the call to ave
> >> because ave
> >> can waste a lot of time calling FUN for nonexistent interaction levels.)
> >>
> >> Bill Dunlap
> >> Spotfire, TIBCO Software
> >> wdunlap tibco.com
> >>
> >>
> >>> -----Original Message-----
> >>> From: [hidden email] [mailto:[hidden email]]
> >>> On
> >>> Behalf Of Rui Barradas
> >>> Sent: Wednesday, July 25, 2012 10:24 AM
> >>> To: kborgmann
> >>> Cc: r-help
> >>> Subject: Re: [R] Select rows based on matching conditions and logical
> >>> operators
> >>>
> >>> Hello,
> >>>
> >>> Apart from the output order this does it.
> >>> (I have changed 'df' to 'df1', 'df' is an R function, the F distribution
> >>> density.)
> >>>
> >>>
> >>> df1 <- read.table(text="
> >>> PGID PTID Year Visit  Count
> >>> 6755 53121 2009 1 0
> >>> 6755 53121 2009 2 0
> >>> 6755 53121 2009 3 0
> >>> 6755 53122 2008 1 0
> >>> 6755 53122 2008 2 0
> >>> 6755 53122 2008 3 1
> >>> 6755 53122 2009 1 0
> >>> 6755 53122 2009 2 1
> >>> 6755 53122 2009 3 2", header=TRUE)
> >>>
> >>>
> >>> df2 <- with(df1, sapply(split(df1, list(PTID, Year)),
> >>>       function(x) if (nrow(x)) x[which.max(x$Count), ]))
> >>> df2 <- do.call(rbind, df2)
> >>> rownames(df2) <- 1:nrow(df2)
> >>> df2
> >>>
> >>> which.max(9, not which().
> >>>
> >>> Hope this helps,
> >>>
> >>> Rui Barradas
> >>> Em 25-07-2012 18:10, kborgmann escreveu:
> >>>>
> >>>> Hi,
> >>>> I have a dataset in which I would like to select rows based on matching
> >>>> conditions and return the maximum value of a variable else return one
> >>>> row if
> >>>> duplicate counts exist.  My dataset looks like this:
> >>>> PGID    PTID    Year     Visit  Count
> >>>> 6755    53121   2009    1       0
> >>>> 6755    53121   2009    2       0
> >>>> 6755    53121   2009    3       0
> >>>> 6755    53122   2008    1       0
> >>>> 6755    53122   2008    2       0
> >>>> 6755    53122   2008    3       1
> >>>> 6755    53122   2009    1       0
> >>>> 6755    53122   2009    2       1
> >>>> 6755    53122   2009    3       2
> >>>>
> >>>> I would like to select rows if PTID and Year match and return the
> >>>> maximum
> >>>> count else return one row if counts are the same, such that I get this
> >>>> output
> >>>> PGID    PTID    Year     Visit  Count
> >>>> 6755    53121   2009    1       0
> >>>> 6755    53122   2008    3       1
> >>>> 6755    53122   2009    3       2
> >>>>
> >>>> I tried the following code and the output is almost correct but
> >>>> duplicate
> >>>> values were included
> >>>> df2<-with(df, sapply(split(df, list(PTID, Year)),
> >>>> function(x) if (nrow(x)) x[which(x$Count==max(x$Count)),]))
> >>>> df<-do.call(rbind,df)
> >>>> rownames(df)<-1:nrow(df)
> >>>>
> >>>> Any suggestions?
> >>>> Thanks much for your responses!
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> View this message in context:
> >>>> http://r.789695.n4.nabble.com/Select-rows-based-
> >>>
> >>> on-matching-conditions-and-logical-operators-tp4637809.html
> >>>>
> >>>> Sent from the R help mailing list archive at Nabble.com.
> >>>>
> >>>> ______________________________________________
> >>>> [hidden email] mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> PLEASE do read the posting guide
> >>>> http://www.R-project.org/posting-guide.html
> >>>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>> ______________________________________________
> >>> [hidden email] mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-
> biostatistics/pdb-ncb-home.htm
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Select rows based on matching conditions and logical operators

William Dunlap
And another way to drop the unneed interaction levels is to supply
drop=TRUE to ave():
   > z <- ave(LETTERS[1:3], 1:3, 1:3, FUN=function(x)print(x), drop=TRUE)
   [1] "A"
   [1] "B"
   [1] "C"

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On
> Behalf Of William Dunlap
> Sent: Wednesday, July 25, 2012 3:37 PM
> To: Bert Gunter; Rui Barradas
> Cc: r-help
> Subject: Re: [R] Select rows based on matching conditions and logical operators
>
> Any of those would work.  I wish ave() did that part of the job.
> I don't think there is any reason it shouldn't.  The following only
> needs to call FUN three times, not 9:
>    > z <- ave(LETTERS[1:3], 1:3, 1:3, FUN=function(x)print(x))
>    [1] "A"
>    character(0)
>    character(0)
>    character(0)
>    [1] "B"
>    character(0)
>    character(0)
>    character(0)
>    [1] "C"
>    > z
>    [1] "A" "B" "C"
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
> > -----Original Message-----
> > From: Bert Gunter [mailto:[hidden email]]
> > Sent: Wednesday, July 25, 2012 3:04 PM
> > To: Rui Barradas
> > Cc: William Dunlap; r-help
> > Subject: Re: [R] Select rows based on matching conditions and logical operators
> >
> > Wouldn't
> >
> > > interaction(..., drop=TRUE)
> >
> > be the same, but terser in this situation?
> >
> > Also I tend to use paste() for this, i.e. instead of
> >
> > > interaction(v1,v2, drop=TRUE)
> >
> > simply
> >
> > > paste(v1,v2)
> >
> > Again, this seems shorter and simpler -- but are there good reasons to
> > prefer the use of interaction()?
> >
> > Cheers,
> > Bert
> >
> > On Wed, Jul 25, 2012 at 2:51 PM, Rui Barradas <[hidden email]> wrote:
> > > Hello,
> > >
> > > You're right, thanks.
> > > In my solution, I had tried to keep to the op as much as possible. A glance
> > > at it made me realize that one change only would do the job, and that was
> > > it, no performance worries.
> > > I particularly liked the interaction/droplevels trick.
> > >
> > > Rui Barradas
> > >
> > > Em 25-07-2012 22:13, William Dunlap escreveu:
> > >>
> > >> Rui,
> > >>    Your solution works, but it can be faster for large data.frames if you
> > >> compute
> > >> the indices of the desired rows of the input data.frame and then using one
> > >> subscripting call to select the rows  instead of splitting the input
> > >> data.frame
> > >> into a list of data.frames, extracting the desired row from each
> > >> component,
> > >> and then calling rbind to put the rows together again.  E.g., compare your
> > >> approach, which I've put into the function f1
> > >>    f1 <- function (dataFrame)  {
> > >>        retval <- with(dataFrame, sapply(split(dataFrame, list(PTID,
> > >>            Year)), function(x) if (nrow(x))
> > >>            x[which.max(x$Count), ]))
> > >>        retval <- do.call(rbind, retval)
> > >>        rownames(retval) <- 1:nrow(retval)
> > >>        retval
> > >>    }
> > >> with one that computes a logical subscripting vector (by splitting just
> > >> the
> > >> Counts vector, not the whole data.frame)
> > >>    f2 <- function (dataFrame)  {
> > >>        keep <- as.logical(ave(dataFrame$Count,
> > >> droplevels(interaction(dataFrame$PTID,
> > >>            dataFrame$Year)), FUN = function(x) if (length(x)) seq_along(x)
> > >> ==
> > >>            which.max(x)))
> > >>        dataFrame[keep, ]
> > >>    }
> > >>
> > >> The both compute the same thing, aside from the fact that the rows
> > >> are in a different order (f2 keeps the order of the original data.frame)
> > >> and f2 leaves the original row label with the row.
> > >>>
> > >>> f1(df1)
> > >>
> > >>    PGID  PTID Year Visit Count
> > >> 1 6755 53122 2008     3     1
> > >> 2 6755 53121 2009     1     0
> > >> 3 6755 53122 2009     3     2
> > >>>
> > >>> f2(df1)
> > >>
> > >>    PGID  PTID Year Visit Count
> > >> 1 6755 53121 2009     1     0
> > >> 6 6755 53122 2008     3     1
> > >> 9 6755 53122 2009     3     2
> > >> When there are a lot of output rows the f2 can be quite a bit faster.
> > >>
> > >> (I put the call to droplevels(interaction(...)) into the call to ave
> > >> because ave
> > >> can waste a lot of time calling FUN for nonexistent interaction levels.)
> > >>
> > >> Bill Dunlap
> > >> Spotfire, TIBCO Software
> > >> wdunlap tibco.com
> > >>
> > >>
> > >>> -----Original Message-----
> > >>> From: [hidden email] [mailto:[hidden email]]
> > >>> On
> > >>> Behalf Of Rui Barradas
> > >>> Sent: Wednesday, July 25, 2012 10:24 AM
> > >>> To: kborgmann
> > >>> Cc: r-help
> > >>> Subject: Re: [R] Select rows based on matching conditions and logical
> > >>> operators
> > >>>
> > >>> Hello,
> > >>>
> > >>> Apart from the output order this does it.
> > >>> (I have changed 'df' to 'df1', 'df' is an R function, the F distribution
> > >>> density.)
> > >>>
> > >>>
> > >>> df1 <- read.table(text="
> > >>> PGID PTID Year Visit  Count
> > >>> 6755 53121 2009 1 0
> > >>> 6755 53121 2009 2 0
> > >>> 6755 53121 2009 3 0
> > >>> 6755 53122 2008 1 0
> > >>> 6755 53122 2008 2 0
> > >>> 6755 53122 2008 3 1
> > >>> 6755 53122 2009 1 0
> > >>> 6755 53122 2009 2 1
> > >>> 6755 53122 2009 3 2", header=TRUE)
> > >>>
> > >>>
> > >>> df2 <- with(df1, sapply(split(df1, list(PTID, Year)),
> > >>>       function(x) if (nrow(x)) x[which.max(x$Count), ]))
> > >>> df2 <- do.call(rbind, df2)
> > >>> rownames(df2) <- 1:nrow(df2)
> > >>> df2
> > >>>
> > >>> which.max(9, not which().
> > >>>
> > >>> Hope this helps,
> > >>>
> > >>> Rui Barradas
> > >>> Em 25-07-2012 18:10, kborgmann escreveu:
> > >>>>
> > >>>> Hi,
> > >>>> I have a dataset in which I would like to select rows based on matching
> > >>>> conditions and return the maximum value of a variable else return one
> > >>>> row if
> > >>>> duplicate counts exist.  My dataset looks like this:
> > >>>> PGID    PTID    Year     Visit  Count
> > >>>> 6755    53121   2009    1       0
> > >>>> 6755    53121   2009    2       0
> > >>>> 6755    53121   2009    3       0
> > >>>> 6755    53122   2008    1       0
> > >>>> 6755    53122   2008    2       0
> > >>>> 6755    53122   2008    3       1
> > >>>> 6755    53122   2009    1       0
> > >>>> 6755    53122   2009    2       1
> > >>>> 6755    53122   2009    3       2
> > >>>>
> > >>>> I would like to select rows if PTID and Year match and return the
> > >>>> maximum
> > >>>> count else return one row if counts are the same, such that I get this
> > >>>> output
> > >>>> PGID    PTID    Year     Visit  Count
> > >>>> 6755    53121   2009    1       0
> > >>>> 6755    53122   2008    3       1
> > >>>> 6755    53122   2009    3       2
> > >>>>
> > >>>> I tried the following code and the output is almost correct but
> > >>>> duplicate
> > >>>> values were included
> > >>>> df2<-with(df, sapply(split(df, list(PTID, Year)),
> > >>>> function(x) if (nrow(x)) x[which(x$Count==max(x$Count)),]))
> > >>>> df<-do.call(rbind,df)
> > >>>> rownames(df)<-1:nrow(df)
> > >>>>
> > >>>> Any suggestions?
> > >>>> Thanks much for your responses!
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> View this message in context:
> > >>>> http://r.789695.n4.nabble.com/Select-rows-based-
> > >>>
> > >>> on-matching-conditions-and-logical-operators-tp4637809.html
> > >>>>
> > >>>> Sent from the R help mailing list archive at Nabble.com.
> > >>>>
> > >>>> ______________________________________________
> > >>>> [hidden email] mailing list
> > >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> > >>>> PLEASE do read the posting guide
> > >>>> http://www.R-project.org/posting-guide.html
> > >>>> and provide commented, minimal, self-contained, reproducible code.
> > >>>
> > >>> ______________________________________________
> > >>> [hidden email] mailing list
> > >>> https://stat.ethz.ch/mailman/listinfo/r-help
> > >>> PLEASE do read the posting guide
> > >>> http://www.R-project.org/posting-guide.html
> > >>> and provide commented, minimal, self-contained, reproducible code.
> > >
> > >
> > > ______________________________________________
> > > [hidden email] mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> > --
> >
> > Bert Gunter
> > Genentech Nonclinical Biostatistics
> >
> > Internal Contact Info:
> > Phone: 467-7374
> > Website:
> > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-
> > biostatistics/pdb-ncb-home.htm
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...