Grouping Question

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Grouping Question

R help mailing list-2
Colleagues,

Here is my dataset.

Serial Measurement Meas_test Serial_test
1 17 fail fail
1 16 pass fail
2 12 pass pass
2 8 pass pass
2 10 pass pass
3 19 fail fail
3 13 pass pass

If a measurement is less than or equal to 16, then Meas_test is pass. Else
Meas_test is fail
This is easy to code.

Serial_test is a pass, when all of the Meas_test are pass for a given
serial. Else Serial_test is a fail.
I'm at a loss to figure out how to do this in R.

Some guidance would be appreciated.

All the best,

Thomas Subia

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Grouping Question

Ivan Krylov
On Sat, 21 Mar 2020 20:01:30 -0700
Thomas Subia via R-help <[hidden email]> wrote:

> Serial_test is a pass, when all of the Meas_test are pass for a given
> serial. Else Serial_test is a fail.

Use by/tapply in base R or dplyr::group_by if you prefer tidyverse
packages.

--
Best regards,
Ivan

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Grouping Question

Chris Evans
Here's a very "step by step" example with dplyr as I'm trying to teach myself the Tidyverse way of being

library(dplyr)

# Serial        Measurement        Meas_test        Serial_test
# 1        17                fail                fail
# 1        16                pass                fail
# 2        12                pass                pass
# 2        8                pass                pass
# 2        10                pass                pass
# 3        19                fail                fail
# 3        13                pass                pass

dat <- as.data.frame(list(Serial = c(1,1,2,2,2,3,3),
                          Measurement = c(17, 16, 12, 8, 10, 19, 13),
                          Meas_test = c("fail", "pass", "pass", "pass", "pass", "fail", "pass")))

dat %>%
  group_by(Serial) %>%
  summarise(Serial_test = sum(Meas_test == "fail")) %>%
  mutate(Serial_test = if_else(Serial_test > 0, 1, 0),
         Serial_test = factor(Serial_test,
                              levels = 0:1,
                              labels = c("pass", "fail"))) -> groupedDat

dat %>%
  left_join(groupedDat) # add -> dat to the end to pip to dat

Gives:

  Serial Measurement Meas_test Serial_test
1      1          17      fail        fail
2      1          16      pass        fail
3      2          12      pass        pass
4      2           8      pass        pass
5      2          10      pass        pass
6      3          19      fail        fail
7      3          13      pass        fail
           
Would be easier for us if used dput() to share your data but thanks for the minimal example!

Chris

----- Original Message -----
> From: "Ivan Krylov" <[hidden email]>
> To: "Thomas Subia via R-help" <[hidden email]>
> Cc: "Thomas Subia" <[hidden email]>
> Sent: Sunday, 22 March, 2020 07:24:15
> Subject: Re: [R] Grouping Question

> On Sat, 21 Mar 2020 20:01:30 -0700
> Thomas Subia via R-help <[hidden email]> wrote:
>
>> Serial_test is a pass, when all of the Meas_test are pass for a given
>> serial. Else Serial_test is a fail.
>
> Use by/tapply in base R or dplyr::group_by if you prefer tidyverse
> packages.
>
> --
> Best regards,
> Ivan
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Chris Evans <[hidden email]> Visiting Professor, University of Sheffield <[hidden email]>
I do some consultation work for the University of Roehampton <[hidden email]> and other places
but <[hidden email]> remains my main Email address.  I have a work web site at:
   https://www.psyctc.org/psyctc/
and a site I manage for CORE and CORE system trust at:
   http://www.coresystemtrust.org.uk/
I have "semigrated" to France, see:
   https://www.psyctc.org/pelerinage2016/semigrating-to-france/ 
That page will also take you to my blog which started with earlier joys in France and Spain!

If you want to book to talk, I am trying to keep that to Thursdays and my diary is at:
   https://www.psyctc.org/pelerinage2016/ceworkdiary/
Beware: French time, generally an hour ahead of UK.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [FORGED] Grouping Question

Rolf Turner
In reply to this post by R help mailing list-2

On 22/03/20 4:01 pm, Thomas Subia via R-help wrote:

> Colleagues,
>
> Here is my dataset.
>
> Serial Measurement Meas_test Serial_test
> 1 17 fail fail
> 1 16 pass fail
> 2 12 pass pass
> 2 8 pass pass
> 2 10 pass pass
> 3 19 fail fail
> 3 13 pass pass
>
> If a measurement is less than or equal to 16, then Meas_test is pass. Else
> Meas_test is fail
> This is easy to code.
>
> Serial_test is a pass, when all of the Meas_test are pass for a given
> serial. Else Serial_test is a fail.
> I'm at a loss to figure out how to do this in R.
>
> Some guidance would be appreciated.

In future, please present your data using dput(); makes life much easier
for those trying to help you.  Your data are really the first two
columns of what you presented --- the last two columns are your desired
output.

Let "X" be these first two columns.  Define

foo <- function (X) {
a <- with(X,Measurement <= 16)
a <- ifelse(a,"pass","fail")
b <- with(X,tapply(Measurement,Serial,function(x){all(x<=16)}))
i <- match(X$Serial,names(b))
b <- ifelse(b[i],"pass","fail")
data.frame(Meas_test=a,Serial_test=b)
}

foo(X) gives:

>   Meas_test Serial_test
> 1      fail        fail
> 2      pass        fail
> 3      pass        pass
> 4      pass        pass
> 5      pass        pass
> 6      fail        fail
> 7      pass        fail

If you want input and output combined, as in the way that you presented
your data use cbind(X,foo(X)).

cheers,

Rolf Turner

--
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [FORGED] Grouping Question

Deepayan Sarkar
Another possible approach is to use split -> lapply -> rbind, which I
often find to be conceptually simpler:

d <- data.frame(Serial = c(1, 1, 2, 2, 2, 3, 3),
                Measurement = c(17, 16, 12, 8, 10, 19, 13))

dlist <- split(d, d$Serial)
dlist <- lapply(dlist, within,
{
    Serial_test <- if (all(Measurement <= 16)) "pass" else "fail"
    Meas_test <- ifelse(Measurement <= 16, "pass", "fail")
})
do.call(rbind, dlist)

-Deepayan

On Sun, Mar 22, 2020 at 12:29 PM Rolf Turner <[hidden email]> wrote:

>
>
> On 22/03/20 4:01 pm, Thomas Subia via R-help wrote:
>
> > Colleagues,
> >
> > Here is my dataset.
> >
> > Serial        Measurement     Meas_test       Serial_test
> > 1     17              fail            fail
> > 1     16              pass            fail
> > 2     12              pass            pass
> > 2     8               pass            pass
> > 2     10              pass            pass
> > 3     19              fail            fail
> > 3     13              pass            pass
> >
> > If a measurement is less than or equal to 16, then Meas_test is pass. Else
> > Meas_test is fail
> > This is easy to code.
> >
> > Serial_test is a pass, when all of the Meas_test are pass for a given
> > serial. Else Serial_test is a fail.
> > I'm at a loss to figure out how to do this in R.
> >
> > Some guidance would be appreciated.
>
> In future, please present your data using dput(); makes life much easier
> for those trying to help you.  Your data are really the first two
> columns of what you presented --- the last two columns are your desired
> output.
>
> Let "X" be these first two columns.  Define
>
> foo <- function (X) {
> a <- with(X,Measurement <= 16)
> a <- ifelse(a,"pass","fail")
> b <- with(X,tapply(Measurement,Serial,function(x){all(x<=16)}))
> i <- match(X$Serial,names(b))
> b <- ifelse(b[i],"pass","fail")
> data.frame(Meas_test=a,Serial_test=b)
> }
>
> foo(X) gives:
>
> >   Meas_test Serial_test
> > 1      fail        fail
> > 2      pass        fail
> > 3      pass        pass
> > 4      pass        pass
> > 5      pass        pass
> > 6      fail        fail
> > 7      pass        fail
>
> If you want input and output combined, as in the way that you presented
> your data use cbind(X,foo(X)).
>
> cheers,
>
> Rolf Turner
>
> --
> Honorary Research Fellow
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [FORGED] Grouping Question

Rolf Turner

On 22/03/20 8:44 pm, Deepayan Sarkar wrote:

> Another possible approach is to use split -> lapply -> rbind, which I
> often find to be conceptually simpler:
>
> d <- data.frame(Serial = c(1, 1, 2, 2, 2, 3, 3),
>                  Measurement = c(17, 16, 12, 8, 10, 19, 13))
>
> dlist <- split(d, d$Serial)
> dlist <- lapply(dlist, within,
> {
>      Serial_test <- if (all(Measurement <= 16)) "pass" else "fail"
>      Meas_test <- ifelse(Measurement <= 16, "pass", "fail")
> })
> do.call(rbind, dlist)

Yes!!! Much sexier than my clumsy hack-and-grind approach!

cheers,

Rolf

--
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [FORGED] Grouping Question

Peter Dalgaard-2
In reply to this post by Deepayan Sarkar
Or even split -> lapply -> unsplit, in cases where you want the results put back in the original order.

(Doesn't matter here, but it would, had it been, say, ordered 1,2,3,1,2,2,3).

-pd

> On 22 Mar 2020, at 08:44 , Deepayan Sarkar <[hidden email]> wrote:
>
> Another possible approach is to use split -> lapply -> rbind, which I
> often find to be conceptually simpler:
>
> d <- data.frame(Serial = c(1, 1, 2, 2, 2, 3, 3),
>                Measurement = c(17, 16, 12, 8, 10, 19, 13))
>
> dlist <- split(d, d$Serial)
> dlist <- lapply(dlist, within,
> {
>    Serial_test <- if (all(Measurement <= 16)) "pass" else "fail"
>    Meas_test <- ifelse(Measurement <= 16, "pass", "fail")
> })
> do.call(rbind, dlist)
>
> -Deepayan
>
> On Sun, Mar 22, 2020 at 12:29 PM Rolf Turner <[hidden email]> wrote:
>>
>>
>> On 22/03/20 4:01 pm, Thomas Subia via R-help wrote:
>>
>>> Colleagues,
>>>
>>> Here is my dataset.
>>>
>>> Serial        Measurement     Meas_test       Serial_test
>>> 1     17              fail            fail
>>> 1     16              pass            fail
>>> 2     12              pass            pass
>>> 2     8               pass            pass
>>> 2     10              pass            pass
>>> 3     19              fail            fail
>>> 3     13              pass            pass
>>>
>>> If a measurement is less than or equal to 16, then Meas_test is pass. Else
>>> Meas_test is fail
>>> This is easy to code.
>>>
>>> Serial_test is a pass, when all of the Meas_test are pass for a given
>>> serial. Else Serial_test is a fail.
>>> I'm at a loss to figure out how to do this in R.
>>>
>>> Some guidance would be appreciated.
>>
>> In future, please present your data using dput(); makes life much easier
>> for those trying to help you.  Your data are really the first two
>> columns of what you presented --- the last two columns are your desired
>> output.
>>
>> Let "X" be these first two columns.  Define
>>
>> foo <- function (X) {
>> a <- with(X,Measurement <= 16)
>> a <- ifelse(a,"pass","fail")
>> b <- with(X,tapply(Measurement,Serial,function(x){all(x<=16)}))
>> i <- match(X$Serial,names(b))
>> b <- ifelse(b[i],"pass","fail")
>> data.frame(Meas_test=a,Serial_test=b)
>> }
>>
>> foo(X) gives:
>>
>>>  Meas_test Serial_test
>>> 1      fail        fail
>>> 2      pass        fail
>>> 3      pass        pass
>>> 4      pass        pass
>>> 5      pass        pass
>>> 6      fail        fail
>>> 7      pass        fail
>>
>> If you want input and output combined, as in the way that you presented
>> your data use cbind(X,foo(X)).
>>
>> cheers,
>>
>> Rolf Turner
>>
>> --
>> Honorary Research Fellow
>> Department of Statistics
>> University of Auckland
>> Phone: +64-9-373-7599 ext. 88276
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.