Help with sub-setting

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Help with sub-setting

Burgess, Jamie
Dear all,

I hope this message finds you well. I am currently trying to subset my data by two variables, so far, I have tried two different ways to stratify participants into groups. I would like to use the �summary� and �table� arguments to characterise the data of participants based on the presence of two variables and summarise this sub-set against a third variable.
I have used this method:

dgb001<-subset(data,data$variable==1 & data,data$variable)


However, I get the following error: �Error: cannot allocate vector of size 16.0 Gb�. Is there another method I can try?


Kind regards,


Jamie Burgess

PhD Student Endocrinology and Diabetes

University of Liverpool

Aintree University Hospital &

The Walton Centre

Institute of Ageing & Chronic Disease

0151 529 5936


        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [External] Help with sub-setting

Richard M. Heiberger
I think the syntax you are looking for is

datasubset <- data[ data$A ==1 & data$B ==  1 , ] )

This gives the subset of your original data for variable A with value
1 and variable B with value 1.


On Mon, May 25, 2020 at 12:57 PM Burgess, Jamie
<[hidden email]> wrote:

>
> Dear all,
>
> I hope this message finds you well. I am currently trying to subset my data by two variables, so far, I have tried two different ways to stratify participants into groups. I would like to use the ‘summary’ and ‘table’ arguments to characterise the data of participants based on the presence of two variables and summarise this sub-set against a third variable.
> I have used this method:
>
> dgb001<-subset(data,data$variable==1 & data,data$variable)
>
>
> However, I get the following error: “Error: cannot allocate vector of size 16.0 Gb”. Is there another method I can try?
>
>
> Kind regards,
>
>
> Jamie Burgess
>
> PhD Student Endocrinology and Diabetes
>
> University of Liverpool
>
> Aintree University Hospital &
>
> The Walton Centre
>
> Institute of Ageing & Chronic Disease
>
> 0151 529 5936
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [External] Help with sub-setting

Bert Gunter-2
Yes. In particular:

data$variable==1 & data

makes no sense (data is a data frame). A typo perhaps? Or as Richard
indicated, consult references/tutorials to learn proper syntax for
(vectorized) predicates.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, May 25, 2020 at 10:20 AM Richard M. Heiberger <[hidden email]>
wrote:

> I think the syntax you are looking for is
>
> datasubset <- data[ data$A ==1 & data$B ==  1 , ] )
>
> This gives the subset of your original data for variable A with value
> 1 and variable B with value 1.
>
>
> On Mon, May 25, 2020 at 12:57 PM Burgess, Jamie
> <[hidden email]> wrote:
> >
> > Dear all,
> >
> > I hope this message finds you well. I am currently trying to subset my
> data by two variables, so far, I have tried two different ways to stratify
> participants into groups. I would like to use the ‘summary’ and ‘table’
> arguments to characterise the data of participants based on the presence of
> two variables and summarise this sub-set against a third variable.
> > I have used this method:
> >
> > dgb001<-subset(data,data$variable==1 & data,data$variable)
> >
> >
> > However, I get the following error: “Error: cannot allocate vector of
> size 16.0 Gb”. Is there another method I can try?
> >
> >
> > Kind regards,
> >
> >
> > Jamie Burgess
> >
> > PhD Student Endocrinology and Diabetes
> >
> > University of Liverpool
> >
> > Aintree University Hospital &
> >
> > The Walton Centre
> >
> > Institute of Ageing & Chronic Disease
> >
> > 0151 529 5936
> >
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with sub-setting

Rui Barradas
In reply to this post by Burgess, Jamie
Hello,

Inline.

Às 13:26 de 25/05/20, Burgess, Jamie escreveu:
> Dear all,
>
> I hope this message finds you well. I am currently trying to subset my data by two variables, so far, I have tried two different ways to stratify participants into groups.

I don't understand what you mean by this, do you want to split the data
set into sub-dataframes by 2 variables? If so try

df_groups <- split(data, list(data$Var1, data$Var2), drop = TRUE)


This produces a list of sub-df's.
To get the group with Var1 == 1 and Var2 == 1

grp_name <- paste(1, 1, sep = '.')
df_groups[[grp_name]]


But if you only want the sub-df with Var1 == 1 and Var2 == 1, any of the
following will do it.

data[data$Var1 == 1 & data$Var2 == 1, ]

subset(data, Var1 == 1 & Var2 == 1)


Hope this helps,

Rui Barradas


I would like to use the �summary� and �table� arguments to characterise
the data of participants based on the presence of two variables and
summarise this sub-set against a third variable.

> I have used this method:
>
> dgb001<-subset(data,data$variable==1 & data,data$variable)
>
>
> However, I get the following error: �Error: cannot allocate vector of size 16.0 Gb�. Is there another method I can try?
>
>
> Kind regards,
>
>
> Jamie Burgess
>
> PhD Student Endocrinology and Diabetes
>
> University of Liverpool
>
> Aintree University Hospital &
>
> The Walton Centre
>
> Institute of Ageing & Chronic Disease
>
> 0151 529 5936
>
>
> [[alternative HTML version deleted]]
>
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with sub-setting

Jim Lemon-4
In reply to this post by Burgess, Jamie
Hi Jamie,
Your seem to want some descriptive statistic applied to subsets of
your data frame "data" (maybe a more imaginative name would help).
I'll guess that your data frame contains variables X, Y and Z among
others. Further, I'll guess that you want the summaries of variable Z
subset by Y and X.

data<-data.frame(X=sample(1:2,100,TRUE),Y=sample(1:2,100,TRUE),
 Z=rnorm(100))
by(data,data[,c("X","Y")],summary)

Jim

On Tue, May 26, 2020 at 2:57 AM Burgess, Jamie
<[hidden email]> wrote:

>
> Dear all,
>
> I hope this message finds you well. I am currently trying to subset my data by two variables, so far, I have tried two different ways to stratify participants into groups. I would like to use the ‘summary’ and ‘table’ arguments to characterise the data of participants based on the presence of two variables and summarise this sub-set against a third variable.
> I have used this method:
>
> dgb001<-subset(data,data$variable==1 & data,data$variable)
>
>
> However, I get the following error: “Error: cannot allocate vector of size 16.0 Gb”. Is there another method I can try?
>
>
> Kind regards,
>
>
> Jamie Burgess
>
> PhD Student Endocrinology and Diabetes
>
> University of Liverpool
>
> Aintree University Hospital &
>
> The Walton Centre
>
> Institute of Ageing & Chronic Disease
>
> 0151 529 5936
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with sub-setting

Jim Lemon-4
oops, that should have been:

by(data$Z,data[,c("X","Y")],summary)

Jim

On Tue, May 26, 2020 at 9:00 AM Jim Lemon <[hidden email]> wrote:

>
> Hi Jamie,
> Your seem to want some descriptive statistic applied to subsets of
> your data frame "data" (maybe a more imaginative name would help).
> I'll guess that your data frame contains variables X, Y and Z among
> others. Further, I'll guess that you want the summaries of variable Z
> subset by Y and X.
>
> data<-data.frame(X=sample(1:2,100,TRUE),Y=sample(1:2,100,TRUE),
>  Z=rnorm(100))
> by(data,data[,c("X","Y")],summary)
>
> Jim
>
> On Tue, May 26, 2020 at 2:57 AM Burgess, Jamie
> <[hidden email]> wrote:
> >
> > Dear all,
> >
> > I hope this message finds you well. I am currently trying to subset my data by two variables, so far, I have tried two different ways to stratify participants into groups. I would like to use the ‘summary’ and ‘table’ arguments to characterise the data of participants based on the presence of two variables and summarise this sub-set against a third variable.
> > I have used this method:
> >
> > dgb001<-subset(data,data$variable==1 & data,data$variable)
> >
> >
> > However, I get the following error: “Error: cannot allocate vector of size 16.0 Gb”. Is there another method I can try?
> >
> >
> > Kind regards,
> >
> >
> > Jamie Burgess
> >
> > PhD Student Endocrinology and Diabetes
> >
> > University of Liverpool
> >
> > Aintree University Hospital &
> >
> > The Walton Centre
> >
> > Institute of Ageing & Chronic Disease
> >
> > 0151 529 5936
> >
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [External] Help with sub-setting

Burgess, Jamie
In reply to this post by Bert Gunter-2
Dear all,


Apologies for the late reply - I have just got back from my shift. I am unfortunately a little sleep deprived hehe


Hi Bert,

Thank-you for your reply


Yes, apologies - the syntax was lost in translation whilst changing the names of the groups, imported data-set file name and variables.


data<-data.frame(X=sample(1:2,100,TRUE),Y=sample(1:2,100,TRUE),
>  Z=rnorm(100))

by(data$Z,data[,c("X","Y")],summary)


In your example, if one of my variables recorded integer data and the other continuous data, does "1:2" specify columns, and "100" the number of entries I would like to select?


Dear Richard,


Thank-you for your reply


I have had previous success sub-grouping by one variable using the following:


group1<-subset(dataset1,dataset1$A==1)


I have subsequently been summarising the data using:


table(group1$variable) or summarise(group1$variable)


Using your suggestion I have managed to sub-group using the following:


GroupAB<-subset(data,data$A==1 & is.na (data$B)==FALSE)




I will also try your suggestion datasubset <- data[data$A ==1 & data$B ==  1 ,]) it is much appreciated. Does my entry do the same thing as yours?


I thought the problem was to do with the size of my data-set (4.9GB) and the presence of ~500,000 entries. However, as another command worked I am unsure what the problems was

I am only actually interested in around one third of these which is the reason I wish to sub-group by the two variables I have selected.

I was wondering why this new script worked.



Kind regards,


Jamie



________________________________
From: Bert Gunter <[hidden email]>
Sent: 25 May 2020 18:36:18
To: Richard M. Heiberger
Cc: Burgess, Jamie; [hidden email]
Subject: Re: [R] [External] Help with sub-setting

Yes. In particular:

data$variable==1 & data

makes no sense (data is a data frame). A typo perhaps? Or as Richard indicated, consult references/tutorials to learn proper syntax for (vectorized) predicates.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, May 25, 2020 at 10:20 AM Richard M. Heiberger <[hidden email]<mailto:[hidden email]>> wrote:
I think the syntax you are looking for is

datasubset <- data[ data$A ==1 & data$B ==  1 , ] )

This gives the subset of your original data for variable A with value
1 and variable B with value 1.


On Mon, May 25, 2020 at 12:57 PM Burgess, Jamie
<[hidden email]<mailto:[hidden email]>> wrote:

>
> Dear all,
>
> I hope this message finds you well. I am currently trying to subset my data by two variables, so far, I have tried two different ways to stratify participants into groups. I would like to use the �summary� and �table� arguments to characterise the data of participants based on the presence of two variables and summarise this sub-set against a third variable.
> I have used this method:
>
> dgb001<-subset(data,data$variable==1 & data,data$variable)
>
>
> However, I get the following error: �Error: cannot allocate vector of size 16.0 Gb�. Is there another method I can try?
>
>
> Kind regards,
>
>
> Jamie Burgess
>
> PhD Student Endocrinology and Diabetes
>
> University of Liverpool
>
> Aintree University Hospital &
>
> The Walton Centre
>
> Institute of Ageing & Chronic Disease
>
> 0151 529 5936
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email]<mailto:[hidden email]> mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
______________________________________________
[hidden email]<mailto:[hidden email]> mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.