subsamples and regressions for 100 times

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

subsamples and regressions for 100 times

Angela Smith


Hi R user,
I'm new to R so
my problem is probably pretty simple but I'm stuck:



my data is consist of 2 variables: co2, temp and one
treatment (l_group). The sample size is different among the treatments. so
that, I wanted to make equal sample size among three groups (A,B and C) of the
treatment.

For this one, I used subsamples technique. Using
subsample, each time the data are different among the three groups of the
treatment.

so that I want to run regression (co2~temp) for a 100
subsamples for each group of treatment (100 times subsample).

it means that I will have 100 regression equations.  Later, I want to compare the slope of the
regression among the three groups. is there simple way to make a loop so that I
can compare it?

Thanks in advance!



Angela

================
Here is the example:

dat<-structure(list(co2 = c(0.15, 0.148, 0.125, 0.145, 0.138, 0.23,
0.26, 0.35, 0.41, 0.45, 0.39, 0.42, 0.4, 0.43, 0.26, 0.3, 0.34,
0.141, 0.145, 0.153, 0.151, 0.128, 0.23, 0.26), temp = c(0.0119,
0.0122, 0.0089, 0.0115, 0.0101, 0.055, 0.097, 0.22, 0.339, 0.397,
0.257, 0.434, 0.318, 0.395, 0.087, 0.13, 0.154, 0.0107, 0.0112,
0.0119, 0.012, 0.0092, 0.055, 0.089), L_group = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor")), .Names = c("co2",
"temp", "L_group"), class = "data.frame", row.names = c(NA, -24L
))

head(dat)
library(sampling)

# strata.sampling -----
strata.sampling <- function(data, group,size, method = NULL) {
 require(sampling)
  if (is.null(method)) method <- "srswor"
  temp <- data[order(data[[group]]), ]
  ifelse(length(size)> 1,
         size <- size,
         ifelse(size < 1,
                size <- round(table(temp[group]) * size),
                size <- rep(size, times=length(table(temp[group])))))
  strat = strata(temp, stratanames = names(temp[group]),
                 size = size, method = method)
  getdata(temp, strat)
}

#--------------------------------------------------
sub_dat <- strata.sampling(dat, 'L_group', 4)#
Lmodel_subdata1<-lm(co2~temp, data=subdat)
Lmodel_subdata1#coef

sub_dat2 <- strata.sampling(dat, 'L_group', 4)#
Lmodel_subdata2<-lm(co2~temp, data=subdat2)
Lmodel_subdata2#coef

and so on.....[for 100 times)

Table<-rbind(Lmodel_subdata1$coef, Lmodel_subdata1$coef, ....)


     
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: subsamples and regressions for 100 times

Michael Dewey
Comment inline

On 17/02/2015 12:40, Angela Smith wrote:

>
>
> Hi R user,
> I'm new to R so
> my problem is probably pretty simple but I'm stuck:
>
>
>
> my data is consist of 2 variables: co2, temp and one
> treatment (l_group). The sample size is different among the treatments. so
>
> that, I wanted to make equal sample size among three groups (A,B and C) of the
> treatment.
>

Not sure whether that is necessary for regression but you did not tell
us why you want to do that.

> For this one, I used subsamples technique. Using
> subsample, each time the data are different among the three groups of the
> treatment.
>
> so that I want to run regression (co2~temp) for a 100
> subsamples for each group of treatment (100 times subsample).
>

The usual way to do this is to store the subsamples in a list and then
write a function and use lapply, say to store your models. You then have
another list to which you can then apply the extractor function of your
choice.


> it means that I will have 100 regression equations.  Later, I want to compare the slope of the
> regression among the three groups. is there simple way to make a loop so that I
> can compare it?
>
> Thanks in advance!
>
>
>
> Angela
>
> ================
> Here is the example:
>
> dat<-structure(list(co2 = c(0.15, 0.148, 0.125, 0.145, 0.138, 0.23,
> 0.26, 0.35, 0.41, 0.45, 0.39, 0.42, 0.4, 0.43, 0.26, 0.3, 0.34,
> 0.141, 0.145, 0.153, 0.151, 0.128, 0.23, 0.26), temp = c(0.0119,
> 0.0122, 0.0089, 0.0115, 0.0101, 0.055, 0.097, 0.22, 0.339, 0.397,
> 0.257, 0.434, 0.318, 0.395, 0.087, 0.13, 0.154, 0.0107, 0.0112,
> 0.0119, 0.012, 0.0092, 0.055, 0.089), L_group = structure(c(1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
> 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor")), .Names = c("co2",
> "temp", "L_group"), class = "data.frame", row.names = c(NA, -24L
> ))
>
> head(dat)
> library(sampling)
>
> # strata.sampling -----
> strata.sampling <- function(data, group,size, method = NULL) {
>   require(sampling)
>    if (is.null(method)) method <- "srswor"
>    temp <- data[order(data[[group]]), ]
>    ifelse(length(size)> 1,
>           size <- size,
>           ifelse(size < 1,
>                  size <- round(table(temp[group]) * size),
>                  size <- rep(size, times=length(table(temp[group])))))
>    strat = strata(temp, stratanames = names(temp[group]),
>                   size = size, method = method)
>    getdata(temp, strat)
> }
>
> #--------------------------------------------------
> sub_dat <- strata.sampling(dat, 'L_group', 4)#
> Lmodel_subdata1<-lm(co2~temp, data=subdat)
> Lmodel_subdata1#coef
>
> sub_dat2 <- strata.sampling(dat, 'L_group', 4)#
> Lmodel_subdata2<-lm(co2~temp, data=subdat2)
> Lmodel_subdata2#coef
>
> and so on.....[for 100 times)
>
> Table<-rbind(Lmodel_subdata1$coef, Lmodel_subdata1$coef, ....)
>
>
>    
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 2015.0.5645 / Virus Database: 4284/9131 - Release Date: 02/17/15
>
>
>

--
Michael
http://www.dewey.myzen.co.uk

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: subsamples and regressions for 100 times

David Carlson
Expanding a bit on Michael's answer, you don't need the sampling package for this, just the sample.int() function to draw a random set of integers that you will use to extract rows from each of your groups. The write a function that returns what you want, the regression slopes from each group and use that function with the replicate() function. Your problem is a good way to illustrate the lapply(), sapply(), replicate() family of functions in R:

# Split the data into a list of data frames
datlist <- split(dat, dat$L_group)
# Write a function to draw the sample and perform the regression on each group
slopes <- function(lst) {
        # Get the minimum sample size
        minsize <- min(sapply(lst, nrow))
        # Draw sample (row numbers) of size minsize from each group
        samlist <- lapply(sapply(lst, nrow), sample.int, size=minsize)
        # Extract sample from each group
        samples <- lapply(names(lst), function(x) lst[[x]][samlist[[x]],])
        # Run the regressions for each group and extract the slopes
        results <- sapply(samples, function(x) coef(lm(co2~temp, x))[2])
        # Use the group names to label the slopes
        names(results) <- names(datlist)
        return(results)
}
# You can get a single set of results with
(results <- slopes(datlist))
#         A         B         C
# 1.0128392 0.2658041 1.3423786

# To get 100 runs
many <- t(replicate(100, slopes(datlist)))
head(many)
#              A         B        C
# [1,] 1.4326103 0.2658041 1.357475
# [2,] 1.4754324 0.2658041 1.309208
# [3,] 0.9838589 0.2658041 1.408987
# [4,] 0.9993144 0.2658041 1.354297
# [5,] 1.0134187 0.2658041 1.397112
# [6,] 1.4922856 0.2658041 1.312531
>

-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: R-help [mailto:[hidden email]] On Behalf Of Michael Dewey
Sent: Tuesday, February 17, 2015 9:52 AM
To: Angela Smith; [hidden email]
Subject: Re: [R] subsamples and regressions for 100 times

Comment inline

On 17/02/2015 12:40, Angela Smith wrote:

>
>
> Hi R user,
> I'm new to R so
> my problem is probably pretty simple but I'm stuck:
>
>
>
> my data is consist of 2 variables: co2, temp and one
> treatment (l_group). The sample size is different among the treatments. so
>
> that, I wanted to make equal sample size among three groups (A,B and C) of the
> treatment.
>

Not sure whether that is necessary for regression but you did not tell
us why you want to do that.

> For this one, I used subsamples technique. Using
> subsample, each time the data are different among the three groups of the
> treatment.
>
> so that I want to run regression (co2~temp) for a 100
> subsamples for each group of treatment (100 times subsample).
>

The usual way to do this is to store the subsamples in a list and then
write a function and use lapply, say to store your models. You then have
another list to which you can then apply the extractor function of your
choice.


> it means that I will have 100 regression equations.  Later, I want to compare the slope of the
> regression among the three groups. is there simple way to make a loop so that I
> can compare it?
>
> Thanks in advance!
>
>
>
> Angela
>
> ================
> Here is the example:
>
> dat<-structure(list(co2 = c(0.15, 0.148, 0.125, 0.145, 0.138, 0.23,
> 0.26, 0.35, 0.41, 0.45, 0.39, 0.42, 0.4, 0.43, 0.26, 0.3, 0.34,
> 0.141, 0.145, 0.153, 0.151, 0.128, 0.23, 0.26), temp = c(0.0119,
> 0.0122, 0.0089, 0.0115, 0.0101, 0.055, 0.097, 0.22, 0.339, 0.397,
> 0.257, 0.434, 0.318, 0.395, 0.087, 0.13, 0.154, 0.0107, 0.0112,
> 0.0119, 0.012, 0.0092, 0.055, 0.089), L_group = structure(c(1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
> 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor")), .Names = c("co2",
> "temp", "L_group"), class = "data.frame", row.names = c(NA, -24L
> ))
>
> head(dat)
> library(sampling)
>
> # strata.sampling -----
> strata.sampling <- function(data, group,size, method = NULL) {
>   require(sampling)
>    if (is.null(method)) method <- "srswor"
>    temp <- data[order(data[[group]]), ]
>    ifelse(length(size)> 1,
>           size <- size,
>           ifelse(size < 1,
>                  size <- round(table(temp[group]) * size),
>                  size <- rep(size, times=length(table(temp[group])))))
>    strat = strata(temp, stratanames = names(temp[group]),
>                   size = size, method = method)
>    getdata(temp, strat)
> }
>
> #--------------------------------------------------
> sub_dat <- strata.sampling(dat, 'L_group', 4)#
> Lmodel_subdata1<-lm(co2~temp, data=subdat)
> Lmodel_subdata1#coef
>
> sub_dat2 <- strata.sampling(dat, 'L_group', 4)#
> Lmodel_subdata2<-lm(co2~temp, data=subdat2)
> Lmodel_subdata2#coef
>
> and so on.....[for 100 times)
>
> Table<-rbind(Lmodel_subdata1$coef, Lmodel_subdata1$coef, ....)
>
>
>    
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 2015.0.5645 / Virus Database: 4284/9131 - Release Date: 02/17/15
>
>
>

--
Michael
http://www.dewey.myzen.co.uk

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: subsamples and regressions for 100 times

Angela Smith
Dear David and Michael,
Thank you so much for the code. It helped me to understand in making a loop and perform the analysis. I am really obliged with your help.
cheers,
AS
=====




> From: [hidden email]
> To: [hidden email]; [hidden email]; [hidden email]
> Subject: RE: [R] subsamples and regressions for 100 times
> Date: Tue, 17 Feb 2015 17:51:30 +0000
>
> Expanding a bit on Michael's answer, you don't need the sampling package for this, just the sample.int() function to draw a random set of integers that you will use to extract rows from each of your groups. The write a function that returns what you want, the regression slopes from each group and use that function with the replicate() function. Your problem is a good way to illustrate the lapply(), sapply(), replicate() family of functions in R:
>
> # Split the data into a list of data frames
> datlist <- split(dat, dat$L_group)
> # Write a function to draw the sample and perform the regression on each group
> slopes <- function(lst) {
> # Get the minimum sample size
> minsize <- min(sapply(lst, nrow))
> # Draw sample (row numbers) of size minsize from each group
> samlist <- lapply(sapply(lst, nrow), sample.int, size=minsize)
> # Extract sample from each group
> samples <- lapply(names(lst), function(x) lst[[x]][samlist[[x]],])
> # Run the regressions for each group and extract the slopes
> results <- sapply(samples, function(x) coef(lm(co2~temp, x))[2])
> # Use the group names to label the slopes
> names(results) <- names(datlist)
> return(results)
> }
> # You can get a single set of results with
> (results <- slopes(datlist))
> #         A         B         C
> # 1.0128392 0.2658041 1.3423786
>
> # To get 100 runs
> many <- t(replicate(100, slopes(datlist)))
> head(many)
> #              A         B        C
> # [1,] 1.4326103 0.2658041 1.357475
> # [2,] 1.4754324 0.2658041 1.309208
> # [3,] 0.9838589 0.2658041 1.408987
> # [4,] 0.9993144 0.2658041 1.354297
> # [5,] 1.0134187 0.2658041 1.397112
> # [6,] 1.4922856 0.2658041 1.312531
> >
>
> -------------------------------------
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77840-4352
>
> -----Original Message-----
> From: R-help [mailto:[hidden email]] On Behalf Of Michael Dewey
> Sent: Tuesday, February 17, 2015 9:52 AM
> To: Angela Smith; [hidden email]
> Subject: Re: [R] subsamples and regressions for 100 times
>
> Comment inline
>
> On 17/02/2015 12:40, Angela Smith wrote:
> >
> >
> > Hi R user,
> > I'm new to R so
> > my problem is probably pretty simple but I'm stuck:
> >
> >
> >
> > my data is consist of 2 variables: co2, temp and one
> > treatment (l_group). The sample size is different among the treatments. so
> >
> > that, I wanted to make equal sample size among three groups (A,B and C) of the
> > treatment.
> >
>
> Not sure whether that is necessary for regression but you did not tell
> us why you want to do that.
>
> > For this one, I used subsamples technique. Using
> > subsample, each time the data are different among the three groups of the
> > treatment.
> >
> > so that I want to run regression (co2~temp) for a 100
> > subsamples for each group of treatment (100 times subsample).
> >
>
> The usual way to do this is to store the subsamples in a list and then
> write a function and use lapply, say to store your models. You then have
> another list to which you can then apply the extractor function of your
> choice.
>
>
> > it means that I will have 100 regression equations.  Later, I want to compare the slope of the
> > regression among the three groups. is there simple way to make a loop so that I
> > can compare it?
> >
> > Thanks in advance!
> >
> >
> >
> > Angela
> >
> > ================
> > Here is the example:
> >
> > dat<-structure(list(co2 = c(0.15, 0.148, 0.125, 0.145, 0.138, 0.23,
> > 0.26, 0.35, 0.41, 0.45, 0.39, 0.42, 0.4, 0.43, 0.26, 0.3, 0.34,
> > 0.141, 0.145, 0.153, 0.151, 0.128, 0.23, 0.26), temp = c(0.0119,
> > 0.0122, 0.0089, 0.0115, 0.0101, 0.055, 0.097, 0.22, 0.339, 0.397,
> > 0.257, 0.434, 0.318, 0.395, 0.087, 0.13, 0.154, 0.0107, 0.0112,
> > 0.0119, 0.012, 0.0092, 0.055, 0.089), L_group = structure(c(1L,
> > 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
> > 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor")), .Names = c("co2",
> > "temp", "L_group"), class = "data.frame", row.names = c(NA, -24L
> > ))
> >
> > head(dat)
> > library(sampling)
> >
> > # strata.sampling -----
> > strata.sampling <- function(data, group,size, method = NULL) {
> >   require(sampling)
> >    if (is.null(method)) method <- "srswor"
> >    temp <- data[order(data[[group]]), ]
> >    ifelse(length(size)> 1,
> >           size <- size,
> >           ifelse(size < 1,
> >                  size <- round(table(temp[group]) * size),
> >                  size <- rep(size, times=length(table(temp[group])))))
> >    strat = strata(temp, stratanames = names(temp[group]),
> >                   size = size, method = method)
> >    getdata(temp, strat)
> > }
> >
> > #--------------------------------------------------
> > sub_dat <- strata.sampling(dat, 'L_group', 4)#
> > Lmodel_subdata1<-lm(co2~temp, data=subdat)
> > Lmodel_subdata1#coef
> >
> > sub_dat2 <- strata.sampling(dat, 'L_group', 4)#
> > Lmodel_subdata2<-lm(co2~temp, data=subdat2)
> > Lmodel_subdata2#coef
> >
> > and so on.....[for 100 times)
> >
> > Table<-rbind(Lmodel_subdata1$coef, Lmodel_subdata1$coef, ....)
> >
> >
> >    
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> > -----
> > No virus found in this message.
> > Checked by AVG - www.avg.com
> > Version: 2015.0.5645 / Virus Database: 4284/9131 - Release Date: 02/17/15
> >
> >
> >
>
> --
> Michael
> http://www.dewey.myzen.co.uk
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
     
        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.