Quantcast

Automating R for Hypothesis Testing

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Automating R for Hypothesis Testing

meredith
R Users-
  I have been trying to automate a manual code that I have developed for calling in a .csv file, isolating certain rows and columns that correspond to specified months:
something to the effect
i=name.csv
N=length(i$month)
iphos1=0
iphos2=0
isphos3=0
for i=1,N
 if month=1
    iphos1=iphos+1
    iphos1(iphos1)=i

an so on to call out the months into there own arrays (unless there is a way I can wrap it into the next automation)

Next: I would like to run a simple linear regression combining each of the months 1 by 1:
for instance I want to run a regression on a combined model from months 1 and 2 and a dummy model for 1 and 2, compare them using a Chi-sq distribution, if Chi-sq is less than the Critical value, we accept and go on to test another set of models with both 1 and 2. If it rejects, then we proceed to months 2 and 3.  If we move on to the second set on months 1 and 2, and the critical value is accepted, I want to print an accept or reject and move on to months 2 and 3, until finally comparing months 12-1 at the end.
Is there a way to loop or automate this in R?

Thanks
Meredith
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Automating R for Hypothesis Testing

Rui Barradas
Hello,

I'm not at all sure if I understand your problem. Does this describe it?


test first model for months 1 and 2
if test statistic less than critical value{
        test second model for months 1 and 2
        print results of the first and second tests? just one of them?
}
move on to months 2 and 3
etc, until months 12 and 1


Please post example data using dput(dataset).
Just copy it's output and paste it in your post.

And example code, what you're already doing.
(Possibly simplified)

Rui Barradas

meredith wrote
R Users-
  I have been trying to automate a manual code that I have developed for calling in a .csv file, isolating certain rows and columns that correspond to specified months:
something to the effect
i=name.csv
N=length(i$month)
iphos1=0
iphos2=0
isphos3=0
for i=1,N
 if month=1
    iphos1=iphos+1
    iphos1(iphos1)=i

an so on to call out the months into there own arrays (unless there is a way I can wrap it into the next automation)

Next: I would like to run a simple linear regression combining each of the months 1 by 1:
for instance I want to run a regression on a combined model from months 1 and 2 and a dummy model for 1 and 2, compare them using a Chi-sq distribution, if Chi-sq is less than the Critical value, we accept and go on to test another set of models with both 1 and 2. If it rejects, then we proceed to months 2 and 3.  If we move on to the second set on months 1 and 2, and the critical value is accepted, I want to print an accept or reject and move on to months 2 and 3, until finally comparing months 12-1 at the end.
Is there a way to loop or automate this in R?

Thanks
Meredith
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Automating R for Hypothesis Testing

meredith
dput: milwaukeephos.csv

# Feb-march
> modelH_febmarch<-lm(llfeb_march~lffeb_march)
>modelHa_febmarch<-lm(llfeb_march~X1feb_mar+lffeb_march)
> anova(modelHa_febmarch)
> coefficients(modelH_febmarch)
(Intercept) lffeb_march
  -2.429890    1.172821
> coefficients(modelHa_febmarch)
(Intercept)   X1feb_mar lffeb_march
 -2.8957776  -0.5272793   1.3016303
> bres_fm<-matrix(c(-2.429890,0,1.172821),nrow=3)
> bunres_fm<-matrix(c(-2.8957776,-0.5272793,1.3016303),nrow=3)
>bfm<-t(bunres_fm-bres_fm)
> fmvect<-seq(1,1,length=34)
> X1a_febmar<-seq(0,0,length=9) # dummy variable step 1
> X1b_febmar<-seq(1,1,length=25) # dummy variable step 2
> X1feb_mar<-c(X1a_febmar,X1b_febmar) #dummy variable creation
# Test Stat Equation for Chisq
> fmxx<-cbind(fmvect,X1feb_mar,lffeb_march)
> tfmx<-t(fmxx)
> xcom_fm<-(tfmx %*% fmxx)
> xinv_fm<-ginv(xcom_fm)
> var_fm<-xinv_fm*0.307
> chi_fm<-bfm %*% var_fm %*% (bunres_fm-bres_fm)
> chi_fm # chisq value for recording
if less than CV move onto to slope modification
> modelH2_febmarch<-lm(llfeb_march~X3feb_march)
> modelH2a_febmarch<-lm(llfeb_march~X3feb_march+X4feb_march)
> anova(modelH2a_febmarch)
> coefficients(modelH2_febmarch) # get coefficients to make beta vectors for test
(Intercept) X3feb_march
   5.342130    1.172821
> coefficients(modelH2a_febmarch)
(Intercept) X3feb_march X4feb_march
  5.2936263   1.0353752   0.2407557
# Test Stat
> bsres_fm<-matrix(c(5.342130,1.172821,0),nrow=3)
> bsunres_fm<-matrix(c(5.2936263,1.0353752,0.2407557),nrow=3)
> bsfm<-t(bsunres_fm-bsres_fm)
> #X matrix
> fmxs<-cbind(fmvect,X3feb_march,X4feb_march)
> tfmxs<-t(fmxs)
> xcoms_fm<-(tfmxs %*% fmxs)
> xinvs_fm<-ginv(xcoms_fm)
> var_fms<-xinvs_fm*0.341
> chi_fms<-bsfm %*% var_fms %*% (bsunres_fm-bsres_fm)
> chi_fms
# Record Chisq value

Does this help?
Here lffeb_march is the combination of Feb and March log flows
and llfeb_march is the combination of Feb and March log loads
X3: lffeb_march-mean(feb_march)
X4: X1*X3

Thanks
Rui Barradas wrote
Hello,

I'm not at all sure if I understand your problem. Does this describe it?


test first model for months 1 and 2
if test statistic less than critical value{
        test second model for months 1 and 2
        print results of the first and second tests? just one of them?
}
move on to months 2 and 3
etc, until months 12 and 1


Please post example data using dput(dataset).
Just copy it's output and paste it in your post.

And example code, what you're already doing.
(Possibly simplified)

Rui Barradas

meredith wrote
R Users-
  I have been trying to automate a manual code that I have developed for calling in a .csv file, isolating certain rows and columns that correspond to specified months:
something to the effect
i=name.csv
N=length(i$month)
iphos1=0
iphos2=0
isphos3=0
for i=1,N
 if month=1
    iphos1=iphos+1
    iphos1(iphos1)=i

an so on to call out the months into there own arrays (unless there is a way I can wrap it into the next automation)

Next: I would like to run a simple linear regression combining each of the months 1 by 1:
for instance I want to run a regression on a combined model from months 1 and 2 and a dummy model for 1 and 2, compare them using a Chi-sq distribution, if Chi-sq is less than the Critical value, we accept and go on to test another set of models with both 1 and 2. If it rejects, then we proceed to months 2 and 3.  If we move on to the second set on months 1 and 2, and the critical value is accepted, I want to print an accept or reject and move on to months 2 and 3, until finally comparing months 12-1 at the end.
Is there a way to loop or automate this in R?

Thanks
Meredith
Rui Barradas wrote
Hello,

I'm not at all sure if I understand your problem. Does this describe it?


test first model for months 1 and 2
if test statistic less than critical value{
        test second model for months 1 and 2
        print results of the first and second tests? just one of them?
}
move on to months 2 and 3
etc, until months 12 and 1


Please post example data using dput(dataset).
Just copy it's output and paste it in your post.

And example code, what you're already doing.
(Possibly simplified)

Rui Barradas

meredith wrote
R Users-
  I have been trying to automate a manual code that I have developed for calling in a .csv file, isolating certain rows and columns that correspond to specified months:
something to the effect
i=name.csv
N=length(i$month)
iphos1=0
iphos2=0
isphos3=0
for i=1,N
 if month=1
    iphos1=iphos+1
    iphos1(iphos1)=i

an so on to call out the months into there own arrays (unless there is a way I can wrap it into the next automation)

Next: I would like to run a simple linear regression combining each of the months 1 by 1:
for instance I want to run a regression on a combined model from months 1 and 2 and a dummy model for 1 and 2, compare them using a Chi-sq distribution, if Chi-sq is less than the Critical value, we accept and go on to test another set of models with both 1 and 2. If it rejects, then we proceed to months 2 and 3.  If we move on to the second set on months 1 and 2, and the critical value is accepted, I want to print an accept or reject and move on to months 2 and 3, until finally comparing months 12-1 at the end.
Is there a way to loop or automate this in R?

Thanks
Meredith
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Automating R for Hypothesis Testing

Rui Barradas
Hello,

Yes, it does help. Now we can see your data and what you're doing.
What follows is a suggestion on what you could do, not full solution.
(You forgot to say what X1 is, but I don't think it's important to understand the suggestion.)
(If I'm wrong, say something.)


milwaukeephos <- read.csv("milwaukeephos.csv", header=TRUE, stringsAsFactors=FALSE)
# list of data.frames, one per month
ls1 <- split(milwaukeephos, milwaukeephos$month)

#--------- if you want to keep the models, not needed if you don't.
#          (yoy probably don't)
modelH <- vector("list", 12)
modelHa <- vector("list", 12)
modelH2 <- vector("list", 12)
modelH2a <- vector("list", 12)
#--------- values to record, these are needed, create them beforehand.
chi_fm <- numeric(12)
chi_fms <- numeric(12)
#
seq_months <- c(1:12, 1) # wrap months around.
for(i in 1:12){
        month_this <- seq_months[i]
        month_next <- seq_months[i + 1]

        lload <- c(ls1[[month_this]]$load_kg, ls1[[month_next]]$load_kg)
        lflow <- c(ls1[[month_this]]$flow, ls1[[month_next]]$flow)
        modelH[[i]] <- lm(lload ~ lflow)
        # If you don't want to keep the models, use modelH only
        # ( without [[i]] )
        # and do the same with X1

        # rest of your code for first test goes here
        chi_fm[i] <- bfm %*% var_fm %*% (bunres_fm - bres_fm)

        # and the same for the second test
        chi_fms[i] <- ...etc...
}


Hope this helps,

Rui Barradas

meredith wrote
dput: milwaukeephos.csv

# Feb-march
> modelH_febmarch<-lm(llfeb_march~lffeb_march)
>modelHa_febmarch<-lm(llfeb_march~X1feb_mar+lffeb_march)
> anova(modelHa_febmarch)
> coefficients(modelH_febmarch)
(Intercept) lffeb_march
  -2.429890    1.172821
> coefficients(modelHa_febmarch)
(Intercept)   X1feb_mar lffeb_march
 -2.8957776  -0.5272793   1.3016303
> bres_fm<-matrix(c(-2.429890,0,1.172821),nrow=3)
> bunres_fm<-matrix(c(-2.8957776,-0.5272793,1.3016303),nrow=3)
>bfm<-t(bunres_fm-bres_fm)
> fmvect<-seq(1,1,length=34)
> X1a_febmar<-seq(0,0,length=9) # dummy variable step 1
> X1b_febmar<-seq(1,1,length=25) # dummy variable step 2
> X1feb_mar<-c(X1a_febmar,X1b_febmar) #dummy variable creation
# Test Stat Equation for Chisq
> fmxx<-cbind(fmvect,X1feb_mar,lffeb_march)
> tfmx<-t(fmxx)
> xcom_fm<-(tfmx %*% fmxx)
> xinv_fm<-ginv(xcom_fm)
> var_fm<-xinv_fm*0.307
> chi_fm<-bfm %*% var_fm %*% (bunres_fm-bres_fm)
> chi_fm # chisq value for recording
if less than CV move onto to slope modification
> modelH2_febmarch<-lm(llfeb_march~X3feb_march)
> modelH2a_febmarch<-lm(llfeb_march~X3feb_march+X4feb_march)
> anova(modelH2a_febmarch)
> coefficients(modelH2_febmarch) # get coefficients to make beta vectors for test
(Intercept) X3feb_march
   5.342130    1.172821
> coefficients(modelH2a_febmarch)
(Intercept) X3feb_march X4feb_march
  5.2936263   1.0353752   0.2407557
# Test Stat
> bsres_fm<-matrix(c(5.342130,1.172821,0),nrow=3)
> bsunres_fm<-matrix(c(5.2936263,1.0353752,0.2407557),nrow=3)
> bsfm<-t(bsunres_fm-bsres_fm)
> #X matrix
> fmxs<-cbind(fmvect,X3feb_march,X4feb_march)
> tfmxs<-t(fmxs)
> xcoms_fm<-(tfmxs %*% fmxs)
> xinvs_fm<-ginv(xcoms_fm)
> var_fms<-xinvs_fm*0.341
> chi_fms<-bsfm %*% var_fms %*% (bsunres_fm-bsres_fm)
> chi_fms
# Record Chisq value

Does this help?
Here lffeb_march is the combination of Feb and March log flows
and llfeb_march is the combination of Feb and March log loads
X3: lffeb_march-mean(feb_march)
X4: X1*X3

Thanks
Rui Barradas wrote
Hello,

I'm not at all sure if I understand your problem. Does this describe it?


test first model for months 1 and 2
if test statistic less than critical value{
        test second model for months 1 and 2
        print results of the first and second tests? just one of them?
}
move on to months 2 and 3
etc, until months 12 and 1


Please post example data using dput(dataset).
Just copy it's output and paste it in your post.

And example code, what you're already doing.
(Possibly simplified)

Rui Barradas

meredith wrote
R Users-
  I have been trying to automate a manual code that I have developed for calling in a .csv file, isolating certain rows and columns that correspond to specified months:
something to the effect
i=name.csv
N=length(i$month)
iphos1=0
iphos2=0
isphos3=0
for i=1,N
 if month=1
    iphos1=iphos+1
    iphos1(iphos1)=i

an so on to call out the months into there own arrays (unless there is a way I can wrap it into the next automation)

Next: I would like to run a simple linear regression combining each of the months 1 by 1:
for instance I want to run a regression on a combined model from months 1 and 2 and a dummy model for 1 and 2, compare them using a Chi-sq distribution, if Chi-sq is less than the Critical value, we accept and go on to test another set of models with both 1 and 2. If it rejects, then we proceed to months 2 and 3.  If we move on to the second set on months 1 and 2, and the critical value is accepted, I want to print an accept or reject and move on to months 2 and 3, until finally comparing months 12-1 at the end.
Is there a way to loop or automate this in R?

Thanks
Meredith
Rui Barradas wrote
Hello,

I'm not at all sure if I understand your problem. Does this describe it?


test first model for months 1 and 2
if test statistic less than critical value{
        test second model for months 1 and 2
        print results of the first and second tests? just one of them?
}
move on to months 2 and 3
etc, until months 12 and 1


Please post example data using dput(dataset).
Just copy it's output and paste it in your post.

And example code, what you're already doing.
(Possibly simplified)

Rui Barradas

meredith wrote
R Users-
  I have been trying to automate a manual code that I have developed for calling in a .csv file, isolating certain rows and columns that correspond to specified months:
something to the effect
i=name.csv
N=length(i$month)
iphos1=0
iphos2=0
isphos3=0
for i=1,N
 if month=1
    iphos1=iphos+1
    iphos1(iphos1)=i

an so on to call out the months into there own arrays (unless there is a way I can wrap it into the next automation)

Next: I would like to run a simple linear regression combining each of the months 1 by 1:
for instance I want to run a regression on a combined model from months 1 and 2 and a dummy model for 1 and 2, compare them using a Chi-sq distribution, if Chi-sq is less than the Critical value, we accept and go on to test another set of models with both 1 and 2. If it rejects, then we proceed to months 2 and 3.  If we move on to the second set on months 1 and 2, and the critical value is accepted, I want to print an accept or reject and move on to months 2 and 3, until finally comparing months 12-1 at the end.
Is there a way to loop or automate this in R?

Thanks
Meredith
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Automating R for Hypothesis Testing

meredith
Rui-
  Thanks this definitely helps, just one quick question. How would you code the values of chi-fm and chi-fms to change based on the degrees of freedom of each model H(i)?

Meredith

Rui Barradas wrote
Hello,

Yes, it does help. Now we can see your data and what you're doing.
What follows is a suggestion on what you could do, not full solution.
(You forgot to say what X1 is, but I don't think it's important to understand the suggestion.)
(If I'm wrong, say something.)


milwaukeephos <- read.csv("milwaukeephos.csv", header=TRUE, stringsAsFactors=FALSE)
# list of data.frames, one per month
ls1 <- split(milwaukeephos, milwaukeephos$month)

#--------- if you want to keep the models, not needed if you don't.
#          (yoy probably don't)
modelH <- vector("list", 12)
modelHa <- vector("list", 12)
modelH2 <- vector("list", 12)
modelH2a <- vector("list", 12)
#--------- values to record, these are needed, create them beforehand.
chi_fm <- numeric(12)
chi_fms <- numeric(12)
#
seq_months <- c(1:12, 1) # wrap months around.
for(i in 1:12){
        month_this <- seq_months[i]
        month_next <- seq_months[i + 1]

        lload <- c(ls1[[month_this]]$load_kg, ls1[[month_next]]$load_kg)
        lflow <- c(ls1[[month_this]]$flow, ls1[[month_next]]$flow)
        modelH[[i]] <- lm(lload ~ lflow)
        # If you don't want to keep the models, use modelH only
        # ( without [[i]] )
        # and do the same with X1

        # rest of your code for first test goes here
        chi_fm[i] <- bfm %*% var_fm %*% (bunres_fm - bres_fm)

        # and the same for the second test
        chi_fms[i] <- ...etc...
}


Hope this helps,

Rui Barradas

meredith wrote
dput: milwaukeephos.csv

# Feb-march
> modelH_febmarch<-lm(llfeb_march~lffeb_march)
>modelHa_febmarch<-lm(llfeb_march~X1feb_mar+lffeb_march)
> anova(modelHa_febmarch)
> coefficients(modelH_febmarch)
(Intercept) lffeb_march
  -2.429890    1.172821
> coefficients(modelHa_febmarch)
(Intercept)   X1feb_mar lffeb_march
 -2.8957776  -0.5272793   1.3016303
> bres_fm<-matrix(c(-2.429890,0,1.172821),nrow=3)
> bunres_fm<-matrix(c(-2.8957776,-0.5272793,1.3016303),nrow=3)
>bfm<-t(bunres_fm-bres_fm)
> fmvect<-seq(1,1,length=34)
> X1a_febmar<-seq(0,0,length=9) # dummy variable step 1
> X1b_febmar<-seq(1,1,length=25) # dummy variable step 2
> X1feb_mar<-c(X1a_febmar,X1b_febmar) #dummy variable creation
# Test Stat Equation for Chisq
> fmxx<-cbind(fmvect,X1feb_mar,lffeb_march)
> tfmx<-t(fmxx)
> xcom_fm<-(tfmx %*% fmxx)
> xinv_fm<-ginv(xcom_fm)
> var_fm<-xinv_fm*0.307
> chi_fm<-bfm %*% var_fm %*% (bunres_fm-bres_fm)
> chi_fm # chisq value for recording
if less than CV move onto to slope modification
> modelH2_febmarch<-lm(llfeb_march~X3feb_march)
> modelH2a_febmarch<-lm(llfeb_march~X3feb_march+X4feb_march)
> anova(modelH2a_febmarch)
> coefficients(modelH2_febmarch) # get coefficients to make beta vectors for test
(Intercept) X3feb_march
   5.342130    1.172821
> coefficients(modelH2a_febmarch)
(Intercept) X3feb_march X4feb_march
  5.2936263   1.0353752   0.2407557
# Test Stat
> bsres_fm<-matrix(c(5.342130,1.172821,0),nrow=3)
> bsunres_fm<-matrix(c(5.2936263,1.0353752,0.2407557),nrow=3)
> bsfm<-t(bsunres_fm-bsres_fm)
> #X matrix
> fmxs<-cbind(fmvect,X3feb_march,X4feb_march)
> tfmxs<-t(fmxs)
> xcoms_fm<-(tfmxs %*% fmxs)
> xinvs_fm<-ginv(xcoms_fm)
> var_fms<-xinvs_fm*0.341
> chi_fms<-bsfm %*% var_fms %*% (bsunres_fm-bsres_fm)
> chi_fms
# Record Chisq value

Does this help?
Here lffeb_march is the combination of Feb and March log flows
and llfeb_march is the combination of Feb and March log loads
X3: lffeb_march-mean(feb_march)
X4: X1*X3

Thanks
Rui Barradas wrote
Hello,

I'm not at all sure if I understand your problem. Does this describe it?


test first model for months 1 and 2
if test statistic less than critical value{
        test second model for months 1 and 2
        print results of the first and second tests? just one of them?
}
move on to months 2 and 3
etc, until months 12 and 1


Please post example data using dput(dataset).
Just copy it's output and paste it in your post.

And example code, what you're already doing.
(Possibly simplified)

Rui Barradas

meredith wrote
R Users-
  I have been trying to automate a manual code that I have developed for calling in a .csv file, isolating certain rows and columns that correspond to specified months:
something to the effect
i=name.csv
N=length(i$month)
iphos1=0
iphos2=0
isphos3=0
for i=1,N
 if month=1
    iphos1=iphos+1
    iphos1(iphos1)=i

an so on to call out the months into there own arrays (unless there is a way I can wrap it into the next automation)

Next: I would like to run a simple linear regression combining each of the months 1 by 1:
for instance I want to run a regression on a combined model from months 1 and 2 and a dummy model for 1 and 2, compare them using a Chi-sq distribution, if Chi-sq is less than the Critical value, we accept and go on to test another set of models with both 1 and 2. If it rejects, then we proceed to months 2 and 3.  If we move on to the second set on months 1 and 2, and the critical value is accepted, I want to print an accept or reject and move on to months 2 and 3, until finally comparing months 12-1 at the end.
Is there a way to loop or automate this in R?

Thanks
Meredith
Rui Barradas wrote
Hello,

I'm not at all sure if I understand your problem. Does this describe it?


test first model for months 1 and 2
if test statistic less than critical value{
        test second model for months 1 and 2
        print results of the first and second tests? just one of them?
}
move on to months 2 and 3
etc, until months 12 and 1


Please post example data using dput(dataset).
Just copy it's output and paste it in your post.

And example code, what you're already doing.
(Possibly simplified)

Rui Barradas

meredith wrote
R Users-
  I have been trying to automate a manual code that I have developed for calling in a .csv file, isolating certain rows and columns that correspond to specified months:
something to the effect
i=name.csv
N=length(i$month)
iphos1=0
iphos2=0
isphos3=0
for i=1,N
 if month=1
    iphos1=iphos+1
    iphos1(iphos1)=i

an so on to call out the months into there own arrays (unless there is a way I can wrap it into the next automation)

Next: I would like to run a simple linear regression combining each of the months 1 by 1:
for instance I want to run a regression on a combined model from months 1 and 2 and a dummy model for 1 and 2, compare them using a Chi-sq distribution, if Chi-sq is less than the Critical value, we accept and go on to test another set of models with both 1 and 2. If it rejects, then we proceed to months 2 and 3.  If we move on to the second set on months 1 and 2, and the critical value is accepted, I want to print an accept or reject and move on to months 2 and 3, until finally comparing months 12-1 at the end.
Is there a way to loop or automate this in R?

Thanks
Meredith
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Automating R for Hypothesis Testing

Rui Barradas
Hello,

I'm glad it helped.

As for your second question, I don't know, but I'm not very comfortable with the way you're doing things.
Why subtract the coefficients of model 1 from model 2?
And why the dummy? Why set model 1 to zero?

Isn't it better to use anova's F? After all, it's designed for it, for the linear model...
And if you really want/need the dummy, wouldn't a nested anova do it? (F statistic, once again.)

anova(model1, model2)

is simple and statistically speaking seems to me much better. (I specially don't like the subtraction bit.)

Rui Barradas
meredith wrote
Rui-
  Thanks this definitely helps, just one quick question. How would you code the values of chi-fm and chi-fms to change based on the degrees of freedom of each model H(i)?

Meredith

Rui Barradas wrote
Hello,

Yes, it does help. Now we can see your data and what you're doing.
What follows is a suggestion on what you could do, not full solution.
(You forgot to say what X1 is, but I don't think it's important to understand the suggestion.)
(If I'm wrong, say something.)


milwaukeephos <- read.csv("milwaukeephos.csv", header=TRUE, stringsAsFactors=FALSE)
# list of data.frames, one per month
ls1 <- split(milwaukeephos, milwaukeephos$month)

#--------- if you want to keep the models, not needed if you don't.
#          (yoy probably don't)
modelH <- vector("list", 12)
modelHa <- vector("list", 12)
modelH2 <- vector("list", 12)
modelH2a <- vector("list", 12)
#--------- values to record, these are needed, create them beforehand.
chi_fm <- numeric(12)
chi_fms <- numeric(12)
#
seq_months <- c(1:12, 1) # wrap months around.
for(i in 1:12){
        month_this <- seq_months[i]
        month_next <- seq_months[i + 1]

        lload <- c(ls1[[month_this]]$load_kg, ls1[[month_next]]$load_kg)
        lflow <- c(ls1[[month_this]]$flow, ls1[[month_next]]$flow)
        modelH[[i]] <- lm(lload ~ lflow)
        # If you don't want to keep the models, use modelH only
        # ( without [[i]] )
        # and do the same with X1

        # rest of your code for first test goes here
        chi_fm[i] <- bfm %*% var_fm %*% (bunres_fm - bres_fm)

        # and the same for the second test
        chi_fms[i] <- ...etc...
}


Hope this helps,

Rui Barradas

meredith wrote
dput: milwaukeephos.csv

# Feb-march
> modelH_febmarch<-lm(llfeb_march~lffeb_march)
>modelHa_febmarch<-lm(llfeb_march~X1feb_mar+lffeb_march)
> anova(modelHa_febmarch)
> coefficients(modelH_febmarch)
(Intercept) lffeb_march
  -2.429890    1.172821
> coefficients(modelHa_febmarch)
(Intercept)   X1feb_mar lffeb_march
 -2.8957776  -0.5272793   1.3016303
> bres_fm<-matrix(c(-2.429890,0,1.172821),nrow=3)
> bunres_fm<-matrix(c(-2.8957776,-0.5272793,1.3016303),nrow=3)
>bfm<-t(bunres_fm-bres_fm)
> fmvect<-seq(1,1,length=34)
> X1a_febmar<-seq(0,0,length=9) # dummy variable step 1
> X1b_febmar<-seq(1,1,length=25) # dummy variable step 2
> X1feb_mar<-c(X1a_febmar,X1b_febmar) #dummy variable creation
# Test Stat Equation for Chisq
> fmxx<-cbind(fmvect,X1feb_mar,lffeb_march)
> tfmx<-t(fmxx)
> xcom_fm<-(tfmx %*% fmxx)
> xinv_fm<-ginv(xcom_fm)
> var_fm<-xinv_fm*0.307
> chi_fm<-bfm %*% var_fm %*% (bunres_fm-bres_fm)
> chi_fm # chisq value for recording
if less than CV move onto to slope modification
> modelH2_febmarch<-lm(llfeb_march~X3feb_march)
> modelH2a_febmarch<-lm(llfeb_march~X3feb_march+X4feb_march)
> anova(modelH2a_febmarch)
> coefficients(modelH2_febmarch) # get coefficients to make beta vectors for test
(Intercept) X3feb_march
   5.342130    1.172821
> coefficients(modelH2a_febmarch)
(Intercept) X3feb_march X4feb_march
  5.2936263   1.0353752   0.2407557
# Test Stat
> bsres_fm<-matrix(c(5.342130,1.172821,0),nrow=3)
> bsunres_fm<-matrix(c(5.2936263,1.0353752,0.2407557),nrow=3)
> bsfm<-t(bsunres_fm-bsres_fm)
> #X matrix
> fmxs<-cbind(fmvect,X3feb_march,X4feb_march)
> tfmxs<-t(fmxs)
> xcoms_fm<-(tfmxs %*% fmxs)
> xinvs_fm<-ginv(xcoms_fm)
> var_fms<-xinvs_fm*0.341
> chi_fms<-bsfm %*% var_fms %*% (bsunres_fm-bsres_fm)
> chi_fms
# Record Chisq value

Does this help?
Here lffeb_march is the combination of Feb and March log flows
and llfeb_march is the combination of Feb and March log loads
X3: lffeb_march-mean(feb_march)
X4: X1*X3

Thanks
Rui Barradas wrote
Hello,

I'm not at all sure if I understand your problem. Does this describe it?


test first model for months 1 and 2
if test statistic less than critical value{
        test second model for months 1 and 2
        print results of the first and second tests? just one of them?
}
move on to months 2 and 3
etc, until months 12 and 1


Please post example data using dput(dataset).
Just copy it's output and paste it in your post.

And example code, what you're already doing.
(Possibly simplified)

Rui Barradas

meredith wrote
R Users-
  I have been trying to automate a manual code that I have developed for calling in a .csv file, isolating certain rows and columns that correspond to specified months:
something to the effect
i=name.csv
N=length(i$month)
iphos1=0
iphos2=0
isphos3=0
for i=1,N
 if month=1
    iphos1=iphos+1
    iphos1(iphos1)=i

an so on to call out the months into there own arrays (unless there is a way I can wrap it into the next automation)

Next: I would like to run a simple linear regression combining each of the months 1 by 1:
for instance I want to run a regression on a combined model from months 1 and 2 and a dummy model for 1 and 2, compare them using a Chi-sq distribution, if Chi-sq is less than the Critical value, we accept and go on to test another set of models with both 1 and 2. If it rejects, then we proceed to months 2 and 3.  If we move on to the second set on months 1 and 2, and the critical value is accepted, I want to print an accept or reject and move on to months 2 and 3, until finally comparing months 12-1 at the end.
Is there a way to loop or automate this in R?

Thanks
Meredith
Rui Barradas wrote
Hello,

I'm not at all sure if I understand your problem. Does this describe it?


test first model for months 1 and 2
if test statistic less than critical value{
        test second model for months 1 and 2
        print results of the first and second tests? just one of them?
}
move on to months 2 and 3
etc, until months 12 and 1


Please post example data using dput(dataset).
Just copy it's output and paste it in your post.

And example code, what you're already doing.
(Possibly simplified)

Rui Barradas

meredith wrote
R Users-
  I have been trying to automate a manual code that I have developed for calling in a .csv file, isolating certain rows and columns that correspond to specified months:
something to the effect
i=name.csv
N=length(i$month)
iphos1=0
iphos2=0
isphos3=0
for i=1,N
 if month=1
    iphos1=iphos+1
    iphos1(iphos1)=i

an so on to call out the months into there own arrays (unless there is a way I can wrap it into the next automation)

Next: I would like to run a simple linear regression combining each of the months 1 by 1:
for instance I want to run a regression on a combined model from months 1 and 2 and a dummy model for 1 and 2, compare them using a Chi-sq distribution, if Chi-sq is less than the Critical value, we accept and go on to test another set of models with both 1 and 2. If it rejects, then we proceed to months 2 and 3.  If we move on to the second set on months 1 and 2, and the critical value is accepted, I want to print an accept or reject and move on to months 2 and 3, until finally comparing months 12-1 at the end.
Is there a way to loop or automate this in R?

Thanks
Meredith
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Automating R for Hypothesis Testing

meredith
Rui-
  Just a quick question. I understand your comment on using ANOVA, but doesn't this only test for similarities of the mean. We are trying to see if a the same model can fit for two or three months, therefore have the similar slope and intercept. The ANOVA would only do one part of this correct, with the F-Test?

Thanks!
Meredith
Rui Barradas wrote
Hello,

I'm glad it helped.

As for your second question, I don't know, but I'm not very comfortable with the way you're doing things.
Why subtract the coefficients of model 1 from model 2?
And why the dummy? Why set model 1 to zero?

Isn't it better to use anova's F? After all, it's designed for it, for the linear model...
And if you really want/need the dummy, wouldn't a nested anova do it? (F statistic, once again.)

anova(model1, model2)

is simple and statistically speaking seems to me much better. (I specially don't like the subtraction bit.)

Rui Barradas
meredith wrote
Rui-
  Thanks this definitely helps, just one quick question. How would you code the values of chi-fm and chi-fms to change based on the degrees of freedom of each model H(i)?

Meredith

Rui Barradas wrote
Hello,

Yes, it does help. Now we can see your data and what you're doing.
What follows is a suggestion on what you could do, not full solution.
(You forgot to say what X1 is, but I don't think it's important to understand the suggestion.)
(If I'm wrong, say something.)


milwaukeephos <- read.csv("milwaukeephos.csv", header=TRUE, stringsAsFactors=FALSE)
# list of data.frames, one per month
ls1 <- split(milwaukeephos, milwaukeephos$month)

#--------- if you want to keep the models, not needed if you don't.
#          (yoy probably don't)
modelH <- vector("list", 12)
modelHa <- vector("list", 12)
modelH2 <- vector("list", 12)
modelH2a <- vector("list", 12)
#--------- values to record, these are needed, create them beforehand.
chi_fm <- numeric(12)
chi_fms <- numeric(12)
#
seq_months <- c(1:12, 1) # wrap months around.
for(i in 1:12){
        month_this <- seq_months[i]
        month_next <- seq_months[i + 1]

        lload <- c(ls1[[month_this]]$load_kg, ls1[[month_next]]$load_kg)
        lflow <- c(ls1[[month_this]]$flow, ls1[[month_next]]$flow)
        modelH[[i]] <- lm(lload ~ lflow)
        # If you don't want to keep the models, use modelH only
        # ( without [[i]] )
        # and do the same with X1

        # rest of your code for first test goes here
        chi_fm[i] <- bfm %*% var_fm %*% (bunres_fm - bres_fm)

        # and the same for the second test
        chi_fms[i] <- ...etc...
}


Hope this helps,

Rui Barradas

meredith wrote
dput: milwaukeephos.csv

# Feb-march
> modelH_febmarch<-lm(llfeb_march~lffeb_march)
>modelHa_febmarch<-lm(llfeb_march~X1feb_mar+lffeb_march)
> anova(modelHa_febmarch)
> coefficients(modelH_febmarch)
(Intercept) lffeb_march
  -2.429890    1.172821
> coefficients(modelHa_febmarch)
(Intercept)   X1feb_mar lffeb_march
 -2.8957776  -0.5272793   1.3016303
> bres_fm<-matrix(c(-2.429890,0,1.172821),nrow=3)
> bunres_fm<-matrix(c(-2.8957776,-0.5272793,1.3016303),nrow=3)
>bfm<-t(bunres_fm-bres_fm)
> fmvect<-seq(1,1,length=34)
> X1a_febmar<-seq(0,0,length=9) # dummy variable step 1
> X1b_febmar<-seq(1,1,length=25) # dummy variable step 2
> X1feb_mar<-c(X1a_febmar,X1b_febmar) #dummy variable creation
# Test Stat Equation for Chisq
> fmxx<-cbind(fmvect,X1feb_mar,lffeb_march)
> tfmx<-t(fmxx)
> xcom_fm<-(tfmx %*% fmxx)
> xinv_fm<-ginv(xcom_fm)
> var_fm<-xinv_fm*0.307
> chi_fm<-bfm %*% var_fm %*% (bunres_fm-bres_fm)
> chi_fm # chisq value for recording
if less than CV move onto to slope modification
> modelH2_febmarch<-lm(llfeb_march~X3feb_march)
> modelH2a_febmarch<-lm(llfeb_march~X3feb_march+X4feb_march)
> anova(modelH2a_febmarch)
> coefficients(modelH2_febmarch) # get coefficients to make beta vectors for test
(Intercept) X3feb_march
   5.342130    1.172821
> coefficients(modelH2a_febmarch)
(Intercept) X3feb_march X4feb_march
  5.2936263   1.0353752   0.2407557
# Test Stat
> bsres_fm<-matrix(c(5.342130,1.172821,0),nrow=3)
> bsunres_fm<-matrix(c(5.2936263,1.0353752,0.2407557),nrow=3)
> bsfm<-t(bsunres_fm-bsres_fm)
> #X matrix
> fmxs<-cbind(fmvect,X3feb_march,X4feb_march)
> tfmxs<-t(fmxs)
> xcoms_fm<-(tfmxs %*% fmxs)
> xinvs_fm<-ginv(xcoms_fm)
> var_fms<-xinvs_fm*0.341
> chi_fms<-bsfm %*% var_fms %*% (bsunres_fm-bsres_fm)
> chi_fms
# Record Chisq value

Does this help?
Here lffeb_march is the combination of Feb and March log flows
and llfeb_march is the combination of Feb and March log loads
X3: lffeb_march-mean(feb_march)
X4: X1*X3

Thanks
Rui Barradas wrote
Hello,

I'm not at all sure if I understand your problem. Does this describe it?


test first model for months 1 and 2
if test statistic less than critical value{
        test second model for months 1 and 2
        print results of the first and second tests? just one of them?
}
move on to months 2 and 3
etc, until months 12 and 1


Please post example data using dput(dataset).
Just copy it's output and paste it in your post.

And example code, what you're already doing.
(Possibly simplified)

Rui Barradas

meredith wrote
R Users-
  I have been trying to automate a manual code that I have developed for calling in a .csv file, isolating certain rows and columns that correspond to specified months:
something to the effect
i=name.csv
N=length(i$month)
iphos1=0
iphos2=0
isphos3=0
for i=1,N
 if month=1
    iphos1=iphos+1
    iphos1(iphos1)=i

an so on to call out the months into there own arrays (unless there is a way I can wrap it into the next automation)

Next: I would like to run a simple linear regression combining each of the months 1 by 1:
for instance I want to run a regression on a combined model from months 1 and 2 and a dummy model for 1 and 2, compare them using a Chi-sq distribution, if Chi-sq is less than the Critical value, we accept and go on to test another set of models with both 1 and 2. If it rejects, then we proceed to months 2 and 3.  If we move on to the second set on months 1 and 2, and the critical value is accepted, I want to print an accept or reject and move on to months 2 and 3, until finally comparing months 12-1 at the end.
Is there a way to loop or automate this in R?

Thanks
Meredith
Rui Barradas wrote
Hello,

I'm not at all sure if I understand your problem. Does this describe it?


test first model for months 1 and 2
if test statistic less than critical value{
        test second model for months 1 and 2
        print results of the first and second tests? just one of them?
}
move on to months 2 and 3
etc, until months 12 and 1


Please post example data using dput(dataset).
Just copy it's output and paste it in your post.

And example code, what you're already doing.
(Possibly simplified)

Rui Barradas

meredith wrote
R Users-
  I have been trying to automate a manual code that I have developed for calling in a .csv file, isolating certain rows and columns that correspond to specified months:
something to the effect
i=name.csv
N=length(i$month)
iphos1=0
iphos2=0
isphos3=0
for i=1,N
 if month=1
    iphos1=iphos+1
    iphos1(iphos1)=i

an so on to call out the months into there own arrays (unless there is a way I can wrap it into the next automation)

Next: I would like to run a simple linear regression combining each of the months 1 by 1:
for instance I want to run a regression on a combined model from months 1 and 2 and a dummy model for 1 and 2, compare them using a Chi-sq distribution, if Chi-sq is less than the Critical value, we accept and go on to test another set of models with both 1 and 2. If it rejects, then we proceed to months 2 and 3.  If we move on to the second set on months 1 and 2, and the critical value is accepted, I want to print an accept or reject and move on to months 2 and 3, until finally comparing months 12-1 at the end.
Is there a way to loop or automate this in R?

Thanks
Meredith
Loading...