|
R Users-
I have been trying to automate a manual code that I have developed for calling in a .csv file, isolating certain rows and columns that correspond to specified months: something to the effect i=name.csv N=length(i$month) iphos1=0 iphos2=0 isphos3=0 for i=1,N if month=1 iphos1=iphos+1 iphos1(iphos1)=i an so on to call out the months into there own arrays (unless there is a way I can wrap it into the next automation) Next: I would like to run a simple linear regression combining each of the months 1 by 1: for instance I want to run a regression on a combined model from months 1 and 2 and a dummy model for 1 and 2, compare them using a Chi-sq distribution, if Chi-sq is less than the Critical value, we accept and go on to test another set of models with both 1 and 2. If it rejects, then we proceed to months 2 and 3. If we move on to the second set on months 1 and 2, and the critical value is accepted, I want to print an accept or reject and move on to months 2 and 3, until finally comparing months 12-1 at the end. Is there a way to loop or automate this in R? Thanks Meredith |
|
Hello,
I'm not at all sure if I understand your problem. Does this describe it? test first model for months 1 and 2 if test statistic less than critical value{ test second model for months 1 and 2 print results of the first and second tests? just one of them? } move on to months 2 and 3 etc, until months 12 and 1 Please post example data using dput(dataset). Just copy it's output and paste it in your post. And example code, what you're already doing. (Possibly simplified) Rui Barradas
|
|
dput: milwaukeephos.csv
# Feb-march > modelH_febmarch<-lm(llfeb_march~lffeb_march) >modelHa_febmarch<-lm(llfeb_march~X1feb_mar+lffeb_march) > anova(modelHa_febmarch) > coefficients(modelH_febmarch) (Intercept) lffeb_march -2.429890 1.172821 > coefficients(modelHa_febmarch) (Intercept) X1feb_mar lffeb_march -2.8957776 -0.5272793 1.3016303 > bres_fm<-matrix(c(-2.429890,0,1.172821),nrow=3) > bunres_fm<-matrix(c(-2.8957776,-0.5272793,1.3016303),nrow=3) >bfm<-t(bunres_fm-bres_fm) > fmvect<-seq(1,1,length=34) > X1a_febmar<-seq(0,0,length=9) # dummy variable step 1 > X1b_febmar<-seq(1,1,length=25) # dummy variable step 2 > X1feb_mar<-c(X1a_febmar,X1b_febmar) #dummy variable creation # Test Stat Equation for Chisq > fmxx<-cbind(fmvect,X1feb_mar,lffeb_march) > tfmx<-t(fmxx) > xcom_fm<-(tfmx %*% fmxx) > xinv_fm<-ginv(xcom_fm) > var_fm<-xinv_fm*0.307 > chi_fm<-bfm %*% var_fm %*% (bunres_fm-bres_fm) > chi_fm # chisq value for recording if less than CV move onto to slope modification > modelH2_febmarch<-lm(llfeb_march~X3feb_march) > modelH2a_febmarch<-lm(llfeb_march~X3feb_march+X4feb_march) > anova(modelH2a_febmarch) > coefficients(modelH2_febmarch) # get coefficients to make beta vectors for test (Intercept) X3feb_march 5.342130 1.172821 > coefficients(modelH2a_febmarch) (Intercept) X3feb_march X4feb_march 5.2936263 1.0353752 0.2407557 # Test Stat > bsres_fm<-matrix(c(5.342130,1.172821,0),nrow=3) > bsunres_fm<-matrix(c(5.2936263,1.0353752,0.2407557),nrow=3) > bsfm<-t(bsunres_fm-bsres_fm) > #X matrix > fmxs<-cbind(fmvect,X3feb_march,X4feb_march) > tfmxs<-t(fmxs) > xcoms_fm<-(tfmxs %*% fmxs) > xinvs_fm<-ginv(xcoms_fm) > var_fms<-xinvs_fm*0.341 > chi_fms<-bsfm %*% var_fms %*% (bsunres_fm-bsres_fm) > chi_fms # Record Chisq value Does this help? Here lffeb_march is the combination of Feb and March log flows and llfeb_march is the combination of Feb and March log loads X3: lffeb_march-mean(feb_march) X4: X1*X3 Thanks
|
|
Hello,
Yes, it does help. Now we can see your data and what you're doing. What follows is a suggestion on what you could do, not full solution. (You forgot to say what X1 is, but I don't think it's important to understand the suggestion.) (If I'm wrong, say something.) milwaukeephos <- read.csv("milwaukeephos.csv", header=TRUE, stringsAsFactors=FALSE) # list of data.frames, one per month ls1 <- split(milwaukeephos, milwaukeephos$month) #--------- if you want to keep the models, not needed if you don't. # (yoy probably don't) modelH <- vector("list", 12) modelHa <- vector("list", 12) modelH2 <- vector("list", 12) modelH2a <- vector("list", 12) #--------- values to record, these are needed, create them beforehand. chi_fm <- numeric(12) chi_fms <- numeric(12) # seq_months <- c(1:12, 1) # wrap months around. for(i in 1:12){ month_this <- seq_months[i] month_next <- seq_months[i + 1] lload <- c(ls1[[month_this]]$load_kg, ls1[[month_next]]$load_kg) lflow <- c(ls1[[month_this]]$flow, ls1[[month_next]]$flow) modelH[[i]] <- lm(lload ~ lflow) # If you don't want to keep the models, use modelH only # ( without [[i]] ) # and do the same with X1 # rest of your code for first test goes here chi_fm[i] <- bfm %*% var_fm %*% (bunres_fm - bres_fm) # and the same for the second test chi_fms[i] <- ...etc... } Hope this helps, Rui Barradas
|
|
Rui-
Thanks this definitely helps, just one quick question. How would you code the values of chi-fm and chi-fms to change based on the degrees of freedom of each model H(i)? Meredith
|
|
Hello,
I'm glad it helped. As for your second question, I don't know, but I'm not very comfortable with the way you're doing things. Why subtract the coefficients of model 1 from model 2? And why the dummy? Why set model 1 to zero? Isn't it better to use anova's F? After all, it's designed for it, for the linear model... And if you really want/need the dummy, wouldn't a nested anova do it? (F statistic, once again.) anova(model1, model2) is simple and statistically speaking seems to me much better. (I specially don't like the subtraction bit.) Rui Barradas
|
|
Rui-
Just a quick question. I understand your comment on using ANOVA, but doesn't this only test for similarities of the mean. We are trying to see if a the same model can fit for two or three months, therefore have the similar slope and intercept. The ANOVA would only do one part of this correct, with the F-Test? Thanks! Meredith
|
| Powered by Nabble | Edit this page |
