|
Em 04-05-2012 11:00, jeff6868 <[hidden email]> escreveu:
> Date: Thu, 3 May 2012 06:45:59 -0700 (PDT) > From: jeff6868<[hidden email]> > To:[hidden email] > Subject: [R] add an automatized linear regression in a function > Message-ID:<[hidden email]> > Content-Type: text/plain; charset=us-ascii > > Dear R users, > > For the moment, I have a script and a function which calculates correlation > matrices between all my data files. Then, it chooses the best correlation > for each data and take it in order to fill missing data in the analysed file > (so the data from the best correlation file is put automatically into the > missing data gaps of the first file (because my files are containing missing > values (NAs))). If the best correlated file doesn't contain data , it takes > the data from the second best correlated file. > The problem is that for the moment, it takes raw data from the best > correlated file. > > So I need to adapt this raw data to the file that is going to be filled. As > a consequence, I'd like to automatize the calculation of a linear regression > (after the selection of the best or the second best correlated data file) > between the two files. > Instead of taking the raw data from the best correlated file to fill the > first one, it should take the estimated data from the regression to fill it > (in order to have more precise filled data). > The idea is so to do an lm() between these two files, to extract the > coefficients of the straight line (from the regression) and to calculate the > estimated data for all my file (NA included), and finally to fill the gaps > with this estimated data. Hope you've understand my problem. > Here's the function: > > process.all<- function(df.list, mat){ > f<- function(station) > na.fill(df.list[[ station ]], df.list[[ max.cor[station] ]]) > > g<- function(station){ > x<- df.list[[station]] > if(any(is.na(x$data))){ > mat[row(mat) == col(mat)]<- -Inf > nas<- which(is.na(x$data)) > ord<- order(mat[station, ], decreasing = TRUE)[-c(1, > ncol(mat))] > for(i in nas){ > for(y in ord){ > if(!is.na(df.list[[y]]$data[i])){ > x$data[i]<- df.list[[y]]$data[i] > break > } > } > } > } > x > } > > n<- length(df.list) > nms<- names(df.list) > max.cor<- sapply(seq.int(n), get.max.cor, corhiver2008capt1) > df.list<- lapply(seq.int(n), f) > df.list<- lapply(seq.int(n), g) > names(df.list)<- nms > df.list > } > > I succeded for a small data.frame I've created, but I don't know how to do > it in this particular case. > Thanks a lot for your help! > could be na.fill <- function(x, y){ i <- is.na(x$data) xx <- y$data new <- data.frame(xx=xx) x$data[i] <- predict(lm(x$data~xx, na.action=na.exclude), new)[i] x } and in process.all, change function g() to g <- function(station){ x <- df.list[[station]] if(any(is.na(x$data))){ mat[row(mat) == col(mat)] <- -Inf nas <- which(is.na(x$data)) ord <- order(mat[station, ], decreasing = TRUE)[-c(1, ncol(mat))] for(y in ord){ if(all(!is.na(df.list[[y]]$data[nas]))){ xx <- df.list[[y]]$data new <- data.frame(xx=xx) x$data[nas] <- predict(lm(x$data~xx, na.action=na.exclude), new)[nas] break } } } x } Hope this helps, Rui Barradas ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
| Powered by Nabble | Edit this page |
