|
Dear users,
I'm quite a new french R-user, and I have a problem about doing a correlation matrix. I have temperature data for each weather station of my study area and for each year (for example, a data file for the weather station N°1 for the year 2009, a data file for the N°2 for the year 2010, ....). So I have 70 weather stations with one data file per year since 2005. Each station has 4 temperature sensors. Each data file has exactly the same structure: date&hour, sensor1, sensor2, sensor3, sensor4. Here's an example: time sensor1 sensor2 sensor3sensor4 01/01/2008 00:00 -0.25 -2.43 -3.25 -2.37 01/01/2008 00:15 -0.18 -2.37 -3.18 -2.25 01/01/2008 00:30 -0.25 -2.5 -3.37 -2.56 01/01/2008 00:45 -0.25 -2.37 -3.31 -2.37 I need to do a matrix correlation between each same sensors of the different stations (one correlation matrix between all the sensors 1 of the 70 stations, another one for sensor 2, ...). I have to find for each year and each station the best correlation. For example, which one of the 70 weather stations is the most well correlated with station 1 for the sensor 1? and with station 2? ... and so one for each sensor and each station. Example: Sensor 1 for the year 2009 Station 1 Station 2 Station 3 [...] Station 1 1 0.910 0.748 Station 2 0.910 1 0.6 Station 3 0.748 0.6 1 [...] And the same for year 2005,2006,2007,2008,2009,2010,2011 for each of the 4 sensors. Have you got any idea how can I do this on R? Should I first merge all the sensors in one file or could I do it with data in separate files (like I have for the moment)? Thank you very much for all your answers! |
|
Hello,
You don't need to merge all files, but you must do some preprocessing. If you put all data of one year in a 3d array, then simply use 'cor'. I've made up some fake data, in files named "station1_2009.dat", etc (only 6 stations), each of them with the same number of observations. If you have 70 stations per year, you'll need an automated process to access them. Something like the function below would solve part of that problem. What follows assumes that the n. obs. is the same in all files. # This function gives file names with the pattern above filenames <- function(y, n=70){ tmp <- paste("station", seq_len(n), sep="") tmp <- paste(tmp, y, sep="_") paste(tmp, "dat", sep=".") } Sensors <- paste("sensor", 1:4, sep="") Stations <- paste("station", 1:6, sep="") nsensors <- length(Sensors) nstations <- length(Stations) year <- 2009 fnames <- filenames(year, nstations) # If nobs is the same in all files, any one will do. nobs <- nrow(read.table(fnames[1], header=TRUE)) yr2009 <- array(NA, dim=c(nobs, nsensors, nstations)) for(i in seq_len(nstations)){ tmp <- read.table(fnames[i], header=TRUE) yr2009[ , , i] <- as.matrix(tmp[, Sensors]) } dimnames(yr2009) <- list(seq.int(nobs), Sensors, Stations) # correlations for sensor 1 cor(yr2009[ , 1, ]) # a list of correlations for the 4 sensors cor2009 <- lapply(Sensors, function(s) cor(yr2009[ , s, ])) names(cor2009) <- Sensors cor2009$sensor1 Don't pay much attention to the files part, what's relevant is to create and fill the array. Hope this helps, Rui Barradas |
|
Hello Rui,
Thanks a lot for your answer. Hou hoped that your script would help me? I answer you: It is WON-DER-FUL! It works very well! I had first some difficulties to adapt it to my data, but I succeeded afterwords when I made a test between 2 stations. It's not perfect yet (I still have to modify a bit my data because it doesn't recognize the time column, and I have some problems with the automatization according to the name of the data from each stations), but the main problem (correlation matrix) seems to be resolved thanks to you! Thanks a lot again! |
|
I improved yesterday a bit your script (mostly according to station numbers for the automatization). Here's the final version. thanks again!
filenames <- list.files(pattern="\\_2008_reconstruit.csv$") Sensors <- paste("capteur_", 1:4, sep="") Stations <-substr(filenames,1,5) nsensors <- length(Sensors) nstations <- length(Stations) nobs <- nrow(read.table(filenames[1], header=TRUE)) yr2008 <- array(NA, dim=c(nobs, nsensors, nstations)) for(i in seq_len(nstations)){ tmp <- read.table(filenames[i], header=TRUE, sep=";") yr2008[ , , i] <- as.matrix(tmp[, Sensors]) } dimnames(yr2008) <- list(seq.int(nobs), Sensors, Stations) cor2008 <- lapply(Sensors, function(s) cor(yr2008[ , s, ],use="complete.obs")) names(cor2008) <- Sensors cor2008$capteur_1 cor2008$capteur_2 cor2008$capteur_3 cor2008$capteur_4 |
| Powered by Nabble | Edit this page |
