Quantcast

how to ignore NA with "NA" or "NULL"

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

how to ignore NA with "NA" or "NULL"

jeff6868
Hello dear R-users,

I have a problem in my code about ignoring NA values without removing them.
I'm working on a list of files. The aim is to fill one file from another according to the highest correlation (correlation coeff between all my files, so the file which looks like the most to the one I want to fill).
When I have just small gaps of NA, my function works well.
The problem is when I have only NAs in some files. As a consequence, it cannot calculate any correlation coefficients (my previous function in the case of only NAs in the file returns "NA" for the correlation coefficient), and so it cannot fill it or make any calculation with it.

Nevertheless in my work I need to keep these NA files in my list (and so to keep their dimensions). Otherwise it creates some dimensions problems, and my function needs to me automatic for every files.

So my question in this post is: how to ignore (or do nothing with them if you prefer) NA files with NA correlation coefficients?
The function for filling files (where there's the problem) is:

na.fill <- function(x, y){        
        i <- is.na(x[1:8700,1])
        xx <- y[1:8700,1]            
        new <- data.frame(xx=xx)      
        x[1:8700,1][i] <- predict(lm(x[1:8700,1]~xx, na.action=na.exclude), new)[i]
        x        
    }

My error message is: Error in model.frame.default(formula = x[1:8700, 1] ~ xx, na.action = na.exclude,  :  : invalid type (NULL) for variable 'xx'

I tried to add in the function:  
ifelse( all(is.null(xx))==TRUE,return(NA),xx)  or
ifelse( all(is.null(xx))==TRUE,return(NULL),xx)

but it still doesn't work.
How can I write that in my function? With NA, NULL or in another way?
Thank you very much for your answers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: how to ignore NA with "NA" or "NULL"

Jeff Newmiller
I find that avoiding using the return() function at all makes my code easier to follow. In your case it is simply incorrect, though, since ifelse is a vector function and return is a control flow function.

Your code is not reproducible and your description isn't clear about how you are handling the return result from this function, so I can't be sure what you are really asking, but I suspect you just want flow control, so use (untested):

na.fill <- function(x, y){        
  i <- is.na(x[1:8700,1])
  xx <- y[1:8700,1]            
  new <- data.frame(xx=xx)
  if ( !all(is.na(xx)) ) {
   x[1:8700,1][i] <- predict(lm(x[1:8700,1]~xx, na.action=na.exclude),new)[i]
  }
  x
}
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.

jeff6868 <[hidden email]> wrote:

>Hello dear R-users,
>
>I have a problem in my code about ignoring NA values without removing
>them.
>I'm working on a list of files. The aim is to fill one file from
>another
>according to the highest correlation (correlation coeff between all my
>files, so the file which looks like the most to the one I want to
>fill).
>When I have just small gaps of NA, my function works well.
>The problem is when I have only NAs in some files. As a consequence, it
>cannot calculate any correlation coefficients (my previous function in
>the
>case of only NAs in the file returns "NA" for the correlation
>coefficient),
>and so it cannot fill it or make any calculation with it.
>
>Nevertheless in my work I need to keep these NA files in my list (and
>so to
>keep their dimensions). Otherwise it creates some dimensions problems,
>and
>my function needs to me automatic for every files.
>
>So my question in this post is: how to ignore (or do nothing with them
>if
>you prefer) NA files with NA correlation coefficients?
>The function for filling files (where there's the problem) is:
>
>na.fill <- function(x, y){        
>        i <- is.na(x[1:8700,1])
>        xx <- y[1:8700,1]            
>        new <- data.frame(xx=xx)      
>    x[1:8700,1][i] <- predict(lm(x[1:8700,1]~xx, na.action=na.exclude),
>new)[i]
>        x        
>    }
>
>My error message is: Error in model.frame.default(formula = x[1:8700,
>1] ~
>xx, na.action = na.exclude,  :  : invalid type (NULL) for variable 'xx'
>
>I tried to add in the function:  
>ifelse( all(is.null(xx))==TRUE,return(NA),xx)  or
>ifelse( all(is.null(xx))==TRUE,return(NULL),xx)
>
>but it still doesn't work.
>How can I write that in my function? With NA, NULL or in another way?
>Thank you very much for your answers
>
>
>--
>View this message in context:
>http://r.789695.n4.nabble.com/how-to-ignore-NA-with-NA-or-NULL-tp4632287.html
>Sent from the R help mailing list archive at Nabble.com.
>
>______________________________________________
>[hidden email] mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: how to ignore NA with "NA" or "NULL"

Rui Barradas
In reply to this post by jeff6868
Hello,

'ifelse' is vectorized, what you want is the plain 'if'.

if(all(is.na(xx))) return(NA)

Hope this helps,

Rui Barradas

Em 04-06-2012 09:56, jeff6868 escreveu:

> Hello dear R-users,
>
> I have a problem in my code about ignoring NA values without removing them.
> I'm working on a list of files. The aim is to fill one file from another
> according to the highest correlation (correlation coeff between all my
> files, so the file which looks like the most to the one I want to fill).
> When I have just small gaps of NA, my function works well.
> The problem is when I have only NAs in some files. As a consequence, it
> cannot calculate any correlation coefficients (my previous function in the
> case of only NAs in the file returns "NA" for the correlation coefficient),
> and so it cannot fill it or make any calculation with it.
>
> Nevertheless in my work I need to keep these NA files in my list (and so to
> keep their dimensions). Otherwise it creates some dimensions problems, and
> my function needs to me automatic for every files.
>
> So my question in this post is: how to ignore (or do nothing with them if
> you prefer) NA files with NA correlation coefficients?
> The function for filling files (where there's the problem) is:
>
> na.fill<- function(x, y){
>          i<- is.na(x[1:8700,1])
>          xx<- y[1:8700,1]
>          new<- data.frame(xx=xx)
>          x[1:8700,1][i]<- predict(lm(x[1:8700,1]~xx, na.action=na.exclude),
> new)[i]
>          x
>      }
>
> My error message is: Error in model.frame.default(formula = x[1:8700, 1] ~
> xx, na.action = na.exclude,  :  : invalid type (NULL) for variable 'xx'
>
> I tried to add in the function:
> ifelse( all(is.null(xx))==TRUE,return(NA),xx)  or
> ifelse( all(is.null(xx))==TRUE,return(NULL),xx)
>
> but it still doesn't work.
> How can I write that in my function? With NA, NULL or in another way?
> Thank you very much for your answers
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/how-to-ignore-NA-with-NA-or-NULL-tp4632287.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: how to ignore NA with "NA" or "NULL"

jeff6868
In reply to this post by Jeff Newmiller
Thanks for answering Jeff.
Yes sorry it's not easy to explain my problem. I'll try to give you a reproductible example (even if it'll not be exactly like my files), and I'll try to explain my function and what I want to do more precisely.

Imagine for the example: df1, df2 and df3 are my files:
df1 <- data.frame(x1=c(rnorm(1:5),NA,NA,rnorm(8:10)))
df2 <- data.frame(x2=rnorm(1:10))
df3 <- data.frame(x3=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA))
df <- list(df1,df2,df3)

I want to fill each NA gaps of my files. If I have only df1 and df2 in my list, it'll work. If I introduce df3 (a file with only NAs), R won't understand what to do.

In my function:

na.fill <- function(x, y){
        i <- is.na(x[1:10,1])
        xx <- y[1:10,1]
        new <- data.frame(xx=xx)        
        x[1:10,1][i] <- predict(lm(x[1:10,1]~xx, na.action=na.exclude), new)[i]
        x
    }

x is the file I want to fill. So "i" lists all the NA gaps of the file.
xx is the file that will be used to fill x (actually the best correlated file with x according to all my files).
And then I apply a linear regression between my 2 files: "x" and "xx" to take predicted values from xx to put in the gaps of x.

Before I got files containing only NAs, it was working well. But since I introduced some files with no data and so only NAs, I have my problem.
I got different NA problems when I tried a few solutions:
Error in model.frame.default(formula = x[1:8700,1] ~xx, na.action = na.exclude,  :  : invalid type (NULL) for variable 'xx'     OR
0 (non-NA) cases     OR
is.na() applied to non-(list or vector) of type 'NULL

Actually I'm looking for a solution in na.fill to avoid these problems, in order to ignore these "only NA files" from the calculation (maybe something like na.pass) but I would like to keep them in the list. So the aim would be maybe to keep them unchanged (if I have for example ST1 file with 30 only NA in input, I want to have ST1 file with 30 only NA in output) but calculation should work with these kinds of files in my list even if the code does nothing with them.

Hope you've understood. Thanks again for your help.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: how to ignore NA with "NA" or "NULL"

jeff6868
In reply to this post by Rui Barradas
Hello Rui,

Sorry I read your post after having answered to jeff.

If seems effectively to be better than ifelse, thanks. But I still have some errors:
Error in x[1:8700, 1] : incorrect number of dimensions AND
In is.na(xx) : is.na() applied to non-(list or vector) of type 'NULL

It seems to have modified the length of my data, due to these NAs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: how to ignore NA with "NA" or "NULL"

Rui Barradas
In reply to this post by Rui Barradas
Hello again,

The complete function would be

na.fill <- function(x, y){
     # do this immediatly, may save copying
     if(all(is.na(y[1:8700,1]))) return(NA)
     i <- is.na(x[1:8700,1])
     xx <- y[1:8700,1]
     new <- data.frame(xx=xx)
     x[1:8700,1][i] <- predict(lm(x[1:8700,1]~xx, na.action=na.exclude),
new)[i]
     x
}

Rui Barradas

Em 04-06-2012 16:05, Rui Barradas escreveu:

> Hello,
>
> 'ifelse' is vectorized, what you want is the plain 'if'.
>
> if(all(is.na(xx))) return(NA)
>
> Hope this helps,
>
> Rui Barradas
>
> Em 04-06-2012 09:56, jeff6868 escreveu:
>> Hello dear R-users,
>>
>> I have a problem in my code about ignoring NA values without removing
>> them.
>> I'm working on a list of files. The aim is to fill one file from another
>> according to the highest correlation (correlation coeff between all my
>> files, so the file which looks like the most to the one I want to fill).
>> When I have just small gaps of NA, my function works well.
>> The problem is when I have only NAs in some files. As a consequence, it
>> cannot calculate any correlation coefficients (my previous function
>> in the
>> case of only NAs in the file returns "NA" for the correlation
>> coefficient),
>> and so it cannot fill it or make any calculation with it.
>>
>> Nevertheless in my work I need to keep these NA files in my list (and
>> so to
>> keep their dimensions). Otherwise it creates some dimensions
>> problems, and
>> my function needs to me automatic for every files.
>>
>> So my question in this post is: how to ignore (or do nothing with
>> them if
>> you prefer) NA files with NA correlation coefficients?
>> The function for filling files (where there's the problem) is:
>>
>> na.fill<- function(x, y){
>>          i<- is.na(x[1:8700,1])
>>          xx<- y[1:8700,1]
>>          new<- data.frame(xx=xx)
>>          x[1:8700,1][i]<- predict(lm(x[1:8700,1]~xx,
>> na.action=na.exclude),
>> new)[i]
>>          x
>>      }
>>
>> My error message is: Error in model.frame.default(formula = x[1:8700,
>> 1] ~
>> xx, na.action = na.exclude,  :  : invalid type (NULL) for variable 'xx'
>>
>> I tried to add in the function:
>> ifelse( all(is.null(xx))==TRUE,return(NA),xx)  or
>> ifelse( all(is.null(xx))==TRUE,return(NULL),xx)
>>
>> but it still doesn't work.
>> How can I write that in my function? With NA, NULL or in another way?
>> Thank you very much for your answers
>>
>>
>> --
>> View this message in context:
>> http://r.789695.n4.nabble.com/how-to-ignore-NA-with-NA-or-NULL-tp4632287.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: how to ignore NA with "NA" or "NULL"

jeff6868
Thanks again but my errors are still here. Is it maybe coming from the next fonction (I combinate these 2 functions but I thought it was coming from the first one):

process.all <- function(df.list, mat){
       
        f <- function(station)
             na.fill(df.list[[ station ]], df.list[[ max.cor[station] ]])
                         
        g <- function(station){
        x <- df.list[[station]]
        if(any(is.na(x[1:8700,1]))){
            mat[row(mat) == col(mat)] <- -Inf
            nas <- which(is.na(x[1:8700,1]))
            ord <- order(mat[station, ], decreasing = TRUE)[-c(1, ncol(mat))]
            for(y in ord){                          
                if(all(!is.na(df.list[[y]][1:8700,1][nas]))){
                    xx <- df.list[[y]][1:8700,1]
                    new <- data.frame(xx=xx)
                    x[1:8700,1][nas] <- predict(lm(x[1:8700,1]~xx, na.action=na.exclude), new)[nas]
                    break
                }
            }
        }
        x
    }            
       
        n <- length(df.list)
        nms <- names(df.list)
        max.cor <- sapply(seq.int(n), get.max.cor, corhiver2008capt1)
        df.list <- lapply(seq.int(n), f)
        df.list <- lapply(seq.int(n), g)
        names(df.list) <- nms
        df.list
    }

    refill <- process.all(lst, corhiver2008capt1)
    refill <- as.data.frame(refill)

The error is when "refill" is created. It applies "process.all" in which "na.fill" is also used. Do you see perhaps any error or missing code which could create this NA problem when I introduce "only NAs" files?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: how to ignore NA with "NA" or "NULL"

Rui Barradas
Hello,

I believe the error is in function 'g'. If I'm right, follow these steps

1. Just before the first if include
flag <- TRUE
2. Just before for(y in ord) include
flag <- FALSE
3. Just before break include
flag <- TRUE
3. Change the return value form simply x to
if(flag) x else NA


The code loops through the ordered matrix until it finds no NAs in the
respective df.list element. Nothing guarantees that there are such list
elements. The changes above check it by setting a flag.

Rui Barradas

Em 05-06-2012 10:54, jeff6868 escreveu:

> Thanks again but my errors are still here. Is it maybe coming from the next
> fonction (I combinate these 2 functions but I thought it was coming from the
> first one):
>
> process.all<- function(df.list, mat){
>
>          f<- function(station)
>               na.fill(df.list[[ station ]], df.list[[ max.cor[station] ]])
>
>          g<- function(station){
>          x<- df.list[[station]]
>          if(any(is.na(x[1:8700,1]))){
>              mat[row(mat) == col(mat)]<- -Inf
>              nas<- which(is.na(x[1:8700,1]))
>              ord<- order(mat[station, ], decreasing = TRUE)[-c(1,
> ncol(mat))]
>              for(y in ord){
>                  if(all(!is.na(df.list[[y]][1:8700,1][nas]))){
>                      xx<- df.list[[y]][1:8700,1]
>                      new<- data.frame(xx=xx)
>                      x[1:8700,1][nas]<- predict(lm(x[1:8700,1]~xx,
> na.action=na.exclude), new)[nas]
>                      break
>                  }
>              }
>          }
>          x
>      }
>
>          n<- length(df.list)
>          nms<- names(df.list)
>          max.cor<- sapply(seq.int(n), get.max.cor, corhiver2008capt1)
>          df.list<- lapply(seq.int(n), f)
>          df.list<- lapply(seq.int(n), g)
>          names(df.list)<- nms
>          df.list
>      }
>
>      refill<- process.all(lst, corhiver2008capt1)
>      refill<- as.data.frame(refill)
>
> The error is when "refill" is created. It applies "process.all" in which
> "na.fill" is also used. Do you see perhaps any error or missing code which
> could create this NA problem when I introduce "only NAs" files?
>
> --
> View this message in context: http://r.789695.n4.nabble.com/how-to-ignore-NA-with-NA-or-NULL-tp4632287p4632388.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: how to ignore NA with "NA" or "NULL"

jeff6868
This post was updated on .
Hello,

I added your flags in my code but there are still errors.
Actually I tried some things:

- in function "na.fill", I changed:
if(all(!is.na(y[1:8700,1])))  return(NA)  to if(all(!is.finite(y[1:8700,1])))  return(y)
In order to have this file unchanged.

It has removed my dimension problem. I don't have errors anymore in:
 refill <- process.all(lst, corhiver2008capt1) but  just some "message d'avis" readable with warnings()

Then I noticed in "refill" (the object which should be filled with my code) that files containing only NAs are turned as NULL in this object. So I have 0 rows for these objects instead of having them unchanged (35000 rows).
So when I transform it to data.frame, it doesn't work because of a new dimension problem due to these NULL files.

But I don't understand where these files have been turned as NULL in my code. I know that as I'm working on a list:
lst <- lapply(list.files(pattern="\\_2008_nettoye.csv$"), read.table,sep=";", header=TRUE, stringsAsFactors=FALSE)

dim(lst) is NULL but all my data is correctly inside and it was working before I introduce some NA files.

Could you maybe tell me how can I have in output my "only NA files" not as NULL but kept unchanged like at the beginning?

Thanks again.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: how to ignore NA with "NA" or "NULL"

Jeff Newmiller
Please read the posting guide mentioned at the bottom of every message.

You might also benefit from reading http://stackoverflow.com/questions/5963269/how-to-make-a-great-reproducible-example. We would certainly benefit from not having to guess what problems you are really encountering.

Also, it seems that you refer to in-memory data as "files"... this is imprecise and confusing. Learn to use the str() function to know what kinds of objects you are referring to... in this case I believe you are referring to data frames.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.

jeff6868 <[hidden email]> wrote:

>Hello,
>
>I added your flags in my code but there are still errors.
>Actually I tried some things:
>
>- in function "na.fill", I changed:
>if(all(!is.na(y[1:8700,1])))  return(NA)  to
>if(all(!is.finite(y[1:8700,1])))  return(y)
>In order to have this file unchanged.
>
>It has removed my dimension problem. I don't have errors anymore in:
> refill <- process.all(lst, corhiver2008capt1) but  just some "message
>d'avis" readable with warnings()
>
>Then I noticed in "refill" (the object which should be filled with my
>code)
>that files containing only NAs are turned as NULL in this object. So I
>have
>0 rows for these objects instead of having them unchanged (35000 rows).
>So when I transform it to data.frame, it doesn't work because of a new
>dimension problem due to these NULL files.
>
>But I don't understand where these files have been turned as NULL in my
>code. Could you maybe tell me how can I have in output my "only NA
>files"
>not as NULL but kept unchanged like at the beginning?
>Thanks again.
>
>
>
>--
>View this message in context:
>http://r.789695.n4.nabble.com/how-to-ignore-NA-with-NA-or-NULL-tp4632287p4632506.html
>Sent from the R help mailing list archive at Nabble.com.
>
>______________________________________________
>[hidden email] mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: how to ignore NA with "NA" or "NULL"

jeff6868
Ok Jeff, but then it'll be a big one. I'm working on a list of files and my problem depends on different functions used previously. So it's very hard for me to summarize to reproduct my error. But here is the reproductible example with the error at the last line of the code (just copy and paste it).
You'll notice that the data.frame with only NAs is set to NULL in "refill", and I just want to have it unchanged in output (so the same as input).
The aim of the function is to fill the NAs of my data.frames. It'll not work in this example because there're only big NA gaps which are my problem for the moment. But maybe now you can have an idea where the problem is (change NULL for "only NA DF" in output to the same DF as in input).
For the example, we are just testing for "x1".
Hope you have understood my problem now :)
Thanks Jeff, Rui or everyone else!

# my data for example
DF1 <- data.frame(x1=rnorm(1:20),x2=c(31:50))
write.table(DF1,"ST001_2008.csv",sep=";")
DF2 <- data.frame(x1=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,rnorm(1:10)),x2=c(1:20))
write.table(DF2,"ST002_2008.csv",sep=";")
DF3 <- data.frame(x1=rnorm(81:100),x2=NA)
write.table(DF3,"ST003_2008.csv",sep=";")
DF4 <- data.frame(x1=c(21:40),x2=rnorm(1:20))
write.table(DF4,"ST004_2008.csv",sep=";")

    #list my data
    filenames <- list.files(pattern="\\_2008.csv$")

    Sensors <- paste("x", 1:2,sep="")

    Stations <-substr(filenames,1,5)

    nsensors <- length(Sensors)
    nstations <- length(Stations)

    nobs <- nrow(read.table(filenames[1], header=TRUE))

    yr2008 <- array(NA, dim=c(nobs, nsensors, nstations))

    for(i in seq_len(nstations)){
    tmp <- read.table(filenames[i], header=TRUE, sep=";")
    yr2008[ , , i] <- as.matrix(tmp[, Sensors])
    }

    dimnames(yr2008) <- list(seq.int(nobs), Sensors, Stations)

    yr2008capt1hiver<-yr2008[1:10,1,]
    yr2008capt1hiver <- as.data.frame(yr2008capt1hiver)

    #correlation between my data for x1 (for the example)
    corhiver2008capt1 <- cor(yr2008capt1hiver,use="pairwise.complete.obs")

    capt1hiver <- c(1:length(yr2008capt1hiver))

    for(i in 1:length(capt1hiver))
    {
    if(sum(!is.na(yr2008capt1hiver[,capt1hiver[i]]))<(length(yr2008capt1hiver[[capt1hiver[i]]])/2))
    {
         corhiver2008capt1[i,]=NA
         corhiver2008capt1[,i]=NA
      }
    }


    lst <- lapply(list.files(pattern="\\_2008.csv$"), read.table,sep=";", header=TRUE, stringsAsFactors=FALSE)
    names(lst) <- Stations

    # searching the highest correlation for each data.Frame
    get.max.cor <- function(station, mat){
     mat[row(mat) == col(mat)] <- -Inf
     m <- max(mat[station, ],na.rm=TRUE)
     if (is.finite(m)) {return(which( mat[station, ] == m ))}
     else {return(NA)}
    }

    # fill the data.frame with the data.frame which has the highest correlation coefficient
    na.fill <- function(x, y){
     if(all(!is.finite(y[1:10,1])))  return(y)
     i <- is.na(x[1:10,1])
     xx <- y[1:10,1]
     new <- data.frame(xx=xx)
     x[1:10,1][i] <- predict(lm(x[1:10,1]~xx, na.action=na.exclude),new)[i]
     x
    }

    process.all <- function(df.list, mat){

        f <- function(station)
             na.fill(df.list[[ station ]], df.list[[ max.cor[station] ]])

        g <- function(station){
        x <- df.list[[station]]
        if(any(!is.finite(x[1:10,1]))){
            mat[row(mat) == col(mat)] <- -Inf
            nas <- which(is.na(x[1:10,1]))
            ord <- order(mat[station, ], decreasing = TRUE)[-c(1, ncol(mat))]
            for(y in ord){
                if(all(!is.na(df.list[[y]][1:10,1][nas]))){
                    xx <- df.list[[y]][1:10,1]
                    new <- data.frame(xx=xx)
                    x[1:10,1][nas] <- predict(lm(x[1:10,1]~xx, na.action=na.exclude), new)[nas]
                    break
                }
            }
        }
        x
    }

        n <- length(df.list)
        nms <- names(df.list)
        max.cor <- sapply(seq.int(n), get.max.cor, corhiver2008capt1)
        df.list <- lapply(seq.int(n), f)
        df.list <- lapply(seq.int(n), g)
        names(df.list) <- nms
        df.list
    }

    refill <- process.all(lst, corhiver2008capt1)
    refill <- as.data.frame(refill)                                               ########## HERE IS THE PROBLEM ######
    head(refill)
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: how to ignore NA with "NA" or "NULL"

Jeff Newmiller
Still not clear what solution you would consider a success. On the one hand, you said you needed the NULLs, but you want one big data frame also.

Does

refill <- refill[ -which( sapply( refill, is.null ), arr.ind=TRUE ) ) ]
refill <- as.data.frame( refill )

do what you want? If you need to keep the nulls, perhaps don't overwrite the refill list?
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.

jeff6868 <[hidden email]> wrote:

>Ok Jeff, but then it'll be a big one. I'm working on a list of files
>and my
>problem depends on different functions used previously. So it's very
>hard
>for me to summarize to reproduct my error. But here is the
>reproductible
>example with the error at the last line of the code (just copy and
>paste
>it).
>You'll notice that the data.frame with only NAs is set to NULL in
>"refill",
>and I just want to have it unchanged in output (so the same as input).
>The aim of the function is to fill the NAs of my data.frames. It'll not
>work
>in this example because there're only big NA gaps which are my problem
>for
>the moment. But maybe now you can have an idea where the problem is
>(change
>NULL for "only NA DF" in output to the same DF as in input).
>For the example, we are just testing for "x1".
>Hope you have understood my problem now :)
>Thanks Jeff, Rui or everyone else!
>
># my data for example
>DF1 <- data.frame(x1=rnorm(1:20),x2=c(31:50))
>write.table(DF1,"ST001_2008.csv",sep=";")
>DF2 <-
>data.frame(x1=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,rnorm(1:10)),x2=c(1:20))
>write.table(DF2,"ST002_2008.csv",sep=";")
>DF3 <- data.frame(x1=rnorm(81:100),x2=NA)
>write.table(DF3,"ST003_2008.csv",sep=";")
>DF4 <- data.frame(x1=c(21:40),x2=rnorm(1:20))
>write.table(DF4,"ST004_2008.csv",sep=";")
>
>    #list my data
>    filenames <- list.files(pattern="\\_2008.csv$")
>
>    Sensors <- paste("x", 1:2,sep="")
>
>    Stations <-substr(filenames,1,5)
>
>    nsensors <- length(Sensors)
>    nstations <- length(Stations)
>
>    nobs <- nrow(read.table(filenames[1], header=TRUE))
>
>    yr2008 <- array(NA, dim=c(nobs, nsensors, nstations))
>
>    for(i in seq_len(nstations)){
>    tmp <- read.table(filenames[i], header=TRUE, sep=";")
>    yr2008[ , , i] <- as.matrix(tmp[, Sensors])
>    }
>
>    dimnames(yr2008) <- list(seq.int(nobs), Sensors, Stations)
>
>    yr2008capt1hiver<-yr2008[1:10,1,]
>    yr2008capt1hiver <- as.data.frame(yr2008capt1hiver)
>
>    #correlation between my data for x1 (for the example)
> corhiver2008capt1 <- cor(yr2008capt1hiver,use="pairwise.complete.obs")
>
>    capt1hiver <- c(1:length(yr2008capt1hiver))
>
>    for(i in 1:length(capt1hiver))
>    {
>  
>if(sum(!is.na(yr2008capt1hiver[,capt1hiver[i]]))<(length(yr2008capt1hiver[[capt1hiver[i]]])/2))
>    {
>         corhiver2008capt1[i,]=NA
>         corhiver2008capt1[,i]=NA
>      }
>    }
>
>
>  lst <- lapply(list.files(pattern="\\_2008.csv$"), read.table,sep=";",
>header=TRUE, stringsAsFactors=FALSE)
>    names(lst) <- Stations
>
>    # searching the highest correlation for each data.Frame
>    get.max.cor <- function(station, mat){
>     mat[row(mat) == col(mat)] <- -Inf
>     m <- max(mat[station, ],na.rm=TRUE)
>     if (is.finite(m)) {return(which( mat[station, ] == m ))}
>     else {return(NA)}
>    }
>
>    # fill the data.frame with the data.frame which has the highest
>correlation coefficient
>    na.fill <- function(x, y){
>     if(all(!is.finite(y[1:10,1])))  return(y)
>     i <- is.na(x[1:10,1])
>     xx <- y[1:10,1]
>     new <- data.frame(xx=xx)
> x[1:10,1][i] <- predict(lm(x[1:10,1]~xx, na.action=na.exclude),new)[i]
>     x
>    }
>
>    process.all <- function(df.list, mat){
>
>        f <- function(station)
>           na.fill(df.list[[ station ]], df.list[[ max.cor[station] ]])
>
>        g <- function(station){
>        x <- df.list[[station]]
>        if(any(!is.finite(x[1:10,1]))){
>            mat[row(mat) == col(mat)] <- -Inf
>            nas <- which(is.na(x[1:10,1]))
>            ord <- order(mat[station, ], decreasing = TRUE)[-c(1,
>ncol(mat))]
>            for(y in ord){
>                if(all(!is.na(df.list[[y]][1:10,1][nas]))){
>                    xx <- df.list[[y]][1:10,1]
>                    new <- data.frame(xx=xx)
>                    x[1:10,1][nas] <- predict(lm(x[1:10,1]~xx,
>na.action=na.exclude), new)[nas]
>                    break
>                }
>            }
>        }
>        x
>    }
>
>        n <- length(df.list)
>        nms <- names(df.list)
>        max.cor <- sapply(seq.int(n), get.max.cor, corhiver2008capt1)
>        df.list <- lapply(seq.int(n), f)
>        df.list <- lapply(seq.int(n), g)
>        names(df.list) <- nms
>        df.list
>    }
>
>    refill <- process.all(lst, corhiver2008capt1)
>refill <- as.data.frame(refill)                                        
>    
>########## HERE IS THE PROBLEM ######
>    head(refill)
>
>--
>View this message in context:
>http://r.789695.n4.nabble.com/how-to-ignore-NA-with-NA-or-NULL-tp4632287p4632527.html
>Sent from the R help mailing list archive at Nabble.com.
>
>______________________________________________
>[hidden email] mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: how to ignore NA with "NA" or "NULL"

jeff6868
Thanks again for your help jeff.
Sorry if I'm not very clear. It's programmingly speaking hard to explain, and even to explain in english as I'm French.
But i'll try again.

Well your proposition removes the error, but it's not the result I'm expecting. You've removed NULL data.frames, but I need to keep them, well not to keep them but to transform them to something non-NULL actually.

I'll try to show you in a very small and fake exemple what I want results to be:
Imagine these are my 3 input data frames (10 rows each):
ST1 <- data.frame(x1=c(1:10))
ST2 <- data.frame(x2=c(1:5,NA,NA,8:10))
ST3 <- data.frame(x3=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA))

The aim of my code is to fill all the NA of my data.frames with data, according to the correlation coefficient  of my data.frames(for example, if there're NAs in ST1, ST1 must be filled with data from the best correlated file with ST1 (between ST2 and ST3 in this example)).

As ST3 has no data, I cannot have any correlation coefficient. So NAs from ST3 cannot be filled, and ST3 cannot also be used to fill another file. So ST3 has no use if you want. Nevertheless I want to keep ST3 unchanged during all my code.
For the moment my code would give for "refill" this (filled NA in my data.frames):

ST1 <- data.frame(x1=c(1:10))
ST2 <- data.frame(x2=c(1:5,6,7,8:10))
ST3 <- NULL

But actually, I want for results in "refill" this:

ST1 <- data.frame(x1=c(1:10))
ST2 <- data.frame(x2=c(1:5,6,7,8:10))
ST3 <- data.frame(x3=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA))

So for data.frames with only NAs, I don't want them to be NULL in "refill", but I want them to be identical as in input. I need this to have the same dimensions of data.frames between inputs and outputs.
If I set them as NULL (like it is for the moment but I don't understand why and I want to change this), there will be 0 rows in this data.frame instead of 10 rows like the other data.frames.

So I think there's something wrong in my code in function "process.all" or "na.fill" or maybe "lst".
We don't seem to be far from the solution but I still don't find it for the moment.
For information, in function "process.all" and "na.fill": x is the data.frame I want to fill, and y is the file which will be used to fill x (so the best correlated file with x).

I really hope I've been enoughly clear and understandable this time.
Thank you!

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: how to ignore NA with "NA" or "NULL"

Rui Barradas
Hello,

Why don't you test an all(is.na(x)) condition? If TRUE, return(NA), not
NULL.

Rui Barradas

Em 06-06-2012 16:42, jeff6868 escreveu:

> Thanks again for your help jeff.
> Sorry if I'm not very clear. It's programmingly speaking hard to explain,
> and even to explain in english as I'm French.
> But i'll try again.
>
> Well your proposition removes the error, but it's not the result I'm
> expecting. You've removed NULL data.frames, but I need to keep them, well
> not to keep them but to transform them to something non-NULL actually.
>
> I'll try to show you in a very small and fake exemple what I want results to
> be:
> Imagine these are my 3 input data frames (10 rows each):
> ST1 <- data.frame(x1=c(1:10))
> ST2 <- data.frame(x2=c(1:5,NA,NA,8:10))
> ST3 <- data.frame(x3=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA))
>
> The aim of my code is to fill all the NA of my data.frames with data,
> according to the correlation coefficient  of my data.frames(for example, if
> there're NAs in ST1, ST1 must be filled with data from the best correlated
> file with ST1 (between ST2 and ST3 in this example)).
>
> As ST3 has no data, I cannot have any correlation coefficient. So NAs from
> ST3 cannot be filled, and ST3 cannot also be used to fill another file. So
> ST3 has no use if you want. Nevertheless I want to keep ST3 unchanged during
> all my code.
> For the moment my code would give for "refill" this (filled NA in my
> data.frames):
>
> ST1 <- data.frame(x1=c(1:10))
> ST2 <- data.frame(x2=c(1:5,6,7,8:10))
> ST3 <- NULL
>
> But actually, I want for results in "refill" this:
>
> ST1 <- data.frame(x1=c(1:10))
> ST2 <- data.frame(x2=c(1:5,6,7,8:10))
> ST3 <- data.frame(x3=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA))
>
> So for data.frames with only NAs, I don't want them to be NULL in "refill",
> but I want them to be identical as in input. I need this to have the same
> dimensions of data.frames between inputs and outputs.
> If I set them as NULL (like it is for the moment but I don't understand why
> and I want to change this), there will be 0 rows in this data.frame instead
> of 10 rows like the other data.frames.
>
> So I think there's something wrong in my code in function "process.all" or
> "na.fill" or maybe "lst".
> We don't seem to be far from the solution but I still don't find it for the
> moment.
> For information, in function "process.all" and "na.fill": x is the
> data.frame I want to fill, and y is the file which will be used to fill x
> (so the best correlated file with x).
>
> I really hope I've been enoughly clear and understandable this time.
> Thank you!
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/how-to-ignore-NA-with-NA-or-NULL-tp4632287p4632546.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...