Unexpected values obtained when reading in data using ncdf and ncdf4

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Unexpected values obtained when reading in data using ncdf and ncdf4

Louise Mair-2
Dear R Users,

I am encountering a problem when reading nc files into R using the ncdf and ncdf4 libraries. The nc files are too large to attach an example (but if someone is interested in helping out I could send a file privately via an online drive), but the code is basic:

for(i in 1:length(thesenames[,1])){
   data <- nc_open(paste(INDIR, thesenames[i,c("wholename")], sep=""), write=F)
   d.vars <- names(data$var)
   d.size <- (data$var[[length(d.vars)]])$size

   # Obtaining longitude and latitude values
   d.lon <- as.vector(ncvar_get(data, varid="lon", start=c(1,1), count=c(d.size[1],d.size[2])))
   d.lat <- as.vector(ncvar_get(data, varid="lat", start=c(1,1), count=c(d.size[1],d.size[2])))

   # Obtaining climate data values
   df.clim <- data.frame(rn=seq(1:length(d.lon)))
   for(y in 1:d.size[3]){
     df.clim[,1+y] <- as.vector(ncvar_get(data, varid=d.vars[length(d.vars)], start=c(1,1,y), count=c(d.size[1],d.size[2],1)))
      names(df.clim)[1+y] <- paste("y",y,sep="")  }
   tosummarise[,,i] <- as.matrix(df.clim[,-1])
}

The data are temperature or precipitation, across space and time.

For most of the >250 files I have, there are no problems, but for around 8 of these files, I get strange values. The data should be within a relatively narrow range, yet I get values such as -8.246508e+07  or  7.659506e+11. The particularly strange part is that these kind of values occur at regularly spaced intervals across the data, usually within a single time step.

I have the same problem (including the exact same strange values) when using ArcMap, yet the data provider assures me that the data look normal when using CDO (climate data operators) to view them, and that there are no missing values.

I realise this is very difficult to diagnose without the nc files themselves, so my questions are (1) Has anyone encountered something like this before?, (2) Is there something I am failing to specify in the code when reading in?, and (3) Is anyone interested in digging into this and willing to play around with the nc files if I make them available privately?

Thanks very much in advance!
Louise





        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Unexpected values obtained when reading in data using ncdf and ncdf4

David W. Pierce
On Fri, Apr 22, 2016 at 1:32 AM, Louise Mair <[hidden email]> wrote:

> Dear R Users,
>
> I am encountering a problem when reading nc files into R using the ncdf
> and ncdf4 libraries. The nc files are too large to attach an example (but
> if someone is interested in helping out I could send a file privately via
> an online drive), but the code is basic:
>
​[...]​


​Hi Louise,

I'm the author of the ncdf and ncdf4 libraries. What are the details --
what operating system are you running on, what version of R and the netcdf
library are you using?

If you make the files available to me I can take a look.

Regards,

--Dave Pierce






> for(i in 1:length(thesenames[,1])){
>    data <- nc_open(paste(INDIR, thesenames[i,c("wholename")], sep=""),
> write=F)
>    d.vars <- names(data$var)
>    d.size <- (data$var[[length(d.vars)]])$size
>
>    # Obtaining longitude and latitude values
>    d.lon <- as.vector(ncvar_get(data, varid="lon", start=c(1,1),
> count=c(d.size[1],d.size[2])))
>    d.lat <- as.vector(ncvar_get(data, varid="lat", start=c(1,1),
> count=c(d.size[1],d.size[2])))
>
>    # Obtaining climate data values
>    df.clim <- data.frame(rn=seq(1:length(d.lon)))
>    for(y in 1:d.size[3]){
>      df.clim[,1+y] <- as.vector(ncvar_get(data,
> varid=d.vars[length(d.vars)], start=c(1,1,y),
> count=c(d.size[1],d.size[2],1)))
>       names(df.clim)[1+y] <- paste("y",y,sep="")  }
>    tosummarise[,,i] <- as.matrix(df.clim[,-1])
> }
>
> The data are temperature or precipitation, across space and time.
>
> For most of the >250 files I have, there are no problems, but for around 8
> of these files, I get strange values. The data should be within a
> relatively narrow range, yet I get values such as -8.246508e+07  or
> 7.659506e+11. The particularly strange part is that these kind of values
> occur at regularly spaced intervals across the data, usually within a
> single time step.
>
> I have the same problem (including the exact same strange values) when
> using ArcMap, yet the data provider assures me that the data look normal
> when using CDO (climate data operators) to view them, and that there are no
> missing values.
>
> I realise this is very difficult to diagnose without the nc files
> themselves, so my questions are (1) Has anyone encountered something like
> this before?, (2) Is there something I am failing to specify in the code
> when reading in?, and (3) Is anyone interested in digging into this and
> willing to play around with the nc files if I make them available privately?
>
> Thanks very much in advance!
> Louise
>
>
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
David W. Pierce
Division of Climate, Atmospheric Science, and Physical Oceanography
Scripps Institution of Oceanography, La Jolla, California, USA
(858) 534-8276 (voice)  /  (858) 534-8561 (fax)    [hidden email]

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Unexpected values obtained when reading in data using ncdf and ncdf4

Roy Mendelssohn - NOAA Federal
Hi Louise:

If Dave can’t figure it out, I can give a look also.  A couple of things I would suggest:

1.  Don’t use the name “data” in the nc_open command, that is a reserved command in R and you never know what problems that can cause.

2. You are doing calculations to get set the start and count values in the ncvar_get commands, print those values out before you make the calls to make certain they are valid.

HTH,

-Roy

> On Apr 22, 2016, at 8:08 AM, David W. Pierce <[hidden email]> wrote:
>
> On Fri, Apr 22, 2016 at 1:32 AM, Louise Mair <[hidden email]> wrote:
>
>> Dear R Users,
>>
>> I am encountering a problem when reading nc files into R using the ncdf
>> and ncdf4 libraries. The nc files are too large to attach an example (but
>> if someone is interested in helping out I could send a file privately via
>> an online drive), but the code is basic:
>>
> ​[...]​
>
>
> ​Hi Louise,
>
> I'm the author of the ncdf and ncdf4 libraries. What are the details --
> what operating system are you running on, what version of R and the netcdf
> library are you using?
>
> If you make the files available to me I can take a look.
>
> Regards,
>
> --Dave Pierce
> ​
>
>
>
>
>
>> for(i in 1:length(thesenames[,1])){
>>   data <- nc_open(paste(INDIR, thesenames[i,c("wholename")], sep=""),
>> write=F)
>>   d.vars <- names(data$var)
>>   d.size <- (data$var[[length(d.vars)]])$size
>>
>>   # Obtaining longitude and latitude values
>>   d.lon <- as.vector(ncvar_get(data, varid="lon", start=c(1,1),
>> count=c(d.size[1],d.size[2])))
>>   d.lat <- as.vector(ncvar_get(data, varid="lat", start=c(1,1),
>> count=c(d.size[1],d.size[2])))
>>
>>   # Obtaining climate data values
>>   df.clim <- data.frame(rn=seq(1:length(d.lon)))
>>   for(y in 1:d.size[3]){
>>     df.clim[,1+y] <- as.vector(ncvar_get(data,
>> varid=d.vars[length(d.vars)], start=c(1,1,y),
>> count=c(d.size[1],d.size[2],1)))
>>      names(df.clim)[1+y] <- paste("y",y,sep="")  }
>>   tosummarise[,,i] <- as.matrix(df.clim[,-1])
>> }
>>
>> The data are temperature or precipitation, across space and time.
>>
>> For most of the >250 files I have, there are no problems, but for around 8
>> of these files, I get strange values. The data should be within a
>> relatively narrow range, yet I get values such as -8.246508e+07  or
>> 7.659506e+11. The particularly strange part is that these kind of values
>> occur at regularly spaced intervals across the data, usually within a
>> single time step.
>>
>> I have the same problem (including the exact same strange values) when
>> using ArcMap, yet the data provider assures me that the data look normal
>> when using CDO (climate data operators) to view them, and that there are no
>> missing values.
>>
>> I realise this is very difficult to diagnose without the nc files
>> themselves, so my questions are (1) Has anyone encountered something like
>> this before?, (2) Is there something I am failing to specify in the code
>> when reading in?, and (3) Is anyone interested in digging into this and
>> willing to play around with the nc files if I make them available privately?
>>
>> Thanks very much in advance!
>> Louise
>>
>>
>>
>>
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> David W. Pierce
> Division of Climate, Atmospheric Science, and Physical Oceanography
> Scripps Institution of Oceanography, La Jolla, California, USA
> (858) 534-8276 (voice)  /  (858) 534-8561 (fax)    [hidden email]
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

**********************
"The contents of this message do not reflect any position of the U.S. Government or NOAA."
**********************
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
***Note new address and phone***
110 Shaffer Road
Santa Cruz, CA 95060
Phone: (831)-420-3666
Fax: (831) 420-3980
e-mail: [hidden email] www: http://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected"
"the arc of the moral universe is long, but it bends toward justice" -MLK Jr.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Unexpected values obtained when reading in data using ncdf and ncdf4

Louise Mair-2
Hi Dave, Roy and R-users,

Many thanks for your suggestions - in later correspondence Dave suggested that I ask the data provider to run a md5 checksum on the problem files, and compare their results against a md5 checksum on my copies of the files. Having done this, I found that the results didn't match (which they should do if the files were identical), and so this indicated that some corruption must have occurred during the file transfer.

Unfortunately we haven't discovered the source of the problem, but it was very helpful to learn how to compare files and identify the problem, so thanks very much for your help!

Best wishes,
Louise




-----Original Message-----
From: Roy Mendelssohn - NOAA Federal [mailto:[hidden email]]
Sent: den 22 april 2016 17:31
To: Louise Mair
Cc: [hidden email]; David W. Pierce
Subject: Re: [R] Unexpected values obtained when reading in data using ncdf and ncdf4

Hi Louise:

If Dave can’t figure it out, I can give a look also.  A couple of things I would suggest:

1.  Don’t use the name “data” in the nc_open command, that is a reserved command in R and you never know what problems that can cause.

2. You are doing calculations to get set the start and count values in the ncvar_get commands, print those values out before you make the calls to make certain they are valid.

HTH,

-Roy

> On Apr 22, 2016, at 8:08 AM, David W. Pierce <[hidden email]> wrote:
>
> On Fri, Apr 22, 2016 at 1:32 AM, Louise Mair <[hidden email]> wrote:
>
>> Dear R Users,
>>
>> I am encountering a problem when reading nc files into R using the
>> ncdf and ncdf4 libraries. The nc files are too large to attach an
>> example (but if someone is interested in helping out I could send a
>> file privately via an online drive), but the code is basic:
>>
> ​[...]​
>
>
> ​Hi Louise,
>
> I'm the author of the ncdf and ncdf4 libraries. What are the details
> -- what operating system are you running on, what version of R and the
> netcdf library are you using?
>
> If you make the files available to me I can take a look.
>
> Regards,
>
> --Dave Pierce
> ​
>
>
>
>
>
>> for(i in 1:length(thesenames[,1])){
>>   data <- nc_open(paste(INDIR, thesenames[i,c("wholename")], sep=""),
>> write=F)
>>   d.vars <- names(data$var)
>>   d.size <- (data$var[[length(d.vars)]])$size
>>
>>   # Obtaining longitude and latitude values
>>   d.lon <- as.vector(ncvar_get(data, varid="lon", start=c(1,1),
>> count=c(d.size[1],d.size[2])))
>>   d.lat <- as.vector(ncvar_get(data, varid="lat", start=c(1,1),
>> count=c(d.size[1],d.size[2])))
>>
>>   # Obtaining climate data values
>>   df.clim <- data.frame(rn=seq(1:length(d.lon)))
>>   for(y in 1:d.size[3]){
>>     df.clim[,1+y] <- as.vector(ncvar_get(data,
>> varid=d.vars[length(d.vars)], start=c(1,1,y),
>> count=c(d.size[1],d.size[2],1)))
>>      names(df.clim)[1+y] <- paste("y",y,sep="")  }
>>   tosummarise[,,i] <- as.matrix(df.clim[,-1]) }
>>
>> The data are temperature or precipitation, across space and time.
>>
>> For most of the >250 files I have, there are no problems, but for
>> around 8 of these files, I get strange values. The data should be
>> within a relatively narrow range, yet I get values such as
>> -8.246508e+07  or 7.659506e+11. The particularly strange part is that
>> these kind of values occur at regularly spaced intervals across the
>> data, usually within a single time step.
>>
>> I have the same problem (including the exact same strange values)
>> when using ArcMap, yet the data provider assures me that the data
>> look normal when using CDO (climate data operators) to view them, and
>> that there are no missing values.
>>
>> I realise this is very difficult to diagnose without the nc files
>> themselves, so my questions are (1) Has anyone encountered something
>> like this before?, (2) Is there something I am failing to specify in
>> the code when reading in?, and (3) Is anyone interested in digging
>> into this and willing to play around with the nc files if I make them available privately?
>>
>> Thanks very much in advance!
>> Louise
>>
>>
>>
>>
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> David W. Pierce
> Division of Climate, Atmospheric Science, and Physical Oceanography
> Scripps Institution of Oceanography, La Jolla, California, USA
> (858) 534-8276 (voice)  /  (858) 534-8561 (fax)    [hidden email]
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

**********************
"The contents of this message do not reflect any position of the U.S. Government or NOAA."
**********************
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
***Note new address and phone***
110 Shaffer Road
Santa Cruz, CA 95060
Phone: (831)-420-3666
Fax: (831) 420-3980
e-mail: [hidden email] www: http://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected"
"the arc of the moral universe is long, but it bends toward justice" -MLK Jr.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.