Read_fwf in package readr, double vs. numeric

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Read_fwf in package readr, double vs. numeric

Doran, Harold
Suppose I have the following data sitting in a fwf file 'foo.txt'. The point of this email is to ask the group how to properly read in the value in this pseudo-data "1e-20" using the read_fwf function in the package readr.

11e-201043
1712201043
1912201055

First, suppose I do it this way, where in this case "D" is used for double precision.

library(readr)
pos <- fwf_positions(c(1,2,7), c(1,6,10))
type <- c('N','D','N')
types <- paste0(type, collapse = '')
types <- chartr('NCD', 'ncd', types)  

read_fwf(file = myFile, col_positions = pos, col_types = types)

# A tibble: 3 x 3
     X1       X2    X3
  <dbl>    <dbl> <dbl>
1     1 1.00e-20  1043
2     1 7.12e+ 4  1043
3     1 9.12e+ 4  1055

This seemingly works well and properly captures the value. However, if I instead were to indicate to the function that *all* of my columns were numeric (just insert this one line in lieu of the other above)

type <- c('N','N','N')

# A tibble: 3 x 3
     X1    X2    X3
  <dbl> <dbl> <dbl>
1     1     1  1043
2     1 71220  1043
3     1 91220  1055

The read in is not correct. Here is the pragmatic issue. I have a legacy program that spits out the layout structure of the fwf file (start, end positions) and also indicates what the column types are. This layout file we receive always uses a column type of numeric (N) for any numeric types (including the column holding values such as 1e-20).

This layout file will not change so I need to figure out how to solve the problem within my read in program. I suppose one option is that I could manually change any values of "N" to "D" in my R code. That seems to work. But not sure if that is the "right" way to solve this issue.

Thanks
Harold

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Read_fwf in package readr, double vs. numeric

Sarah Goslee
Hi,

I can't reproduce your problem: with readr 1.1.1 on linux, it works as
expected. Letting read_fwf guess the types also works fine. (See
below.)

If you aren't running the current version of readr, update and retry.
If you are, then we probably need more info, at least sessionInfo().

Sarah



library(readr)
myFile <- "foo.txt"
pos <- fwf_positions(c(1,2,7), c(1,6,10))


type <- c('N','D','N')
types <- paste0(type, collapse = '')
types <- chartr('NCD', 'ncd', types)
read_fwf(file = myFile, col_positions = pos, col_types = types)

# A tibble: 3 x 3
     X1       X2    X3
  <dbl>    <dbl> <dbl>
1     1 1.00e-20  1043
2     1 7.12e+ 4  1043
3     1 9.12e+ 4  1055


type <- c('N','N','N')
types <- paste0(type, collapse = '')
types <- chartr('NCD', 'ncd', types)
read_fwf(file = myFile, col_positions = pos, col_types = types)

# A tibble: 3 x 3
     X1       X2    X3
  <dbl>    <dbl> <dbl>
1     1 1.00e-20  1043
2     1 7.12e+ 4  1043
3     1 9.12e+ 4  1055



> read_fwf(file = myFile, col_positions = pos, col_types = NULL)
Parsed with column specification:
cols(
  X1 = col_double(),
  X2 = col_double(),
  X3 = col_double()
)
# A tibble: 3 x 3
     X1       X2    X3
  <dbl>    <dbl> <dbl>
1     1 1.00e-20  1043
2     1 7.12e+ 4  1043
3     1 9.12e+ 4  1055




> sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora 28 (Workstation Edition)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] readr_1.3.1    colorout_1.2-0

loaded via a namespace (and not attached):
 [1] compiler_3.5.3   assertthat_0.2.0 R6_2.4.0         cli_1.0.1
 [5] hms_0.4.2        tools_3.5.3      pillar_1.3.1     tibble_2.0.1
 [9] Rcpp_1.0.0       crayon_1.3.4     utf8_1.1.4       fansi_0.4.0
[13] pkgconfig_2.0.2  rlang_0.3.1


On Wed, Apr 24, 2019 at 10:56 AM Doran, Harold <[hidden email]> wrote:

>
> Suppose I have the following data sitting in a fwf file 'foo.txt'. The point of this email is to ask the group how to properly read in the value in this pseudo-data "1e-20" using the read_fwf function in the package readr.
>
> 11e-201043
> 1712201043
> 1912201055
>
> First, suppose I do it this way, where in this case "D" is used for double precision.
>
> library(readr)
> pos <- fwf_positions(c(1,2,7), c(1,6,10))
> type <- c('N','D','N')
> types <- paste0(type, collapse = '')
> types <- chartr('NCD', 'ncd', types)
>
> read_fwf(file = myFile, col_positions = pos, col_types = types)
>
> # A tibble: 3 x 3
>      X1       X2    X3
>   <dbl>    <dbl> <dbl>
> 1     1 1.00e-20  1043
> 2     1 7.12e+ 4  1043
> 3     1 9.12e+ 4  1055
>
> This seemingly works well and properly captures the value. However, if I instead were to indicate to the function that *all* of my columns were numeric (just insert this one line in lieu of the other above)
>
> type <- c('N','N','N')
>
> # A tibble: 3 x 3
>      X1    X2    X3
>   <dbl> <dbl> <dbl>
> 1     1     1  1043
> 2     1 71220  1043
> 3     1 91220  1055
>
> The read in is not correct. Here is the pragmatic issue. I have a legacy program that spits out the layout structure of the fwf file (start, end positions) and also indicates what the column types are. This layout file we receive always uses a column type of numeric (N) for any numeric types (including the column holding values such as 1e-20).
>
> This layout file will not change so I need to figure out how to solve the problem within my read in program. I suppose one option is that I could manually change any values of "N" to "D" in my R code. That seems to work. But not sure if that is the "right" way to solve this issue.
>
> Thanks
> Harold
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Sarah Goslee (she/her)
http://www.numberwright.com

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Read_fwf in package readr, double vs. numeric

Doran, Harold
Thank you, Sarah. Seems that updating to a newer version does indeed solve that problem. For completeness, below is the version in which it seems to work properly and below is the version in which I observe the problem I described.

> sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252  
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base    

other attached packages:
[1] readr_1.3.1

loaded via a namespace (and not attached):
 [1] compiler_3.5.3   assertthat_0.2.1 R6_2.4.0         cli_1.1.0        hms_0.4.2      
 [6] tools_3.5.3      pillar_1.3.1     tibble_2.1.1     Rcpp_1.0.1       crayon_1.3.4    
[11] utf8_1.1.4       fansi_0.4.0      pkgconfig_2.0.2  rlang_0.3.4    

> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252  
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base    

other attached packages:
[1] readr_1.1.1

loaded via a namespace (and not attached):
 [1] compiler_3.4.2   assertthat_0.2.0 R6_2.2.2         cli_1.0.0        hms_0.3          tools_3.4.2    
 [7] pillar_1.3.0     tibble_1.4.2     Rcpp_1.0.0       crayon_1.3.4     utf8_1.1.4       fansi_0.2.3    
[13] rlang_0.3.0.1    

-----Original Message-----
From: Sarah Goslee <[hidden email]>
Sent: Wednesday, April 24, 2019 11:12 AM
To: Doran, Harold <[hidden email]>
Cc: [hidden email]
Subject: Re: [R] Read_fwf in package readr, double vs. numeric

Hi,

I can't reproduce your problem: with readr 1.1.1 on linux, it works as expected. Letting read_fwf guess the types also works fine. (See
below.)

If you aren't running the current version of readr, update and retry.
If you are, then we probably need more info, at least sessionInfo().

Sarah



library(readr)
myFile <- "foo.txt"
pos <- fwf_positions(c(1,2,7), c(1,6,10))


type <- c('N','D','N')
types <- paste0(type, collapse = '')
types <- chartr('NCD', 'ncd', types)
read_fwf(file = myFile, col_positions = pos, col_types = types)

# A tibble: 3 x 3
     X1       X2    X3
  <dbl>    <dbl> <dbl>
1     1 1.00e-20  1043
2     1 7.12e+ 4  1043
3     1 9.12e+ 4  1055


type <- c('N','N','N')
types <- paste0(type, collapse = '')
types <- chartr('NCD', 'ncd', types)
read_fwf(file = myFile, col_positions = pos, col_types = types)

# A tibble: 3 x 3
     X1       X2    X3
  <dbl>    <dbl> <dbl>
1     1 1.00e-20  1043
2     1 7.12e+ 4  1043
3     1 9.12e+ 4  1055



> read_fwf(file = myFile, col_positions = pos, col_types = NULL)
Parsed with column specification:
cols(
  X1 = col_double(),
  X2 = col_double(),
  X3 = col_double()
)
# A tibble: 3 x 3
     X1       X2    X3
  <dbl>    <dbl> <dbl>
1     1 1.00e-20  1043
2     1 7.12e+ 4  1043
3     1 9.12e+ 4  1055




> sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-redhat-linux-gnu (64-bit) Running under: Fedora 28 (Workstation Edition)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] readr_1.3.1    colorout_1.2-0

loaded via a namespace (and not attached):
 [1] compiler_3.5.3   assertthat_0.2.0 R6_2.4.0         cli_1.0.1
 [5] hms_0.4.2        tools_3.5.3      pillar_1.3.1     tibble_2.0.1
 [9] Rcpp_1.0.0       crayon_1.3.4     utf8_1.1.4       fansi_0.4.0
[13] pkgconfig_2.0.2  rlang_0.3.1


On Wed, Apr 24, 2019 at 10:56 AM Doran, Harold <[hidden email]> wrote:

>
> Suppose I have the following data sitting in a fwf file 'foo.txt'. The point of this email is to ask the group how to properly read in the value in this pseudo-data "1e-20" using the read_fwf function in the package readr.
>
> 11e-201043
> 1712201043
> 1912201055
>
> First, suppose I do it this way, where in this case "D" is used for double precision.
>
> library(readr)
> pos <- fwf_positions(c(1,2,7), c(1,6,10)) type <- c('N','D','N') types
> <- paste0(type, collapse = '') types <- chartr('NCD', 'ncd', types)
>
> read_fwf(file = myFile, col_positions = pos, col_types = types)
>
> # A tibble: 3 x 3
>      X1       X2    X3
>   <dbl>    <dbl> <dbl>
> 1     1 1.00e-20  1043
> 2     1 7.12e+ 4  1043
> 3     1 9.12e+ 4  1055
>
> This seemingly works well and properly captures the value. However, if
> I instead were to indicate to the function that *all* of my columns
> were numeric (just insert this one line in lieu of the other above)
>
> type <- c('N','N','N')
>
> # A tibble: 3 x 3
>      X1    X2    X3
>   <dbl> <dbl> <dbl>
> 1     1     1  1043
> 2     1 71220  1043
> 3     1 91220  1055
>
> The read in is not correct. Here is the pragmatic issue. I have a legacy program that spits out the layout structure of the fwf file (start, end positions) and also indicates what the column types are. This layout file we receive always uses a column type of numeric (N) for any numeric types (including the column holding values such as 1e-20).
>
> This layout file will not change so I need to figure out how to solve the problem within my read in program. I suppose one option is that I could manually change any values of "N" to "D" in my R code. That seems to work. But not sure if that is the "right" way to solve this issue.
>
> Thanks
> Harold
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Sarah Goslee (she/her)
http://www.numberwright.com

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Read_fwf in package readr, double vs. numeric

Sarah Goslee
And just for thoroughness, I meant that it works in readr 1.3.1, as my
sessionInfo (but not what I typed myself) said. Sorry for the typo,
but I'm glad it solved your problem nonetheless.

Sarah

On Wed, Apr 24, 2019 at 11:38 AM Doran, Harold <[hidden email]> wrote:

>
> Thank you, Sarah. Seems that updating to a newer version does indeed solve that problem. For completeness, below is the version in which it seems to work properly and below is the version in which I observe the problem I described.
>
> > sessionInfo()
> R version 3.5.3 (2019-03-11)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 7 x64 (build 7601) Service Pack 1
>
> Matrix products: default
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] readr_1.3.1
>
> loaded via a namespace (and not attached):
>  [1] compiler_3.5.3   assertthat_0.2.1 R6_2.4.0         cli_1.1.0        hms_0.4.2
>  [6] tools_3.5.3      pillar_1.3.1     tibble_2.1.1     Rcpp_1.0.1       crayon_1.3.4
> [11] utf8_1.1.4       fansi_0.4.0      pkgconfig_2.0.2  rlang_0.3.4
>
> > sessionInfo()
> R version 3.4.2 (2017-09-28)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 7 x64 (build 7601) Service Pack 1
>
> Matrix products: default
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] readr_1.1.1
>
> loaded via a namespace (and not attached):
>  [1] compiler_3.4.2   assertthat_0.2.0 R6_2.2.2         cli_1.0.0        hms_0.3          tools_3.4.2
>  [7] pillar_1.3.0     tibble_1.4.2     Rcpp_1.0.0       crayon_1.3.4     utf8_1.1.4       fansi_0.2.3
> [13] rlang_0.3.0.1
>
> -----Original Message-----
> From: Sarah Goslee <[hidden email]>
> Sent: Wednesday, April 24, 2019 11:12 AM
> To: Doran, Harold <[hidden email]>
> Cc: [hidden email]
> Subject: Re: [R] Read_fwf in package readr, double vs. numeric
>
> Hi,
>
> I can't reproduce your problem: with readr 1.1.1 on linux, it works as expected. Letting read_fwf guess the types also works fine. (See
> below.)
>
> If you aren't running the current version of readr, update and retry.
> If you are, then we probably need more info, at least sessionInfo().
>
> Sarah
>
>
>
> library(readr)
> myFile <- "foo.txt"
> pos <- fwf_positions(c(1,2,7), c(1,6,10))
>
>
> type <- c('N','D','N')
> types <- paste0(type, collapse = '')
> types <- chartr('NCD', 'ncd', types)
> read_fwf(file = myFile, col_positions = pos, col_types = types)
>
> # A tibble: 3 x 3
>      X1       X2    X3
>   <dbl>    <dbl> <dbl>
> 1     1 1.00e-20  1043
> 2     1 7.12e+ 4  1043
> 3     1 9.12e+ 4  1055
>
>
> type <- c('N','N','N')
> types <- paste0(type, collapse = '')
> types <- chartr('NCD', 'ncd', types)
> read_fwf(file = myFile, col_positions = pos, col_types = types)
>
> # A tibble: 3 x 3
>      X1       X2    X3
>   <dbl>    <dbl> <dbl>
> 1     1 1.00e-20  1043
> 2     1 7.12e+ 4  1043
> 3     1 9.12e+ 4  1055
>
>
>
> > read_fwf(file = myFile, col_positions = pos, col_types = NULL)
> Parsed with column specification:
> cols(
>   X1 = col_double(),
>   X2 = col_double(),
>   X3 = col_double()
> )
> # A tibble: 3 x 3
>      X1       X2    X3
>   <dbl>    <dbl> <dbl>
> 1     1 1.00e-20  1043
> 2     1 7.12e+ 4  1043
> 3     1 9.12e+ 4  1055
>
>
>
>
> > sessionInfo()
> R version 3.5.3 (2019-03-11)
> Platform: x86_64-redhat-linux-gnu (64-bit) Running under: Fedora 28 (Workstation Edition)
>
> Matrix products: default
> BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] readr_1.3.1    colorout_1.2-0
>
> loaded via a namespace (and not attached):
>  [1] compiler_3.5.3   assertthat_0.2.0 R6_2.4.0         cli_1.0.1
>  [5] hms_0.4.2        tools_3.5.3      pillar_1.3.1     tibble_2.0.1
>  [9] Rcpp_1.0.0       crayon_1.3.4     utf8_1.1.4       fansi_0.4.0
> [13] pkgconfig_2.0.2  rlang_0.3.1
>
>
> On Wed, Apr 24, 2019 at 10:56 AM Doran, Harold <[hidden email]> wrote:
> >
> > Suppose I have the following data sitting in a fwf file 'foo.txt'. The point of this email is to ask the group how to properly read in the value in this pseudo-data "1e-20" using the read_fwf function in the package readr.
> >
> > 11e-201043
> > 1712201043
> > 1912201055
> >
> > First, suppose I do it this way, where in this case "D" is used for double precision.
> >
> > library(readr)
> > pos <- fwf_positions(c(1,2,7), c(1,6,10)) type <- c('N','D','N') types
> > <- paste0(type, collapse = '') types <- chartr('NCD', 'ncd', types)
> >
> > read_fwf(file = myFile, col_positions = pos, col_types = types)
> >
> > # A tibble: 3 x 3
> >      X1       X2    X3
> >   <dbl>    <dbl> <dbl>
> > 1     1 1.00e-20  1043
> > 2     1 7.12e+ 4  1043
> > 3     1 9.12e+ 4  1055
> >
> > This seemingly works well and properly captures the value. However, if
> > I instead were to indicate to the function that *all* of my columns
> > were numeric (just insert this one line in lieu of the other above)
> >
> > type <- c('N','N','N')
> >
> > # A tibble: 3 x 3
> >      X1    X2    X3
> >   <dbl> <dbl> <dbl>
> > 1     1     1  1043
> > 2     1 71220  1043
> > 3     1 91220  1055
> >
> > The read in is not correct. Here is the pragmatic issue. I have a legacy program that spits out the layout structure of the fwf file (start, end positions) and also indicates what the column types are. This layout file we receive always uses a column type of numeric (N) for any numeric types (including the column holding values such as 1e-20).
> >
> > This layout file will not change so I need to figure out how to solve the problem within my read in program. I suppose one option is that I could manually change any values of "N" to "D" in my R code. That seems to work. But not sure if that is the "right" way to solve this issue.
> >
> > Thanks
> > Harold
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Sarah Goslee (she/her)
> http://www.numberwright.com
>


--
Sarah Goslee (she/her)
http://www.sarahgoslee.com

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.