corruption of data with serialize(ascii=TRUE)

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

corruption of data with serialize(ascii=TRUE)

Roger D. Peng
I noticed the following peculiarity with `serialize()' when `ascii = TRUE' is
used.  In today's (svn r37299) R-devel, I get

 > set.seed(10)
 > x <- rnorm(10)
 >
 > a <- serialize(x, con = NULL, ascii = TRUE)
 > b <- unserialize(a)
 >
 > identical(x, b)  ## FALSE
[1] FALSE
 > x - b
  [1] -3.469447e-18  2.775558e-17 -4.440892e-16  0.000000e+00  5.551115e-17
  [6] -5.551115e-17 -4.440892e-16  0.000000e+00  2.220446e-16 -5.551115e-17


I expected `x' and `b' to be identical, which is what I get when `ascii = FALSE':

 > a <- serialize(x, con = NULL, ascii = FALSE)
 > b <- unserialize(a)
 >
 > identical(x, b)  ## TRUE
[1] TRUE


The same phenomenon occurs with `.saveRDS(ascii = TRUE)',

 > .saveRDS(x, file = "asdf", ascii = TRUE)
 > d <- .readRDS("asdf")
 >
 > identical(x, d)  ## FALSE
[1] FALSE
 >

Has anyone noticed this before?  I didn't see anything in the docs for
`serialize()' that would indicate this behavior should be expected.

I'm on Linux Fedora Core 4.

-roger
--
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: corruption of data with serialize(ascii=TRUE)

Brian Ripley
It is known (happens with save() too and did in earlier save formats).
Nothing particularly clever is done (the format is "%.16g\n") and
similarly as.character/parse are not inverses.

Perhaps more relevant is

> b/x -1
  [1]  0.000000e+00 -1.110223e-16  2.220446e-16  0.000000e+00  0.000000e+00
  [6]  2.220446e-16  4.440892e-16  0.000000e+00  2.220446e-16  0.000000e+00

so the error (on my system) is about what you would expect from
floating-point computations.

There is a comment in serialize.c

     /* 16: full precision; 17 gives 999, 000 &c */

which suggests that the format is optimized for size not maximal possible
accuracy.

Really all you have said is `floating point operations are subject to
rounding error'.


On Wed, 8 Feb 2006, Roger D. Peng wrote:

> I noticed the following peculiarity with `serialize()' when `ascii = TRUE' is
> used.  In today's (svn r37299) R-devel, I get
>
> > set.seed(10)
> > x <- rnorm(10)
> >
> > a <- serialize(x, con = NULL, ascii = TRUE)
> > b <- unserialize(a)
> >
> > identical(x, b)  ## FALSE
> [1] FALSE
> > x - b
>  [1] -3.469447e-18  2.775558e-17 -4.440892e-16  0.000000e+00  5.551115e-17
>  [6] -5.551115e-17 -4.440892e-16  0.000000e+00  2.220446e-16 -5.551115e-17
>
>
> I expected `x' and `b' to be identical, which is what I get when `ascii = FALSE':
>
> > a <- serialize(x, con = NULL, ascii = FALSE)
> > b <- unserialize(a)
> >
> > identical(x, b)  ## TRUE
> [1] TRUE
>
>
> The same phenomenon occurs with `.saveRDS(ascii = TRUE)',
>
> > .saveRDS(x, file = "asdf", ascii = TRUE)
> > d <- .readRDS("asdf")
> >
> > identical(x, d)  ## FALSE
> [1] FALSE
> >
>
> Has anyone noticed this before?  I didn't see anything in the docs for
> `serialize()' that would indicate this behavior should be expected.
>
> I'm on Linux Fedora Core 4.
>
> -roger
>

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: corruption of data with serialize(ascii=TRUE)

Roger D. Peng
Okay, I just wasn't sure of the source of the changes.  In retrospect, character
and other vectors did serialize/unserialize to the original objects.

-roger

Prof Brian Ripley wrote:

> It is known (happens with save() too and did in earlier save formats).
> Nothing particularly clever is done (the format is "%.16g\n") and
> similarly as.character/parse are not inverses.
>
> Perhaps more relevant is
>
>> b/x -1
>  [1]  0.000000e+00 -1.110223e-16  2.220446e-16  0.000000e+00  0.000000e+00
>  [6]  2.220446e-16  4.440892e-16  0.000000e+00  2.220446e-16  0.000000e+00
>
> so the error (on my system) is about what you would expect from
> floating-point computations.
>
> There is a comment in serialize.c
>
>         /* 16: full precision; 17 gives 999, 000 &c */
>
> which suggests that the format is optimized for size not maximal
> possible accuracy.
>
> Really all you have said is `floating point operations are subject to
> rounding error'.
>
>
> On Wed, 8 Feb 2006, Roger D. Peng wrote:
>
>> I noticed the following peculiarity with `serialize()' when `ascii =
>> TRUE' is
>> used.  In today's (svn r37299) R-devel, I get
>>
>> > set.seed(10)
>> > x <- rnorm(10)
>> >
>> > a <- serialize(x, con = NULL, ascii = TRUE)
>> > b <- unserialize(a)
>> >
>> > identical(x, b)  ## FALSE
>> [1] FALSE
>> > x - b
>>  [1] -3.469447e-18  2.775558e-17 -4.440892e-16  0.000000e+00  
>> 5.551115e-17
>>  [6] -5.551115e-17 -4.440892e-16  0.000000e+00  2.220446e-16
>> -5.551115e-17
>>
>>
>> I expected `x' and `b' to be identical, which is what I get when
>> `ascii = FALSE':
>>
>> > a <- serialize(x, con = NULL, ascii = FALSE)
>> > b <- unserialize(a)
>> >
>> > identical(x, b)  ## TRUE
>> [1] TRUE
>>
>>
>> The same phenomenon occurs with `.saveRDS(ascii = TRUE)',
>>
>> > .saveRDS(x, file = "asdf", ascii = TRUE)
>> > d <- .readRDS("asdf")
>> >
>> > identical(x, d)  ## FALSE
>> [1] FALSE
>> >
>>
>> Has anyone noticed this before?  I didn't see anything in the docs for
>> `serialize()' that would indicate this behavior should be expected.
>>
>> I'm on Linux Fedora Core 4.
>>
>> -roger
>>
>

--
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel