Another issue with Sys.timezone

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Another issue with Sys.timezone

Stephen Berman
(I reported the test failure mentioned below to R-help but was advised
that this list is the right one to address the issue; in the meantime I
investigated the matter somewhat more closely, including searching
recent R-devel postings, since I haven't been following this list.)

Last May there were two reports here of problems with Sys.timezone, one
where the zoneinfo directory is in a nonstandard location
(https://stat.ethz.ch/pipermail/r-devel/2017-May/074267.html) and the
other where the system lacks the file /etc/localtime
(https://stat.ethz.ch/pipermail/r-devel/2017-May/074275.html).  My
system exhibits a third case: it lacks /etc/timezone and does not set TZ
systemwide, but it does have /etc/localtime, which is a copy of, rather
than a symlink to, a file under zoneinfo.  On this system Sys.timezone()
returns NA and the Sys.timezone test in reg-tests-1d fails.  However, on
my system I can get the (abbreviated) timezone in R by using as.POSIXlt,
e.g. as.POSIXlt(Sys.time())$zone.  If Sys.timezone took advantage of
this, e.g. as below, it would be useful on such systems as mine and the
regression test would pass.

my.Sys.timezone <-
    function (location = TRUE)
{
    tz <- Sys.getenv("TZ", names = FALSE)
    if (!location || nzchar(tz))
        return(Sys.getenv("TZ", unset = NA_character_))
    lt <- normalizePath("/etc/localtime")
    if (grepl(pat <- "^/usr/share/zoneinfo/", lt) ||
        grepl(pat <- "^/usr/share/zoneinfo.default/", lt))
        sub(pat, "", lt)
    else if (lt == "/etc/localtime")
        if (!file.exists("/etc/timezone"))
            return(as.POSIXlt(Sys.time())$zone)
        else if (dir.exists("/usr/share/zoneinfo") && {
            info <- file.info(normalizePath("/etc/timezone"), extra_cols = FALSE)
            (!info$isdir && info$size <= 200L)
        } && {
            tz1 <- tryCatch(readBin("/etc/timezone", "raw", 200L),
                            error = function(e) raw(0L))
            length(tz1) > 0L && all(tz1 %in% as.raw(c(9:10, 13L, 32:126)))
        } && {
            tz2 <- gsub("^[[:space:]]+|[[:space:]]+$", "", rawToChar(tz1))
            tzp <- file.path("/usr/share/zoneinfo", tz2)
            file.exists(tzp) && !dir.exists(tzp) &&
                identical(file.size(normalizePath(tzp)), file.size(lt))
        })
            tz2
        else NA_character_
}

One problem with this is that the zone component of as.POSIXlt only
holds the abbreviated timezone, not the Olson name.  I don't know how to
get the Olson name using only R functions, but maybe it would be good
enough to return the abbreviated timezone where possible, e.g. as above.
(On my system I can get the Olson name of the timezone in R with a shell
pipeline, e.g.: system("find /usr/share/zoneinfo/ -type f | xargs md5sum
| grep $(md5sum /etc/localtime | cut -d ' ' -f 1) | head -n 1 | cut -d
'/' -f 5,6"), but the last part of this is tailored to my configuration
and the whole thing is not OS-neutral, so it isn't suitable for
Sys.timezone.)

Steve Berman

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Another issue with Sys.timezone

Martin Maechler
>>>>> Stephen Berman <[hidden email]>
>>>>>     on Sun, 15 Oct 2017 01:53:12 +0200 writes:

    > (I reported the test failure mentioned below to R-help but was advised
    > that this list is the right one to address the issue; in the meantime I
    > investigated the matter somewhat more closely, including searching
    > recent R-devel postings, since I haven't been following this list.)

    > Last May there were two reports here of problems with Sys.timezone, one
    > where the zoneinfo directory is in a nonstandard location
    > (https://stat.ethz.ch/pipermail/r-devel/2017-May/074267.html) and the
    > other where the system lacks the file /etc/localtime
    > (https://stat.ethz.ch/pipermail/r-devel/2017-May/074275.html).  My
    > system exhibits a third case: it lacks /etc/timezone and does not set TZ
    > systemwide, but it does have /etc/localtime, which is a copy of, rather
    > than a symlink to, a file under zoneinfo.  On this system Sys.timezone()
    > returns NA and the Sys.timezone test in reg-tests-1d fails.  However, on
    > my system I can get the (abbreviated) timezone in R by using as.POSIXlt,
    > e.g. as.POSIXlt(Sys.time())$zone.  If Sys.timezone took advantage of
    > this, e.g. as below, it would be useful on such systems as mine and the
    > regression test would pass.

    > my.Sys.timezone <-
    > function (location = TRUE)
    > {
    > tz <- Sys.getenv("TZ", names = FALSE)
    > if (!location || nzchar(tz))
    >    return(Sys.getenv("TZ", unset = NA_character_))
    > lt <- normalizePath("/etc/localtime")
    > if (grepl(pat <- "^/usr/share/zoneinfo/", lt) ||
    >    grepl(pat <- "^/usr/share/zoneinfo.default/", lt))
    >    sub(pat, "", lt)
    > else if (lt == "/etc/localtime")
    >    if (!file.exists("/etc/timezone"))
    > return(as.POSIXlt(Sys.time())$zone)
    >    else if (dir.exists("/usr/share/zoneinfo") && {
    > info <- file.info(normalizePath("/etc/timezone"), extra_cols = FALSE)
    > (!info$isdir && info$size <= 200L)
    >    } && {
    > tz1 <- tryCatch(readBin("/etc/timezone", "raw", 200L),
    > error = function(e) raw(0L))
    > length(tz1) > 0L && all(tz1 %in% as.raw(c(9:10, 13L, 32:126)))
    >    } && {
    > tz2 <- gsub("^[[:space:]]+|[[:space:]]+$", "", rawToChar(tz1))
    > tzp <- file.path("/usr/share/zoneinfo", tz2)
    > file.exists(tzp) && !dir.exists(tzp) &&
    >    identical(file.size(normalizePath(tzp)), file.size(lt))
    >    })
    > tz2
    >    else NA_character_
    > }

    > One problem with this is that the zone component of as.POSIXlt only
    > holds the abbreviated timezone, not the Olson name.  

Yes, indeed.  So, really only for  Sys.timezone(location = FALSE)  this
should be given, for the default  location = TRUE   it should
still give NA (i.e. NA_character_)  in your setup.

Interestingly, the Windows versions of Sys.timezone(location =
FALSE) uses something like your proposal,  and I tend to think that
-- again only for location=FALSE -- this should be used on
on-Windows as well, at least instead of returning  NA  then.

Also for me on 3 different Linuxen (Fedora 24, F. 26, and ubuntu
14.04 LTS), I get

  > Sys.timezone()
  [1] "Europe/Zurich"
  > Sys.timezone(FALSE)
  [1] NA
  >

whereas on Windows I get Europe/Berlin for the first (why on
earth - I'm really in Zurich) and get  "CEST" ("Central European Summer Time")
for the 2nd one instead of NA ... simply using a smarter version
of your proposal.   The windows source is
in R's source at  src/library/base/R/windows/system.R :

Sys.timezone <- function(location = TRUE)
{
    tz <- Sys.getenv("TZ", names = FALSE)
    if(nzchar(tz)) return(tz)
    if(location) return(.Internal(tzone_name()))
    z <- as.POSIXlt(Sys.time())
    zz <- attr(z, "tzone")
    if(length(zz) == 3L) zz[2L + z$isdst] else zz[1L]
}

From what I read, the last three lines also work in your setup
where it seems zz would be of length 1, right ?

I'd really propose to use these 3 lines in the non-Windows
version of Sys.timezone .. at the end *instead* of NA_character_
(or a slightly safer version which gives  NA_character_ if zz is
of length 0 {e.g. if there is no "tzone" attribute}.

    > i don't know how to
    > get the Olson name using only R functions, but maybe it would be good
    > enough to return the abbreviated timezone where possible, e.g. as above.
    > (On my system I can get the Olson name of the timezone in R with a shell
    > pipeline, e.g.: system("find /usr/share/zoneinfo/ -type f | xargs md5sum
    > | grep $(md5sum /etc/localtime | cut -d ' ' -f 1) | head -n 1 | cut -d
    > '/' -f 5,6"), but the last part of this is tailored to my configuration
    > and the whole thing is not OS-neutral, so it isn't suitable for
    > Sys.timezone.)

    > Steve Berman

Definitely not.  I still recommend you think of a more portable
solution for the   `location = TRUE` (default) case in Sys.timezone().
Returning the non-location form (e.g "CEST") when something like
"Europe/Zurich" is expected is really not a good idea,
and you are lucky that the regression test passes "accidentally" ...

Martin

--
Martin <[hidden email]>   http://stat.ethz.ch/~maechler
Seminar für Statistik, ETH Zürich
and R Core Team

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Another issue with Sys.timezone

Martin Maechler
>>>>> Martin Maechler <[hidden email]>
>>>>>     on Mon, 16 Oct 2017 19:13:31 +0200 writes:

>>>>> Stephen Berman <[hidden email]>
>>>>>     on Sun, 15 Oct 2017 01:53:12 +0200 writes:

>     > (I reported the test failure mentioned below to R-help but was advised
>     > that this list is the right one to address the issue; in the meantime I
>     > investigated the matter somewhat more closely, including searching
>     > recent R-devel postings, since I haven't been following this list.)
>
>     > Last May there were two reports here of problems with Sys.timezone, one
>     > where the zoneinfo directory is in a nonstandard location
>     > (https://stat.ethz.ch/pipermail/r-devel/2017-May/074267.html) and the
>     > other where the system lacks the file /etc/localtime
>     > (https://stat.ethz.ch/pipermail/r-devel/2017-May/074275.html).  My
>     > system exhibits a third case: it lacks /etc/timezone and does not set TZ
>     > systemwide, but it does have /etc/localtime, which is a copy of, rather
>     > than a symlink to, a file under zoneinfo.  On this system Sys.timezone()
>     > returns NA and the Sys.timezone test in reg-tests-1d fails.  However, on
>     > my system I can get the (abbreviated) timezone in R by using as.POSIXlt,
>     > e.g. as.POSIXlt(Sys.time())$zone.  If Sys.timezone took advantage of
>     > this, e.g. as below, it would be useful on such systems as mine and the
>     > regression test would pass.
>
>     > my.Sys.timezone <-
>     > function (location = TRUE)
>     > {
>     > tz <- Sys.getenv("TZ", names = FALSE)
>     > if (!location || nzchar(tz))
>     >    return(Sys.getenv("TZ", unset = NA_character_))
>     > lt <- normalizePath("/etc/localtime")
>     > if (grepl(pat <- "^/usr/share/zoneinfo/", lt) ||
>     >    grepl(pat <- "^/usr/share/zoneinfo.default/", lt))
>     >    sub(pat, "", lt)
>     > else if (lt == "/etc/localtime")
>     >    if (!file.exists("/etc/timezone"))
>     > return(as.POSIXlt(Sys.time())$zone)
>     >    else if (dir.exists("/usr/share/zoneinfo") && {
>     > info <- file.info(normalizePath("/etc/timezone"), extra_cols = FALSE)
>     > (!info$isdir && info$size <= 200L)
>     >    } && {
>     > tz1 <- tryCatch(readBin("/etc/timezone", "raw", 200L),
>     > error = function(e) raw(0L))
>     > length(tz1) > 0L && all(tz1 %in% as.raw(c(9:10, 13L, 32:126)))
>     >    } && {
>     > tz2 <- gsub("^[[:space:]]+|[[:space:]]+$", "", rawToChar(tz1))
>     > tzp <- file.path("/usr/share/zoneinfo", tz2)
>     > file.exists(tzp) && !dir.exists(tzp) &&
>     >    identical(file.size(normalizePath(tzp)), file.size(lt))
>     >    })
>     > tz2
>     >    else NA_character_
>     > }
>
>     > One problem with this is that the zone component of as.POSIXlt only
>     > holds the abbreviated timezone, not the Olson name.  
>
> Yes, indeed.  So, really only for  Sys.timezone(location = FALSE)  this
> should be given, for the default  location = TRUE   it should
> still give NA (i.e. NA_character_)  in your setup.
>
> Interestingly, the Windows versions of Sys.timezone(location =
> FALSE) uses something like your proposal,  and I tend to think that
> -- again only for location=FALSE -- this should be used on
> on-Windows as well, at least instead of returning  NA  then.
>
> Also for me on 3 different Linuxen (Fedora 24, F. 26, and ubuntu
> 14.04 LTS), I get
>
>   > Sys.timezone()
>   [1] "Europe/Zurich"
>   > Sys.timezone(FALSE)
>   [1] NA
>   >
>
> whereas on Windows I get Europe/Berlin for the first (why on
> earth - I'm really in Zurich) and get  "CEST" ("Central European Summer Time")
> for the 2nd one instead of NA ... simply using a smarter version
> of your proposal.   The windows source is
> in R's source at  src/library/base/R/windows/system.R :
>
> Sys.timezone <- function(location = TRUE)
> {
>     tz <- Sys.getenv("TZ", names = FALSE)
>     if(nzchar(tz)) return(tz)
>     if(location) return(.Internal(tzone_name()))
>     z <- as.POSIXlt(Sys.time())
>     zz <- attr(z, "tzone")
>     if(length(zz) == 3L) zz[2L + z$isdst] else zz[1L]
> }
>
> >From what I read, the last three lines also work in your setup
> where it seems zz would be of length 1, right ?
>
> I'd really propose to use these 3 lines in the non-Windows
> version of Sys.timezone .. at the end *instead* of NA_character_
> (or a slightly safer version which gives  NA_character_ if zz is
> of length 0 {e.g. if there is no "tzone" attribute}.
>
>     > i don't know how to
>     > get the Olson name using only R functions, but maybe it would be good
>     > enough to return the abbreviated timezone where possible, e.g. as above.
>     > (On my system I can get the Olson name of the timezone in R with a shell
>     > pipeline, e.g.: system("find /usr/share/zoneinfo/ -type f | xargs md5sum
>     > | grep $(md5sum /etc/localtime | cut -d ' ' -f 1) | head -n 1 | cut -d
>     > '/' -f 5,6"), but the last part of this is tailored to my configuration
>     > and the whole thing is not OS-neutral, so it isn't suitable for
>     > Sys.timezone.)
>
>     > Steve Berman
>
> Definitely not.  I still recommend you think of a more portable
> solution for the   `location = TRUE` (default) case in Sys.timezone().
> Returning the non-location form (e.g "CEST") when something like
> "Europe/Zurich" is expected is really not a good idea,
> and you are lucky that the regression test passes "accidentally" ...
>
> Martin

In the mean time, I have committed a common version (Windows and
non-Windows) of  Sys.timezone()  to the R development sources
(aka "R-devel").

That now uses  as.POSIXlt(Sys.time())  very similarly to the
above "Windows only" case,  but __only__ for  'location=FALSE'
which is not the default.

The most current development source is always available (via
'svn' or alternatively for browsing via your web browser) from

   https://svn.r-project.org/R/trunk/src/library/base/R/datetime.R

As you say yourself, the above system("... xargs md5sum ...")
using workaround is really too platform specific  but I'd guess
there should be a less error prone way to get the long timezone
name on your system ...
If that remains "contained" (i.e. small) and works with files
and R's files tools -- e.g. file.*() ones [but not system()],
I'd consider a patch to the above source file
(sent by you to the R-devel mailing list --- or after having
 gotten an account there by asking, via bug report & patch
 attachment at https://bugs.r-project.org/ )

Best,
Martin

>
> --
> Martin <[hidden email]>   http://stat.ethz.ch/~maechler
> Seminar für Statistik, ETH Zürich
> and R Core Team

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Another issue with Sys.timezone

Stephen Berman
On Wed, 18 Oct 2017 18:09:41 +0200 Martin Maechler <[hidden email]> wrote:

>>>>>> Martin Maechler <[hidden email]>
>>>>>>     on Mon, 16 Oct 2017 19:13:31 +0200 writes:

(I also included a reply to part of this response of yours below.)

>>>>>> Stephen Berman <[hidden email]>
>>>>>>     on Sun, 15 Oct 2017 01:53:12 +0200 writes:
>
>>     > (I reported the test failure mentioned below to R-help but was advised
>>     > that this list is the right one to address the issue; in the meantime I
>>     > investigated the matter somewhat more closely, including searching
>>     > recent R-devel postings, since I haven't been following this list.)
>>
>>     > Last May there were two reports here of problems with Sys.timezone, one
>>     > where the zoneinfo directory is in a nonstandard location
>>     > (https://stat.ethz.ch/pipermail/r-devel/2017-May/074267.html) and the
>>     > other where the system lacks the file /etc/localtime
>>     > (https://stat.ethz.ch/pipermail/r-devel/2017-May/074275.html).  My
>>     > system exhibits a third case: it lacks /etc/timezone and does not set TZ
>>     > systemwide, but it does have /etc/localtime, which is a copy of, rather
>>     > than a symlink to, a file under zoneinfo.  On this system Sys.timezone()
>>     > returns NA and the Sys.timezone test in reg-tests-1d fails.  However, on
>>     > my system I can get the (abbreviated) timezone in R by using as.POSIXlt,
>>     > e.g. as.POSIXlt(Sys.time())$zone.  If Sys.timezone took advantage of
>>     > this, e.g. as below, it would be useful on such systems as mine and the
>>     > regression test would pass.
>>
>>     > my.Sys.timezone <-
>>     > function (location = TRUE)
>>     > {
>>     > tz <- Sys.getenv("TZ", names = FALSE)
>>     > if (!location || nzchar(tz))
>>     >    return(Sys.getenv("TZ", unset = NA_character_))
>>     > lt <- normalizePath("/etc/localtime")
>>     > if (grepl(pat <- "^/usr/share/zoneinfo/", lt) ||
>>     >    grepl(pat <- "^/usr/share/zoneinfo.default/", lt))
>>     >    sub(pat, "", lt)
>>     > else if (lt == "/etc/localtime")
>>     >    if (!file.exists("/etc/timezone"))
>>     > return(as.POSIXlt(Sys.time())$zone)
>>     >    else if (dir.exists("/usr/share/zoneinfo") && {
>>     > info <- file.info(normalizePath("/etc/timezone"), extra_cols = FALSE)
>>     > (!info$isdir && info$size <= 200L)
>>     >    } && {
>>     > tz1 <- tryCatch(readBin("/etc/timezone", "raw", 200L),
>>     > error = function(e) raw(0L))
>>     > length(tz1) > 0L && all(tz1 %in% as.raw(c(9:10, 13L, 32:126)))
>>     >    } && {
>>     > tz2 <- gsub("^[[:space:]]+|[[:space:]]+$", "", rawToChar(tz1))
>>     > tzp <- file.path("/usr/share/zoneinfo", tz2)
>>     > file.exists(tzp) && !dir.exists(tzp) &&
>>     >    identical(file.size(normalizePath(tzp)), file.size(lt))
>>     >    })
>>     > tz2
>>     >    else NA_character_
>>     > }
>>
>>     > One problem with this is that the zone component of as.POSIXlt only
>>     > holds the abbreviated timezone, not the Olson name.  
>>
>> Yes, indeed.  So, really only for  Sys.timezone(location = FALSE)  this
>> should be given, for the default  location = TRUE   it should
>> still give NA (i.e. NA_character_)  in your setup.
>>
>> Interestingly, the Windows versions of Sys.timezone(location =
>> FALSE) uses something like your proposal,  and I tend to think that
>> -- again only for location=FALSE -- this should be used on
>> on-Windows as well, at least instead of returning  NA  then.
>>
>> Also for me on 3 different Linuxen (Fedora 24, F. 26, and ubuntu
>> 14.04 LTS), I get
>>
>>   > Sys.timezone()
>>   [1] "Europe/Zurich"
>>   > Sys.timezone(FALSE)
>>   [1] NA
>>   >
>>
>> whereas on Windows I get Europe/Berlin for the first (why on
>> earth - I'm really in Zurich) and get  "CEST" ("Central European Summer Time")
>> for the 2nd one instead of NA ... simply using a smarter version
>> of your proposal.   The windows source is
>> in R's source at  src/library/base/R/windows/system.R :
>>
>> Sys.timezone <- function(location = TRUE)
>> {
>>     tz <- Sys.getenv("TZ", names = FALSE)
>>     if(nzchar(tz)) return(tz)
>>     if(location) return(.Internal(tzone_name()))
>>     z <- as.POSIXlt(Sys.time())
>>     zz <- attr(z, "tzone")
>>     if(length(zz) == 3L) zz[2L + z$isdst] else zz[1L]
>> }
>>
>> >From what I read, the last three lines also work in your setup
>> where it seems zz would be of length 1, right ?

Those line do indeed work here, but zz has three elements:

> attributes(as.POSIXlt(Sys.time()))$tzone
[1] ""     "CET"  "CEST"

>> I'd really propose to use these 3 lines in the non-Windows
>> version of Sys.timezone .. at the end *instead* of NA_character_
>> (or a slightly safer version which gives  NA_character_ if zz is
>> of length 0 {e.g. if there is no "tzone" attribute}.
>>
>>     > i don't know how to
>>     > get the Olson name using only R functions, but maybe it would be good
>>     > enough to return the abbreviated timezone where possible, e.g. as above.
>>     > (On my system I can get the Olson name of the timezone in R with a shell
>>     > pipeline, e.g.: system("find /usr/share/zoneinfo/ -type f | xargs md5sum
>>     > | grep $(md5sum /etc/localtime | cut -d ' ' -f 1) | head -n 1 | cut -d
>>     > '/' -f 5,6"), but the last part of this is tailored to my configuration
>>     > and the whole thing is not OS-neutral, so it isn't suitable for
>>     > Sys.timezone.)
>>
>>     > Steve Berman
>>
>> Definitely not.  I still recommend you think of a more portable
>> solution for the   `location = TRUE` (default) case in Sys.timezone().
>> Returning the non-location form (e.g "CEST") when something like
>> "Europe/Zurich" is expected is really not a good idea,
>> and you are lucky that the regression test passes "accidentally" ...
>>
>> Martin
>
> In the mean time, I have committed a common version (Windows and
> non-Windows) of  Sys.timezone()  to the R development sources
> (aka "R-devel").
>
> That now uses  as.POSIXlt(Sys.time())  very similarly to the
> above "Windows only" case,  but __only__ for  'location=FALSE'
> which is not the default.

Thanks, I think that's definitely better than returning NA when
`location' is false...

> The most current development source is always available (via
> 'svn' or alternatively for browsing via your web browser) from
>
>    https://svn.r-project.org/R/trunk/src/library/base/R/datetime.R

...however, I tried the test that failed for me during `make check' now
with this new definition of Sys.timezone() by pasting the definition (as
new.Sys.timezone()) and the two lines of the test code into the R console,
and this is what happened:

   > new.Sys.timezone()
   > new.Sys.timezone(FALSE)
   [1] "CEST"
   > (S.t <- new.Sys.timezone())
   NULL
   > if(is.na(S.t) || !nzchar(S.t)) stop("could not get timezone")
   Error in if (is.na(S.t) || !nzchar(S.t)) stop("could not get timezone") :
     missing value where TRUE/FALSE needed
   In addition: Warning message:
   In is.na(S.t) : is.na() applied to non-(list or vector) of type 'NULL'

This is because `location' is true but all the if-clauses in the body
following `if(location)' are false, so it returns NULL.  If you add the
line `else NA_character_' below the line `tz2', then NA is returned and
the test fails as before instead of as above.

> As you say yourself, the above system("... xargs md5sum ...")
> using workaround is really too platform specific  but I'd guess
> there should be a less error prone way to get the long timezone
> name on your system ...

If I understand the zic(8) man page, the files in /usr/share/zoneinfo
should contain this information, but I don't know how to extract it,
since these are compiled files.  And since on my system /etc/localtime
is a copy of one of these compiled files, I don't know of any other way
to recover the location name without comparing it to those files.

> If that remains "contained" (i.e. small) and works with files
> and R's files tools -- e.g. file.*() ones [but not system()],
> I'd consider a patch to the above source file
> (sent by you to the R-devel mailing list --- or after having
>  gotten an account there by asking, via bug report & patch
>  attachment at https://bugs.r-project.org/ )

If comparing file size sufficed, that would be easy to do in R;
unfortunately, it is not sufficient, since some files designating
different time zones in /usr/share/zoneinfo do have the same size.  So
the only alternative I can think of is to compare bytes, e.g. with
md5sum or with cmp.  Is there some way to do this in R without using
system()?

Steve Berman

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Another issue with Sys.timezone

Martin Maechler
>>>>> Stephen Berman <[hidden email]>
>>>>>     on Thu, 19 Oct 2017 17:12:50 +0200 writes:

    > On Wed, 18 Oct 2017 18:09:41 +0200 Martin Maechler <[hidden email]> wrote:
    >>>>>>> Martin Maechler <[hidden email]>
    >>>>>>> on Mon, 16 Oct 2017 19:13:31 +0200 writes:

    > (I also included a reply to part of this response of yours below.)

    >>>>>>> Stephen Berman <[hidden email]>
    >>>>>>> on Sun, 15 Oct 2017 01:53:12 +0200 writes:
    >>
    >>> > (I reported the test failure mentioned below to R-help but was advised
    >>> > that this list is the right one to address the issue; in the meantime I
    >>> > investigated the matter somewhat more closely, including searching
    >>> > recent R-devel postings, since I haven't been following this list.)
    >>>
    >>> > Last May there were two reports here of problems with Sys.timezone, one
    >>> > where the zoneinfo directory is in a nonstandard location
    >>> > (https://stat.ethz.ch/pipermail/r-devel/2017-May/074267.html) and the
    >>> > other where the system lacks the file /etc/localtime
    >>> > (https://stat.ethz.ch/pipermail/r-devel/2017-May/074275.html).  My
    >>> > system exhibits a third case: it lacks /etc/timezone and does not set TZ
    >>> > systemwide, but it does have /etc/localtime, which is a copy of, rather
    >>> > than a symlink to, a file under zoneinfo.  On this system Sys.timezone()
    >>> > returns NA and the Sys.timezone test in reg-tests-1d fails.  However, on
    >>> > my system I can get the (abbreviated) timezone in R by using as.POSIXlt,
    >>> > e.g. as.POSIXlt(Sys.time())$zone.  If Sys.timezone took advantage of
    >>> > this, e.g. as below, it would be useful on such systems as mine and the
    >>> > regression test would pass.
    >>>
    >>> > my.Sys.timezone <-
    >>> > function (location = TRUE)
    >>> > {
    >>> > tz <- Sys.getenv("TZ", names = FALSE)
    >>> > if (!location || nzchar(tz))
    >>> >    return(Sys.getenv("TZ", unset = NA_character_))
    >>> > lt <- normalizePath("/etc/localtime")
    >>> > if (grepl(pat <- "^/usr/share/zoneinfo/", lt) ||
    >>> >    grepl(pat <- "^/usr/share/zoneinfo.default/", lt))
    >>> >    sub(pat, "", lt)
    >>> > else if (lt == "/etc/localtime")
    >>> >    if (!file.exists("/etc/timezone"))
    >>> > return(as.POSIXlt(Sys.time())$zone)
    >>> >    else if (dir.exists("/usr/share/zoneinfo") && {
    >>> > info <- file.info(normalizePath("/etc/timezone"), extra_cols = FALSE)
    >>> > (!info$isdir && info$size <= 200L)
    >>> >    } && {
    >>> > tz1 <- tryCatch(readBin("/etc/timezone", "raw", 200L),
    >>> > error = function(e) raw(0L))
    >>> > length(tz1) > 0L && all(tz1 %in% as.raw(c(9:10, 13L, 32:126)))
    >>> >    } && {
    >>> > tz2 <- gsub("^[[:space:]]+|[[:space:]]+$", "", rawToChar(tz1))
    >>> > tzp <- file.path("/usr/share/zoneinfo", tz2)
    >>> > file.exists(tzp) && !dir.exists(tzp) &&
    >>> >    identical(file.size(normalizePath(tzp)), file.size(lt))
    >>> >    })
    >>> > tz2
    >>> >    else NA_character_
    >>> > }
    >>>
    >>> > One problem with this is that the zone component of as.POSIXlt only
    >>> > holds the abbreviated timezone, not the Olson name.  
    >>>
    >>> Yes, indeed.  So, really only for  Sys.timezone(location = FALSE)  this
    >>> should be given, for the default  location = TRUE   it should
    >>> still give NA (i.e. NA_character_)  in your setup.
    >>>
    >>> Interestingly, the Windows versions of Sys.timezone(location =
    >>> FALSE) uses something like your proposal,  and I tend to think that
    >>> -- again only for location=FALSE -- this should be used on
    >>> on-Windows as well, at least instead of returning  NA  then.
    >>>
    >>> Also for me on 3 different Linuxen (Fedora 24, F. 26, and ubuntu
    >>> 14.04 LTS), I get
    >>>
    >>> > Sys.timezone()
    >>> [1] "Europe/Zurich"
    >>> > Sys.timezone(FALSE)
    >>> [1] NA
    >>> >
    >>>
    >>> whereas on Windows I get Europe/Berlin for the first (why on
    >>> earth - I'm really in Zurich) and get  "CEST" ("Central European Summer Time")
    >>> for the 2nd one instead of NA ... simply using a smarter version
    >>> of your proposal.   The windows source is
    >>> in R's source at  src/library/base/R/windows/system.R :
    >>>
    >>> Sys.timezone <- function(location = TRUE)
    >>> {
    >>> tz <- Sys.getenv("TZ", names = FALSE)
    >>> if(nzchar(tz)) return(tz)
    >>> if(location) return(.Internal(tzone_name()))
    >>> z <- as.POSIXlt(Sys.time())
    >>> zz <- attr(z, "tzone")
    >>> if(length(zz) == 3L) zz[2L + z$isdst] else zz[1L]
    >>> }
    >>>
    >>> >From what I read, the last three lines also work in your setup
    >>> where it seems zz would be of length 1, right ?

    > Those line do indeed work here, but zz has three elements:

    >> attributes(as.POSIXlt(Sys.time()))$tzone
    > [1] ""     "CET"  "CEST"

{ "but" ??   yes, three elements is what I see too, but for that
  reason there's the  if(length(zz) == 3L) ... }

    >>> I'd really propose to use these 3 lines in the non-Windows
    >>> version of Sys.timezone .. at the end *instead* of NA_character_
    >>> (or a slightly safer version which gives  NA_character_ if zz is
    >>> of length 0 {e.g. if there is no "tzone" attribute}.
    >>>
    >>> > i don't know how to
    >>> > get the Olson name using only R functions, but maybe it would be good
    >>> > enough to return the abbreviated timezone where possible, e.g. as above.
    >>> > (On my system I can get the Olson name of the timezone in R with a shell
    >>> > pipeline, e.g.: system("find /usr/share/zoneinfo/ -type f | xargs md5sum
    >>> > | grep $(md5sum /etc/localtime | cut -d ' ' -f 1) | head -n 1 | cut -d
    >>> > '/' -f 5,6"), but the last part of this is tailored to my configuration
    >>> > and the whole thing is not OS-neutral, so it isn't suitable for
    >>> > Sys.timezone.)
    >>>
    >>> > Steve Berman
    >>>
    >>> Definitely not.  I still recommend you think of a more portable
    >>> solution for the   `location = TRUE` (default) case in Sys.timezone().
    >>> Returning the non-location form (e.g "CEST") when something like
    >>> "Europe/Zurich" is expected is really not a good idea,
    >>> and you are lucky that the regression test passes "accidentally" ...
    >>>
    >>> Martin
    >>
    >> In the mean time, I have committed a common version (Windows and
    >> non-Windows) of  Sys.timezone()  to the R development sources
    >> (aka "R-devel").
    >>
    >> That now uses  as.POSIXlt(Sys.time())  very similarly to the
    >> above "Windows only" case,  but __only__ for  'location=FALSE'
    >> which is not the default.

    > Thanks, I think that's definitely better than returning NA when
    > `location' is false...

    >> The most current development source is always available (via
    >> 'svn' or alternatively for browsing via your web browser) from
    >>
    >> https://svn.r-project.org/R/trunk/src/library/base/R/datetime.R

    > ...however, I tried the test that failed for me during `make check' now
    > with this new definition of Sys.timezone() by pasting the definition (as
    > new.Sys.timezone()) and the two lines of the test code into the R console,
    > and this is what happened:

    >> new.Sys.timezone()
    >> new.Sys.timezone(FALSE)
    > [1] "CEST"
    >> (S.t <- new.Sys.timezone())
    > NULL
    >> if(is.na(S.t) || !nzchar(S.t)) stop("could not get timezone")
    > Error in if (is.na(S.t) || !nzchar(S.t)) stop("could not get timezone") :
    > missing value where TRUE/FALSE needed
    > In addition: Warning message:
    > In is.na(S.t) : is.na() applied to non-(list or vector) of type 'NULL'

    > This is because `location' is true but all the if-clauses in the body
    > following `if(location)' are false, so it returns NULL.  If you add the
    > line `else NA_character_' below the line `tz2', then NA is returned and
    > the test fails as before instead of as above.

Thank you,  for the perfect diagnosis.  Embarrassingly I had
dropped this else-clause accidentally.

    >> As you say yourself, the above system("... xargs md5sum ...")
    >> using workaround is really too platform specific  but I'd guess
    >> there should be a less error prone way to get the long timezone
    >> name on your system ...

    > If I understand the zic(8) man page, the files in /usr/share/zoneinfo
    > should contain this information, but I don't know how to extract it,
    > since these are compiled files.  And since on my system /etc/localtime
    > is a copy of one of these compiled files, I don't know of any other way
    > to recover the location name without comparing it to those files.

    >> If that remains "contained" (i.e. small) and works with files
    >> and R's files tools -- e.g. file.*() ones [but not system()],
    >> I'd consider a patch to the above source file
    >> (sent by you to the R-devel mailing list --- or after having
    >> gotten an account there by asking, via bug report & patch
    >> attachment at https://bugs.r-project.org/ )

    > If comparing file size sufficed, that would be easy to do in R;
    > unfortunately, it is not sufficient, since some files designating
    > different time zones in /usr/share/zoneinfo do have the same size.  So
    > the only alternative I can think of is to compare bytes, e.g. with
    > md5sum or with cmp.  Is there some way to do this in R without using
    > system()?

Can't you use
      tz1 <- readBin("/etc/localtime", "raw", 200L)
plus later
      tz2 <- gsub(.......,  rawToChar(tz1))

on your  /etc/localtime file
almost identically as the current code does for "/etc/timezone" ?

Martin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Another issue with Sys.timezone

Stephen Berman
On Fri, 20 Oct 2017 09:15:42 +0200 Martin Maechler <[hidden email]> wrote:

>>>>>> Stephen Berman <[hidden email]>
>>>>>>     on Thu, 19 Oct 2017 17:12:50 +0200 writes:
>
>     > On Wed, 18 Oct 2017 18:09:41 +0200 Martin Maechler
>     > <[hidden email]> wrote:
>     >>>>>>> Martin Maechler <[hidden email]>
>     >>>>>>> on Mon, 16 Oct 2017 19:13:31 +0200 writes:
>
[...]

>     >>> whereas on Windows I get Europe/Berlin for the first (why on
>     >>> earth - I'm really in Zurich) and get "CEST" ("Central European Summer
>     >>> Time")
>     >>> for the 2nd one instead of NA ... simply using a smarter version
>     >>> of your proposal.   The windows source is
>     >>> in R's source at  src/library/base/R/windows/system.R :
>     >>>
>     >>> Sys.timezone <- function(location = TRUE)
>     >>> {
>     >>> tz <- Sys.getenv("TZ", names = FALSE)
>     >>> if(nzchar(tz)) return(tz)
>     >>> if(location) return(.Internal(tzone_name()))
>     >>> z <- as.POSIXlt(Sys.time())
>     >>> zz <- attr(z, "tzone")
>     >>> if(length(zz) == 3L) zz[2L + z$isdst] else zz[1L]
>     >>> }
>     >>>
>     >>> >From what I read, the last three lines also work in your setup
>     >>> where it seems zz would be of length 1, right ?
>
>     > Those line do indeed work here, but zz has three elements:
>
>     >> attributes(as.POSIXlt(Sys.time()))$tzone
>     > [1] ""     "CET"  "CEST"
>
> { "but" ??   yes, three elements is what I see too, but for that
>   reason there's the  if(length(zz) == 3L) ... }
The "but" was in response to "it seems zz would be of length 1", but
perhaps I misunderstood you.

[...]

>     >> As you say yourself, the above system("... xargs md5sum ...")
>     >> using workaround is really too platform specific  but I'd guess
>     >> there should be a less error prone way to get the long timezone
>     >> name on your system ...
>
>     > If I understand the zic(8) man page, the files in /usr/share/zoneinfo
>     > should contain this information, but I don't know how to extract it,
>     > since these are compiled files.  And since on my system /etc/localtime
>     > is a copy of one of these compiled files, I don't know of any other way
>     > to recover the location name without comparing it to those files.
>
>     >> If that remains "contained" (i.e. small) and works with files
>     >> and R's files tools -- e.g. file.*() ones [but not system()],
>     >> I'd consider a patch to the above source file
>     >> (sent by you to the R-devel mailing list --- or after having
>     >> gotten an account there by asking, via bug report & patch
>     >> attachment at https://bugs.r-project.org/ )
>
>     > If comparing file size sufficed, that would be easy to do in R;
>     > unfortunately, it is not sufficient, since some files designating
>     > different time zones in /usr/share/zoneinfo do have the same size.  So
>     > the only alternative I can think of is to compare bytes, e.g. with
>     > md5sum or with cmp.  Is there some way to do this in R without using
>     > system()?
>
> Can't you use
>       tz1 <- readBin("/etc/localtime", "raw", 200L)
> plus later
>       tz2 <- gsub(.......,  rawToChar(tz1))
>
> on your  /etc/localtime file
> almost identically as the current code does for "/etc/timezone" ?
Oh, thanks.  I've looked at this code over and over again in the last
few days and somehow still didn't see its usefulness (maybe because I
haven't had occasion to deal with binary data in R till now).  Anyway,
just substituting "/etc/localtime" for "/etc/timezone" doesn't work,
since my /etc/localtime file seems not to hold the timezone location
name in a form recoverable with rawToChar() (all I see are the
abbreviated timezones CEST, CEMT and CET-1CEST); but I can use the raw
bytes to make the comparison with files in /usr/share/zoneinfo.  With
the attached patch, I get both the timezone location name (with
location=TRUE) and the abbreviated timezone (with location=FALSE).  One
thing I wonder about: is looking at just the first 200 bytes guaranteed
to be sufficient, or would it be better to use n=file.size() to examine
the whole file?

Steve Berman


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

datetime.R.diff (3K) Download Attachment