Undesired result

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Undesired result

Val-17
HI All,

I am reading a data file which has different date formats. I wanted to
standardize to one format and used  a library anytime but got
undesired results as shown below. It gave me year 2093 instead of 1993


library(anytime)
DFX<-read.table(text="name ddate
  A  19-10-02
  D  11/19/2006
  F  9/9/2011
  G1  12/29/2010
  AA   10/18/93 ",header=TRUE)
    getFormats()
    addFormats(c("%d-%m-%y"))
    addFormats(c("%m-%d-%y"))
    addFormats(c("%Y/%d/%m"))
    addFormats(c("%m/%d/%y"))

DFX$anew=anydate(DFX$ddate)

Output
 name      ddate       anew
1    A   19-10-02 2002-10-19
2    D 11/19/2006 2020-11-19
3    F   9/9/2011 2011-09-09
4   G1 12/29/2010 2020-12-29
5   AA   10/18/93 2093-10-18

The problem is in the last row. It should be  1993-10-18 instead of 2093-10-18

How do I correct this?
Thank you.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Undesired result

John Kane-3
You have 4 " addFormats" commands. Maybe add one more?

On Wed, 17 Feb 2021 at 10:00, Val <[hidden email]> wrote:

> HI All,
>
> I am reading a data file which has different date formats. I wanted to
> standardize to one format and used  a library anytime but got
> undesired results as shown below. It gave me year 2093 instead of 1993
>
>
> library(anytime)
> DFX<-read.table(text="name ddate
>   A  19-10-02
>   D  11/19/2006
>   F  9/9/2011
>   G1  12/29/2010
>   AA   10/18/93 ",header=TRUE)
>     getFormats()
>     addFormats(c("%d-%m-%y"))
>     addFormats(c("%m-%d-%y"))
>     addFormats(c("%Y/%d/%m"))
>     addFormats(c("%m/%d/%y"))
>
> DFX$anew=anydate(DFX$ddate)
>
> Output
>  name      ddate       anew
> 1    A   19-10-02 2002-10-19
> 2    D 11/19/2006 2020-11-19
> 3    F   9/9/2011 2011-09-09
> 4   G1 12/29/2010 2020-12-29
> 5   AA   10/18/93 2093-10-18
>
> The problem is in the last row. It should be  1993-10-18 instead of
> 2093-10-18
>
> How do I correct this?
> Thank you.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


--
John Kane
Kingston ON Canada

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Undesired result

Duncan Murdoch-2
In reply to this post by Val-17
On 17/02/2021 9:50 a.m., Val wrote:

> HI All,
>
> I am reading a data file which has different date formats. I wanted to
> standardize to one format and used  a library anytime but got
> undesired results as shown below. It gave me year 2093 instead of 1993
>
>
> library(anytime)
> DFX<-read.table(text="name ddate
>    A  19-10-02
>    D  11/19/2006
>    F  9/9/2011
>    G1  12/29/2010
>    AA   10/18/93 ",header=TRUE)
>      getFormats()
>      addFormats(c("%d-%m-%y"))
>      addFormats(c("%m-%d-%y"))
>      addFormats(c("%Y/%d/%m"))
>      addFormats(c("%m/%d/%y"))
>
> DFX$anew=anydate(DFX$ddate)
>
> Output
>   name      ddate       anew
> 1    A   19-10-02 2002-10-19
> 2    D 11/19/2006 2020-11-19
> 3    F   9/9/2011 2011-09-09
> 4   G1 12/29/2010 2020-12-29
> 5   AA   10/18/93 2093-10-18
>
> The problem is in the last row. It should be  1993-10-18 instead of 2093-10-18
>
> How do I correct this?

This looks a little tricky.  The basic idea is that the %y format has to
guess at the century, but the guess depends on things specific to your
system.  So what would be nice is to say "two digit dates should be
assumed to fall between 1922 and 2021", but there's no way to do that
directly.

What you could do is recognize when you have a two digit year, and then
force the result into the range you want.  Here's a function that does
that, but it's not really tested much at all, so be careful if you use
it.  (One thing:  I recommend the 'useR = TRUE' option to anydate(); it
worked better in my tests than the default.)

adjustCentury <- function(inputString,
                           outputDate = anydate(inputString, useR = TRUE),
                           start = "1922-01-01") {

   start <- as.Date(start)

   twodigityear <- !grepl("[[:digit:]]{4}", inputString)

   while (length(bad <- which(twodigityear & outputDate < start))) {
     for (i in bad) {
       longdate <- as.POSIXlt(outputDate[i])
       longdate$year <- longdate$year + 100
       outputDate[i] <- as.Date(longdate)
     }
   }
   longdate <- as.POSIXlt(start)
   longdate$year <- longdate$year + 100
   finish <- as.Date(longdate)

   while (length(bad <- which(twodigityear & outputDate >= finish))) {
     for (i in bad) {
       longdate <- as.POSIXlt(outputDate[i])
       longdate$year <- longdate$year - 100
       outputDate[i] <- as.Date(longdate)
     }
   }
   outputDate
}

library(anytime)
DFX<-read.table(text="name ddate
   A  19-10-02
   D  11/19/2006
   F  9/9/2011
   G1  12/29/2010
   AA   10/18/93
   BB   10/18/1893
   CC   10/18/2093",header=TRUE)

addFormats(c("%d-%m-%y"))
addFormats(c("%m-%d-%y"))
addFormats(c("%Y/%d/%m"))
addFormats(c("%m/%d/%y"))

DFX$anew=adjustCentury(DFX$ddate, start = "1921-01-01")
DFX
#>   name      ddate       anew
#> 1    A   19-10-02 2019-10-02
#> 2    D 11/19/2006 2006-11-19
#> 3    F   9/9/2011 2011-09-09
#> 4   G1 12/29/2010 2010-12-29
#> 5   AA   10/18/93 1993-10-18
#> 6   BB 10/18/1893 1893-10-18
#> 7   CC 10/18/2093 2093-10-18

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Undesired result

Duncan Murdoch-2
Just a quick note:  you can simplify my function and speed it up quite a
bit if speed is an issue.  I had forgotten that the POSIXlt type could
act like a vector; using that you don't need those inner for loops, and
with a little calculation you can also do without the outer while loops.

Duncan Murdoch

On 17/02/2021 12:50 p.m., Duncan Murdoch wrote:

> On 17/02/2021 9:50 a.m., Val wrote:
>> HI All,
>>
>> I am reading a data file which has different date formats. I wanted to
>> standardize to one format and used  a library anytime but got
>> undesired results as shown below. It gave me year 2093 instead of 1993
>>
>>
>> library(anytime)
>> DFX<-read.table(text="name ddate
>>     A  19-10-02
>>     D  11/19/2006
>>     F  9/9/2011
>>     G1  12/29/2010
>>     AA   10/18/93 ",header=TRUE)
>>       getFormats()
>>       addFormats(c("%d-%m-%y"))
>>       addFormats(c("%m-%d-%y"))
>>       addFormats(c("%Y/%d/%m"))
>>       addFormats(c("%m/%d/%y"))
>>
>> DFX$anew=anydate(DFX$ddate)
>>
>> Output
>>    name      ddate       anew
>> 1    A   19-10-02 2002-10-19
>> 2    D 11/19/2006 2020-11-19
>> 3    F   9/9/2011 2011-09-09
>> 4   G1 12/29/2010 2020-12-29
>> 5   AA   10/18/93 2093-10-18
>>
>> The problem is in the last row. It should be  1993-10-18 instead of 2093-10-18
>>
>> How do I correct this?
>
> This looks a little tricky.  The basic idea is that the %y format has to
> guess at the century, but the guess depends on things specific to your
> system.  So what would be nice is to say "two digit dates should be
> assumed to fall between 1922 and 2021", but there's no way to do that
> directly.
>
> What you could do is recognize when you have a two digit year, and then
> force the result into the range you want.  Here's a function that does
> that, but it's not really tested much at all, so be careful if you use
> it.  (One thing:  I recommend the 'useR = TRUE' option to anydate(); it
> worked better in my tests than the default.)
>
> adjustCentury <- function(inputString,
>                             outputDate = anydate(inputString, useR = TRUE),
>                             start = "1922-01-01") {
>
>     start <- as.Date(start)
>
>     twodigityear <- !grepl("[[:digit:]]{4}", inputString)
>
>     while (length(bad <- which(twodigityear & outputDate < start))) {
>       for (i in bad) {
>         longdate <- as.POSIXlt(outputDate[i])
>         longdate$year <- longdate$year + 100
>         outputDate[i] <- as.Date(longdate)
>       }
>     }
>     longdate <- as.POSIXlt(start)
>     longdate$year <- longdate$year + 100
>     finish <- as.Date(longdate)
>
>     while (length(bad <- which(twodigityear & outputDate >= finish))) {
>       for (i in bad) {
>         longdate <- as.POSIXlt(outputDate[i])
>         longdate$year <- longdate$year - 100
>         outputDate[i] <- as.Date(longdate)
>       }
>     }
>     outputDate
> }
>
> library(anytime)
> DFX<-read.table(text="name ddate
>     A  19-10-02
>     D  11/19/2006
>     F  9/9/2011
>     G1  12/29/2010
>     AA   10/18/93
>     BB   10/18/1893
>     CC   10/18/2093",header=TRUE)
>
> addFormats(c("%d-%m-%y"))
> addFormats(c("%m-%d-%y"))
> addFormats(c("%Y/%d/%m"))
> addFormats(c("%m/%d/%y"))
>
> DFX$anew=adjustCentury(DFX$ddate, start = "1921-01-01")
> DFX
> #>   name      ddate       anew
> #> 1    A   19-10-02 2019-10-02
> #> 2    D 11/19/2006 2006-11-19
> #> 3    F   9/9/2011 2011-09-09
> #> 4   G1 12/29/2010 2010-12-29
> #> 5   AA   10/18/93 1993-10-18
> #> 6   BB 10/18/1893 1893-10-18
> #> 7   CC 10/18/2093 2093-10-18
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Undesired result

Val-17
In reply to this post by Duncan Murdoch-2
Very helpful and thank you so much!


On Wed, Feb 17, 2021 at 12:50 PM Duncan Murdoch
<[hidden email]> wrote:

>
> On 17/02/2021 9:50 a.m., Val wrote:
> > HI All,
> >
> > I am reading a data file which has different date formats. I wanted to
> > standardize to one format and used  a library anytime but got
> > undesired results as shown below. It gave me year 2093 instead of 1993
> >
> >
> > library(anytime)
> > DFX<-read.table(text="name ddate
> >    A  19-10-02
> >    D  11/19/2006
> >    F  9/9/2011
> >    G1  12/29/2010
> >    AA   10/18/93 ",header=TRUE)
> >      getFormats()
> >      addFormats(c("%d-%m-%y"))
> >      addFormats(c("%m-%d-%y"))
> >      addFormats(c("%Y/%d/%m"))
> >      addFormats(c("%m/%d/%y"))
> >
> > DFX$anew=anydate(DFX$ddate)
> >
> > Output
> >   name      ddate       anew
> > 1    A   19-10-02 2002-10-19
> > 2    D 11/19/2006 2020-11-19
> > 3    F   9/9/2011 2011-09-09
> > 4   G1 12/29/2010 2020-12-29
> > 5   AA   10/18/93 2093-10-18
> >
> > The problem is in the last row. It should be  1993-10-18 instead of 2093-10-18
> >
> > How do I correct this?
>
> This looks a little tricky.  The basic idea is that the %y format has to
> guess at the century, but the guess depends on things specific to your
> system.  So what would be nice is to say "two digit dates should be
> assumed to fall between 1922 and 2021", but there's no way to do that
> directly.
>
> What you could do is recognize when you have a two digit year, and then
> force the result into the range you want.  Here's a function that does
> that, but it's not really tested much at all, so be careful if you use
> it.  (One thing:  I recommend the 'useR = TRUE' option to anydate(); it
> worked better in my tests than the default.)
>
> adjustCentury <- function(inputString,
>                            outputDate = anydate(inputString, useR = TRUE),
>                            start = "1922-01-01") {
>
>    start <- as.Date(start)
>
>    twodigityear <- !grepl("[[:digit:]]{4}", inputString)
>
>    while (length(bad <- which(twodigityear & outputDate < start))) {
>      for (i in bad) {
>        longdate <- as.POSIXlt(outputDate[i])
>        longdate$year <- longdate$year + 100
>        outputDate[i] <- as.Date(longdate)
>      }
>    }
>    longdate <- as.POSIXlt(start)
>    longdate$year <- longdate$year + 100
>    finish <- as.Date(longdate)
>
>    while (length(bad <- which(twodigityear & outputDate >= finish))) {
>      for (i in bad) {
>        longdate <- as.POSIXlt(outputDate[i])
>        longdate$year <- longdate$year - 100
>        outputDate[i] <- as.Date(longdate)
>      }
>    }
>    outputDate
> }
>
> library(anytime)
> DFX<-read.table(text="name ddate
>    A  19-10-02
>    D  11/19/2006
>    F  9/9/2011
>    G1  12/29/2010
>    AA   10/18/93
>    BB   10/18/1893
>    CC   10/18/2093",header=TRUE)
>
> addFormats(c("%d-%m-%y"))
> addFormats(c("%m-%d-%y"))
> addFormats(c("%Y/%d/%m"))
> addFormats(c("%m/%d/%y"))
>
> DFX$anew=adjustCentury(DFX$ddate, start = "1921-01-01")
> DFX
> #>   name      ddate       anew
> #> 1    A   19-10-02 2019-10-02
> #> 2    D 11/19/2006 2006-11-19
> #> 3    F   9/9/2011 2011-09-09
> #> 4   G1 12/29/2010 2010-12-29
> #> 5   AA   10/18/93 1993-10-18
> #> 6   BB 10/18/1893 1893-10-18
> #> 7   CC 10/18/2093 2093-10-18

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.