Bug found in cut.Date, solution found

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Bug found in cut.Date, solution found

Christopher Carbone
Greetings,

I found a bug in the cut.Date function of base R. My session info is
at the bottom.

The bug is in the "months" section of the code. Consider the following:

# Create vector with 365 days of year and cut into 12 months
dates <- seq(as.Date("2021-1-1"),as.Date("2021-12-31"),by="day")
months <- cut(dates,"months",labels=1:12)
table(months)
# months
# 1  2  3  4  5  6  7  8  9 10 11 12
# 31 28 31 30 31 30 31 31 30 31 30 31
# cut.Date works perfectly

# Extend dates vector by adding following January and cut into 13 months
dates <- seq(as.Date("2021-1-1"),as.Date("2022-1-31"),by="day")
months <- cut(dates,"months",labels=1:13)
# Error in cut.default(unclass(x), unclass(breaks), labels = labels,
right = right,  :
#                        lengths of 'breaks' and 'labels' differ
months <- cut(dates,"months",labels=1:14)
table(months)
# months
# 1  2  3  4  5  6  7  8  9 10 11 12 13 14
# 31 28 31 30 31 30 31 31 30 31 30 31 31  0
# Bug: cut.Date adds a superfluous bin
# This is because February has less than 31 days
# Thus, the code adjusts the end into March instead of February

Inspecting the source code, I see the problem resides in lines 31
through 41, the section dealing with breaks="months":

    if (valid == 3L) {
      start$mday <- 1L
      start$isdst <- -1L
      end <- as.POSIXlt(max(x, na.rm = TRUE))
      step <- if (length(by2) == 2L)
        as.integer(by2[1L])
      else 1L
38  end <- as.POSIXlt(end + (31 * step * 86400))
      end$mday <- 1L
      end$isdst <- -1L
      breaks <- as.Date(seq(start, end, breaks))

When the end is adjusted in line 38, there are instances where an
unnecessary extra bin is created.
This is accounted for in the breaks="quarters" case by testing at the
end and removing the final break if necessary.
The same method employed here fixes the bug:

    if (valid == 3L) {
      start$mday <- 1L
      start$isdst <- -1L
     maxx <- max(x, na.rm = TRUE)   # Added line
     end <- as.POSIXlt(maxx)             # Modified line
      step <- if (length(by2) == 2L)
        as.integer(by2[1L])
      else 1L
      end <- as.POSIXlt(end + (31 * step * 86400))
      end$mday <- 1L
      end$isdst <- -1L
      breaks <- as.Date(seq(start, end, breaks))
     lb <- length(breaks)                   # Added line
     if (maxx < breaks[lb - 1])           # Added line
       breaks <- breaks[-lb]               # Added line

I modified cut.Date with the above code changes and stored it in a
function called cut_Date():

dates <- seq(as.Date("2021-1-1"),as.Date("2022-1-31"),by="day")
months <- cut_Date(dates,"months",labels=1:13)
table(months)
# months
# 1  2  3  4  5  6  7  8  9 10 11 12 13
# 31 28 31 30 31 30 31 31 30 31 30 31 31

Thanks for your time and consideration!

> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.0.3 tools_4.0.3    tinytex_0.28   xfun_0.19

Many Blessings,
Christopher Carbone

"Follow your bliss and doors will open where you didn't know they were
going to be..."
-Joseph Campbell

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Bug found in cut.Date, solution found

Duncan Murdoch-2
If you don't hear something from someone in R Core, you should submit
this as a bug report on bugs.r-project.org.  (You might not have an
account there; if you don't, someone from R Core will have to give you
one.  Feel free to remind them if nothing happens in a few days!)

Duncan Murdoch

On 05/02/2021 4:01 p.m., Christopher Carbone wrote:

> Greetings,
>
> I found a bug in the cut.Date function of base R. My session info is
> at the bottom.
>
> The bug is in the "months" section of the code. Consider the following:
>
> # Create vector with 365 days of year and cut into 12 months
> dates <- seq(as.Date("2021-1-1"),as.Date("2021-12-31"),by="day")
> months <- cut(dates,"months",labels=1:12)
> table(months)
> # months
> # 1  2  3  4  5  6  7  8  9 10 11 12
> # 31 28 31 30 31 30 31 31 30 31 30 31
> # cut.Date works perfectly
>
> # Extend dates vector by adding following January and cut into 13 months
> dates <- seq(as.Date("2021-1-1"),as.Date("2022-1-31"),by="day")
> months <- cut(dates,"months",labels=1:13)
> # Error in cut.default(unclass(x), unclass(breaks), labels = labels,
> right = right,  :
> #                        lengths of 'breaks' and 'labels' differ
> months <- cut(dates,"months",labels=1:14)
> table(months)
> # months
> # 1  2  3  4  5  6  7  8  9 10 11 12 13 14
> # 31 28 31 30 31 30 31 31 30 31 30 31 31  0
> # Bug: cut.Date adds a superfluous bin
> # This is because February has less than 31 days
> # Thus, the code adjusts the end into March instead of February
>
> Inspecting the source code, I see the problem resides in lines 31
> through 41, the section dealing with breaks="months":
>
>      if (valid == 3L) {
>        start$mday <- 1L
>        start$isdst <- -1L
>        end <- as.POSIXlt(max(x, na.rm = TRUE))
>        step <- if (length(by2) == 2L)
>          as.integer(by2[1L])
>        else 1L
> 38  end <- as.POSIXlt(end + (31 * step * 86400))
>        end$mday <- 1L
>        end$isdst <- -1L
>        breaks <- as.Date(seq(start, end, breaks))
>
> When the end is adjusted in line 38, there are instances where an
> unnecessary extra bin is created.
> This is accounted for in the breaks="quarters" case by testing at the
> end and removing the final break if necessary.
> The same method employed here fixes the bug:
>
>      if (valid == 3L) {
>        start$mday <- 1L
>        start$isdst <- -1L
>       maxx <- max(x, na.rm = TRUE)   # Added line
>       end <- as.POSIXlt(maxx)             # Modified line
>        step <- if (length(by2) == 2L)
>          as.integer(by2[1L])
>        else 1L
>        end <- as.POSIXlt(end + (31 * step * 86400))
>        end$mday <- 1L
>        end$isdst <- -1L
>        breaks <- as.Date(seq(start, end, breaks))
>       lb <- length(breaks)                   # Added line
>       if (maxx < breaks[lb - 1])           # Added line
>         breaks <- breaks[-lb]               # Added line
>
> I modified cut.Date with the above code changes and stored it in a
> function called cut_Date():
>
> dates <- seq(as.Date("2021-1-1"),as.Date("2022-1-31"),by="day")
> months <- cut_Date(dates,"months",labels=1:13)
> table(months)
> # months
> # 1  2  3  4  5  6  7  8  9 10 11 12 13
> # 31 28 31 30 31 30 31 31 30 31 30 31 31
>
> Thanks for your time and consideration!
>
>> sessionInfo()
> R version 4.0.3 (2020-10-10)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 10 x64 (build 19041)
>
> Matrix products: default
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] compiler_4.0.3 tools_4.0.3    tinytex_0.28   xfun_0.19
>
> Many Blessings,
> Christopher Carbone
>
> "Follow your bliss and doors will open where you didn't know they were
> going to be..."
> -Joseph Campbell
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel