cut{base}: is it a bug?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

cut{base}: is it a bug?

JCFaria
Dears members,

Is the below a bug of the cut {base} function?

dat <- c(
 0.6, 0.6, 0.6, 0.7, 0.7, 0.7, 0.7, 0.7, #(8)
 0.8, 0.8, 0.8, 0.9, 0.9, 0.9, 0.9,        #(7)
 1.0, 1.0, 1.0, 1.0, 1.1, 1.1, 1.1,        #(7)
 1.2, 1.2, 1.2, 1.2, 1.3, 1.3, 1.3,        #(7)
 1.4, 1.4, 1.4, 1.5, 1.5, 1.5,               #(6)
 1.6, 1.6, 1.7, 1.7, 1.7, 1.7,               #(6)
 1.8, 1.8, 1.8, 1.9, 1.9,                      #(5)
 2.0, 2.0, 2.0, 2.0, 2.0, 2.1                #(6)
 )

# making class from function "cut"
(f <- cut(dat,
          breaks= seq(from=.6, to=2.2, by=.2),
          include.lowest=TRUE,
          dig.lab=10L,
          right=FALSE))

# more easy to see the table
as.matrix(tb <- table(f))

# Checking
print(length(dat[dat >= 0.6 & dat < 0.8])) == tb[1]
print(length(dat[dat >= 0.8 & dat < 1.0])) == tb[2]
print(length(dat[dat >= 1.0 & dat < 1.2])) == tb[3]  # !?
print(length(dat[dat >= 1.2 & dat < 1.4])) == tb[4]  # !?
print(length(dat[dat >= 1.4 & dat < 1.6])) == tb[5]
print(length(dat[dat >= 1.6 & dat < 1.8])) == tb[6]  # !?
print(length(dat[dat >= 1.8 & dat < 2.0])) == tb[7]  # !?
print(length(dat[dat >= 2.0 & dat < 2.2])) == tb[8]

Best,
///\\\///\\\///\\\///\\\///\\\///\\\///\\\///\\\
Jose Claudio Faria
UESC/DCET/Brasil
joseclaudio.faria at gmail.com
Telefones:
55(73)3680.5545 - UESC
55(73)99966.9100 - VIVO
55(73)98817.6159 - OI
///\\\///\\\///\\\///\\\///\\\///\\\///\\\///\\\

If you have software to deal with statistics, you have arms;
if you have good software, you have arms and legs;
if you have software like R, you have arms, legs and wings...
the height of your flight depends only on you!

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: cut{base}: is it a bug?

David Carlson
You've been bitten by FAQ 7.31: Why doesn't R think these numbers are equal?
https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f

Your boundaries and your data values are not what you think they are. This is a limitation of digital computing not R.

> print(seq(from=.6, to=2.2, by=.2), digits=17)
[1] 0.59999999999999998 0.80000000000000004 1.00000000000000000 1.20000000000000018
[5] 1.39999999999999991 1.60000000000000009 1.80000000000000027 2.00000000000000000
[9] 2.20000000000000018

> print(dat, digits=17)
 [1] 0.59999999999999998 0.59999999999999998 0.59999999999999998 0.69999999999999996
 [5] 0.69999999999999996 0.69999999999999996 0.69999999999999996 0.69999999999999996
 [9] 0.80000000000000004 0.80000000000000004 0.80000000000000004 0.90000000000000002
[13] 0.90000000000000002 0.90000000000000002 0.90000000000000002 1.00000000000000000
[17] 1.00000000000000000 1.00000000000000000 1.00000000000000000 1.10000000000000009
[21] 1.10000000000000009 1.10000000000000009 1.19999999999999996 1.19999999999999996
[25] 1.19999999999999996 1.19999999999999996 1.30000000000000004 1.30000000000000004
[29] 1.30000000000000004 1.39999999999999991 1.39999999999999991 1.39999999999999991
[33] 1.50000000000000000 1.50000000000000000 1.50000000000000000 1.60000000000000009
[37] 1.60000000000000009 1.69999999999999996 1.69999999999999996 1.69999999999999996
[41] 1.69999999999999996 1.80000000000000004 1.80000000000000004 1.80000000000000004
[45] 1.89999999999999991 1.89999999999999991 2.00000000000000000 2.00000000000000000
[49] 2.00000000000000000 2.00000000000000000 2.00000000000000000 2.10000000000000009

The simplest solution is to subtract a bit. This also means you don't need the include.lowest= or right= arguments:

> f <- cut(dat,
+           breaks= seq(from=.6-.01, to=2.2-.01, by=.2),
+           dig.lab=10L)
> as.matrix(tb <- table(f))
            [,1]
[0.59,0.79)    8
[0.79,0.99)    7
[0.99,1.19)    7
[1.19,1.39)    7
[1.39,1.59)    6
[1.59,1.79)    6
[1.79,1.99)    5
[1.99,2.19]    6

----------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77843-4352


-----Original Message-----
From: R-help <[hidden email]> On Behalf Of Jose Claudio Faria
Sent: Monday, September 24, 2018 9:32 AM
To: [hidden email]
Subject: [R] cut{base}: is it a bug?

Dears members,

Is the below a bug of the cut {base} function?

dat <- c(
 0.6, 0.6, 0.6, 0.7, 0.7, 0.7, 0.7, 0.7, #(8)
 0.8, 0.8, 0.8, 0.9, 0.9, 0.9, 0.9,        #(7)
 1.0, 1.0, 1.0, 1.0, 1.1, 1.1, 1.1,        #(7)
 1.2, 1.2, 1.2, 1.2, 1.3, 1.3, 1.3,        #(7)
 1.4, 1.4, 1.4, 1.5, 1.5, 1.5,               #(6)
 1.6, 1.6, 1.7, 1.7, 1.7, 1.7,               #(6)
 1.8, 1.8, 1.8, 1.9, 1.9,                      #(5)
 2.0, 2.0, 2.0, 2.0, 2.0, 2.1                #(6)
 )

# making class from function "cut"
(f <- cut(dat,
          breaks= seq(from=.6, to=2.2, by=.2),
          include.lowest=TRUE,
          dig.lab=10L,
          right=FALSE))

# more easy to see the table
as.matrix(tb <- table(f))

# Checking
print(length(dat[dat >= 0.6 & dat < 0.8])) == tb[1] print(length(dat[dat >= 0.8 & dat < 1.0])) == tb[2] print(length(dat[dat >= 1.0 & dat < 1.2])) == tb[3]  # !?
print(length(dat[dat >= 1.2 & dat < 1.4])) == tb[4]  # !?
print(length(dat[dat >= 1.4 & dat < 1.6])) == tb[5] print(length(dat[dat >= 1.6 & dat < 1.8])) == tb[6]  # !?
print(length(dat[dat >= 1.8 & dat < 2.0])) == tb[7]  # !?
print(length(dat[dat >= 2.0 & dat < 2.2])) == tb[8]

Best,
///\\\///\\\///\\\///\\\///\\\///\\\///\\\///\\\
Jose Claudio Faria
UESC/DCET/Brasil
joseclaudio.faria at gmail.com
Telefones:
55(73)3680.5545 - UESC
55(73)99966.9100 - VIVO
55(73)98817.6159 - OI
///\\\///\\\///\\\///\\\///\\\///\\\///\\\///\\\

If you have software to deal with statistics, you have arms; if you have good software, you have arms and legs; if you have software like R, you have arms, legs and wings...
the height of your flight depends only on you!

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: cut{base}: is it a bug?

Jeff Newmiller
"Subtracting a bit" only fixes the problem for the test data... it introduces a bias in any continuous data you happen to throw at it. However, if you have data with known rounding applied (e.g. published tabular data) then the subtracting trick can be useful. In general you should not expect floating point fractions to behave like exact values in your analysis.

On September 24, 2018 8:14:09 AM PDT, David L Carlson <[hidden email]> wrote:

>You've been bitten by FAQ 7.31: Why doesn't R think these numbers are
>equal?
>https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f
>
>Your boundaries and your data values are not what you think they are.
>This is a limitation of digital computing not R.
>
>> print(seq(from=.6, to=2.2, by=.2), digits=17)
>[1] 0.59999999999999998 0.80000000000000004 1.00000000000000000
>1.20000000000000018
>[5] 1.39999999999999991 1.60000000000000009 1.80000000000000027
>2.00000000000000000
>[9] 2.20000000000000018
>
>> print(dat, digits=17)
>[1] 0.59999999999999998 0.59999999999999998 0.59999999999999998
>0.69999999999999996
>[5] 0.69999999999999996 0.69999999999999996 0.69999999999999996
>0.69999999999999996
>[9] 0.80000000000000004 0.80000000000000004 0.80000000000000004
>0.90000000000000002
>[13] 0.90000000000000002 0.90000000000000002 0.90000000000000002
>1.00000000000000000
>[17] 1.00000000000000000 1.00000000000000000 1.00000000000000000
>1.10000000000000009
>[21] 1.10000000000000009 1.10000000000000009 1.19999999999999996
>1.19999999999999996
>[25] 1.19999999999999996 1.19999999999999996 1.30000000000000004
>1.30000000000000004
>[29] 1.30000000000000004 1.39999999999999991 1.39999999999999991
>1.39999999999999991
>[33] 1.50000000000000000 1.50000000000000000 1.50000000000000000
>1.60000000000000009
>[37] 1.60000000000000009 1.69999999999999996 1.69999999999999996
>1.69999999999999996
>[41] 1.69999999999999996 1.80000000000000004 1.80000000000000004
>1.80000000000000004
>[45] 1.89999999999999991 1.89999999999999991 2.00000000000000000
>2.00000000000000000
>[49] 2.00000000000000000 2.00000000000000000 2.00000000000000000
>2.10000000000000009
>
>The simplest solution is to subtract a bit. This also means you don't
>need the include.lowest= or right= arguments:
>
>> f <- cut(dat,
>+           breaks= seq(from=.6-.01, to=2.2-.01, by=.2),
>+           dig.lab=10L)
>> as.matrix(tb <- table(f))
>            [,1]
>[0.59,0.79)    8
>[0.79,0.99)    7
>[0.99,1.19)    7
>[1.19,1.39)    7
>[1.39,1.59)    6
>[1.59,1.79)    6
>[1.79,1.99)    5
>[1.99,2.19]    6
>
>----------------------------------------
>David L Carlson
>Department of Anthropology
>Texas A&M University
>College Station, TX 77843-4352
>
>
>-----Original Message-----
>From: R-help <[hidden email]> On Behalf Of Jose Claudio
>Faria
>Sent: Monday, September 24, 2018 9:32 AM
>To: [hidden email]
>Subject: [R] cut{base}: is it a bug?
>
>Dears members,
>
>Is the below a bug of the cut {base} function?
>
>dat <- c(
> 0.6, 0.6, 0.6, 0.7, 0.7, 0.7, 0.7, 0.7, #(8)
> 0.8, 0.8, 0.8, 0.9, 0.9, 0.9, 0.9,        #(7)
> 1.0, 1.0, 1.0, 1.0, 1.1, 1.1, 1.1,        #(7)
> 1.2, 1.2, 1.2, 1.2, 1.3, 1.3, 1.3,        #(7)
> 1.4, 1.4, 1.4, 1.5, 1.5, 1.5,               #(6)
> 1.6, 1.6, 1.7, 1.7, 1.7, 1.7,               #(6)
> 1.8, 1.8, 1.8, 1.9, 1.9,                      #(5)
> 2.0, 2.0, 2.0, 2.0, 2.0, 2.1                #(6)
> )
>
># making class from function "cut"
>(f <- cut(dat,
>          breaks= seq(from=.6, to=2.2, by=.2),
>          include.lowest=TRUE,
>          dig.lab=10L,
>          right=FALSE))
>
># more easy to see the table
>as.matrix(tb <- table(f))
>
># Checking
>print(length(dat[dat >= 0.6 & dat < 0.8])) == tb[1]
>print(length(dat[dat >= 0.8 & dat < 1.0])) == tb[2]
>print(length(dat[dat >= 1.0 & dat < 1.2])) == tb[3]  # !?
>print(length(dat[dat >= 1.2 & dat < 1.4])) == tb[4]  # !?
>print(length(dat[dat >= 1.4 & dat < 1.6])) == tb[5]
>print(length(dat[dat >= 1.6 & dat < 1.8])) == tb[6]  # !?
>print(length(dat[dat >= 1.8 & dat < 2.0])) == tb[7]  # !?
>print(length(dat[dat >= 2.0 & dat < 2.2])) == tb[8]
>
>Best,
>///\\\///\\\///\\\///\\\///\\\///\\\///\\\///\\\
>Jose Claudio Faria
>UESC/DCET/Brasil
>joseclaudio.faria at gmail.com
>Telefones:
>55(73)3680.5545 - UESC
>55(73)99966.9100 - VIVO
>55(73)98817.6159 - OI
>///\\\///\\\///\\\///\\\///\\\///\\\///\\\///\\\
>
>If you have software to deal with statistics, you have arms; if you
>have good software, you have arms and legs; if you have software like
>R, you have arms, legs and wings...
>the height of your flight depends only on you!
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: cut{base}: is it a bug?

David Carlson
Yes, I should have included that point. The cut() function "encourages" exact comparison of values by including the right= argument without a warning that this may create unexpected results. With truly continuous data, values falling exactly on the boundary would be rare.

Most data arrives from instruments that measure to limited precision. Introductory statistics texts deal with this by distinguishing between "true" and "stated" class limits. Or, like Lyman Ott, recommend choosing the starting point interval such that "no measurement falls on a point of division between two intervals."

----------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77843-4352

-----Original Message-----
From: Jeff Newmiller <[hidden email]>
Sent: Monday, September 24, 2018 10:41 AM
To: [hidden email]; David L Carlson <[hidden email]>; Jose Claudio Faria <[hidden email]>; [hidden email]
Subject: Re: [R] cut{base}: is it a bug?

"Subtracting a bit" only fixes the problem for the test data... it introduces a bias in any continuous data you happen to throw at it. However, if you have data with known rounding applied (e.g. published tabular data) then the subtracting trick can be useful. In general you should not expect floating point fractions to behave like exact values in your analysis.

On September 24, 2018 8:14:09 AM PDT, David L Carlson <[hidden email]> wrote:

>You've been bitten by FAQ 7.31: Why doesn't R think these numbers are
>equal?
>https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f
>
>Your boundaries and your data values are not what you think they are.
>This is a limitation of digital computing not R.
>
>> print(seq(from=.6, to=2.2, by=.2), digits=17)
>[1] 0.59999999999999998 0.80000000000000004 1.00000000000000000
>1.20000000000000018
>[5] 1.39999999999999991 1.60000000000000009 1.80000000000000027
>2.00000000000000000
>[9] 2.20000000000000018
>
>> print(dat, digits=17)
>[1] 0.59999999999999998 0.59999999999999998 0.59999999999999998
>0.69999999999999996
>[5] 0.69999999999999996 0.69999999999999996 0.69999999999999996
>0.69999999999999996
>[9] 0.80000000000000004 0.80000000000000004 0.80000000000000004
>0.90000000000000002
>[13] 0.90000000000000002 0.90000000000000002 0.90000000000000002
>1.00000000000000000
>[17] 1.00000000000000000 1.00000000000000000 1.00000000000000000
>1.10000000000000009
>[21] 1.10000000000000009 1.10000000000000009 1.19999999999999996
>1.19999999999999996
>[25] 1.19999999999999996 1.19999999999999996 1.30000000000000004
>1.30000000000000004
>[29] 1.30000000000000004 1.39999999999999991 1.39999999999999991
>1.39999999999999991
>[33] 1.50000000000000000 1.50000000000000000 1.50000000000000000
>1.60000000000000009
>[37] 1.60000000000000009 1.69999999999999996 1.69999999999999996
>1.69999999999999996
>[41] 1.69999999999999996 1.80000000000000004 1.80000000000000004
>1.80000000000000004
>[45] 1.89999999999999991 1.89999999999999991 2.00000000000000000
>2.00000000000000000
>[49] 2.00000000000000000 2.00000000000000000 2.00000000000000000
>2.10000000000000009
>
>The simplest solution is to subtract a bit. This also means you don't
>need the include.lowest= or right= arguments:
>
>> f <- cut(dat,
>+           breaks= seq(from=.6-.01, to=2.2-.01, by=.2),
>+           dig.lab=10L)
>> as.matrix(tb <- table(f))
>            [,1]
>[0.59,0.79)    8
>[0.79,0.99)    7
>[0.99,1.19)    7
>[1.19,1.39)    7
>[1.39,1.59)    6
>[1.59,1.79)    6
>[1.79,1.99)    5
>[1.99,2.19]    6
>
>----------------------------------------
>David L Carlson
>Department of Anthropology
>Texas A&M University
>College Station, TX 77843-4352
>
>
>-----Original Message-----
>From: R-help <[hidden email]> On Behalf Of Jose Claudio
>Faria
>Sent: Monday, September 24, 2018 9:32 AM
>To: [hidden email]
>Subject: [R] cut{base}: is it a bug?
>
>Dears members,
>
>Is the below a bug of the cut {base} function?
>
>dat <- c(
> 0.6, 0.6, 0.6, 0.7, 0.7, 0.7, 0.7, 0.7, #(8)
> 0.8, 0.8, 0.8, 0.9, 0.9, 0.9, 0.9,        #(7)
> 1.0, 1.0, 1.0, 1.0, 1.1, 1.1, 1.1,        #(7)
> 1.2, 1.2, 1.2, 1.2, 1.3, 1.3, 1.3,        #(7)
> 1.4, 1.4, 1.4, 1.5, 1.5, 1.5,               #(6)
> 1.6, 1.6, 1.7, 1.7, 1.7, 1.7,               #(6)
> 1.8, 1.8, 1.8, 1.9, 1.9,                      #(5)
> 2.0, 2.0, 2.0, 2.0, 2.0, 2.1                #(6)
> )
>
># making class from function "cut"
>(f <- cut(dat,
>          breaks= seq(from=.6, to=2.2, by=.2),
>          include.lowest=TRUE,
>          dig.lab=10L,
>          right=FALSE))
>
># more easy to see the table
>as.matrix(tb <- table(f))
>
># Checking
>print(length(dat[dat >= 0.6 & dat < 0.8])) == tb[1]
>print(length(dat[dat >= 0.8 & dat < 1.0])) == tb[2]
>print(length(dat[dat >= 1.0 & dat < 1.2])) == tb[3]  # !?
>print(length(dat[dat >= 1.2 & dat < 1.4])) == tb[4]  # !?
>print(length(dat[dat >= 1.4 & dat < 1.6])) == tb[5]
>print(length(dat[dat >= 1.6 & dat < 1.8])) == tb[6]  # !?
>print(length(dat[dat >= 1.8 & dat < 2.0])) == tb[7]  # !?
>print(length(dat[dat >= 2.0 & dat < 2.2])) == tb[8]
>
>Best,
>///\\\///\\\///\\\///\\\///\\\///\\\///\\\///\\\
>Jose Claudio Faria
>UESC/DCET/Brasil
>joseclaudio.faria at gmail.com
>Telefones:
>55(73)3680.5545 - UESC
>55(73)99966.9100 - VIVO
>55(73)98817.6159 - OI
>///\\\///\\\///\\\///\\\///\\\///\\\///\\\///\\\
>
>If you have software to deal with statistics, you have arms; if you
>have good software, you have arms and legs; if you have software like
>R, you have arms, legs and wings...
>the height of your flight depends only on you!
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: cut{base}: is it a bug?

JCFaria
Dears,

Thank you for your contribution!

However, this function is important in a generic usage package for
frequency distribution tables: fdth (
https://cran.r-project.org/web/packages/fdth/index.html).

In this case, when I do not know in advance what the user data is, what is
the best option to avoid deviations as centuados as the example?

The data used in the example was sent to me from a teacher trying to
reproduce in class the table of a book.

Best,

///\\\///\\\///\\\///\\\///\\\///\\\///\\\///\\\
Jose Claudio Faria
UESC/DCET/Brasil
joseclaudio.faria at gmail.com
Telefones:
55(73)3680.5545 - UESC
55(73)99966.9100 - VIVO
55(73)98817.6159 - OI
///\\\///\\\///\\\///\\\///\\\///\\\///\\\///\\\

If you have software to deal with statistics, you have arms;
if you have good software, you have arms and legs;
if you have software like R, you have arms, legs and wings...
the height of your flight depends only on you!

2018-09-24 14:42 GMT-03:00 David L Carlson <[hidden email]>:

> Yes, I should have included that point. The cut() function "encourages"
> exact comparison of values by including the right= argument without a
> warning that this may create unexpected results. With truly continuous
> data, values falling exactly on the boundary would be rare.
>
> Most data arrives from instruments that measure to limited precision.
> Introductory statistics texts deal with this by distinguishing between
> "true" and "stated" class limits. Or, like Lyman Ott, recommend choosing
> the starting point interval such that "no measurement falls on a point of
> division between two intervals."
>
> ----------------------------------------
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77843-4352
>
> -----Original Message-----
> From: Jeff Newmiller <[hidden email]>
> Sent: Monday, September 24, 2018 10:41 AM
> To: [hidden email]; David L Carlson <[hidden email]>; Jose
> Claudio Faria <[hidden email]>; [hidden email]
> Subject: Re: [R] cut{base}: is it a bug?
>
> "Subtracting a bit" only fixes the problem for the test data... it
> introduces a bias in any continuous data you happen to throw at it.
> However, if you have data with known rounding applied (e.g. published
> tabular data) then the subtracting trick can be useful. In general you
> should not expect floating point fractions to behave like exact values in
> your analysis.
>
> On September 24, 2018 8:14:09 AM PDT, David L Carlson <[hidden email]>
> wrote:
> >You've been bitten by FAQ 7.31: Why doesn't R think these numbers are
> >equal?
> >https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_
> 0027t-R-think-these-numbers-are-equal_003f
> >
> >Your boundaries and your data values are not what you think they are.
> >This is a limitation of digital computing not R.
> >
> >> print(seq(from=.6, to=2.2, by=.2), digits=17)
> >[1] 0.59999999999999998 0.80000000000000004 1.00000000000000000
> >1.20000000000000018
> >[5] 1.39999999999999991 1.60000000000000009 1.80000000000000027
> >2.00000000000000000
> >[9] 2.20000000000000018
> >
> >> print(dat, digits=17)
> >[1] 0.59999999999999998 0.59999999999999998 0.59999999999999998
> >0.69999999999999996
> >[5] 0.69999999999999996 0.69999999999999996 0.69999999999999996
> >0.69999999999999996
> >[9] 0.80000000000000004 0.80000000000000004 0.80000000000000004
> >0.90000000000000002
> >[13] 0.90000000000000002 0.90000000000000002 0.90000000000000002
> >1.00000000000000000
> >[17] 1.00000000000000000 1.00000000000000000 1.00000000000000000
> >1.10000000000000009
> >[21] 1.10000000000000009 1.10000000000000009 1.19999999999999996
> >1.19999999999999996
> >[25] 1.19999999999999996 1.19999999999999996 1.30000000000000004
> >1.30000000000000004
> >[29] 1.30000000000000004 1.39999999999999991 1.39999999999999991
> >1.39999999999999991
> >[33] 1.50000000000000000 1.50000000000000000 1.50000000000000000
> >1.60000000000000009
> >[37] 1.60000000000000009 1.69999999999999996 1.69999999999999996
> >1.69999999999999996
> >[41] 1.69999999999999996 1.80000000000000004 1.80000000000000004
> >1.80000000000000004
> >[45] 1.89999999999999991 1.89999999999999991 2.00000000000000000
> >2.00000000000000000
> >[49] 2.00000000000000000 2.00000000000000000 2.00000000000000000
> >2.10000000000000009
> >
> >The simplest solution is to subtract a bit. This also means you don't
> >need the include.lowest= or right= arguments:
> >
> >> f <- cut(dat,
> >+           breaks= seq(from=.6-.01, to=2.2-.01, by=.2),
> >+           dig.lab=10L)
> >> as.matrix(tb <- table(f))
> >            [,1]
> >[0.59,0.79)    8
> >[0.79,0.99)    7
> >[0.99,1.19)    7
> >[1.19,1.39)    7
> >[1.39,1.59)    6
> >[1.59,1.79)    6
> >[1.79,1.99)    5
> >[1.99,2.19]    6
> >
> >----------------------------------------
> >David L Carlson
> >Department of Anthropology
> >Texas A&M University
> >College Station, TX 77843-4352
> >
> >
> >-----Original Message-----
> >From: R-help <[hidden email]> On Behalf Of Jose Claudio
> >Faria
> >Sent: Monday, September 24, 2018 9:32 AM
> >To: [hidden email]
> >Subject: [R] cut{base}: is it a bug?
> >
> >Dears members,
> >
> >Is the below a bug of the cut {base} function?
> >
> >dat <- c(
> > 0.6, 0.6, 0.6, 0.7, 0.7, 0.7, 0.7, 0.7, #(8)
> > 0.8, 0.8, 0.8, 0.9, 0.9, 0.9, 0.9,        #(7)
> > 1.0, 1.0, 1.0, 1.0, 1.1, 1.1, 1.1,        #(7)
> > 1.2, 1.2, 1.2, 1.2, 1.3, 1.3, 1.3,        #(7)
> > 1.4, 1.4, 1.4, 1.5, 1.5, 1.5,               #(6)
> > 1.6, 1.6, 1.7, 1.7, 1.7, 1.7,               #(6)
> > 1.8, 1.8, 1.8, 1.9, 1.9,                      #(5)
> > 2.0, 2.0, 2.0, 2.0, 2.0, 2.1                #(6)
> > )
> >
> ># making class from function "cut"
> >(f <- cut(dat,
> >          breaks= seq(from=.6, to=2.2, by=.2),
> >          include.lowest=TRUE,
> >          dig.lab=10L,
> >          right=FALSE))
> >
> ># more easy to see the table
> >as.matrix(tb <- table(f))
> >
> ># Checking
> >print(length(dat[dat >= 0.6 & dat < 0.8])) == tb[1]
> >print(length(dat[dat >= 0.8 & dat < 1.0])) == tb[2]
> >print(length(dat[dat >= 1.0 & dat < 1.2])) == tb[3]  # !?
> >print(length(dat[dat >= 1.2 & dat < 1.4])) == tb[4]  # !?
> >print(length(dat[dat >= 1.4 & dat < 1.6])) == tb[5]
> >print(length(dat[dat >= 1.6 & dat < 1.8])) == tb[6]  # !?
> >print(length(dat[dat >= 1.8 & dat < 2.0])) == tb[7]  # !?
> >print(length(dat[dat >= 2.0 & dat < 2.2])) == tb[8]
> >
> >Best,
> >///\\\///\\\///\\\///\\\///\\\///\\\///\\\///\\\
> >Jose Claudio Faria
> >UESC/DCET/Brasil
> >joseclaudio.faria at gmail.com
> >Telefones:
> >55(73)3680.5545 - UESC
> >55(73)99966.9100 - VIVO
> >55(73)98817.6159 - OI
> >///\\\///\\\///\\\///\\\///\\\///\\\///\\\///\\\
> >
> >If you have software to deal with statistics, you have arms; if you
> >have good software, you have arms and legs; if you have software like
> >R, you have arms, legs and wings...
> >the height of your flight depends only on you!
> >
> >       [[alternative HTML version deleted]]
> >
> >______________________________________________
> >[hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
> >
> >______________________________________________
> >[hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
> --
> Sent from my phone. Please excuse my brevity.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: cut{base}: is it a bug?

David Winsemius

> On Sep 24, 2018, at 12:00 PM, Jose Claudio Faria <[hidden email]> wrote:
>
> Dears,
>
> Thank you for your contribution!
>
> However, this function is important in a generic usage package for
> frequency distribution tables: fdth (
> https://cran.r-project.org/web/packages/fdth/index.html).
>
> In this case, when I do not know in advance what the user data is, what is
> the best option to avoid deviations as centuados as the example?
>
> The data used in the example was sent to me from a teacher trying to
> reproduce in class the table of a book.

If you want to provide tools that protect unsuspecting users from falling into common numerical and well understood potential traps, then why don't you process your data inputs with round( obj, 8) or something similar to your liking.

> round( 3*.1, 8) == 3*.1
[1] FALSE
> 0.3 == 3*.1
[1] FALSE
> round( 3*.1, 8) == 0.3
[1] TRUE


--
David.

>
> Best,
>
> ///\\\///\\\///\\\///\\\///\\\///\\\///\\\///\\\
> Jose Claudio Faria
> UESC/DCET/Brasil
> joseclaudio.faria at gmail.com
> Telefones:
> 55(73)3680.5545 - UESC
> 55(73)99966.9100 - VIVO
> 55(73)98817.6159 - OI
> ///\\\///\\\///\\\///\\\///\\\///\\\///\\\///\\\
>
> If you have software to deal with statistics, you have arms;
> if you have good software, you have arms and legs;
> if you have software like R, you have arms, legs and wings...
> the height of your flight depends only on you!
>
> 2018-09-24 14:42 GMT-03:00 David L Carlson <[hidden email]>:
>
>> Yes, I should have included that point. The cut() function "encourages"
>> exact comparison of values by including the right= argument without a
>> warning that this may create unexpected results. With truly continuous
>> data, values falling exactly on the boundary would be rare.
>>
>> Most data arrives from instruments that measure to limited precision.
>> Introductory statistics texts deal with this by distinguishing between
>> "true" and "stated" class limits. Or, like Lyman Ott, recommend choosing
>> the starting point interval such that "no measurement falls on a point of
>> division between two intervals."
>>
>> ----------------------------------------
>> David L Carlson
>> Department of Anthropology
>> Texas A&M University
>> College Station, TX 77843-4352
>>
>> -----Original Message-----
>> From: Jeff Newmiller <[hidden email]>
>> Sent: Monday, September 24, 2018 10:41 AM
>> To: [hidden email]; David L Carlson <[hidden email]>; Jose
>> Claudio Faria <[hidden email]>; [hidden email]
>> Subject: Re: [R] cut{base}: is it a bug?
>>
>> "Subtracting a bit" only fixes the problem for the test data... it
>> introduces a bias in any continuous data you happen to throw at it.
>> However, if you have data with known rounding applied (e.g. published
>> tabular data) then the subtracting trick can be useful. In general you
>> should not expect floating point fractions to behave like exact values in
>> your analysis.
>>
>> On September 24, 2018 8:14:09 AM PDT, David L Carlson <[hidden email]>
>> wrote:
>>> You've been bitten by FAQ 7.31: Why doesn't R think these numbers are
>>> equal?
>>> https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_
>> 0027t-R-think-these-numbers-are-equal_003f
>>>
>>> Your boundaries and your data values are not what you think they are.
>>> This is a limitation of digital computing not R.
>>>
>>>> print(seq(from=.6, to=2.2, by=.2), digits=17)
>>> [1] 0.59999999999999998 0.80000000000000004 1.00000000000000000
>>> 1.20000000000000018
>>> [5] 1.39999999999999991 1.60000000000000009 1.80000000000000027
>>> 2.00000000000000000
>>> [9] 2.20000000000000018
>>>
>>>> print(dat, digits=17)
>>> [1] 0.59999999999999998 0.59999999999999998 0.59999999999999998
>>> 0.69999999999999996
>>> [5] 0.69999999999999996 0.69999999999999996 0.69999999999999996
>>> 0.69999999999999996
>>> [9] 0.80000000000000004 0.80000000000000004 0.80000000000000004
>>> 0.90000000000000002
>>> [13] 0.90000000000000002 0.90000000000000002 0.90000000000000002
>>> 1.00000000000000000
>>> [17] 1.00000000000000000 1.00000000000000000 1.00000000000000000
>>> 1.10000000000000009
>>> [21] 1.10000000000000009 1.10000000000000009 1.19999999999999996
>>> 1.19999999999999996
>>> [25] 1.19999999999999996 1.19999999999999996 1.30000000000000004
>>> 1.30000000000000004
>>> [29] 1.30000000000000004 1.39999999999999991 1.39999999999999991
>>> 1.39999999999999991
>>> [33] 1.50000000000000000 1.50000000000000000 1.50000000000000000
>>> 1.60000000000000009
>>> [37] 1.60000000000000009 1.69999999999999996 1.69999999999999996
>>> 1.69999999999999996
>>> [41] 1.69999999999999996 1.80000000000000004 1.80000000000000004
>>> 1.80000000000000004
>>> [45] 1.89999999999999991 1.89999999999999991 2.00000000000000000
>>> 2.00000000000000000
>>> [49] 2.00000000000000000 2.00000000000000000 2.00000000000000000
>>> 2.10000000000000009
>>>
>>> The simplest solution is to subtract a bit. This also means you don't
>>> need the include.lowest= or right= arguments:
>>>
>>>> f <- cut(dat,
>>> +           breaks= seq(from=.6-.01, to=2.2-.01, by=.2),
>>> +           dig.lab=10L)
>>>> as.matrix(tb <- table(f))
>>>           [,1]
>>> [0.59,0.79)    8
>>> [0.79,0.99)    7
>>> [0.99,1.19)    7
>>> [1.19,1.39)    7
>>> [1.39,1.59)    6
>>> [1.59,1.79)    6
>>> [1.79,1.99)    5
>>> [1.99,2.19]    6
>>>
>>> ----------------------------------------
>>> David L Carlson
>>> Department of Anthropology
>>> Texas A&M University
>>> College Station, TX 77843-4352
>>>
>>>
>>> -----Original Message-----
>>> From: R-help <[hidden email]> On Behalf Of Jose Claudio
>>> Faria
>>> Sent: Monday, September 24, 2018 9:32 AM
>>> To: [hidden email]
>>> Subject: [R] cut{base}: is it a bug?
>>>
>>> Dears members,
>>>
>>> Is the below a bug of the cut {base} function?
>>>
>>> dat <- c(
>>> 0.6, 0.6, 0.6, 0.7, 0.7, 0.7, 0.7, 0.7, #(8)
>>> 0.8, 0.8, 0.8, 0.9, 0.9, 0.9, 0.9,        #(7)
>>> 1.0, 1.0, 1.0, 1.0, 1.1, 1.1, 1.1,        #(7)
>>> 1.2, 1.2, 1.2, 1.2, 1.3, 1.3, 1.3,        #(7)
>>> 1.4, 1.4, 1.4, 1.5, 1.5, 1.5,               #(6)
>>> 1.6, 1.6, 1.7, 1.7, 1.7, 1.7,               #(6)
>>> 1.8, 1.8, 1.8, 1.9, 1.9,                      #(5)
>>> 2.0, 2.0, 2.0, 2.0, 2.0, 2.1                #(6)
>>> )
>>>
>>> # making class from function "cut"
>>> (f <- cut(dat,
>>>         breaks= seq(from=.6, to=2.2, by=.2),
>>>         include.lowest=TRUE,
>>>         dig.lab=10L,
>>>         right=FALSE))
>>>
>>> # more easy to see the table
>>> as.matrix(tb <- table(f))
>>>
>>> # Checking
>>> print(length(dat[dat >= 0.6 & dat < 0.8])) == tb[1]
>>> print(length(dat[dat >= 0.8 & dat < 1.0])) == tb[2]
>>> print(length(dat[dat >= 1.0 & dat < 1.2])) == tb[3]  # !?
>>> print(length(dat[dat >= 1.2 & dat < 1.4])) == tb[4]  # !?
>>> print(length(dat[dat >= 1.4 & dat < 1.6])) == tb[5]
>>> print(length(dat[dat >= 1.6 & dat < 1.8])) == tb[6]  # !?
>>> print(length(dat[dat >= 1.8 & dat < 2.0])) == tb[7]  # !?
>>> print(length(dat[dat >= 2.0 & dat < 2.2])) == tb[8]
>>>
>>> Best,
>>> ///\\\///\\\///\\\///\\\///\\\///\\\///\\\///\\\
>>> Jose Claudio Faria
>>> UESC/DCET/Brasil
>>> joseclaudio.faria at gmail.com
>>> Telefones:
>>> 55(73)3680.5545 - UESC
>>> 55(73)99966.9100 - VIVO
>>> 55(73)98817.6159 - OI
>>> ///\\\///\\\///\\\///\\\///\\\///\\\///\\\///\\\
>>>
>>> If you have software to deal with statistics, you have arms; if you
>>> have good software, you have arms and legs; if you have software like
>>> R, you have arms, legs and wings...
>>> the height of your flight depends only on you!
>>>
>>>      [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> --
>> Sent from my phone. Please excuse my brevity.
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'   -Gehm's Corollary to Clarke's Third Law

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: cut{base}: is it a bug?

David Carlson
In reply to this post by JCFaria
You need to know the number of decimal places reported for the data. I don't know of any straightforward way to compute that from the data.

Given the number of decimals, you can compute "true" limit boundaries. This would be a way to compute the upper and lower boundaries and the number of intervals from the data:

> decimals <- 1
> tlimit <- (10^-decimals)/2
> bks <- pretty(c(dat, max(dat)+tlimit), nclass.Sturges(dat))
> f <- cut(dat, breaks= bks-tlimit, right=FALSE, dig.lab=10L)

You would also need to decide if you want your factor levels to reflect the true boundaries or the stated boundaries:

> levels(f)
[1] "[0.55,0.75)" "[0.75,0.95)" "[0.95,1.15)" "[1.15,1.35)" "[1.35,1.55)"
[6] "[1.55,1.75)" "[1.75,1.95)" "[1.95,2.15)"

Vs.

> lvls <- levels(cut(dat, breaks= bks, right=FALSE, dig.lab=10L))
> levels(f) <- lvls
> levels(f)
[1] "[0.6,0.8)" "[0.8,1)"   "[1,1.2)"   "[1.2,1.4)" "[1.4,1.6)" "[1.6,1.8)"
[7] "[1.8,2)"   "[2,2.2)"  

----------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77843-4352

From: Jose Claudio Faria <[hidden email]>
Sent: Monday, September 24, 2018 2:01 PM
To: David L Carlson <[hidden email]>
Cc: Jeff Newmiller <[hidden email]>; [hidden email]
Subject: Re: [R] cut{base}: is it a bug?

Dears,

Thank you for your contribution!

However, this function is important in a generic usage package for frequency distribution tables: fdth (https://urldefense.proofpoint.com/v2/url?u=https-3A__cran.r-2Dproject.org_web_packages_fdth_index.html&d=DwMFaQ&c=ODFT-G5SujMiGrKuoJJjVg&r=veMGHMCNZShld-KX-bIj4jRE_tP9ojUvB_Lqp0ieSdk&m=uucFFh4rZR34wAl-W854iMcjYtwQL9AF0bUtWXNd1rQ&s=wB3zkm0Z2hvc1svMqrK7BS3aQS7VlLlteA8BFZd-sQA&e=).

In this case, when I do not know in advance what the user data is, what is the best option to avoid deviations as centuados as the example?

The data used in the example was sent to me from a teacher trying to reproduce in class the table of a book.

Best,


///\\\///\\\///\\\///\\\///\\\///\\\///\\\///\\\
Jose Claudio Faria
UESC/DCET/Brasil
joseclaudio.faria at https://urldefense.proofpoint.com/v2/url?u=http-3A__gmail.com&d=DwMFaQ&c=ODFT-G5SujMiGrKuoJJjVg&r=veMGHMCNZShld-KX-bIj4jRE_tP9ojUvB_Lqp0ieSdk&m=uucFFh4rZR34wAl-W854iMcjYtwQL9AF0bUtWXNd1rQ&s=3NkW6wyXOvCsrjWVqle139SjYzQ1xGL_aOQ3ec8L85Y&e=
Telefones:
55(73)3680.5545 - UESC
55(73)99966.9100 - VIVO
55(73)98817.6159 - OI
///\\\///\\\///\\\///\\\///\\\///\\\///\\\///\\\

If you have software to deal with statistics, you have arms;
if you have good software, you have arms and legs;
if you have software like R, you have arms, legs and wings...
the height of your flight depends only on you!

2018-09-24 14:42 GMT-03:00 David L Carlson <mailto:[hidden email]>:
Yes, I should have included that point. The cut() function "encourages" exact comparison of values by including the right= argument without a warning that this may create unexpected results. With truly continuous data, values falling exactly on the boundary would be rare.

Most data arrives from instruments that measure to limited precision. Introductory statistics texts deal with this by distinguishing between "true" and "stated" class limits. Or, like Lyman Ott, recommend choosing the starting point interval such that "no measurement falls on a point of division between two intervals."

----------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77843-4352

-----Original Message-----
From: Jeff Newmiller <mailto:[hidden email]>
Sent: Monday, September 24, 2018 10:41 AM
To: mailto:[hidden email]; David L Carlson <mailto:[hidden email]>; Jose Claudio Faria <mailto:[hidden email]>; mailto:[hidden email]
Subject: Re: [R] cut{base}: is it a bug?

"Subtracting a bit" only fixes the problem for the test data... it introduces a bias in any continuous data you happen to throw at it. However, if you have data with known rounding applied (e.g. published tabular data) then the subtracting trick can be useful. In general you should not expect floating point fractions to behave like exact values in your analysis.

On September 24, 2018 8:14:09 AM PDT, David L Carlson <mailto:[hidden email]> wrote:

>You've been bitten by FAQ 7.31: Why doesn't R think these numbers are
>equal?
>https://urldefense.proofpoint.com/v2/url?u=https-3A__cran.r-2Dproject.org_doc_FAQ_R-2DFAQ.html-23Why-2Ddoesn-5F0027t-2DR-2Dthink-2Dthese-2Dnumbers-2Dare-2Dequal-5F003f&d=DwMFaQ&c=ODFT-G5SujMiGrKuoJJjVg&r=veMGHMCNZShld-KX-bIj4jRE_tP9ojUvB_Lqp0ieSdk&m=uucFFh4rZR34wAl-W854iMcjYtwQL9AF0bUtWXNd1rQ&s=bmSMJ_7ca1pAJtmWsC5SlqVYRV2rn75Kgco0uSbRHkE&e=
>
>Your boundaries and your data values are not what you think they are.
>This is a limitation of digital computing not R.
>
>> print(seq(from=.6, to=2.2, by=.2), digits=17)
>[1] 0.59999999999999998 0.80000000000000004 1.00000000000000000
>1.20000000000000018
>[5] 1.39999999999999991 1.60000000000000009 1.80000000000000027
>2.00000000000000000
>[9] 2.20000000000000018
>
>> print(dat, digits=17)
>[1] 0.59999999999999998 0.59999999999999998 0.59999999999999998
>0.69999999999999996
>[5] 0.69999999999999996 0.69999999999999996 0.69999999999999996
>0.69999999999999996
>[9] 0.80000000000000004 0.80000000000000004 0.80000000000000004
>0.90000000000000002
>[13] 0.90000000000000002 0.90000000000000002 0.90000000000000002
>1.00000000000000000
>[17] 1.00000000000000000 1.00000000000000000 1.00000000000000000
>1.10000000000000009
>[21] 1.10000000000000009 1.10000000000000009 1.19999999999999996
>1.19999999999999996
>[25] 1.19999999999999996 1.19999999999999996 1.30000000000000004
>1.30000000000000004
>[29] 1.30000000000000004 1.39999999999999991 1.39999999999999991
>1.39999999999999991
>[33] 1.50000000000000000 1.50000000000000000 1.50000000000000000
>1.60000000000000009
>[37] 1.60000000000000009 1.69999999999999996 1.69999999999999996
>1.69999999999999996
>[41] 1.69999999999999996 1.80000000000000004 1.80000000000000004
>1.80000000000000004
>[45] 1.89999999999999991 1.89999999999999991 2.00000000000000000
>2.00000000000000000
>[49] 2.00000000000000000 2.00000000000000000 2.00000000000000000
>2.10000000000000009
>
>The simplest solution is to subtract a bit. This also means you don't
>need the include.lowest= or right= arguments:
>
>> f <- cut(dat,
>+           breaks= seq(from=.6-.01, to=2.2-.01, by=.2),
>+           dig.lab=10L)
>> as.matrix(tb <- table(f))
>            [,1]
>[0.59,0.79)    8
>[0.79,0.99)    7
>[0.99,1.19)    7
>[1.19,1.39)    7
>[1.39,1.59)    6
>[1.59,1.79)    6
>[1.79,1.99)    5
>[1.99,2.19]    6
>
>----------------------------------------
>David L Carlson
>Department of Anthropology
>Texas A&M University
>College Station, TX 77843-4352
>
>
>-----Original Message-----
>From: R-help <mailto:[hidden email]> On Behalf Of Jose Claudio
>Faria
>Sent: Monday, September 24, 2018 9:32 AM
>To: mailto:[hidden email]
>Subject: [R] cut{base}: is it a bug?
>
>Dears members,
>
>Is the below a bug of the cut {base} function?
>
>dat <- c(
> 0.6, 0.6, 0.6, 0.7, 0.7, 0.7, 0.7, 0.7, #(8)
> 0.8, 0.8, 0.8, 0.9, 0.9, 0.9, 0.9,        #(7)
> 1.0, 1.0, 1.0, 1.0, 1.1, 1.1, 1.1,        #(7)
> 1.2, 1.2, 1.2, 1.2, 1.3, 1.3, 1.3,        #(7)
> 1.4, 1.4, 1.4, 1.5, 1.5, 1.5,               #(6)
> 1.6, 1.6, 1.7, 1.7, 1.7, 1.7,               #(6)
> 1.8, 1.8, 1.8, 1.9, 1.9,                      #(5)
> 2.0, 2.0, 2.0, 2.0, 2.0, 2.1                #(6)
> )
>
># making class from function "cut"
>(f <- cut(dat,
>          breaks= seq(from=.6, to=2.2, by=.2),
>          include.lowest=TRUE,
>          dig.lab=10L,
>          right=FALSE))
>
># more easy to see the table
>as.matrix(tb <- table(f))
>
># Checking
>print(length(dat[dat >= 0.6 & dat < 0.8])) == tb[1]
>print(length(dat[dat >= 0.8 & dat < 1.0])) == tb[2]
>print(length(dat[dat >= 1.0 & dat < 1.2])) == tb[3]  # !?
>print(length(dat[dat >= 1.2 & dat < 1.4])) == tb[4]  # !?
>print(length(dat[dat >= 1.4 & dat < 1.6])) == tb[5]
>print(length(dat[dat >= 1.6 & dat < 1.8])) == tb[6]  # !?
>print(length(dat[dat >= 1.8 & dat < 2.0])) == tb[7]  # !?
>print(length(dat[dat >= 2.0 & dat < 2.2])) == tb[8]
>
>Best,
>///\\\///\\\///\\\///\\\///\\\///\\\///\\\///\\\
>Jose Claudio Faria
>UESC/DCET/Brasil
>joseclaudio.faria at https://urldefense.proofpoint.com/v2/url?u=http-3A__gmail.com&d=DwMFaQ&c=ODFT-G5SujMiGrKuoJJjVg&r=veMGHMCNZShld-KX-bIj4jRE_tP9ojUvB_Lqp0ieSdk&m=uucFFh4rZR34wAl-W854iMcjYtwQL9AF0bUtWXNd1rQ&s=3NkW6wyXOvCsrjWVqle139SjYzQ1xGL_aOQ3ec8L85Y&e=
>Telefones:
>55(73)3680.5545 - UESC
>55(73)99966.9100 - VIVO
>55(73)98817.6159 - OI
>///\\\///\\\///\\\///\\\///\\\///\\\///\\\///\\\
>
>If you have software to deal with statistics, you have arms; if you
>have good software, you have arms and legs; if you have software like
>R, you have arms, legs and wings...
>the height of your flight depends only on you!
>
>       [[alternative HTML version deleted]]
>
>______________________________________________
>mailto:[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwMFaQ&c=ODFT-G5SujMiGrKuoJJjVg&r=veMGHMCNZShld-KX-bIj4jRE_tP9ojUvB_Lqp0ieSdk&m=uucFFh4rZR34wAl-W854iMcjYtwQL9AF0bUtWXNd1rQ&s=B_9d6gWXu0q4UO6J41Ve_rNsdRdGpGychN2ABZzb3Z4&e=
>PLEASE do read the posting guide
>https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwMFaQ&c=ODFT-G5SujMiGrKuoJJjVg&r=veMGHMCNZShld-KX-bIj4jRE_tP9ojUvB_Lqp0ieSdk&m=uucFFh4rZR34wAl-W854iMcjYtwQL9AF0bUtWXNd1rQ&s=vyr1qxeTCBubIC7Ora6AWijq6kMLQ0yomzD31wUGgfY&e=
>and provide commented, minimal, self-contained, reproducible code.
>
>______________________________________________
>mailto:[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwMFaQ&c=ODFT-G5SujMiGrKuoJJjVg&r=veMGHMCNZShld-KX-bIj4jRE_tP9ojUvB_Lqp0ieSdk&m=uucFFh4rZR34wAl-W854iMcjYtwQL9AF0bUtWXNd1rQ&s=B_9d6gWXu0q4UO6J41Ve_rNsdRdGpGychN2ABZzb3Z4&e=
>PLEASE do read the posting guide
>https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwMFaQ&c=ODFT-G5SujMiGrKuoJJjVg&r=veMGHMCNZShld-KX-bIj4jRE_tP9ojUvB_Lqp0ieSdk&m=uucFFh4rZR34wAl-W854iMcjYtwQL9AF0bUtWXNd1rQ&s=vyr1qxeTCBubIC7Ora6AWijq6kMLQ0yomzD31wUGgfY&e=
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.