Re: [R] jitter-bug? problematic behaviour of the jitter function

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: [R] jitter-bug? problematic behaviour of the jitter function

Rui Barradas
Hello,

R 4.0.2 on Ubuntu 20.04, sessionInfo at end.
This came up in r-help, I'm answering to the OP and also posting to
r-devel since I believe it is more appropriate there.

I can confirm this. The original instructions are the first and the
last, but even with smaller numbers the error shows up.


set.seed(2020)

jitter(c(1,2,10^4))  # desired behaviour
#[1]     1.058761     1.957690 10000.047401

jitter(c(0,1,10^4))  # bad behaviour
#[1]   -92.43546 -1454.61126  8269.53754

jitter(c(-1,0,10^4))  # bad behaviour
#[1] -1484.3895  -427.5283  8010.3308

jitter(c(1,2,10^5))  # bad behaviour
#[1]   4809.238  10578.561 109753.430


To the OP: I am cc-ing this to [hidden email].
Questions like this are about R itself and should be posted there.


sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
  [1] LC_CTYPE=pt_PT.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=pt_PT.UTF-8        LC_COLLATE=pt_PT.UTF-8
  [5] LC_MONETARY=pt_PT.UTF-8    LC_MESSAGES=pt_PT.UTF-8
  [7] LC_PAPER=pt_PT.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=pt_PT.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.0.2


Hope this helps,

Rui Barradas

Às 11:32 de 23/09/20, Martin Keller-Ressel escreveu:

> Dear all,
>
> i have noticed some strange behaviour in the „jitter“ function in R.
> On the help page for jitter it is stated that
>
> "The result, say r, is r <- x + runif(n, -a, a) where n <- length(x) and a is the amount argument (if specified).“
>
> and
>
> "If amount is NULL (default), we set a <- factor * d/5 where d is the smallest difference between adjacent unique (apart from fuzz) x values.“
>
> This works fine as long as there is no (very) large outlier
>
>> jitter(c(1,2,10^4))  # desired behaviour
> [1]    1.083243    1.851571 9999.942716
>
> But for very large outliers the added noise suddenly ‚jumps‘ to a much larger scale:
>
>> jitter(c(1,2,10^5)) # bad behaviour
> [1] -19535.649   9578.702 115693.854
> # Noise should be of order (2-1)/5  = 0.2 but is of much larger order.
>
> This probably does not matter much when jitter is used for plotting, but it can cause problems when jitter is used to break ties.
>
> best regards,
> Martin
>
> --------------------------------
> Martin Keller-Ressel
> Professor für Stochastische Analysis und Finanzmathematik
> Technische Universität Dresden
> Institut für Mathematische Stochastik
> Willersbau B 316, Zellescher Weg 12-14
> 01062 Dresden
> --------------------------------
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel