Relevel confusing with numeric value

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Relevel confusing with numeric value

Emil
Something that bit me:
The function relevel takes a factor, and a reference level to be promoted to the first place.
If “ref” is a character this level is promoted, if it’s a numeric the “ref”-th level is promoted.
Which turns out to be very confusing if you have factor with numeric values (e.g. when reading in a csv with some dirty numeric columns and stringsAsFactors TRUE)
For example:

set.seed(1)
test <- data.frame(n=sample(c(1:100, letters[1:10]), size=90))
test$n <- relevel(test$n, 50)
print(levels(test$n))

gives “62” as the first level.

Could we make something like this an error, or at least issue a warning?
Also because some other functions automatically coerce, factor(…, levels=1:100) and levels(test$n) <- 1:100 works fine.
So this is maybe the most confusing: relevel(factor(1:10, levels = -10:20), 15) gives “4” as the first level

For now I’ve thought of 2 possible implementations, that could be inserted in stats::relevel.factor(), just before is.character(ref):

if(is.numeric(ref) && ref %in% lev)
    warning('Provided numeric reference, note that this will promote the ', ref, 'th value, not level with value "', ref, '"!')

or

if(is.numeric(ref) && any(!is.na(suppressWarnings(as.numeric(lev)))))
    warning('Provided numeric reference, note that this will promote the ', ref, 'th value, not level with value "', ref, '"!')


Best regards,
Emil Bode

Data-analyst

+31 6 43 83 89 33
[hidden email]<mailto:[hidden email]>

DANS: Netherlands Institute for Permanent Access to Digital Research Resources
Anna van Saksenlaan 51 | 2593 HW Den Haag | +31 70 349 44 50 | [hidden email]<mailto:[hidden email]> | dans.knaw.nl<applewebdata://71F677F0-6872-45F3-A6C4-4972BF87185B/www.dans.knaw.nl>
DANS is an institute of the Dutch Academy KNAW<http://knaw.nl/nl> and funding organisation NWO<http://www.nwo.nl/>.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Relevel confusing with numeric value

Peter Dalgaard-2
In a word, no. It is behaving as documented and adding a warning would just confuse others who have been using the feature as intended.

This belongs in the same bin as "as.integer(f) vs as.integer(as.character(f))" and "x[f] vs. x[as.character(f)]"

-pd


> On 2 Oct 2018, at 17:18 , Emil Bode <[hidden email]> wrote:
>
> Something that bit me:
> The function relevel takes a factor, and a reference level to be promoted to the first place.
> If “ref” is a character this level is promoted, if it’s a numeric the “ref”-th level is promoted.
> Which turns out to be very confusing if you have factor with numeric values (e.g. when reading in a csv with some dirty numeric columns and stringsAsFactors TRUE)
> For example:
>
> set.seed(1)
> test <- data.frame(n=sample(c(1:100, letters[1:10]), size=90))
> test$n <- relevel(test$n, 50)
> print(levels(test$n))
>
> gives “62” as the first level.
>
> Could we make something like this an error, or at least issue a warning?
> Also because some other functions automatically coerce, factor(…, levels=1:100) and levels(test$n) <- 1:100 works fine.
> So this is maybe the most confusing: relevel(factor(1:10, levels = -10:20), 15) gives “4” as the first level
>
> For now I’ve thought of 2 possible implementations, that could be inserted in stats::relevel.factor(), just before is.character(ref):
>
> if(is.numeric(ref) && ref %in% lev)
>    warning('Provided numeric reference, note that this will promote the ', ref, 'th value, not level with value "', ref, '"!')
>
> or
>
> if(is.numeric(ref) && any(!is.na(suppressWarnings(as.numeric(lev)))))
>    warning('Provided numeric reference, note that this will promote the ', ref, 'th value, not level with value "', ref, '"!')
>
>
> Best regards,
> Emil Bode
>
> Data-analyst
>
> +31 6 43 83 89 33
> [hidden email]<mailto:[hidden email]>
>
> DANS: Netherlands Institute for Permanent Access to Digital Research Resources
> Anna van Saksenlaan 51 | 2593 HW Den Haag | +31 70 349 44 50 | [hidden email]<mailto:[hidden email]> | dans.knaw.nl<applewebdata://71F677F0-6872-45F3-A6C4-4972BF87185B/www.dans.knaw.nl>
> DANS is an institute of the Dutch Academy KNAW<http://knaw.nl/nl> and funding organisation NWO<http://www.nwo.nl/>.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel