mean

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

mean

Lipatz Jean-Luc
Hello,

Is there a reason for the following behaviour?
> mean(c("1","2","3"))
[1] NA
Warning message:
In mean.default(c("1", "2", "3")) :
  l'argument n'est ni numérique, ni logique : renvoi de NA

But:
> var(c("1","2","3"))
[1] 1

And also:
> median(c("1","2","3"))
[1] "2"

But:
> quantile(c("1","2","3"),p=.5)
Error in (1 - h) * qs[i] :
  argument non numérique pour un opérateur binaire

It sounds like a lack of symetry.
Best regards.


Jean-Luc LIPATZ
Insee - Direction générale
Responsable de la coordination sur le développement de R et la mise en oeuvre d'alternatives à SAS

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: mean

R devel mailing list

> On Jan 9, 2020, at 7:40 AM, Lipatz Jean-Luc <[hidden email]> wrote:
>
> Hello,
>
> Is there a reason for the following behaviour?
>> mean(c("1","2","3"))
> [1] NA
> Warning message:
> In mean.default(c("1", "2", "3")) :
>  l'argument n'est ni numérique, ni logique : renvoi de NA
>
> But:
>> var(c("1","2","3"))
> [1] 1
>
> And also:
>> median(c("1","2","3"))
> [1] "2"
>
> But:
>> quantile(c("1","2","3"),p=.5)
> Error in (1 - h) * qs[i] :
>  argument non numérique pour un opérateur binaire
>
> It sounds like a lack of symetry.
> Best regards.
>
>
> Jean-Luc LIPATZ
> Insee - Direction générale
> Responsable de la coordination sur le développement de R et la mise en oeuvre d'alternatives à SAS


Hi,

It would appear, whether by design or just inconsistent implementations, perhaps by different authors over time, that the checks for whether or not the input vector is numeric differ across the functions.

A further inconsistency is for median(), where:

> median(c("1", "2", "3", "4"))
[1] NA
Warning message:
In mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]) :
  argument is not numeric or logical: returning NA

as a result of there being 4 elements, rather than 3, and the internal checks in the code, where in the case of the input vector having an even number of elements, mean() is used:

    if (n%%2L == 1L)
        sort(x, partial = half)[half]
    else mean(sort(x, partial = half + 0L:1L)[half + 0L:1L])


Similarly:

> median(factor(c("1", "2", "3")))
Error in median.default(factor(c("1", "2", "3"))) : need numeric data

because the input vector is a factor, rather than character, and the initial check has:

  if (is.factor(x) || is.data.frame(x))
          stop("need numeric data")


Regards,

Marc Schwartz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: mean

R devel mailing list
Jean-Luc,

Please keep the communications on the list, for the benefit of others, now and in the future, via the list archive. I am adding r-devel back here.

I can't speak to the rationale in some of these cases. As I noted, it may be (is likely) due to differing authors over time, and there may have been relevant use cases at the time that the code was written, resulting in the various checks. Presumably, the additional checks were not incorporated into the other functions to enforce a level of consistency.

We will need to wait for someone from R Core to comment.

Regards,

Marc

> On Jan 9, 2020, at 8:34 AM, Lipatz Jean-Luc <[hidden email]> wrote:
>
> Ok, inconstencies.
>
> The last test you wrote is a bit strange. I agree that it is useful to warn about a computation that have no sense in the case of factors. But why testing data;frames? If you go that way using random structures, you can also try :
>
>> median(list(1,2),list(3,4),list(4,5))
> Error in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]) :
>  l'argument n'est pas interprétable comme une valeur logique
> De plus : Warning message:
> In if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]) :
>  la condition a une longueur > 1 et seul le premier élément est utilisé
>
> giving a message which, despite of his length, doesn't really explain the reason of the error.
>
> Why not a test on arguments like?
>  if (!is.numeric(x))
>          stop("need numeric data")
>
>
> -----Message d'origine-----
> De : Marc Schwartz <[hidden email]>
> Envoyé : jeudi 9 janvier 2020 14:19
> À : Lipatz Jean-Luc <[hidden email]>
> Cc : R-Devel <[hidden email]>
> Objet : Re: [Rd] mean
>
>
>> On Jan 9, 2020, at 7:40 AM, Lipatz Jean-Luc <[hidden email]> wrote:
>>
>> Hello,
>>
>> Is there a reason for the following behaviour?
>>> mean(c("1","2","3"))
>> [1] NA
>> Warning message:
>> In mean.default(c("1", "2", "3")) :
>> l'argument n'est ni numérique, ni logique : renvoi de NA
>>
>> But:
>>> var(c("1","2","3"))
>> [1] 1
>>
>> And also:
>>> median(c("1","2","3"))
>> [1] "2"
>>
>> But:
>>> quantile(c("1","2","3"),p=.5)
>> Error in (1 - h) * qs[i] :
>> argument non numérique pour un opérateur binaire
>>
>> It sounds like a lack of symetry.
>> Best regards.
>>
>>
>> Jean-Luc LIPATZ
>> Insee - Direction générale
>> Responsable de la coordination sur le développement de R et la mise en oeuvre d'alternatives à SAS
>
>
> Hi,
>
> It would appear, whether by design or just inconsistent implementations, perhaps by different authors over time, that the checks for whether or not the input vector is numeric differ across the functions.
>
> A further inconsistency is for median(), where:
>
>> median(c("1", "2", "3", "4"))
> [1] NA
> Warning message:
> In mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]) :
>  argument is not numeric or logical: returning NA
>
> as a result of there being 4 elements, rather than 3, and the internal checks in the code, where in the case of the input vector having an even number of elements, mean() is used:
>
>    if (n%%2L == 1L)
>        sort(x, partial = half)[half]
>    else mean(sort(x, partial = half + 0L:1L)[half + 0L:1L])
>
>
> Similarly:
>
>> median(factor(c("1", "2", "3")))
> Error in median.default(factor(c("1", "2", "3"))) : need numeric data
>
> because the input vector is a factor, rather than character, and the initial check has:
>
>  if (is.factor(x) || is.data.frame(x))
>          stop("need numeric data")
>
>
> Regards,
>
> Marc Schwartz
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: mean

Peter Dalgaard-2
I think median() behaves as designed: As long as the argument can be ordered, the "middle observation" makes sense, except when the middle falls between two categories, and you can't define and average of the two candidates for a median.

The "sick man" would seem to be var(). Notice that it is also inconsistent with cov():

> cov(c("1","2","3","4"),c("1","2","3","4") )
Error in cov(c("1", "2", "3", "4"), c("1", "2", "3", "4")) :
  is.numeric(x) || is.logical(x) is not TRUE
> var(c("1","2","3","4"),c("1","2","3","4") )
[1] 1.666667

-pd


> On 9 Jan 2020, at 14:49 , Marc Schwartz via R-devel <[hidden email]> wrote:
>
> Jean-Luc,
>
> Please keep the communications on the list, for the benefit of others, now and in the future, via the list archive. I am adding r-devel back here.
>
> I can't speak to the rationale in some of these cases. As I noted, it may be (is likely) due to differing authors over time, and there may have been relevant use cases at the time that the code was written, resulting in the various checks. Presumably, the additional checks were not incorporated into the other functions to enforce a level of consistency.
>
> We will need to wait for someone from R Core to comment.
>
> Regards,
>
> Marc
>
>> On Jan 9, 2020, at 8:34 AM, Lipatz Jean-Luc <[hidden email]> wrote:
>>
>> Ok, inconstencies.
>>
>> The last test you wrote is a bit strange. I agree that it is useful to warn about a computation that have no sense in the case of factors. But why testing data;frames? If you go that way using random structures, you can also try :
>>
>>> median(list(1,2),list(3,4),list(4,5))
>> Error in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]) :
>> l'argument n'est pas interprétable comme une valeur logique
>> De plus : Warning message:
>> In if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]) :
>> la condition a une longueur > 1 et seul le premier élément est utilisé
>>
>> giving a message which, despite of his length, doesn't really explain the reason of the error.
>>
>> Why not a test on arguments like?
>> if (!is.numeric(x))
>>         stop("need numeric data")
>>
>>
>> -----Message d'origine-----
>> De : Marc Schwartz <[hidden email]>
>> Envoyé : jeudi 9 janvier 2020 14:19
>> À : Lipatz Jean-Luc <[hidden email]>
>> Cc : R-Devel <[hidden email]>
>> Objet : Re: [Rd] mean
>>
>>
>>> On Jan 9, 2020, at 7:40 AM, Lipatz Jean-Luc <[hidden email]> wrote:
>>>
>>> Hello,
>>>
>>> Is there a reason for the following behaviour?
>>>> mean(c("1","2","3"))
>>> [1] NA
>>> Warning message:
>>> In mean.default(c("1", "2", "3")) :
>>> l'argument n'est ni numérique, ni logique : renvoi de NA
>>>
>>> But:
>>>> var(c("1","2","3"))
>>> [1] 1
>>>
>>> And also:
>>>> median(c("1","2","3"))
>>> [1] "2"
>>>
>>> But:
>>>> quantile(c("1","2","3"),p=.5)
>>> Error in (1 - h) * qs[i] :
>>> argument non numérique pour un opérateur binaire
>>>
>>> It sounds like a lack of symetry.
>>> Best regards.
>>>
>>>
>>> Jean-Luc LIPATZ
>>> Insee - Direction générale
>>> Responsable de la coordination sur le développement de R et la mise en oeuvre d'alternatives à SAS
>>
>>
>> Hi,
>>
>> It would appear, whether by design or just inconsistent implementations, perhaps by different authors over time, that the checks for whether or not the input vector is numeric differ across the functions.
>>
>> A further inconsistency is for median(), where:
>>
>>> median(c("1", "2", "3", "4"))
>> [1] NA
>> Warning message:
>> In mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]) :
>> argument is not numeric or logical: returning NA
>>
>> as a result of there being 4 elements, rather than 3, and the internal checks in the code, where in the case of the input vector having an even number of elements, mean() is used:
>>
>>   if (n%%2L == 1L)
>>       sort(x, partial = half)[half]
>>   else mean(sort(x, partial = half + 0L:1L)[half + 0L:1L])
>>
>>
>> Similarly:
>>
>>> median(factor(c("1", "2", "3")))
>> Error in median.default(factor(c("1", "2", "3"))) : need numeric data
>>
>> because the input vector is a factor, rather than character, and the initial check has:
>>
>> if (is.factor(x) || is.data.frame(x))
>>         stop("need numeric data")
>>
>>
>> Regards,
>>
>> Marc Schwartz
>>
>>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: mean

R devel mailing list
Peter,

Thanks for the reply.

If that were the case, then should not the following be allowed to work with ordered factors?

> median(factor(c("1", "2", "3"), ordered = TRUE))
Error in median.default(factor(c("1", "2", "3"), ordered = TRUE)) :
  need numeric data

At least on the surface, if you can lexically order a character vector:

> median(c("red", "blue", "green"))
[1] "green"

you can also order a factor, or ordered factor, and if the number of elements is odd, return a median value.

Regards,

Marc


> On Jan 9, 2020, at 10:46 AM, peter dalgaard <[hidden email]> wrote:
>
> I think median() behaves as designed: As long as the argument can be ordered, the "middle observation" makes sense, except when the middle falls between two categories, and you can't define and average of the two candidates for a median.
>
> The "sick man" would seem to be var(). Notice that it is also inconsistent with cov():
>
>> cov(c("1","2","3","4"),c("1","2","3","4") )
> Error in cov(c("1", "2", "3", "4"), c("1", "2", "3", "4")) :
>  is.numeric(x) || is.logical(x) is not TRUE
>> var(c("1","2","3","4"),c("1","2","3","4") )
> [1] 1.666667
>
> -pd
>
>
>> On 9 Jan 2020, at 14:49 , Marc Schwartz via R-devel <[hidden email]> wrote:
>>
>> Jean-Luc,
>>
>> Please keep the communications on the list, for the benefit of others, now and in the future, via the list archive. I am adding r-devel back here.
>>
>> I can't speak to the rationale in some of these cases. As I noted, it may be (is likely) due to differing authors over time, and there may have been relevant use cases at the time that the code was written, resulting in the various checks. Presumably, the additional checks were not incorporated into the other functions to enforce a level of consistency.
>>
>> We will need to wait for someone from R Core to comment.
>>
>> Regards,
>>
>> Marc
>>
>>> On Jan 9, 2020, at 8:34 AM, Lipatz Jean-Luc <[hidden email]> wrote:
>>>
>>> Ok, inconstencies.
>>>
>>> The last test you wrote is a bit strange. I agree that it is useful to warn about a computation that have no sense in the case of factors. But why testing data;frames? If you go that way using random structures, you can also try :
>>>
>>>> median(list(1,2),list(3,4),list(4,5))
>>> Error in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]) :
>>> l'argument n'est pas interprétable comme une valeur logique
>>> De plus : Warning message:
>>> In if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]) :
>>> la condition a une longueur > 1 et seul le premier élément est utilisé
>>>
>>> giving a message which, despite of his length, doesn't really explain the reason of the error.
>>>
>>> Why not a test on arguments like?
>>> if (!is.numeric(x))
>>>        stop("need numeric data")
>>>
>>>
>>> -----Message d'origine-----
>>> De : Marc Schwartz <[hidden email]>
>>> Envoyé : jeudi 9 janvier 2020 14:19
>>> À : Lipatz Jean-Luc <[hidden email]>
>>> Cc : R-Devel <[hidden email]>
>>> Objet : Re: [Rd] mean
>>>
>>>
>>>> On Jan 9, 2020, at 7:40 AM, Lipatz Jean-Luc <[hidden email]> wrote:
>>>>
>>>> Hello,
>>>>
>>>> Is there a reason for the following behaviour?
>>>>> mean(c("1","2","3"))
>>>> [1] NA
>>>> Warning message:
>>>> In mean.default(c("1", "2", "3")) :
>>>> l'argument n'est ni numérique, ni logique : renvoi de NA
>>>>
>>>> But:
>>>>> var(c("1","2","3"))
>>>> [1] 1
>>>>
>>>> And also:
>>>>> median(c("1","2","3"))
>>>> [1] "2"
>>>>
>>>> But:
>>>>> quantile(c("1","2","3"),p=.5)
>>>> Error in (1 - h) * qs[i] :
>>>> argument non numérique pour un opérateur binaire
>>>>
>>>> It sounds like a lack of symetry.
>>>> Best regards.
>>>>
>>>>
>>>> Jean-Luc LIPATZ
>>>> Insee - Direction générale
>>>> Responsable de la coordination sur le développement de R et la mise en oeuvre d'alternatives à SAS
>>>
>>>
>>> Hi,
>>>
>>> It would appear, whether by design or just inconsistent implementations, perhaps by different authors over time, that the checks for whether or not the input vector is numeric differ across the functions.
>>>
>>> A further inconsistency is for median(), where:
>>>
>>>> median(c("1", "2", "3", "4"))
>>> [1] NA
>>> Warning message:
>>> In mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]) :
>>> argument is not numeric or logical: returning NA
>>>
>>> as a result of there being 4 elements, rather than 3, and the internal checks in the code, where in the case of the input vector having an even number of elements, mean() is used:
>>>
>>>  if (n%%2L == 1L)
>>>      sort(x, partial = half)[half]
>>>  else mean(sort(x, partial = half + 0L:1L)[half + 0L:1L])
>>>
>>>
>>> Similarly:
>>>
>>>> median(factor(c("1", "2", "3")))
>>> Error in median.default(factor(c("1", "2", "3"))) : need numeric data
>>>
>>> because the input vector is a factor, rather than character, and the initial check has:
>>>
>>> if (is.factor(x) || is.data.frame(x))
>>>        stop("need numeric data")
>>>
>>>
>>> Regards,
>>>
>>> Marc Schwartz
>>>
>>>
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: [hidden email]  Priv: [hidden email]
>
>
>
>
>
>
>
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: mean

S Ellison-2
In reply to this post by Lipatz Jean-Luc
Note that in

> > quantile(c("1","2","3"),p=.5)
> Error in (1 - h) * qs[i] :
>  argument non numérique pour un opérateur binaire
the default quantile type (7) does not work for non-numerics.

Quantile types 1 and 3 work as expected:

> quantile(c("1","2","3"),p=.5, type=1)
50%
"2"
> quantile(c("1","2","3"),p=.5, type=3)
50%
"2"


Steve E



*******************************************************************
This email and any attachments are confidential. Any use, copying or
disclosure other than by the intended recipient is unauthorised. If
you have received this message in error, please notify the sender
immediately via +44(0)20 8943 7000 or notify [hidden email]
and delete this message and any copies from your computer and network.
LGC Limited. Registered in England 2991879.
Registered office: Queens Road, Teddington, Middlesex, TW11 0LY, UK
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel