Re: Does anyone.... worth a warning?!? No warning at all

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Does anyone.... worth a warning?!? No warning at all

Tom Willems-2
dear Mathew

mean is a Generic function

mean(x...)

in wich x is a data object, like a  data frame a list a numeric vector...

so in your example it only reads the first character and then reports it.

try x = c(1,1,2)
mean(x)

kind regards,
Tom


Disclaimer: click here
        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Does anyone.... worth a warning?!? No warning at all

Rolf Turner

On 20/08/2007, at 9:54 PM, Tom Willems wrote:

> dear Mathew
>
> mean is a Generic function
>
> mean(x...)
>
> in wich x is a data object, like a  data frame a list a numeric  
> vector...
>
> so in your example it only reads the first character and then  
> reports it.
>
> try x = c(1,1,2)
> mean(x)

        I think you've completely missed the point.  I'm sure Mathew now  
understands the syntax
        of the mean function.  His point was that it would be very easy for  
someone to use this
        function incorrectly --- and he indicated very clearly *why*, by  
giving an example using max().

        If mean() could be made safer to use by incorporating a warning,  
without unduly adding to
        overheads, then it would seem sensible to incorporate such a  
warning.  Or to change the
        mean() function so that mean(1,2,3) returns ``2'' --- just as max
(1,2,3) returns ``3'' --- as
        Mathew *initially* (and quite reasonably) expected it to do.

                                cheers,

                                        Rolf Turner

######################################################################
Attention:\ This e-mail message is privileged and confidenti...{{dropped}}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Does anyone.... worth a warning?!? No warning at all

Ted.Harding
On 20-Aug-07 19:55:44, Rolf Turner wrote:

> On 20/08/2007, at 9:54 PM, Tom Willems wrote:
>> dear Mathew
>>
>> mean is a Generic function
>>
>> mean(x...)
>>
>> in wich x is a data object, like a  data frame a list
>> a numeric vector...
>>
>> so in your example it only reads the first character
>> and then reports it.
>>
>> try x = c(1,1,2)
>> mean(x)
>
> I think you've completely missed the point. I'm sure Mathew
> now understands the syntax of the mean function. His point
> was that it would be very easy for someone to use this
> function incorrectly --- and he indicated very clearly *why*,
> by giving an example using max().
>
> If mean() could be made safer to use by incorporating a warning,  
> without unduly adding to overheads, then it would seem sensible
> to incorporate such a warning.  Or to change the mean()
> function so that mean(1,2,3) returns ``2'' --- just as max
> (1,2,3) returns ``3'' --- as Mathew *initially* (and quite
> reasonably) expected it to do.
>
> cheers,
> Rolf Turner

I think Rolf makes a very important point. There are a lot of
idiosyncracies in R, which in time we get used to; but learning
about them is something of a "sociological" exercise, just as
one learns that when one's friend A says "X Y Z" is may not mean
the same as when one's friend B says it.

Another example is in the use of %*% for matrix multiplication
when one or both of the factors is a vector. If you came to R
from matlab/octave, where every vector is already either a row
vector or a column vector, you knew where you stood. But in R
the semantics of the syntax depend on the context in a more
complicated way. In R, x<-c(-1,1) is called a "vector", but it
does not have dimensions:

x<-c(-1,1)
dim(x)
NULL

So its relationship to matrix multiplication is ambiguous.

For example:

M<-matrix(c(1,2,3,4),nrow=2); M
     [,1] [,2]
[1,]    1    3
[2,]    2    4

x%*%M
     [,1] [,2]
[1,]    1    1

and x is now coerced into a "column vector", which now (for that
immediate purpose) now does have dimensions (just as a row vector
would have in matlab/octave).

Similarly,

M%*%x
     [,1]
[1,]    2
[2,]    2

coerces it into a column vector. But now (asks the beginner who
has not yet got round to looking up ?"%*%") what happens with x%*%x?

Will we get column vector times row vector (a 2x2 matrix) or
row times column (a scalar)? In fact we get the latter:

x%*%x
     [,1]
[1,]    2

All this is in accordance with ?"%*%":

Description:
     Multiplies two matrices, if they are conformable. If one argument
     is a vector, it will be coerced to a either a row or column matrix
     to make the two arguments conformable. If both are vectors it will
     return the inner product.


But now suppose y<-c(1,2,3), with x<-c(-1,1) as before.

x%*%y
Error in x %*% y : non-conformable arguments

because it is trying to make the inner product of vectors of unequal
length. Whereas someone who had got as far as the second sentence of
the Description, and did not take the hird sentence as strictly
literally as intended, might expect that x would be coerced into
column, and y into row, so that they were conformable for
multiplication, giving a 2x3 matrix result (perhaps on the grounds
that "it will return the inner product" means that it will do this
if they are conformable, otherwise doing the coercions described
in the first sentence).

That misunderstanding could be avoided if the last sentence read:

"If both are vectors it will return the inner product provided
both are the same length; otherwise it is an error and nothing
is returned."

Or perhaps x or y should not be called "vector" -- in linear
algebra people are used to "vector" being another name for
a 1-dimensional matrix, being either "row" or "column".
The R entity is not that sort of thing at all. The closest
that "R Language Definition" comes to defining it is:

"Vectors can be thought of as contiguous cells containing
homogeneous data. Cells are accessed through indexing
operations such as x[5]."

x<-matrix(x) will, of course, turn x into a paid-up column vector
(as you might guess from ?matrix, if you copy the "byrow=FALSE"
from the "as.matrix" explanation to the "matrix" explanation;
though in fact that is irrelevant, since e.g. "byrow=TRUE" has
no effect in matrix() -- so in fact there is no specification
in ?matrix as to whether to expect a row or column result).

Just a few thoughts. As I say we all get used to this stuff in
the end, but it can be bewildering (and a trap) for beginners.

Best wishes to all,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <[hidden email]>
Fax-to-email: +44 (0)870 094 0861
Date: 20-Aug-07                                       Time: 22:11:43
------------------------------ XFMail ------------------------------

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Does anyone.... worth a warning?!? No warning at all

François Pinard
[Ted Harding]

> [...] a very important point.  [...] There are a lot of idiosyncracies
> in R, which in time we get used to; but learning about them is
> something of a "sociological" exercise, just as one learns that when
> one's friend A says "X Y Z" is may not mean the same as when one's
> friend B says it.  [...] Another example is in the use of %*% for
> matrix multiplication when one or both of the factors is a vector.  
> [...] Just a few thoughts.  As I say we all get used to this stuff in
> the end, but it can be bewildering (and a trap) for beginners.

Using R is a bit akin to smoking.  Beginnings are difficult, one may get
headaches, and even gag on the first experiences.  But in the long run,
it becomes pleasurable, and even addictive.  Yet, deep down, for those
willing to be honest, there is something not fully healthy in it.

While I appreciate many of the virtues of R, as a language, it has a few
flaws.  Besides, as a library, and despite many commendable symmetries
and beauties, it sometimes suffers from irregularities in its various
specifications and offerings -- likely for historical reasons -- maybe
lack of coordination while aging, or maybe needs of S compatibility.

These irregularities are sometimes documented clearly, yet in many
cases, exegesis is required.  Moreover, around documentation, there is
a question of attitude.  While some R maintainers are refreshingly
open-minded, others are strongly reluctant to reconsider anything which
has been written, as if the mere fact of documenting a detail was fixing
it in the universe and eternity; they would then argue to death against
slightest changes.  In a word, because almost impossible to repair in
practice, R idiosyncrasies are likely to stay.

Accepting them (idiosyncrasies, irregularities) is part of the game.  
Correcting them a tiny bit at a time (like, for example, the "mean"
behaviour at the origin of this thread) might overall take forever and
shake myriads of electrons within tons of discussions.  I'm not sure it
is a worth undertaking.  For one, I prefer learning to be productive
with R as it stands, even knowing it could have been a bit better.

--
François Pinard   http://pinard.progiciels-bpi.ca

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Does anyone.... worth a warning?!? No warning at all

Mike Meredith
In reply to this post by Tom Willems-2
It's always seemed to me that 'mean' behaved as expected, and 'max' et al were peculiar. If you passed 2 or more vectors to a function would you really expect it to concatenate them before doing it's proper job? I'd rather expect it to behave like 'pmax' and compare them element by element.

Maybe 'max' should generate a warning.

Cheers,  Mike


Tom Willems-2 wrote
dear Mathew

mean is a Generic function

mean(x...)

in wich x is a data object, like a  data frame a list a numeric vector...

so in your example it only reads the first character and then reports it.

try x = c(1,1,2)
mean(x)

kind regards,
Tom


Disclaimer: click here
        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Vectors in R (WAS Re: Does anyone.... worth a warning?!? No warning at all)

Stephen Tucker
In reply to this post by Ted.Harding

Dear Ted (and community),

You raise a very interesting point - namely, what should and should
not be called a "vector" in R (it's neither a class or mode,
formally). I don't know which version of the R Language Definition you
were quoting from, but mine (Version 2.5.1 DRAFT), says:

"Vectors can be thought of as contiguous cells containing data."

(doesn't say "homogeneous" in the version that I have). In that sense
it's more analagous to 'lists' in Python, Scheme, etc. (with the
additional benefit that the "names" attribute for R vectors allows you
to use them also as 'dictionaries' or 'hash tables'), and less like
the 1-D array used in mathematics. (Incidentally, the "array" class in
Python is like the "matrix" and "array" classes in R, which do require
specification of row or column).

In any case, the quote above is more consistent with my understanding
of the basic data objects in R, as "atomic vectors" and "lists" are
both "contiguous cells containing data", only that they differ in the
value of their "mode" attributes. I think it can be a bit confusing
when they are introduced separately (e.g., in the R Language
Definition document with headings, "Vectors" and "Lists" in section
2.1) - though I think its origin lies in the pedagogy of the
language. For instance, introductory documents often show off R as a
calculator and draw the analogy between the "vector" notation used in
mathematics and the application of "+"() [as an operator rather than a
function] on a pair of numeric vectors in R. This is probably due to
the background of the audience these documents are intended to address
(Python/Scheme, perhaps more computer science; R/S, more statistics or
mathematics perhaps). I think this is a bit unfortunate as students
can get stuck with the idea that there are (atomic) vectors, and then
another thing called a "list" - and then later he/she is told that a
list is a vector as well, and has to reconcile this new bit of
information - while conceptually they are similar except that a
certain set of functions (e.g., the arithemetic operators and string
functions) cannot be applied to vectors of mode "list", but many other
functions (e.g., extraction, subsetting, replacement) can be applied
in the same way.

This article was very elucidating:

Statistical programming with R, Part 3: Reusable and object-oriented
programming
<http://www.ibm.com/developerworks/linux/library/l-r3.html>

In it, David Mertz says:

'The main thing to keep in mind about R data is that "everything is a
vector." Even objects that look superficially distinct from vectors --
matrices, arrays, data.frames, etc. -- are really just vectors with
extra (mutable) attributes that tell [generic functions in] R to treat
them in special ways.'

So matrices, arrays, lists, data frames, (and even factors) are all
vectors (used henceforth in the sense of "contiguous cells" as are
lists in Python/Scheme), with additional attributes attached. When
these attributes are removed, print() will allow us to view them to us
as 1-D objects (a sequence of values; not necessarily a 1-D row or
column matrix).

One defining attribute besides "mode" and "length" is the "class"
attribute, which determines the dispatch method for a generic
function. For instance, the "["() and "[<-"() functions allow N-D
subscripting notation for "matrix", "array", and "data.frame" classes,
but as they are also still vectors ("contiguous cells"), and therefore
can be subscripted as stated, "cells are accessed through indexing
operations such as x[5]."

This is important in it that it allows one to use many functions not
immediately thought of as applicable to data frames (which is a list,
which is a vector, etc.
<http://tolstoy.newcastle.edu.au/R/help/00b/2390.html>); for me that
would be functions like append(), replace(), etc. For example:

> df <- data.frame(a=1:5,c=11:15,d=16:20)
> append(df,list(b=6:10),1)
$a
[1] 1 2 3 4 5

$b
[1]  6  7  8  9 10

$c
[1] 11 12 13 14 15

$d
[1] 16 17 18 19 20

> replace(df,c(FALSE,TRUE,FALSE),list(b=21:25))
  a  c  d
1 1 21 16
2 2 22 17
3 3 23 18
4 4 24 19
5 5 25 20

append() returns a "list" because c() is invoked internally, and this
removes all extra attributes except names (including "class",
"row.names", etc.). So, retaining the intrinsic mode "list", the
append function returns a class "list" object by default ['If the
object does not have a class attribute, it has an implicit class,
"matrix", "array" or the result of mode(x)', says ?class] when applied
to a data frame.

On the other hand, replace() still returns a data frame because only
"[.<-data.frame"() is invoked so the returned object retains the class
of "data.frame".

Even factors, which fails the is.vector() test, are actually vectors
(IMHO). The R Language definition says,

"Factors are currently implemented using an integer array [which is a
vector] to specify the actual levels and a second array of names [in
the "levels" attribute] that are mapped to the integers."

As an example, the following behavior is also predictable in that if
we know how each function modifies the attributes, we can predict what
class of object is returned:

> f <- factor(letters[1:5])
> append(f,factor(letters[6:10]),2)
 [1] 1 2 1 2 3 4 5 3 4 5
> replace(f,c(FALSE,FALSE,TRUE,TRUE,FALSE),NA)
[1] a    b    <NA> <NA> e  
Levels: a b c d e

And for these cases also:
> append(factor(f,levels=letters[1:10]),
+        factor(letters[6:10],levels=letters[1:10]),2)
 [1]  1  2  6  7  8  9 10  3  4  5
> replace(factor(f,levels=letters[1:10]),c(FALSE,FALSE,TRUE,TRUE,FALSE),"g")
[1] a b g g e
Levels: a b c d e f g h i j

I understand S4 classes were introduced in S partly because in S3 the
"class" assignment doesn't necessarily raise an error if it isn't
consistent with the rest of the attributes, but then may yield
surprising results (or an error) when you pass that object to
functions that require access to those attributes.

I suppose the reason I'm throwing this out there is that for a while I
wasn't sure (1) which functions could be invoked on which objects
classes and (2) the class of object returned from each function (which
depends on the class of its argument) without reading the
documentation several times over; this also made explaining the
behavior of functions to colleagues and students learning R very tough
(clearly, my own shortcoming). But seeing everything as vectors
(again, in the sense of "contiguous cells") with mutable attributes,
made everything more transparent - that if a specific method does
not exist for, say a "data.frame" object, you can still call a
function on it if you treat the data frame as a heterogeneous vector
consisting of identical-length atomic vectors, and the structure of
the output is less unpredictable to me if I can figure out which
attributes are potentially modified in the returned object.

I wonder if anyone has additional thoughts on this.

Stephen

P.S. I agree that R/S does have its own peculiarities, but I think having
them is not unique to R at all! But then I suppose the question turns to
addressing  severity rather than the presence/absence of them...



--- [hidden email] wrote:

> On 20-Aug-07 19:55:44, Rolf Turner wrote:
> > On 20/08/2007, at 9:54 PM, Tom Willems wrote:
> >> dear Mathew
> >>
> >> mean is a Generic function
> >>
> >> mean(x...)
> >>
> >> in wich x is a data object, like a  data frame a list
> >> a numeric vector...
> >>
> >> so in your example it only reads the first character
> >> and then reports it.
> >>
> >> try x = c(1,1,2)
> >> mean(x)
> >
> > I think you've completely missed the point. I'm sure Mathew
> > now understands the syntax of the mean function. His point
> > was that it would be very easy for someone to use this
> > function incorrectly --- and he indicated very clearly *why*,
> > by giving an example using max().
> >
> > If mean() could be made safer to use by incorporating a warning,  
> > without unduly adding to overheads, then it would seem sensible
> > to incorporate such a warning.  Or to change the mean()
> > function so that mean(1,2,3) returns ``2'' --- just as max
> > (1,2,3) returns ``3'' --- as Mathew *initially* (and quite
> > reasonably) expected it to do.
> >
> > cheers,
> > Rolf Turner
>
> I think Rolf makes a very important point. There are a lot of
> idiosyncracies in R, which in time we get used to; but learning
> about them is something of a "sociological" exercise, just as
> one learns that when one's friend A says "X Y Z" is may not mean
> the same as when one's friend B says it.
>
> Another example is in the use of %*% for matrix multiplication
> when one or both of the factors is a vector. If you came to R
> from matlab/octave, where every vector is already either a row
> vector or a column vector, you knew where you stood. But in R
> the semantics of the syntax depend on the context in a more
> complicated way. In R, x<-c(-1,1) is called a "vector", but it
> does not have dimensions:
>
> x<-c(-1,1)
> dim(x)
> NULL
>
> So its relationship to matrix multiplication is ambiguous.
>
> For example:
>
> M<-matrix(c(1,2,3,4),nrow=2); M
>      [,1] [,2]
> [1,]    1    3
> [2,]    2    4
>
> x%*%M
>      [,1] [,2]
> [1,]    1    1
>
> and x is now coerced into a "column vector", which now (for that
> immediate purpose) now does have dimensions (just as a row vector
> would have in matlab/octave).
>
> Similarly,
>
> M%*%x
>      [,1]
> [1,]    2
> [2,]    2
>
> coerces it into a column vector. But now (asks the beginner who
> has not yet got round to looking up ?"%*%") what happens with x%*%x?
>
> Will we get column vector times row vector (a 2x2 matrix) or
> row times column (a scalar)? In fact we get the latter:
>
> x%*%x
>      [,1]
> [1,]    2
>
> All this is in accordance with ?"%*%":
>
> Description:
>      Multiplies two matrices, if they are conformable. If one argument
>      is a vector, it will be coerced to a either a row or column matrix
>      to make the two arguments conformable. If both are vectors it will
>      return the inner product.
>
>
> But now suppose y<-c(1,2,3), with x<-c(-1,1) as before.
>
> x%*%y
> Error in x %*% y : non-conformable arguments
>
> because it is trying to make the inner product of vectors of unequal
> length. Whereas someone who had got as far as the second sentence of
> the Description, and did not take the hird sentence as strictly
> literally as intended, might expect that x would be coerced into
> column, and y into row, so that they were conformable for
> multiplication, giving a 2x3 matrix result (perhaps on the grounds
> that "it will return the inner product" means that it will do this
> if they are conformable, otherwise doing the coercions described
> in the first sentence).
>
> That misunderstanding could be avoided if the last sentence read:
>
> "If both are vectors it will return the inner product provided
> both are the same length; otherwise it is an error and nothing
> is returned."
>
> Or perhaps x or y should not be called "vector" -- in linear
> algebra people are used to "vector" being another name for
> a 1-dimensional matrix, being either "row" or "column".
> The R entity is not that sort of thing at all. The closest
> that "R Language Definition" comes to defining it is:
>
> "Vectors can be thought of as contiguous cells containing
> homogeneous data. Cells are accessed through indexing
> operations such as x[5]."
>
> x<-matrix(x) will, of course, turn x into a paid-up column vector
> (as you might guess from ?matrix, if you copy the "byrow=FALSE"
> from the "as.matrix" explanation to the "matrix" explanation;
> though in fact that is irrelevant, since e.g. "byrow=TRUE" has
> no effect in matrix() -- so in fact there is no specification
> in ?matrix as to whether to expect a row or column result).
>
> Just a few thoughts. As I say we all get used to this stuff in
> the end, but it can be bewildering (and a trap) for beginners.
>
> Best wishes to all,
> Ted.
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <[hidden email]>
> Fax-to-email: +44 (0)870 094 0861
> Date: 20-Aug-07                                       Time: 22:11:43
> ------------------------------ XFMail ------------------------------
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



       
____________________________________________________________________________________

Comedy with an Edge to see what's on, when.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.