Aggregate behaviour inconsistent (?) when FUN=table

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Aggregate behaviour inconsistent (?) when FUN=table

Alain Guillet-2
Dear R users,

When I use aggregate with table as FUN, I get what I would call a
strange behaviour if it involves numerical vectors and one "level" of it
is not present for every "levels" of the "by" variable:

---------------------------

 > df <-
data.frame(A=c(1,1,1,1,0,0,0,0),B=c(1,0,1,0,0,0,1,0),C=c(1,0,1,0,0,1,1,1))
 > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
   Group.1 A.0 A.1    B
1       0   1   2    3
2       1   3   2 2, 3

 > table(df$C,df$B)

     0 1
   0 3 0
   1 2 3

---------------

As you can see, a comma appears in the column with the variable B in the
aggregate whereas when I call table I obtain the same result as if B was
defined as a factor (I suppose it comes from the fact "non-factor
arguments a are coerced via factor" according to the details of the
table help). I find it completely normal if I remember that aggregate
first splits the data into subsets and then compute the table. But then
I don't understand why it works differently with character vectors.
Indeed if I use character vectors, I get the same result as with factors:

------------------------

 > df <-
data.frame(A=factor(c("1","1","1","1","0","0","0","0")),B=factor(c("1","0","1","0","0","0","1","0")),C=factor(c("1","0","1","0","0","1","1","1")))
 > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
   Group.1 A.0 A.1 B.0 B.1
1       0   1   2   3   0
2       1   3   2   2   3

 > df <-
data.frame(A=factor(c(1,1,1,1,0,0,0,0)),B=factor(c(1,0,1,0,0,0,1,0)),C=factor(c(1,0,1,0,0,1,1,1)))
 > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
   Group.1 A.0 A.1 B.0 B.1
1       0   1   2   3   0
2       1   3   2   2   3

---------------------

Is it possible to precise anything about this behaviour in the aggregate
help since the result is not completely compatible with the expectation
of result we can have according to the table help? Or would it be
possible to have the same results independently of the vector type? This
post was rejected on the R-devel mailing list so I ask my question here
as suggested.


Best regards,
Alain Guillet

--
Alain Guillet
Statistician and Computer Scientist

SMCS - IMMAQ - Université catholique de Louvain
http://www.uclouvain.be/smcs

Bureau c.316
Voie du Roman Pays, 20 (bte L1.04.01)
B-1348 Louvain-la-Neuve
Belgium

Tel: +32 10 47 30 50

Accès: http://www.uclouvain.be/323631.html

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Aggregate behaviour inconsistent (?) when FUN=table

Jeff Newmiller
The normal input to a factory that builds cars is car parts. Feeding whole trucks into such a factory is likely to yield odd-looking results.

Both aggregate and table do similar kinds of things, but yield differently constructed outputs. The output of the table function is not well-suited to be used as the aggregated value to be compiled into a data frame by the aggregate function, so having aggregate call the table function will yield surprises.

I am having some difficulty deciphering what it is you are trying to accomplish with all this, so I will guess that you are trying to reproduce the information output from

table( df$C, df$B )

so

aggregate( df$A, df[ , c( "C", "B" ) ], length )

but if that isn't what you want then perhaps you can clarify what result you want to see and we can help you get there.
--
Sent from my phone. Please excuse my brevity.

On February 6, 2018 12:20:03 AM PST, Alain Guillet <[hidden email]> wrote:

>Dear R users,
>
>When I use aggregate with table as FUN, I get what I would call a
>strange behaviour if it involves numerical vectors and one "level" of
>it
>is not present for every "levels" of the "by" variable:
>
>---------------------------
>
> > df <-
>data.frame(A=c(1,1,1,1,0,0,0,0),B=c(1,0,1,0,0,0,1,0),C=c(1,0,1,0,0,1,1,1))
> > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
>   Group.1 A.0 A.1    B
>1       0   1   2    3
>2       1   3   2 2, 3
>
> > table(df$C,df$B)
>
>     0 1
>   0 3 0
>   1 2 3
>
>---------------
>
>As you can see, a comma appears in the column with the variable B in
>the
>aggregate whereas when I call table I obtain the same result as if B
>was
>defined as a factor (I suppose it comes from the fact "non-factor
>arguments a are coerced via factor" according to the details of the
>table help). I find it completely normal if I remember that aggregate
>first splits the data into subsets and then compute the table. But then
>
>I don't understand why it works differently with character vectors.
>Indeed if I use character vectors, I get the same result as with
>factors:
>
>------------------------
>
> > df <-
>data.frame(A=factor(c("1","1","1","1","0","0","0","0")),B=factor(c("1","0","1","0","0","0","1","0")),C=factor(c("1","0","1","0","0","1","1","1")))
> > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
>   Group.1 A.0 A.1 B.0 B.1
>1       0   1   2   3   0
>2       1   3   2   2   3
>
> > df <-
>data.frame(A=factor(c(1,1,1,1,0,0,0,0)),B=factor(c(1,0,1,0,0,0,1,0)),C=factor(c(1,0,1,0,0,1,1,1)))
> > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
>   Group.1 A.0 A.1 B.0 B.1
>1       0   1   2   3   0
>2       1   3   2   2   3
>
>---------------------
>
>Is it possible to precise anything about this behaviour in the
>aggregate
>help since the result is not completely compatible with the expectation
>
>of result we can have according to the table help? Or would it be
>possible to have the same results independently of the vector type?
>This
>post was rejected on the R-devel mailing list so I ask my question here
>
>as suggested.
>
>
>Best regards,
>Alain Guillet

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Aggregate behaviour inconsistent (?) when FUN=table

R help mailing list-2
In reply to this post by Alain Guillet-2
Don't use aggregate's simplify=TRUE when FUN() produces return
values of various dimensions.  In your case, the shape of table(subset)'s
return value depends on the number of levels in the factor 'subset'.
If you make B a factor before splitting it by C, each split will have the
same number of levels (2).  If you split it and then let table convert
each split to a factor, one split will have 1 level and the other 2.  To see
the details of the output , use str() instead of print().


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Feb 6, 2018 at 12:20 AM, Alain Guillet <[hidden email]>
wrote:

> Dear R users,
>
> When I use aggregate with table as FUN, I get what I would call a strange
> behaviour if it involves numerical vectors and one "level" of it is not
> present for every "levels" of the "by" variable:
>
> ---------------------------
>
> > df <- data.frame(A=c(1,1,1,1,0,0,0,0),B=c(1,0,1,0,0,0,1,0),C=c(1,0
> ,1,0,0,1,1,1))
> > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
>   Group.1 A.0 A.1    B
> 1       0   1   2    3
> 2       1   3   2 2, 3
>
> > table(df$C,df$B)
>
>     0 1
>   0 3 0
>   1 2 3
>
> ---------------
>
> As you can see, a comma appears in the column with the variable B in the
> aggregate whereas when I call table I obtain the same result as if B was
> defined as a factor (I suppose it comes from the fact "non-factor arguments
> a are coerced via factor" according to the details of the table help). I
> find it completely normal if I remember that aggregate first splits the
> data into subsets and then compute the table. But then I don't understand
> why it works differently with character vectors. Indeed if I use character
> vectors, I get the same result as with factors:
>
> ------------------------
>
> > df <- data.frame(A=factor(c("1","1","1","1","0","0","0","0")),B=fa
> ctor(c("1","0","1","0","0","0","1","0")),C=factor(c("1","0",
> "1","0","0","1","1","1")))
> > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
>   Group.1 A.0 A.1 B.0 B.1
> 1       0   1   2   3   0
> 2       1   3   2   2   3
>
> > df <- data.frame(A=factor(c(1,1,1,1,0,0,0,0)),B=factor(c(1,0,1,0,0
> ,0,1,0)),C=factor(c(1,0,1,0,0,1,1,1)))
> > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
>   Group.1 A.0 A.1 B.0 B.1
> 1       0   1   2   3   0
> 2       1   3   2   2   3
>
> ---------------------
>
> Is it possible to precise anything about this behaviour in the aggregate
> help since the result is not completely compatible with the expectation of
> result we can have according to the table help? Or would it be possible to
> have the same results independently of the vector type? This post was
> rejected on the R-devel mailing list so I ask my question here as suggested.
>
>
> Best regards,
> Alain Guillet
>
> --
> Alain Guillet
> Statistician and Computer Scientist
>
> SMCS - IMMAQ - Université catholique de Louvain
> http://www.uclouvain.be/smcs
>
> Bureau c.316
> Voie du Roman Pays, 20 (bte L1.04.01)
> B-1348 Louvain-la-Neuve
> Belgium
>
> Tel: +32 10 47 30 50
>
> Accès: http://www.uclouvain.be/323631.html
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posti
> ng-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Aggregate behaviour inconsistent (?) when FUN=table

Alain Guillet-2
Thank you for your response. Note that with R 3.4.3, I get the same
result with simplify=TRUE or simplify=FALSE.

My problem was the behaviour was different if I define my columns as
character or as numeric but for now some minutes I discovered there also
is a stringsAsFactors option in the function data.frame. So yes, it was
a stupid question and I apologize for it.


On 06/02/2018 18:07, William Dunlap wrote:

> Don't use aggregate's simplify=TRUE when FUN() produces return
> values of various dimensions.  In your case, the shape of table(subset)'s
> return value depends on the number of levels in the factor 'subset'.
> If you make B a factor before splitting it by C, each split will have the
> same number of levels (2).  If you split it and then let table convert
> each split to a factor, one split will have 1 level and the other 2. 
> To see
> the details of the output , use str() instead of print().
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com <http://tibco.com>
>
> On Tue, Feb 6, 2018 at 12:20 AM, Alain Guillet
> <[hidden email] <mailto:[hidden email]>> wrote:
>
>     Dear R users,
>
>     When I use aggregate with table as FUN, I get what I would call a
>     strange behaviour if it involves numerical vectors and one "level"
>     of it is not present for every "levels" of the "by" variable:
>
>     ---------------------------
>
>     > df <-
>     data.frame(A=c(1,1,1,1,0,0,0,0),B=c(1,0,1,0,0,0,1,0),C=c(1,0,1,0,0,1,1,1))
>     > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
>       Group.1 A.0 A.1    B
>     1       0   1   2    3
>     2       1   3   2 2, 3
>
>     > table(df$C,df$B)
>
>         0 1
>       0 3 0
>       1 2 3
>
>     ---------------
>
>     As you can see, a comma appears in the column with the variable B
>     in the aggregate whereas when I call table I obtain the same
>     result as if B was defined as a factor (I suppose it comes from
>     the fact "non-factor arguments a are coerced via factor" according
>     to the details of the table help). I find it completely normal if
>     I remember that aggregate first splits the data into subsets and
>     then compute the table. But then I don't understand why it works
>     differently with character vectors. Indeed if I use character
>     vectors, I get the same result as with factors:
>
>     ------------------------
>
>     > df <-
>     data.frame(A=factor(c("1","1","1","1","0","0","0","0")),B=factor(c("1","0","1","0","0","0","1","0")),C=factor(c("1","0","1","0","0","1","1","1")))
>     > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
>       Group.1 A.0 A.1 B.0 B.1
>     1       0   1   2   3   0
>     2       1   3   2   2   3
>
>     > df <-
>     data.frame(A=factor(c(1,1,1,1,0,0,0,0)),B=factor(c(1,0,1,0,0,0,1,0)),C=factor(c(1,0,1,0,0,1,1,1)))
>     > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE)
>       Group.1 A.0 A.1 B.0 B.1
>     1       0   1   2   3   0
>     2       1   3   2   2   3
>
>     ---------------------
>
>     Is it possible to precise anything about this behaviour in the
>     aggregate help since the result is not completely compatible with
>     the expectation of result we can have according to the table help?
>     Or would it be possible to have the same results independently of
>     the vector type? This post was rejected on the R-devel mailing
>     list so I ask my question here as suggested.
>
>
>     Best regards,
>     Alain Guillet
>
>     --
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.