

Hi,
I'm trying to create a simple function that takes a dataframe as its only argument. I've been using gmodels::CrossTable, but it requires a lot of arguments, e.g.:
#this runs fine
CrossTable(data$col1, data$col2, prop.chisq = FALSE, prop.c = FALSE, prop.t = FALSE, format = "SPSS")
Moreover, I wanted to make it compatible with piping, so I decided to create the following function:
ctab < function(data) {
CrossTable(data[,1], data[,2], prop.chisq = FALSE, prop.c = FALSE, prop.t = FALSE, format = "SPSS")
}
When I try to use this function, however, I get the following error:
#this results in 'Error: Must use a vector in `[`, not an object of class matrix.'
data %>% select(col1, col2) %>% ctab()
I tried searching online but couldn't find much about that error (except for in specific and unrelated cases). Moreover, when I created a very simple dataset, it turns out there's no problem:
#this runs fine
data.frame(C1 = c('x','y','x','y'), C2 = c('a','a','b','b')) %>% ctab()
Is this a problem with my function or the data? If it's the data, why does directly calling CrossTable work?
Thanks!
Best,
Zach
On 20/09/2019 11:30 a.m., Zachary Lim wrote:
Presumably data %>% select(col1, col2) isn't giving you a dataframe.
However, you haven't given us a reproducible example, so I can't tell
you what it's doing. But that's where you should look.
Duncan Murdoch
Hello,
Something like this?
ctab < function(data) {
gmodels::CrossTable(as.matrix(data), prop.chisq = FALSE, prop.c =
FALSE, prop.t = FALSE, format = "SPSS")
}
mtcars %>% select(cyl, gear) %>% ctab()
Hope this helps,
Rui Barradas
The dplyr::select function returns a special variety of data.frame called a tibble. The tibble has certain features designed to make it behave consistently when indexing is used. Specifically, the `[` operator always returns a tibble regardless of how many columns are indicated by the column index. This is unlike the conventional data frame which returns a vector when exactly one column is indicated by the column index, or a data.frame if more than one is indicated.
A syntax that consistently yields a column vector with both tibbles and data.frames is
dta[[ 1 ]]
so
ctab < function(data) {
CrossTable(data[[1]], data[[2]], prop.chisq = FALSE, prop.c = FALSE,
prop.t = FALSE, format = "SPSS")
}
should work.
On 21/09/2019 7:38 a.m., Jeff Newmiller wrote:
> The dplyr::select function returns a special variety of data.frame called a tibble.
I don't think that's always true. The docs say it returns "An object of
the same class as .data.", and that's what I'm seeing:
> str(data.frame(a=c(1,1,2,2), b=1:4) %>% subset(a == 1))
'data.frame': 2 obs. of 2 variables:
$ a: num 1 1
$ b: int 1 2
But I believe there are other dplyr functions that take dataframes as
input and return tibbles, I just don't know which ones.
Duncan Murdoch
The tibble has certain features designed to make it behave consistently
when indexing is used. Specifically, the `[` operator always returns a
tibble regardless of how many columns are indicated by the column index.
This is unlike the conventional data frame which returns a vector when
exactly one column is indicated by the column index, or a data.frame if
more than one is indicated.
>
> A syntax that consistently yields a column vector with both tibbles and data.frames is
>
> dta[[ 1 ]]
>
> so
>
> ctab < function(data) {
> CrossTable(data[[1]], data[[2]], prop.chisq = FALSE, prop.c = FALSE,
> prop.t = FALSE, format = "SPSS")
> }
>
> should work.
>
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Your use of subset instead of select does not help, but a corrected example does indeed confirm your point.
library(dplyr)
str(data.frame(a=c(1,1,2,2), b=1:4) %>% select(b,a))
## 'data.frame': 4 obs. of 2 variables:
## $ b: int 1 2 3 4
## $ a: num 1 1 2 2
However the `[` issue is still worth addressing. If that does not fix the problem then a dput(head(troublesomedata)) from Zachary will be needed to figure out what actually is going on.
On 21/09/2019 9:05 a.m., Jeff Newmiller wrote:
Your use of subset instead of select does not help,
Whoops, sorry. Thanks for doing the real check.
Duncan
> library(dplyr)
>
> str(data.frame(a=c(1,1,2,2), b=1:4) %>% select(b,a))
> ## 'data.frame': 4 obs. of 2 variables:
> ## $ b: int 1 2 3 4
> ## $ a: num 1 1 2 2
>
> However the `[` issue is still worth addressing. If that does not fix the problem then a dput(head(troublesomedata)) from Zachary will be needed to figure out what actually is going on.
>
