Why can't I access this type?

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Why can't I access this type?

Yves S. Garret
Hi, I'm just learning my way around R.  I got a bunch of states and would
like to access to get all of the ones where it's cold.  But when I do the
following, I will get the following error:

> all.states <- as.data.frame(state.x77)
> cold.states <- all.states[all.states$Frost > 150, c("Name", "Frost")]
Error in `[.data.frame`(all.states, all.states$Frost > 150, c("Name",  :
  undefined columns selected

I don't get it.  When I look at all.states, this is what I see:

> str(all.states)
'data.frame':   50 obs. of  8 variables:
 $ Population: num  3615 365 2212 2110 21198 ...
 $ Income    : num  3624 6315 4530 3378 5114 ...
 $ Illiteracy: num  2.1 1.5 1.8 1.9 1.1 0.7 1.1 0.9 1.3 2 ...
 $ Life Exp  : num  69 69.3 70.5 70.7 71.7 ...
 $ Murder    : num  15.1 11.3 7.8 10.1 10.3 6.8 3.1 6.2 10.7 13.9 ...
 $ HS Grad   : num  41.3 66.7 58.1 39.9 62.6 63.9 56 54.6 52.6 40.6 ...
 $ Frost     : num  20 152 15 65 20 166 139 103 11 60 ...
 $ Area      : num  50708 566432 113417 51945 156361 ...

What am I messing up?

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Why can't I access this type?

Bert Gunter
Your data frame contains no column named "Name" .

Maybe what you want is

rownames(all.states)[all.state$Frost>150]

However, what you clearly need to do is stop posting until you have
done your homework by spending some time with one of the many good R
tutorials that are out there (possibly Intro to R, which ships with R,
though it's getting a bit dated now). This appears to be a very basic
question. If you are going through a tutorial and got stuck here, then
note that row names are an attribute of the data frame, not a column
name of one of its columns. See ?rownames and the links therein for
more info.

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll




On Sun, Mar 22, 2015 at 7:39 AM, Yves S. Garret
<[hidden email]> wrote:

> Hi, I'm just learning my way around R.  I got a bunch of states and would
> like to access to get all of the ones where it's cold.  But when I do the
> following, I will get the following error:
>
>> all.states <- as.data.frame(state.x77)
>> cold.states <- all.states[all.states$Frost > 150, c("Name", "Frost")]
> Error in `[.data.frame`(all.states, all.states$Frost > 150, c("Name",  :
>   undefined columns selected
>
> I don't get it.  When I look at all.states, this is what I see:
>
>> str(all.states)
> 'data.frame':   50 obs. of  8 variables:
>  $ Population: num  3615 365 2212 2110 21198 ...
>  $ Income    : num  3624 6315 4530 3378 5114 ...
>  $ Illiteracy: num  2.1 1.5 1.8 1.9 1.1 0.7 1.1 0.9 1.3 2 ...
>  $ Life Exp  : num  69 69.3 70.5 70.7 71.7 ...
>  $ Murder    : num  15.1 11.3 7.8 10.1 10.3 6.8 3.1 6.2 10.7 13.9 ...
>  $ HS Grad   : num  41.3 66.7 58.1 39.9 62.6 63.9 56 54.6 52.6 40.6 ...
>  $ Frost     : num  20 152 15 65 20 166 139 103 11 60 ...
>  $ Area      : num  50708 566432 113417 51945 156361 ...
>
> What am I messing up?
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Why can't I access this type?

John Kane
In reply to this post by Yves S. Garret
Well, first off, you have no variable called "Name".  You have lost the state names as they are rownames in the matrix state.x77 and not a variable.

Try this. It's ugly and I have no idea why I had to do a cbind() but it seems to work. Personally I find subset easier to read than the indexing approach.

state  <-  rownames(state.x77)
all.states <- as.data.frame(state.x77)
all.states  <-  cbind(state, all.states) ### ?????

coldstates  <-   subset(all.states, all.states$Frost > 50,
                        select = c("state","Frost") )


John Kane
Kingston ON Canada


> -----Original Message-----
> From: [hidden email]
> Sent: Sun, 22 Mar 2015 10:39:03 -0400
> To: [hidden email]
> Subject: [R] Why can't I access this type?
>
> Hi, I'm just learning my way around R.  I got a bunch of states and would
> like to access to get all of the ones where it's cold.  But when I do the
> following, I will get the following error:
>
>> all.states <- as.data.frame(state.x77)
>> cold.states <- all.states[all.states$Frost > 150, c("Name", "Frost")]
> Error in `[.data.frame`(all.states, all.states$Frost > 150, c("Name",  :
>   undefined columns selected
>
> I don't get it.  When I look at all.states, this is what I see:
>
>> str(all.states)
> 'data.frame':   50 obs. of  8 variables:
>  $ Population: num  3615 365 2212 2110 21198 ...
>  $ Income    : num  3624 6315 4530 3378 5114 ...
>  $ Illiteracy: num  2.1 1.5 1.8 1.9 1.1 0.7 1.1 0.9 1.3 2 ...
>  $ Life Exp  : num  69 69.3 70.5 70.7 71.7 ...
>  $ Murder    : num  15.1 11.3 7.8 10.1 10.3 6.8 3.1 6.2 10.7 13.9 ...
>  $ HS Grad   : num  41.3 66.7 58.1 39.9 62.6 63.9 56 54.6 52.6 40.6 ...
>  $ Frost     : num  20 152 15 65 20 166 139 103 11 60 ...
>  $ Area      : num  50708 566432 113417 51945 156361 ...
>
> What am I messing up?
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

____________________________________________________________
FREE ONLINE PHOTOSHARING - Share your photos online with your friends and family!
Visit http://www.inbox.com/photosharing to find out more!

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Why can't I access this type?

p_connolly
On Sun, 22-Mar-2015 at 08:06AM -0800, John Kane wrote:

|> Well, first off, you have no variable called "Name".  You have lost
|> the state names as they are rownames in the matrix state.x77 and
|> not a variable.
|>
|> Try this. It's ugly and I have no idea why I had to do a cbind()

You don't have to use cbind

|> but it seems to work. Personally I find subset easier to read than
|> the indexing approach.

|> state  <-  rownames(state.x77)
|> all.states <- as.data.frame(state.x77)
|> all.states  <-  cbind(state, all.states) ### ?????

You don't have to use cbind()

all.states  <- within(as.data.frame(state.x77), state <- rownames(state.x77))

but I think cbind is simpler to read.

|>
|> coldstates  <-   subset(all.states, all.states$Frost > 50,
|>                         select = c("state","Frost") )

Tidier, even more so than subset():

require(dplyr)
coldstates <- all.states %>% filter(Frost > 150) %>% select(state, Frost)

Or, easier to see what's happening:

coldstates <- all.states %>%
  filter(Frost > 150) %>%
  select(state, Frost)


|>
|>
|> John Kane
|> Kingston ON Canada
|>
|>
|> > -----Original Message-----
|> > From: [hidden email]
|> > Sent: Sun, 22 Mar 2015 10:39:03 -0400
|> > To: [hidden email]
|> > Subject: [R] Why can't I access this type?
|> >
|> > Hi, I'm just learning my way around R.  I got a bunch of states and would
|> > like to access to get all of the ones where it's cold.  But when I do the
|> > following, I will get the following error:
|> >
|> >> all.states <- as.data.frame(state.x77)
|> >> cold.states <- all.states[all.states$Frost > 150, c("Name", "Frost")]
|> > Error in `[.data.frame`(all.states, all.states$Frost > 150, c("Name",  :
|> >   undefined columns selected
|> >
|> > I don't get it.  When I look at all.states, this is what I see:
|> >
|> >> str(all.states)
|> > 'data.frame':   50 obs. of  8 variables:
|> >  $ Population: num  3615 365 2212 2110 21198 ...
|> >  $ Income    : num  3624 6315 4530 3378 5114 ...
|> >  $ Illiteracy: num  2.1 1.5 1.8 1.9 1.1 0.7 1.1 0.9 1.3 2 ...
|> >  $ Life Exp  : num  69 69.3 70.5 70.7 71.7 ...
|> >  $ Murder    : num  15.1 11.3 7.8 10.1 10.3 6.8 3.1 6.2 10.7 13.9 ...
|> >  $ HS Grad   : num  41.3 66.7 58.1 39.9 62.6 63.9 56 54.6 52.6 40.6 ...
|> >  $ Frost     : num  20 152 15 65 20 166 139 103 11 60 ...
|> >  $ Area      : num  50708 566432 113417 51945 156361 ...
|> >
|> > What am I messing up?
|> >
|> > [[alternative HTML version deleted]]
|> >
|> > ______________________________________________
|> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
|> > https://stat.ethz.ch/mailman/listinfo/r-help
|> > PLEASE do read the posting guide
|> > http://www.R-project.org/posting-guide.html
|> > and provide commented, minimal, self-contained, reproducible code.
|>
|> ____________________________________________________________
|> FREE ONLINE PHOTOSHARING - Share your photos online with your friends and family!
|> Visit http://www.inbox.com/photosharing to find out more!
|>
|> ______________________________________________
|> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
|> https://stat.ethz.ch/mailman/listinfo/r-help
|> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
|> and provide commented, minimal, self-contained, reproducible code.

--
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.  
   ___    Patrick Connolly  
 {~._.~}                   Great minds discuss ideas    
 _( Y )_           Average minds discuss events
(:_~*~_:)                  Small minds discuss people  
 (_)-(_)                        ..... Eleanor Roosevelt
         
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Why can't I access this type?

Henric Winell
On 2015-03-25 09:40, Patrick Connolly wrote:

> On Sun, 22-Mar-2015 at 08:06AM -0800, John Kane wrote:
>
> |> Well, first off, you have no variable called "Name".  You have lost
> |> the state names as they are rownames in the matrix state.x77 and
> |> not a variable.
> |>
> |> Try this. It's ugly and I have no idea why I had to do a cbind()
>
> You don't have to use cbind
>
> |> but it seems to work. Personally I find subset easier to read than
> |> the indexing approach.
>
> |> state  <-  rownames(state.x77)
> |> all.states <- as.data.frame(state.x77)
> |> all.states  <-  cbind(state, all.states) ### ?????
>
> You don't have to use cbind()
>
> all.states  <- within(as.data.frame(state.x77), state <- rownames(state.x77))
>
> but I think cbind is simpler to read.
>
> |>
> |> coldstates  <-   subset(all.states, all.states$Frost > 50,
> |>                         select = c("state","Frost") )

I find the indexing approach

coldstates <- all.states[all.states$Frost > 150, c("state","Frost")]

to be the most direct and obvious solution.

> Tidier, even more so than subset():
>
> require(dplyr)
> coldstates <- all.states %>% filter(Frost > 150) %>% select(state, Frost)
>
> Or, easier to see what's happening:
>
> coldstates <- all.states %>%
>    filter(Frost > 150) %>%
>    select(state, Frost)

Well...  Opinions may perhaps differ, but apart from '%>%' being
butt-ugly it's also fairly slow:

 > library("microbenchmark")
 > microbenchmark(
+     subset(all.states, all.states$Frost > 150, select =
c("state","Frost")),
+     all.states[all.states$Frost > 150, c("state","Frost")],
+     all.states %>% filter(Frost > 150) %>% select(state, Frost),
+     times = 1000L
+ )
Unit: microseconds
 
    expr
  subset(all.states, all.states$Frost > 150, select = c("state",
"Frost"))
                        all.states[all.states$Frost > 150, c("state",
"Frost")]
                    all.states %>% filter(Frost > 150) %>% select(state,
Frost)
       min       lq      mean    median        uq      max neval cld
   139.112  148.673  163.3960  159.1760  170.7895 1763.200  1000  b
   104.039  111.973  127.2138  120.4395  128.6640 1381.809  1000 a
  1010.076 1033.519 1133.1469 1107.8480 1175.1800 2932.206  1000   c

Of course, this doesn't matter for interactive one-off use.  But lately
I've seen examples of the '%>%' operator creeping into functions in
packages.  However, it would be nice to see a fast pipe operator as part
of base R.


Henric Winell



>
>
> |>
> |>
> |> John Kane
> |> Kingston ON Canada
> |>
> |>
> |> > -----Original Message-----
> |> > From: [hidden email]
> |> > Sent: Sun, 22 Mar 2015 10:39:03 -0400
> |> > To: [hidden email]
> |> > Subject: [R] Why can't I access this type?
> |> >
> |> > Hi, I'm just learning my way around R.  I got a bunch of states and would
> |> > like to access to get all of the ones where it's cold.  But when I do the
> |> > following, I will get the following error:
> |> >
> |> >> all.states <- as.data.frame(state.x77)
> |> >> cold.states <- all.states[all.states$Frost > 150, c("Name", "Frost")]
> |> > Error in `[.data.frame`(all.states, all.states$Frost > 150, c("Name",  :
> |> >   undefined columns selected
> |> >
> |> > I don't get it.  When I look at all.states, this is what I see:
> |> >
> |> >> str(all.states)
> |> > 'data.frame':   50 obs. of  8 variables:
> |> >  $ Population: num  3615 365 2212 2110 21198 ...
> |> >  $ Income    : num  3624 6315 4530 3378 5114 ...
> |> >  $ Illiteracy: num  2.1 1.5 1.8 1.9 1.1 0.7 1.1 0.9 1.3 2 ...
> |> >  $ Life Exp  : num  69 69.3 70.5 70.7 71.7 ...
> |> >  $ Murder    : num  15.1 11.3 7.8 10.1 10.3 6.8 3.1 6.2 10.7 13.9 ...
> |> >  $ HS Grad   : num  41.3 66.7 58.1 39.9 62.6 63.9 56 54.6 52.6 40.6 ...
> |> >  $ Frost     : num  20 152 15 65 20 166 139 103 11 60 ...
> |> >  $ Area      : num  50708 566432 113417 51945 156361 ...
> |> >
> |> > What am I messing up?
> |> >
> |> > [[alternative HTML version deleted]]
> |> >
> |> > ______________________________________________
> |> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> |> > https://stat.ethz.ch/mailman/listinfo/r-help
> |> > PLEASE do read the posting guide
> |> > http://www.R-project.org/posting-guide.html
> |> > and provide commented, minimal, self-contained, reproducible code.
> |>
> |> ____________________________________________________________
> |> FREE ONLINE PHOTOSHARING - Share your photos online with your friends and family!
> |> Visit http://www.inbox.com/photosharing to find out more!
> |>
> |> ______________________________________________
> |> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> |> https://stat.ethz.ch/mailman/listinfo/r-help
> |> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> |> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Using and abusing %>% (was Re: Why can't I access this type?)

p_connolly
On Wed, 25-Mar-2015 at 03:14PM +0100, Henric Winell wrote:

...

|> Well...  Opinions may perhaps differ, but apart from '%>%' being
|> butt-ugly it's also fairly slow:

Beauty, it is said, is in the eye of the beholder.  I'm impressed by
the way using %>% reduces or eliminates complicated nested brackets.
In this tiny example it's not obvious but it's very clear if the
objective is to sort the dataframe by three or four columns and
various lots of aggregation then returning a largish number of
consecutive columns, omitting the rest.  It's very easy to see what's
going on without the need for intermediate objects.

|>
|>  .....

|> Unit: microseconds
|>
|>    expr
|>  subset(all.states, all.states$Frost > 150, select = c("state",
|> "Frost"))
|>                        all.states[all.states$Frost > 150,
|> c("state", "Frost")]
|>                    all.states %>% filter(Frost > 150) %>%
|> select(state, Frost)
|>       min       lq      mean    median        uq      max neval cld
|>   139.112  148.673  163.3960  159.1760  170.7895 1763.200  1000  b
|>   104.039  111.973  127.2138  120.4395  128.6640 1381.809  1000 a
|>  1010.076 1033.519 1133.1469 1107.8480 1175.1800 2932.206  1000   c

It's no surprise that instructing a computer in something closer to
human language is an order of magnitude slower.  I'm sure you'd get
something even quicker using machine code.  I spend 3 or 4 orders of
magnitude more time writing code than running it.  It's much more
important to me to be able to read and modify than it is to have it
run at optimum speed.

|>
|> Of course, this doesn't matter for interactive one-off use.  But
|> lately I've seen examples of the '%>%' operator creeping into
|> functions in packages.

That could indicate that %>% is seductively easy to use.  It's
probably true that there are places where it should be done the hard
way.


|>  However, it would be nice to see a fast pipe operator as part of
|> base R.

|>
|>
|> Henric Winell
|>

--
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.  
   ___    Patrick Connolly  
 {~._.~}                   Great minds discuss ideas    
 _( Y )_           Average minds discuss events
(:_~*~_:)                  Small minds discuss people  
 (_)-(_)                        ..... Eleanor Roosevelt
         
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Why can't I access this type?

Yves S. Garret
In reply to this post by Henric Winell




> On Mar 25, 2015, at 10:14, Henric Winell <[hidden email]> wrote:
>
>> On 2015-03-25 09:40, Patrick Connolly wrote:
>>
>> On Sun, 22-Mar-2015 at 08:06AM -0800, John Kane wrote:
>>
>> |> Well, first off, you have no variable called "Name".  You have lost
>> |> the state names as they are rownames in the matrix state.x77 and
>> |> not a variable.
>> |>
>> |> Try this. It's ugly and I have no idea why I had to do a cbind()
>>
>> You don't have to use cbind
>>
>> |> but it seems to work. Personally I find subset easier to read than
>> |> the indexing approach.
>>
>> |> state  <-  rownames(state.x77)
>> |> all.states <- as.data.frame(state.x77)
>> |> all.states  <-  cbind(state, all.states) ### ?????
>>
>> You don't have to use cbind()
>>
>> all.states  <- within(as.data.frame(state.x77), state <- rownames(state.x77))
>>
>> but I think cbind is simpler to read.
>>
>> |>
>> |> coldstates  <-   subset(all.states, all.states$Frost > 50,
>> |>                         select = c("state","Frost") )
>
> I find the indexing approach
>
> coldstates <- all.states[all.states$Frost > 150, c("state","Frost")]
>
> to be the most direct and obvious solution.
>
>> Tidier, even more so than subset():
>>
>> require(dplyr)
>> coldstates <- all.states %>% filter(Frost > 150) %>% select(state, Frost)
>>
>> Or, easier to see what's happening:
>>
>> coldstates <- all.states %>%
>>   filter(Frost > 150) %>%
>>   select(state, Frost)
>
> Well...  Opinions may perhaps differ, but apart from '%>%' being butt-ugly it's also fairly slow:
>
> > library("microbenchmark")
> > microbenchmark(
> +     subset(all.states, all.states$Frost > 150, select = c("state","Frost")),
> +     all.states[all.states$Frost > 150, c("state","Frost")],
> +     all.states %>% filter(Frost > 150) %>% select(state, Frost),
> +     times = 1000L
> + )
> Unit: microseconds
>   expr
> subset(all.states, all.states$Frost > 150, select = c("state", "Frost"))
>                       all.states[all.states$Frost > 150, c("state", "Frost")]
>                   all.states %>% filter(Frost > 150) %>% select(state, Frost)
>      min       lq      mean    median        uq      max neval cld
>  139.112  148.673  163.3960  159.1760  170.7895 1763.200  1000  b
>  104.039  111.973  127.2138  120.4395  128.6640 1381.809  1000 a
> 1010.076 1033.519 1133.1469 1107.8480 1175.1800 2932.206  1000   c
>
> Of course, this doesn't matter for interactive one-off use.  But lately I've seen examples of the '%>%' operator creeping into functions in packages.  However, it would be nice to see a fast pipe operator as part of base R.
>
>
> Henric Winell
>
>
>
>>
>>
>> |>
>> |>
>> |> John Kane
>> |> Kingston ON Canada
>> |>
>> |>
>> |> > -----Original Message-----
>> |> > From: [hidden email]
>> |> > Sent: Sun, 22 Mar 2015 10:39:03 -0400
>> |> > To: [hidden email]
>> |> > Subject: [R] Why can't I access this type?
>> |> >
>> |> > Hi, I'm just learning my way around R.  I got a bunch of states and would
>> |> > like to access to get all of the ones where it's cold.  But when I do the
>> |> > following, I will get the following error:
>> |> >
>> |> >> all.states <- as.data.frame(state.x77)
>> |> >> cold.states <- all.states[all.states$Frost > 150, c("Name", "Frost")]
>> |> > Error in `[.data.frame`(all.states, all.states$Frost > 150, c("Name",  :
>> |> >   undefined columns selected
>> |> >
>> |> > I don't get it.  When I look at all.states, this is what I see:
>> |> >
>> |> >> str(all.states)
>> |> > 'data.frame':   50 obs. of  8 variables:
>> |> >  $ Population: num  3615 365 2212 2110 21198 ...
>> |> >  $ Income    : num  3624 6315 4530 3378 5114 ...
>> |> >  $ Illiteracy: num  2.1 1.5 1.8 1.9 1.1 0.7 1.1 0.9 1.3 2 ...
>> |> >  $ Life Exp  : num  69 69.3 70.5 70.7 71.7 ...
>> |> >  $ Murder    : num  15.1 11.3 7.8 10.1 10.3 6.8 3.1 6.2 10.7 13.9 ...
>> |> >  $ HS Grad   : num  41.3 66.7 58.1 39.9 62.6 63.9 56 54.6 52.6 40.6 ...
>> |> >  $ Frost     : num  20 152 15 65 20 166 139 103 11 60 ...
>> |> >  $ Area      : num  50708 566432 113417 51945 156361 ...
>> |> >
>> |> > What am I messing up?
>> |> >
>> |> >    [[alternative HTML version deleted]]
>> |> >
>> |> > ______________________________________________
>> |> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> |> > https://stat.ethz.ch/mailman/listinfo/r-help
>> |> > PLEASE do read the posting guide
>> |> > http://www.R-project.org/posting-guide.html
>> |> > and provide commented, minimal, self-contained, reproducible code.
>> |>
>> |> ____________________________________________________________
>> |> FREE ONLINE PHOTOSHARING - Share your photos online with your friends and family!
>> |> Visit http://www.inbox.com/photosharing to find out more!
>> |>
>> |> ______________________________________________
>> |> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> |> https://stat.ethz.ch/mailman/listinfo/r-help
>> |> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> |> and provide commented, minimal, self-contained, reproducible code.


I agree with you on the indexing approach.  But even after using within, I still get the same error.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Why can't I access this type?

p_connolly
On Thu, 26-Mar-2015 at 04:58PM -0400, [hidden email] wrote:

[...]
|>  I agree with you on the indexing approach.  But even after using
|> within, I still get the same error.  >

You leave us to guess just what you tried, but if you did this:

> all.states  <- within(as.data.frame(state.x77), state <- rownames(state.x77))
and then again did this:

> cold.states <- all.states[all.states$Frost > 150, c("Name", "Frost")]

of course it will give the same error, because as you haven't
addressed the problem as you've been told

On Sun, 22-Mar-2015 at 08:06AM -0800, John Kane wrote:

|> Well, first off, you have no variable called "Name".  You have lost
|> the state names as they are rownames in the matrix state.x77 and
|> not a variable.

If you did this:

> all.states  <- within(as.data.frame(state.x77), Name <- rownames(state.x77))
instead of
> all.states  <- within(as.data.frame(state.x77), state <- rownames(state.x77))

then this would worka;
> cold.states <- all.states[all.states$Frost > 150, c("Name", "Frost")]

Modify the above to match where my guess at what you tried is in error.


HTH

--
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.  
   ___    Patrick Connolly  
 {~._.~}                   Great minds discuss ideas    
 _( Y )_           Average minds discuss events
(:_~*~_:)                  Small minds discuss people  
 (_)-(_)                        ..... Eleanor Roosevelt
         
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Why can't I access this type?

Henric Winell
On 2015-03-27 09:19, Patrick Connolly wrote:

> [...]
>
> On Sun, 22-Mar-2015 at 08:06AM -0800, John Kane wrote:
>
> |> Well, first off, you have no variable called "Name".  You have lost
> |> the state names as they are rownames in the matrix state.x77 and
> |> not a variable.
>
> If you did this:
>
>> all.states  <- within(as.data.frame(state.x77), Name <- rownames(state.x77))
> instead of
>> all.states  <- within(as.data.frame(state.x77), state <- rownames(state.x77))

Alternatively, since 'data.frame()' coerces internally, one could do

all.states <- data.frame(state.x77, Name = rownames(state.x77))


Henric Winell



>
> then this would worka;
>> cold.states <- all.states[all.states$Frost > 150, c("Name", "Frost")]
>
> Modify the above to match where my guess at what you tried is in error.
>
>
> HTH
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Using and abusing %>% (was Re: Why can't I access this type?)

Henric Winell
In reply to this post by p_connolly
On 2015-03-26 07:48, Patrick Connolly wrote:

> On Wed, 25-Mar-2015 at 03:14PM +0100, Henric Winell wrote:
>
> ...
>
> |> Well...  Opinions may perhaps differ, but apart from '%>%' being
> |> butt-ugly it's also fairly slow:
>
> Beauty, it is said, is in the eye of the beholder.  I'm impressed by
> the way using %>% reduces or eliminates complicated nested brackets.

I didn't dispute whether '%>%' may be useful -- I just pointed out that
it is slow.  However, it is only part of the problem: 'filter()' and
'select()', although aesthetically pleasing, also seem to be slow:

 > all.states <- data.frame(state.x77, Name = rownames(state.x77))
 >
 > f1 <- function()
+     all.states[all.states$Frost > 150, c("Name", "Frost")]
 >
 > f2 <- function()
+     subset(all.states, Frost > 150, select = c("Name", "Frost"))
 >
 > f3 <- function() {
+     filt <- subset(all.states, Frost > 150)
+     subset(filt, select = c("Name", "Frost"))
+ }
 >
 > f4 <- function()
+     all.states %>% subset(Frost > 150) %>%
+         subset(select = c("Name", "Frost"))
 >
 > f5 <- function()
+     select(filter(all.states, Frost > 150), Name, Frost)
 >
 > f6 <- function()
+     all.states %>% filter(Frost > 150) %>% select(Name, Frost)
 >
 > mb <- microbenchmark(
+     f1(), f2(), f3(), f4(), f5(), f6(),
+     times = 1000L
+ )
 > print(mb, signif = 3L)
Unit: microseconds
  expr min   lq      mean median   uq  max neval   cld
  f1() 115  124  134.8812    129  134 1500  1000 a
  f2() 128  141  147.4694    145  151 1520  1000 a
  f3() 303  328  344.3175    338  348 1740  1000  b
  f4() 458  494  518.0830    510  523 1890  1000   c
  f5() 806  848  887.7270    875  894 3510  1000    d
  f6() 971 1010 1056.5659   1040 1060 3110  1000     e

So, using '%>%', but leaving 'filter()' and 'select()' out of the
equation, as in 'f4()' is only half as bad as the "full" 'dplyr' idiom
in 'f6()'.  In this case, since we're talking microseconds, the speed-up
is negligible but that *is* beside the point.

> In this tiny example it's not obvious but it's very clear if the
> objective is to sort the dataframe by three or four columns and
> various lots of aggregation then returning a largish number of
> consecutive columns, omitting the rest.  It's very easy to see what's
> going on without the need for intermediate objects.

Why are you opposed to using intermediate objects?  In this case, as can
be seen from 'f3()', it will also have the benefit of being faster than
either '%>%' or the "full" 'dplyr' idiom.

> |> [...]
>
> It's no surprise that instructing a computer in something closer to
> human language is an order of magnitude slower.

Certainly not true, at least for compiled languages.  In any case,
judging from off-list correspondence, it definitely came as a surprise
to some R users...

Given that '%>%' is so heavily marketed through 'dplyr', where the
latter is said to provide "blazing fast performance for in-memory data
by writing key pieces in C++" and "a fast, consistent tool for working
with data frame like objects, both in memory and out of memory", I don't
think it's far-fetched to expect that it should be more performant than
base R.

> I'm sure you'd get something even quicker using machine code.

Don't be ridiculous.  We're mainly discussing

all.states[all.states$Frost > 150, c("state", "Frost")]

vs.

all.states %>% filter(Frost > 150) %>% select(state, Frost)

i.e., pure R code.

> I spend 3 or 4 orders of magnitude more time writing code than running it.

You and me both.  But that doesn't mean speed is of no or little importance.

> It's much more important to me to be able to read and modify than
 > it is to have it run at optimum speed.

Good for you.  But surely, if this is your goal, nothing beats
intermediate objects.  And like I said, it may still be faster than the
'dplyr' idiom.

> |> Of course, this doesn't matter for interactive one-off use.  But
> |> lately I've seen examples of the '%>%' operator creeping into
> |> functions in packages.
>
> That could indicate that %>% is seductively easy to use.  It's
> probably true that there are places where it should be done the hard
> way.

We all know how easy it is to write ugly and sluggish code in R.  But
'foo[i,j]' is neither ugly nor sluggish and certainly not "the hard way."

> |>  However, it would be nice to see a fast pipe operator as part of
> |> base R.

Heck, it doesn't even have to be fast as long as it's a bit more elegant
than '%>%'.


Henric Winell



>
> |>
> |>
> |> Henric Winell
> |>
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Using and abusing %>% (was Re: Why can't I access this type?)

hadley wickham
> I didn't dispute whether '%>%' may be useful -- I just pointed out that it
> is slow.  However, it is only part of the problem: 'filter()' and
> 'select()', although aesthetically pleasing, also seem to be slow:
>
>> all.states <- data.frame(state.x77, Name = rownames(state.x77))
>>
>> f1 <- function()
> +     all.states[all.states$Frost > 150, c("Name", "Frost")]
>>
>> f2 <- function()
> +     subset(all.states, Frost > 150, select = c("Name", "Frost"))
>>
>> f3 <- function() {
> +     filt <- subset(all.states, Frost > 150)
> +     subset(filt, select = c("Name", "Frost"))
> + }
>>
>> f4 <- function()
> +     all.states %>% subset(Frost > 150) %>%
> +         subset(select = c("Name", "Frost"))
>>
>> f5 <- function()
> +     select(filter(all.states, Frost > 150), Name, Frost)
>>
>> f6 <- function()
> +     all.states %>% filter(Frost > 150) %>% select(Name, Frost)
>>
>> mb <- microbenchmark(
> +     f1(), f2(), f3(), f4(), f5(), f6(),
> +     times = 1000L
> + )
>> print(mb, signif = 3L)
> Unit: microseconds
>  expr min   lq      mean median   uq  max neval   cld
>  f1() 115  124  134.8812    129  134 1500  1000 a
>  f2() 128  141  147.4694    145  151 1520  1000 a
>  f3() 303  328  344.3175    338  348 1740  1000  b
>  f4() 458  494  518.0830    510  523 1890  1000   c
>  f5() 806  848  887.7270    875  894 3510  1000    d
>  f6() 971 1010 1056.5659   1040 1060 3110  1000     e
>
> So, using '%>%', but leaving 'filter()' and 'select()' out of the equation,
> as in 'f4()' is only half as bad as the "full" 'dplyr' idiom in 'f6()'.  In
> this case, since we're talking microseconds, the speed-up is negligible but
> that *is* beside the point.

When benchmarking it's important to consider both the relative and
absolute difference and to think about how the cost scales as the data
grows - the cost of using using %>% is fixed, and 500 ┬Ás doesn't seem
like a huge performance penalty to me.

Hadley

--
http://had.co.nz/

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Using and abusing %>% (was Re: Why can't I access this type?)

p_connolly
In reply to this post by Henric Winell
On Fri, 27-Mar-2015 at 03:27PM +0100, Henric Winell wrote:

|> On 2015-03-26 07:48, Patrick Connolly wrote:
|>
|> >On Wed, 25-Mar-2015 at 03:14PM +0100, Henric Winell wrote:
|> >
|> >...
|> >
|> >|> Well...  Opinions may perhaps differ, but apart from '%>%' being
|> >|> butt-ugly it's also fairly slow:
|> >
|> >Beauty, it is said, is in the eye of the beholder.  I'm impressed by
|> >the way using %>% reduces or eliminates complicated nested brackets.
|>
|> I didn't dispute whether '%>%' may be useful -- I just pointed out

Likewise I didn't dispute that it might not be as fast as other ways,
but I was disputing the claim that it was ugly.

|> that it is slow.  However, it is only part of the problem:
|> 'filter()' and 'select()', although aesthetically pleasing, also
|> seem to be slow:

So not 'butt ugly' like '%>%'?

|>
....

|> > mb <- microbenchmark(
|> +     f1(), f2(), f3(), f4(), f5(), f6(),
|> +     times = 1000L
|> + )
|> > print(mb, signif = 3L)
|> Unit: microseconds
|>  expr min   lq      mean median   uq  max neval   cld
|>  f1() 115  124  134.8812    129  134 1500  1000 a
|>  f2() 128  141  147.4694    145  151 1520  1000 a
|>  f3() 303  328  344.3175    338  348 1740  1000  b
|>  f4() 458  494  518.0830    510  523 1890  1000   c
|>  f5() 806  848  887.7270    875  894 3510  1000    d
|>  f6() 971 1010 1056.5659   1040 1060 3110  1000     e
|>
|> So, using '%>%', but leaving 'filter()' and 'select()' out of the
|> equation, as in 'f4()' is only half as bad as the "full" 'dplyr'
|> idiom in 'f6()'.  In this case, since we're talking microseconds,
|> the speed-up is negligible but that *is* beside the point.

Agreed that the more 'dplyr' used the slower it gets but don't agree
that it's an issue except in packages that should be optimized.  The
lack of speed won't stop me using it any more than I'll stop using
dataframes because matrices are much faster than them.  The OP's
example can be done using matrix syntax:

state.x77[state.x77[, "Frost"] > 150, "Frost", drop = FALSE]


which is more than an order of magnitude faster than subscripting a
dataframe.  See No 4. here:

  microbenchmark(## 1. using subset()
        subset(all.states, all.states$Frost > 150, select = c("state","Frost")),
        ## 2. standard R indexing
        all.states[all.states$Frost > 150, c("state","Frost")],
        ## 3. leave out redundant 'state' column
        all.states[all.states$Frost > 150, "Frost", drop = FALSE],
        ## 4. avoid using 'slow' dataframes altogether
        state.x77[state.x77[, "Frost"] > 150, "Frost", drop = FALSE],
        ## 5. easy, slow way without square brackets or quote marks
        all.states %>% filter(Frost > 150) %>% select(state, Frost),
        times = 1000L
        )

Unit: microseconds
                                                                      expr
  subset(all.states, all.states$Frost > 150, select = c("state", "Frost"))
                   all.states[all.states$Frost > 150, c("state", "Frost")]
                 all.states[all.states$Frost > 150, "Frost", drop = FALSE]
              state.x77[state.x77[, "Frost"] > 150, "Frost", drop = FALSE]
               all.states %>% filter(Frost > 150) %>% select(state, Frost)
      min        lq       mean    median        uq      max neval  cld
  223.960  229.9290  236.16557  232.4060  241.4165  291.083  1000   c
  177.187  182.6075  203.04666  185.1475  194.4815 7259.760  1000   c
  125.281  130.4835  135.83826  132.6985  141.7375  210.576  1000  b  
    6.442   10.3860   10.61733   11.0405   11.4855   25.077  1000 a  
 1416.592 1437.7015 1562.91898 1447.5695 1473.4440 9394.071  1000    d
>

[...]

|>
|> >In this tiny example it's not obvious but it's very clear if the
|> >objective is to sort the dataframe by three or four columns and
|> >various lots of aggregation then returning a largish number of
|> >consecutive columns, omitting the rest.  It's very easy to see what's
|> >going on without the need for intermediate objects.
|>
|> Why are you opposed to using intermediate objects?  In this case,

I'm not opposed to intermediate objects nor to dogs.  It's just easier
to keep things tidy without either.


|> as can be seen from 'f3()', it will also have the benefit of being
|> faster than either '%>%' or the "full" 'dplyr' idiom.
|>
|> >|> [...]
|> >
|> >It's no surprise that instructing a computer in something closer to
|> >human language is an order of magnitude slower.
|>
|> Certainly not true, at least for compiled languages.  In any case,
|> judging from off-list correspondence, it definitely came as a
|> surprise to some R users...
|>
|> Given that '%>%' is so heavily marketed through 'dplyr', where the
|> latter is said to provide "blazing fast performance for in-memory
|> data by writing key pieces in C++" and "a fast, consistent tool for
|> working with data frame like objects, both in memory and out of
|> memory", I don't think it's far-fetched to expect that it should be
|> more performant than base R.
|>

I've never come across 'marketing' of free software.  Evidently that's
a looser use of the word.

...


|> >I spend 3 or 4 orders of magnitude more time writing code than running it.
|>
|> You and me both.  But that doesn't mean speed is of no or little importance.

I never claimed it was.  Tardiness hasn't yet become an issue for me.
When it does, I'll revert to the old ways.

|>
|> >It's much more important to me to be able to read and modify than
|> > it is to have it run at optimum speed.
|>
|> Good for you.  But surely, if this is your goal, nothing beats
|> intermediate objects.  

Nothing except chaining, that is.  I went 16 years without it and now
find it amazing how useful it is.  As they say: You're never too old
to learn.


|>  And like I said, it may still be faster than the 'dplyr' idiom.
|>
|> >|> Of course, this doesn't matter for interactive one-off use.  But
|> >|> lately I've seen examples of the '%>%' operator creeping into
|> >|> functions in packages.
|> >
|> >That could indicate that %>% is seductively easy to use.  It's
|> >probably true that there are places where it should be done the hard
|> >way.
|>
|> We all know how easy it is to write ugly and sluggish code in R.
|> But 'foo[i,j]' is neither ugly nor sluggish and certainly not "the
|> hard way."

I meant to put a ':-)' in there. Such adjectives as 'easy' and 'hard' are
relative.  There's little difference in difficulty at each step, but
integrating them and revising later are considerably easier using the
so-called "'dplyr' idiom" -- especially if each link in the chain is
on a separate line.

|>
|> >|>  However, it would be nice to see a fast pipe operator as part of
|> >|> base R.
|>
|> Heck, it doesn't even have to be fast as long as it's a bit more
|> elegant than '%>%'.

IMHO, %>% fits in nicely with %/%, %%, and %in%.  Elegance, like
beauty, is in the eye of the beholder.

|>
|>
|> Henric Winell
|>
|>
|>
|> >
|> >|>
|> >|>
|> >|> Henric Winell
|> >|>
|> >

--
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.  
   ___    Patrick Connolly  
 {~._.~}                   Great minds discuss ideas    
 _( Y )_           Average minds discuss events
(:_~*~_:)                  Small minds discuss people  
 (_)-(_)                        ..... Eleanor Roosevelt
         
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.