Unlisting a nested dataset

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Unlisting a nested dataset

Nathan Parsons
I’m attempting to do some content analysis on a few million tweets, but I can’t seem to get them cleaned correctly.

I’m trying to replicate the process outlined here: https://stackoverflow.com/questions/46734501/opposite-of-unnest-tokens

My code:

tweets %>%
 unnest_tokens(word, text, token = 'tweets') %>%
 filter(!word %in% stop_words$word) %>%
 nest(word) %>%
 mutate(text = map(data, unlist),
           text = map_chr(text, paste, collapse = " ")) -> tweets

Unfortunately, I keep getting:

 Error in mutate_impl(.data, dots) :
 Evaluation error: cannot coerce type 'closure' to vector of type 'character’.

What am I doing wrong?

Here’s what the dataset looks like:

> glimpse(tweets)
Observations: 389,253
Variables: 12
$ status_id "x1047841705729306624", "x1046966595610927105", "x104709...
$ created_at "2018-10-04T13:31:45Z", "2018-10-02T03:34:22Z", "2018-10...
$ text "Technique is everything with olympic lifts ! @ Body By ...
$ lat 43.68359, 40.28412, 37.77066, 40.43139, 31.16889, 33.937...
$ lng -70.32841, -83.07859, -122.43598, -79.98069, -100.07689,...
$ county_name "Cumberland County", "Delaware County", "San Francisco C...
$ fips 23005, 39041, 6075, 42003, 48095, 6037, 6037, 55073, 482...
$ state_name "Maine", "Ohio", "California", "Pennsylvania", "Texas", ...
$ state_abb "ME", "OH", "CA", "PA", "TX", "CA", "CA", "WI", "TX", "A...
$ urban_level "Medium Metro", "Large Fringe Metro", "Large Central Met...
$ urban_code 3, 2, 1, 1, 6, 1, 1, 4, 1, 3, 2, 2, 1, 3, 6, 1, 1, 2, 3,...
$ population 277308, 184029, 830781, 1160433, 4160, 9509611, 9509611,...

--

Nate Parsons
Pronouns: He, Him, His
Graduate Teaching Assistant
Department of Sociology
Portland State University
Portland, Oregon

503-725-9025
503-725-3957 FAX

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Unlisting a nested dataset

Ista Zahn
Hi Nate,

You've made it pretty difficult to answer your question. Please see
https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
and follow some of the suggestions you find there to make it easier on
those who want to help you.

Best,
Ista
On Mon, Oct 15, 2018 at 10:56 PM Nathan Parsons
<[hidden email]> wrote:

>
> I’m attempting to do some content analysis on a few million tweets, but I can’t seem to get them cleaned correctly.
>
> I’m trying to replicate the process outlined here: https://stackoverflow.com/questions/46734501/opposite-of-unnest-tokens
>
> My code:
>
> tweets %>%
> unnest_tokens(word, text, token = 'tweets') %>%
> filter(!word %in% stop_words$word) %>%
> nest(word) %>%
> mutate(text = map(data, unlist),
>          text = map_chr(text, paste, collapse = " ")) -> tweets
>
> Unfortunately, I keep getting:
>
> Error in mutate_impl(.data, dots) :
> Evaluation error: cannot coerce type 'closure' to vector of type 'character’.
>
> What am I doing wrong?
>
> Here’s what the dataset looks like:
>
> > glimpse(tweets)
> Observations: 389,253
> Variables: 12
> $ status_id "x1047841705729306624", "x1046966595610927105", "x104709...
> $ created_at "2018-10-04T13:31:45Z", "2018-10-02T03:34:22Z", "2018-10...
> $ text "Technique is everything with olympic lifts ! @ Body By ...
> $ lat 43.68359, 40.28412, 37.77066, 40.43139, 31.16889, 33.937...
> $ lng -70.32841, -83.07859, -122.43598, -79.98069, -100.07689,...
> $ county_name "Cumberland County", "Delaware County", "San Francisco C...
> $ fips 23005, 39041, 6075, 42003, 48095, 6037, 6037, 55073, 482...
> $ state_name "Maine", "Ohio", "California", "Pennsylvania", "Texas", ...
> $ state_abb "ME", "OH", "CA", "PA", "TX", "CA", "CA", "WI", "TX", "A...
> $ urban_level "Medium Metro", "Large Fringe Metro", "Large Central Met...
> $ urban_code 3, 2, 1, 1, 6, 1, 1, 4, 1, 3, 2, 2, 1, 3, 6, 1, 1, 2, 3,...
> $ population 277308, 184029, 830781, 1160433, 4160, 9509611, 9509611,...
>
> --
>
> Nate Parsons
> Pronouns: He, Him, His
> Graduate Teaching Assistant
> Department of Sociology
> Portland State University
> Portland, Oregon
>
> 503-725-9025
> 503-725-3957 FAX
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Unlisting a nested dataset

Nathan Parsons
Ista - I provided data, code, and the error being returned as per reproducible r protocol. I did not include packages, however. unnest_tokens is from the TidyText package, map/map_chr are from purrr, and everything else is from tidyverse(dplyr/tidyr/etc.)

Not sure what else I can provide to make this more clear.

--

Nate Parsons
Pronouns: He, Him, His
Graduate Teaching Assistant
Department of Sociology
Portland State University
Portland, Oregon

503-725-9025
503-725-3957 FAX
On Oct 16, 2018, 12:35 PM -0700, Ista Zahn <[hidden email]>, wrote:

> Hi Nate,
>
> You've made it pretty difficult to answer your question. Please see
> https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
> and follow some of the suggestions you find there to make it easier on
> those who want to help you.
>
> Best,
> Ista
> On Mon, Oct 15, 2018 at 10:56 PM Nathan Parsons
> <[hidden email]> wrote:
> >
> > I’m attempting to do some content analysis on a few million tweets, but I can’t seem to get them cleaned correctly.
> >
> > I’m trying to replicate the process outlined here: https://stackoverflow.com/questions/46734501/opposite-of-unnest-tokens
> >
> > My code:
> >
> > tweets %>%
> > unnest_tokens(word, text, token = 'tweets') %>%
> > filter(!word %in% stop_words$word) %>%
> > nest(word) %>%
> > mutate(text = map(data, unlist),
> > text = map_chr(text, paste, collapse = " ")) -> tweets
> >
> > Unfortunately, I keep getting:
> >
> > Error in mutate_impl(.data, dots) :
> > Evaluation error: cannot coerce type 'closure' to vector of type 'character’.
> >
> > What am I doing wrong?
> >
> > Here’s what the dataset looks like:
> >
> > > glimpse(tweets)
> > Observations: 389,253
> > Variables: 12
> > $ status_id "x1047841705729306624", "x1046966595610927105", "x104709...
> > $ created_at "2018-10-04T13:31:45Z", "2018-10-02T03:34:22Z", "2018-10...
> > $ text "Technique is everything with olympic lifts ! @ Body By ...
> > $ lat 43.68359, 40.28412, 37.77066, 40.43139, 31.16889, 33.937...
> > $ lng -70.32841, -83.07859, -122.43598, -79.98069, -100.07689,...
> > $ county_name "Cumberland County", "Delaware County", "San Francisco C...
> > $ fips 23005, 39041, 6075, 42003, 48095, 6037, 6037, 55073, 482...
> > $ state_name "Maine", "Ohio", "California", "Pennsylvania", "Texas", ...
> > $ state_abb "ME", "OH", "CA", "PA", "TX", "CA", "CA", "WI", "TX", "A...
> > $ urban_level "Medium Metro", "Large Fringe Metro", "Large Central Met...
> > $ urban_code 3, 2, 1, 1, 6, 1, 1, 4, 1, 3, 2, 2, 1, 3, 6, 1, 1, 2, 3,...
> > $ population 277308, 184029, 830781, 1160433, 4160, 9509611, 9509611,...
> >
> > --
> >
> > Nate Parsons
> > Pronouns: He, Him, His
> > Graduate Teaching Assistant
> > Department of Sociology
> > Portland State University
> > Portland, Oregon
> >
> > 503-725-9025
> > 503-725-3957 FAX
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Unlisting a nested dataset

Jeff Newmiller
> >Not sure what else I can provide to make this more clear.

You can address this deficiency of awareness by using the reprex package to generate the example+errors you are concerned about. Combined with the use of dput described in the link Ista provided (or use of example data from your declared packages), you will know before you hit the send button that we should be able to reproduce your error even without knowing anything about your field of interest or having your directory structure/files.

The last impediment will be your sending email in HTML format... Gmail DOES have a plain text option, and this makes a difference because with HTML what you see is almost never what we see.

On October 16, 2018 12:50:20 PM PDT, Nathan Parsons <[hidden email]> wrote:

>Ista - I provided data, code, and the error being returned as per
>reproducible r protocol. I did not include packages, however.
>unnest_tokens is from the TidyText package, map/map_chr are from purrr,
>and everything else is from tidyverse(dplyr/tidyr/etc.)
>
>Not sure what else I can provide to make this more clear.
>
>--
>
>Nate Parsons
>Pronouns: He, Him, His
>Graduate Teaching Assistant
>Department of Sociology
>Portland State University
>Portland, Oregon
>
>503-725-9025
>503-725-3957 FAX
>On Oct 16, 2018, 12:35 PM -0700, Ista Zahn <[hidden email]>, wrote:
>> Hi Nate,
>>
>> You've made it pretty difficult to answer your question. Please see
>>
>https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
>> and follow some of the suggestions you find there to make it easier
>on
>> those who want to help you.
>>
>> Best,
>> Ista
>> On Mon, Oct 15, 2018 at 10:56 PM Nathan Parsons
>> <[hidden email]> wrote:
>> >
>> > I’m attempting to do some content analysis on a few million tweets,
>but I can’t seem to get them cleaned correctly.
>> >
>> > I’m trying to replicate the process outlined here:
>https://stackoverflow.com/questions/46734501/opposite-of-unnest-tokens
>> >
>> > My code:
>> >
>> > tweets %>%
>> > unnest_tokens(word, text, token = 'tweets') %>%
>> > filter(!word %in% stop_words$word) %>%
>> > nest(word) %>%
>> > mutate(text = map(data, unlist),
>> > text = map_chr(text, paste, collapse = " ")) -> tweets
>> >
>> > Unfortunately, I keep getting:
>> >
>> > Error in mutate_impl(.data, dots) :
>> > Evaluation error: cannot coerce type 'closure' to vector of type
>'character’.
>> >
>> > What am I doing wrong?
>> >
>> > Here’s what the dataset looks like:
>> >
>> > > glimpse(tweets)
>> > Observations: 389,253
>> > Variables: 12
>> > $ status_id "x1047841705729306624", "x1046966595610927105",
>"x104709...
>> > $ created_at "2018-10-04T13:31:45Z", "2018-10-02T03:34:22Z",
>"2018-10...
>> > $ text "Technique is everything with olympic lifts ! @ Body By ...
>> > $ lat 43.68359, 40.28412, 37.77066, 40.43139, 31.16889, 33.937...
>> > $ lng -70.32841, -83.07859, -122.43598, -79.98069, -100.07689,...
>> > $ county_name "Cumberland County", "Delaware County", "San
>Francisco C...
>> > $ fips 23005, 39041, 6075, 42003, 48095, 6037, 6037, 55073, 482...
>> > $ state_name "Maine", "Ohio", "California", "Pennsylvania",
>"Texas", ...
>> > $ state_abb "ME", "OH", "CA", "PA", "TX", "CA", "CA", "WI", "TX",
>"A...
>> > $ urban_level "Medium Metro", "Large Fringe Metro", "Large Central
>Met...
>> > $ urban_code 3, 2, 1, 1, 6, 1, 1, 4, 1, 3, 2, 2, 1, 3, 6, 1, 1, 2,
>3,...
>> > $ population 277308, 184029, 830781, 1160433, 4160, 9509611,
>9509611,...
>> >
>> > --
>> >
>> > Nate Parsons
>> > Pronouns: He, Him, His
>> > Graduate Teaching Assistant
>> > Department of Sociology
>> > Portland State University
>> > Portland, Oregon
>> >
>> > 503-725-9025
>> > 503-725-3957 FAX
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.