Data Structure to Unnest_tokens in tidytext package

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Data Structure to Unnest_tokens in tidytext package

sara_41
Hi--I'm fairly new to R and trying to do a text mining project on a novel
using the tidytext package. The novel is saved as a plain text document and
I can import it into RStudio just fine. For reference I'm trying to do
something similar to section 1.3 of this tidy text tutorial
<https://www.tidytextmining.com/tidytext.html>, except I'm working with one
novel instead of many. So I import the novel and then run:

"tidy_novel <- quicksandr %>%
unnest_tokens (word, text)"

I get the following error:

Error in check_input(x) :
  Input must be a character vector of any length or a list of character
  vectors, each of which has a length of 1.

typeof(novel) returns "list" and str(novel) returns

Classes ‘spec_tbl_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 955 obs. of  1
variable:
 $ FOR E. S. I.: chr  "FOR E. S. I." "My old man died in a fine big house.
My ma died in a shack. I wonder where I'm gonna die, Being neither white
nor black?'" "LANGSTON HUGHES" "ONE" ...
 - attr(*, "problems")=Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 8 obs. of
 5 variables:
  ..$ row     : int  530 726 733 836 853 886 889 942
  ..$ col     : chr  NA NA NA NA ...
  ..$ expected: chr  "1 columns" "1 columns" "1 columns" "1 columns" ...
  ..$ actual  : chr  "2 columns" "2 columns" "2 columns" "2 columns" ...
  ..$ file    : chr  "'quicksandr.txt'" "'quicksandr.txt'"
"'quicksandr.txt'" "'quicksandr.txt'" ...
 - attr(*, "spec")=
  .. cols(
  ..   `FOR E. S. I.` = col_character()
  .. )
>

I'm just importing the text file and then trying to run the unnest_tokens
function, so maybe I'm missing a step in between? I seem to need my text
file in a different format, so would appreciate answers on how to do that.
Thanks, and let me know if I need to provide more info!

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Data Structure to Unnest_tokens in tidytext package

Eric Berger
Hi Sarah,
I looked at the documentation that you linked to. It contains the step

text_df <- tibble(line = 1:4, text = text)

before it does the step

text_df %>%
  unnest_tokens(word, text)

So you may be missing a step.

Best,
Eric

On Tue, Dec 10, 2019 at 9:05 PM Sarah Payne <[hidden email]> wrote:

>
> Hi--I'm fairly new to R and trying to do a text mining project on a novel
> using the tidytext package. The novel is saved as a plain text document and
> I can import it into RStudio just fine. For reference I'm trying to do
> something similar to section 1.3 of this tidy text tutorial
> <https://www.tidytextmining.com/tidytext.html>, except I'm working with one
> novel instead of many. So I import the novel and then run:
>
> "tidy_novel <- quicksandr %>%
> unnest_tokens (word, text)"
>
> I get the following error:
>
> Error in check_input(x) :
>   Input must be a character vector of any length or a list of character
>   vectors, each of which has a length of 1.
>
> typeof(novel) returns "list" and str(novel) returns
>
> Classes ‘spec_tbl_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 955 obs. of  1
> variable:
>  $ FOR E. S. I.: chr  "FOR E. S. I." "My old man died in a fine big house.
> My ma died in a shack. I wonder where I'm gonna die, Being neither white
> nor black?'" "LANGSTON HUGHES" "ONE" ...
>  - attr(*, "problems")=Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 8 obs. of
>  5 variables:
>   ..$ row     : int  530 726 733 836 853 886 889 942
>   ..$ col     : chr  NA NA NA NA ...
>   ..$ expected: chr  "1 columns" "1 columns" "1 columns" "1 columns" ...
>   ..$ actual  : chr  "2 columns" "2 columns" "2 columns" "2 columns" ...
>   ..$ file    : chr  "'quicksandr.txt'" "'quicksandr.txt'"
> "'quicksandr.txt'" "'quicksandr.txt'" ...
>  - attr(*, "spec")=
>   .. cols(
>   ..   `FOR E. S. I.` = col_character()
>   .. )
> >
>
> I'm just importing the text file and then trying to run the unnest_tokens
> function, so maybe I'm missing a step in between? I seem to need my text
> file in a different format, so would appreciate answers on how to do that.
> Thanks, and let me know if I need to provide more info!
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.