piping in only specific parts of a certain column

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

piping in only specific parts of a certain column

Drake Gossi
Hello!

Question. I'm dealing with a large excel sheet that I'm trying to tidy
and then visualize, and I'm wondering how I might specify the data I'm
visualizing.

Here's the data frame I'm working with:

> str(unclean_data)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 1909 obs. of  9 variables:
 $ unique identifier: num  1 1 1 1 1 1 1 1 1 1 ...
 $ question         : num  1 2 2 2 2 2 2 3 3 3 ...
 $ grid text        : chr  "******* and his family have lived and
worked in ******* for 6 years." "******* contributes to public safety
while also organizing community events. He said he hosts Trunk or
Treat, en"| __truncated__ "******* did not know the origin or history
of ******* PD, but he said it is integral to the safety of the area."
"The ******* PD ensures safety, he said, while also familiarizing
themselves with the town’s people. He said ev"| __truncated__ ...
>

The most important column is the $grid text one, and I know how to extract that:

> text_df_APPLIED <- tibble(line = 1:1909, text = unclean_data$`grid text`)

But my question is, what if I only wanted to extract stuff from the
$grid text column that was itself only correlated with the number 3 in
the $question column? So, instead of visualizing or rather tidying the
whole $grid text column, I want to only tidy a smaller portion of it,
only that which is indexed to the number 3 is the $question column.

Is there a way to do that in this line of code:

> text_df_APPLIED <- tibble(line = 1:1909, text = unclean_data$`grid text`)

Or do I have to FIRST shorten the $`grid text` column (shorten it to
only that which is indexed to 3 in the $question column) BEFORE I even
begin to tidy it?

I'm working with these libraries right now, if it helps:

library(tidytext)
library(dplyr)
library(stringr)

D

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: piping in only specific parts of a certain column

Jim Lemon-4
Hi Drake,
This is a guess on my part, but what about:
\
q3only<-unclean_data[unclean_data$question == 3,]

then perform your operations on q3only

Jim

On Thu, Jul 2, 2020 at 8:35 PM Drake Gossi <[hidden email]> wrote:

>
> Hello!
>
> Question. I'm dealing with a large excel sheet that I'm trying to tidy
> and then visualize, and I'm wondering how I might specify the data I'm
> visualizing.
>
> Here's the data frame I'm working with:
>
> > str(unclean_data)
> Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 1909 obs. of  9 variables:
>  $ unique identifier: num  1 1 1 1 1 1 1 1 1 1 ...
>  $ question         : num  1 2 2 2 2 2 2 3 3 3 ...
>  $ grid text        : chr  "******* and his family have lived and
> worked in ******* for 6 years." "******* contributes to public safety
> while also organizing community events. He said he hosts Trunk or
> Treat, en"| __truncated__ "******* did not know the origin or history
> of ******* PD, but he said it is integral to the safety of the area."
> "The ******* PD ensures safety, he said, while also familiarizing
> themselves with the town’s people. He said ev"| __truncated__ ...
> >
>
> The most important column is the $grid text one, and I know how to extract that:
>
> > text_df_APPLIED <- tibble(line = 1:1909, text = unclean_data$`grid text`)
>
> But my question is, what if I only wanted to extract stuff from the
> $grid text column that was itself only correlated with the number 3 in
> the $question column? So, instead of visualizing or rather tidying the
> whole $grid text column, I want to only tidy a smaller portion of it,
> only that which is indexed to the number 3 is the $question column.
>
> Is there a way to do that in this line of code:
>
> > text_df_APPLIED <- tibble(line = 1:1909, text = unclean_data$`grid text`)
>
> Or do I have to FIRST shorten the $`grid text` column (shorten it to
> only that which is indexed to 3 in the $question column) BEFORE I even
> begin to tidy it?
>
> I'm working with these libraries right now, if it helps:
>
> library(tidytext)
> library(dplyr)
> library(stringr)
>
> D
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: piping in only specific parts of a certain column

Rui Barradas
In reply to this post by Drake Gossi
Hello,


Maybe the following is what you are looking for.


unclean_data %>%
   filter(question == 3) %>%
   mutate(line = row_number()) %>%
   select(line, `grid text`)


Hope this helps,

Rui Barradas
 

Às 23:47 de 01/07/2020, Drake Gossi escreveu:

> Hello!
>
> Question. I'm dealing with a large excel sheet that I'm trying to tidy
> and then visualize, and I'm wondering how I might specify the data I'm
> visualizing.
>
> Here's the data frame I'm working with:
>
>> str(unclean_data)
> Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 1909 obs. of  9 variables:
>   $ unique identifier: num  1 1 1 1 1 1 1 1 1 1 ...
>   $ question         : num  1 2 2 2 2 2 2 3 3 3 ...
>   $ grid text        : chr  "******* and his family have lived and
> worked in ******* for 6 years." "******* contributes to public safety
> while also organizing community events. He said he hosts Trunk or
> Treat, en"| __truncated__ "******* did not know the origin or history
> of ******* PD, but he said it is integral to the safety of the area."
> "The ******* PD ensures safety, he said, while also familiarizing
> themselves with the town’s people. He said ev"| __truncated__ ...
> The most important column is the $grid text one, and I know how to extract that:
>
>> text_df_APPLIED <- tibble(line = 1:1909, text = unclean_data$`grid text`)
> But my question is, what if I only wanted to extract stuff from the
> $grid text column that was itself only correlated with the number 3 in
> the $question column? So, instead of visualizing or rather tidying the
> whole $grid text column, I want to only tidy a smaller portion of it,
> only that which is indexed to the number 3 is the $question column.
>
> Is there a way to do that in this line of code:
>
>> text_df_APPLIED <- tibble(line = 1:1909, text = unclean_data$`grid text`)
> Or do I have to FIRST shorten the $`grid text` column (shorten it to
> only that which is indexed to 3 in the $question column) BEFORE I even
> begin to tidy it?
>
> I'm working with these libraries right now, if it helps:
>
> library(tidytext)
> library(dplyr)
> library(stringr)
>
> D
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


--
Este e-mail foi verificado em termos de vírus pelo software antivírus Avast.
https://www.avast.com/antivirus

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: piping in only specific parts of a certain column

Drake Gossi
Thank you very much, Jim and Rui.

The line that ended up working for me was this:

> ed_exp3 <- unclean_data[which(unclean_data$question == 3) %in% c("`grid text`")]

However, as I read and study Jim's and Rui's code, I see how those
would work too. Thank you all again!

On Thu, Jul 2, 2020 at 5:07 AM Rui Barradas <[hidden email]> wrote:

>
> Hello,
>
>
> Maybe the following is what you are looking for.
>
>
> unclean_data %>%
>   filter(question == 3) %>%
>   mutate(line = row_number()) %>%
>   select(line, `grid text`)
>
>
> Hope this helps,
>
> Rui Barradas
>
>
> Às 23:47 de 01/07/2020, Drake Gossi escreveu:
>
> Hello!
>
> Question. I'm dealing with a large excel sheet that I'm trying to tidy
> and then visualize, and I'm wondering how I might specify the data I'm
> visualizing.
>
> Here's the data frame I'm working with:
>
> str(unclean_data)
>
> Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 1909 obs. of  9 variables:
>  $ unique identifier: num  1 1 1 1 1 1 1 1 1 1 ...
>  $ question         : num  1 2 2 2 2 2 2 3 3 3 ...
>  $ grid text        : chr  "******* and his family have lived and
> worked in ******* for 6 years." "******* contributes to public safety
> while also organizing community events. He said he hosts Trunk or
> Treat, en"| __truncated__ "******* did not know the origin or history
> of ******* PD, but he said it is integral to the safety of the area."
> "The ******* PD ensures safety, he said, while also familiarizing
> themselves with the town’s people. He said ev"| __truncated__ ...
>
> The most important column is the $grid text one, and I know how to extract that:
>
> text_df_APPLIED <- tibble(line = 1:1909, text = unclean_data$`grid text`)
>
> But my question is, what if I only wanted to extract stuff from the
> $grid text column that was itself only correlated with the number 3 in
> the $question column? So, instead of visualizing or rather tidying the
> whole $grid text column, I want to only tidy a smaller portion of it,
> only that which is indexed to the number 3 is the $question column.
>
> Is there a way to do that in this line of code:
>
> text_df_APPLIED <- tibble(line = 1:1909, text = unclean_data$`grid text`)
>
> Or do I have to FIRST shorten the $`grid text` column (shorten it to
> only that which is indexed to 3 in the $question column) BEFORE I even
> begin to tidy it?
>
> I'm working with these libraries right now, if it helps:
>
> library(tidytext)
> library(dplyr)
> library(stringr)
>
> D
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> Sem vírus. www.avast.com



--
Drake Gossi
Phd Student
University of Texas at Austin

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.