count number of stop words in R

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

count number of stop words in R

R help mailing list-2
Hi all,

Is there a way in R to count the number of stop words (English) of a string using tm package?

str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the sink . And the dishes might get falled over if you don't fell fall over there there if you don't get it . And it there it's a picture of a kitchen window . And the curtains are very uh distinct . But the water is still flowing .

255 Levels: A boy's on the uh falling off the stool picking up cookies . The girl's reaching up for it . The girl the lady is is drying dishes . The water is uh running over uh from the sink into the floor . The window's opened . Dishes on the on the counter . She's outside ."

Thanks for any help!
Elahe

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: count number of stop words in R

Bert Gunter-2
You can use regular expressions.

?regex and/or the stringr package are good places to start.  Of
course, you have to define "stop words."


Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Jun 12, 2017 at 5:40 AM, Elahe chalabi via R-help
<[hidden email]> wrote:

> Hi all,
>
> Is there a way in R to count the number of stop words (English) of a string using tm package?
>
> str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the sink . And the dishes might get falled over if you don't fell fall over there there if you don't get it . And it there it's a picture of a kitchen window . And the curtains are very uh distinct . But the water is still flowing .
>
> 255 Levels: A boy's on the uh falling off the stool picking up cookies . The girl's reaching up for it . The girl the lady is is drying dishes . The water is uh running over uh from the sink into the floor . The window's opened . Dishes on the on the counter . She's outside ."
>
> Thanks for any help!
> Elahe
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: count number of stop words in R

Patrick Casimir
You can define stop words as below.

data <- tm_map(data, removeWords, stopwords("english"))



Patrick Casimir, PhD
Health Analytics, Data Science, Big Data Expert & Independent Consultant
C: 954.614.1178

________________________________
From: R-help <[hidden email]> on behalf of Bert Gunter <[hidden email]>
Sent: Monday, June 12, 2017 10:12:33 AM
To: Elahe chalabi
Cc: R-help Mailing List
Subject: Re: [R] count number of stop words in R

You can use regular expressions.

?regex and/or the stringr package are good places to start.  Of
course, you have to define "stop words."


Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Jun 12, 2017 at 5:40 AM, Elahe chalabi via R-help
<[hidden email]> wrote:

> Hi all,
>
> Is there a way in R to count the number of stop words (English) of a string using tm package?
>
> str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the sink . And the dishes might get falled over if you don't fell fall over there there if you don't get it . And it there it's a picture of a kitchen window . And the curtains are very uh distinct . But the water is still flowing .
>
> 255 Levels: A boy's on the uh falling off the stool picking up cookies . The girl's reaching up for it . The girl the lady is is drying dishes . The water is uh running over uh from the sink into the floor . The window's opened . Dishes on the on the counter . She's outside ."
>
> Thanks for any help!
> Elahe
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: count number of stop words in R

R help mailing list-2
Thanks for your reply. I know the command  
data <- tm_map(data, removeWords, stopwords("english"))
removes English stop words, I don't know how should I count stop words of my string:


str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the sink . And the dishes might get falled over if you don't fell fall over there there if you don't get it . And it there it's a picture of a kitchen window . And the curtains are very uh distinct . But the water is still flowing .





On Monday, June 12, 2017 7:24 AM, Patrick Casimir <[hidden email]> wrote:



You can define stop words as below.
data <- tm_map(data, removeWords, stopwords("english"))


Patrick Casimir, PhD
Health Analytics, Data Science, Big Data Expert & Independent Consultant
C: 954.614.1178

________________________________

From: R-help <[hidden email]> on behalf of Bert Gunter <[hidden email]>
Sent: Monday, June 12, 2017 10:12:33 AM
To: Elahe chalabi
Cc: R-help Mailing List
Subject: Re: [R] count number of stop words in R
 
You can use regular expressions.

?regex and/or the stringr package are good places to start.  Of
course, you have to define "stop words."


Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Jun 12, 2017 at 5:40 AM, Elahe chalabi via R-help
<[hidden email]> wrote:
> Hi all,
>
> Is there a way in R to count the number of stop words (English) of a string using tm package?
>
> str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the
sink . And the dishes might get falled over if you don't fell fall over there there if you don't get it . And it there it's a picture of a kitchen window . And the curtains are very uh distinct . But the water is still flowing .
>
> 255 Levels: A boy's on the uh falling off the stool picking up cookies . The girl's reaching up for it . The girl the lady is is drying dishes . The water is uh running over uh from the sink into the floor . The window's opened . Dishes on the on the counter
. She's outside ."
>
[[elided Yahoo spam]]
> Elahe
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: count number of stop words in R

Patrick Casimir
define your string as whatever object you want:

data <- "Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the sink . And the dishes might get falled over if you don't fell fall over there there if you don't get it . And it there it's a picture of a kitchen window . And the curtains are very uh distinct . But the water is still flowing."



Patrick Casimir, PhD
Health Analytics, Data Science, Big Data Expert & Independent Consultant
C: 954.614.1178

________________________________
From: Elahe chalabi <[hidden email]>
Sent: Monday, June 12, 2017 11:23:42 AM
To: Patrick Casimir; Bert Gunter
Cc: R-help Mailing List
Subject: Re: [R] count number of stop words in R

Thanks for your reply. I know the command
data <- tm_map(data, removeWords, stopwords("english"))
removes English stop words, I don't know how should I count stop words of my string:


str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the sink . And the dishes might get falled over if you don't fell fall over there there if you don't get it . And it there it's a picture of a kitchen window . And the curtains are very uh distinct . But the water is still flowing .





On Monday, June 12, 2017 7:24 AM, Patrick Casimir <[hidden email]> wrote:



You can define stop words as below.
data <- tm_map(data, removeWords, stopwords("english"))


Patrick Casimir, PhD
Health Analytics, Data Science, Big Data Expert & Independent Consultant
C: 954.614.1178

________________________________

From: R-help <[hidden email]> on behalf of Bert Gunter <[hidden email]>
Sent: Monday, June 12, 2017 10:12:33 AM
To: Elahe chalabi
Cc: R-help Mailing List
Subject: Re: [R] count number of stop words in R

You can use regular expressions.

?regex and/or the stringr package are good places to start.  Of
course, you have to define "stop words."


Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Jun 12, 2017 at 5:40 AM, Elahe chalabi via R-help
<[hidden email]> wrote:
> Hi all,
>
> Is there a way in R to count the number of stop words (English) of a string using tm package?
>
> str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the
sink . And the dishes might get falled over if you don't fell fall over there there if you don't get it . And it there it's a picture of a kitchen window . And the curtains are very uh distinct . But the water is still flowing .
>
> 255 Levels: A boy's on the uh falling off the stool picking up cookies . The girl's reaching up for it . The girl the lady is is drying dishes . The water is uh running over uh from the sink into the floor . The window's opened . Dishes on the on the counter
. She's outside ."
>
[[elided Yahoo spam]]
> Elahe
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: count number of stop words in R

R help mailing list-2
Defining data as you mentioned in your respond causes the following error:

   
Error in UseMethod("tm_map", x) :
no applicable method for 'tm_map' applied to an object of class "character"

I can solve this error by using  Corpus(VectorSource(my string)) and the using your command but I cannot see the number of stop words in my string!


On Monday, June 12, 2017 8:36 AM, Patrick Casimir <[hidden email]> wrote:



define your string as whatever object you want:
data <- "Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the sink . And the dishes might get falled over if you don't fell fall over there there if you don't get it . And it there it's a picture of a kitchen window . And the curtains are very uh distinct . But the water is still flowing."


Patrick Casimir, PhD
Health Analytics, Data Science, Big Data Expert & Independent Consultant
C: 954.614.1178

________________________________


Sent: Monday, June 12, 2017 11:23:42 AM
To: Patrick Casimir; Bert Gunter
Cc: R-help Mailing List
Subject: Re: [R] count number of stop words in R
 
Thanks for your reply. I know the command  
data <- tm_map(data, removeWords, stopwords("english"))
removes English stop words, I don't know how should I count stop words of my string:


str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the sink
. And the dishes might get falled over if you don't fell fall over there there if you don't get it . And it there it's a picture of a kitchen window . And the curtains are very uh distinct . But the water is still flowing .





On Monday, June 12, 2017 7:24 AM, Patrick Casimir <[hidden email]> wrote:



You can define stop words as below.
data <- tm_map(data, removeWords, stopwords("english"))


Patrick Casimir, PhD
Health Analytics, Data Science, Big Data Expert & Independent Consultant
C: 954.614.1178

________________________________

From: R-help <[hidden email]> on behalf of Bert Gunter <[hidden email]>
Sent: Monday, June 12, 2017 10:12:33 AM
To: Elahe chalabi
Cc: R-help Mailing List
Subject: Re: [R] count number of stop words in R
 
You can use regular expressions.

?regex and/or the stringr package are good places to start.  Of
course, you have to define "stop words."


Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Jun 12, 2017 at 5:40 AM, Elahe chalabi via R-help
<[hidden email]> wrote:
> Hi all,
>
> Is there a way in R to count the number of stop words (English) of a string using tm package?
>
> str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the
sink . And the dishes might get falled over if you don't fell fall over there there if you don't get it . And it there it's a picture of a kitchen window . And the curtains are very uh distinct . But the water is still flowing .
>
> 255 Levels: A boy's on the uh falling off the stool picking up cookies . The girl's reaching up for it . The girl the lady is is drying dishes . The water is uh running over uh from the sink into the floor . The window's opened . Dishes on the on the counter
. She's outside ."
>
[[elided Yahoo spam]]
> Elahe
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: count number of stop words in R

Patrick Casimir
you can use

summary (my string)


Patrick Casimir, PhD
Health Analytics, Data Science, Big Data Expert & Independent Consultant
C: 954.614.1178

________________________________
From: Elahe chalabi <[hidden email]>
Sent: Monday, June 12, 2017 11:42:43 AM
To: Patrick Casimir; Bert Gunter
Cc: R-help Mailing List
Subject: Re: [R] count number of stop words in R

Defining data as you mentioned in your respond causes the following error:


Error in UseMethod("tm_map", x) :
no applicable method for 'tm_map' applied to an object of class "character"

I can solve this error by using  Corpus(VectorSource(my string)) and the us[[elided Yahoo spam]]


On Monday, June 12, 2017 8:36 AM, Patrick Casimir <[hidden email]> wrote:



define your string as whatever object you want:
data <- "Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the sink . And the dishes might get falled over if you don't fell fall over there there if you don't get it . And it there it's a picture of a kitchen window . And the curtains are very uh distinct . But the water is still flowing."


Patrick Casimir, PhD
Health Analytics, Data Science, Big Data Expert & Independent Consultant
C: 954.614.1178

________________________________


Sent: Monday, June 12, 2017 11:23:42 AM
To: Patrick Casimir; Bert Gunter
Cc: R-help Mailing List
Subject: Re: [R] count number of stop words in R

Thanks for your reply. I know the command
data <- tm_map(data, removeWords, stopwords("english"))
removes English stop words, I don't know how should I count stop words of my string:


str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the sink
. And the dishes might get falled over if you don't fell fall over there there if you don't get it . And it there it's a picture of a kitchen window . And the curtains are very uh distinct . But the water is still flowing .





On Monday, June 12, 2017 7:24 AM, Patrick Casimir <[hidden email]> wrote:



You can define stop words as below.
data <- tm_map(data, removeWords, stopwords("english"))


Patrick Casimir, PhD
Health Analytics, Data Science, Big Data Expert & Independent Consultant
C: 954.614.1178

________________________________

From: R-help <[hidden email]> on behalf of Bert Gunter <[hidden email]>
Sent: Monday, June 12, 2017 10:12:33 AM
To: Elahe chalabi
Cc: R-help Mailing List
Subject: Re: [R] count number of stop words in R

You can use regular expressions.

?regex and/or the stringr package are good places to start.  Of
course, you have to define "stop words."


Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Jun 12, 2017 at 5:40 AM, Elahe chalabi via R-help
<[hidden email]> wrote:
> Hi all,
>
> Is there a way in R to count the number of stop words (English) of a string using tm package?
>
> str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the
sink . And the dishes might get falled over if you don't fell fall over there there if you don't get it . And it there it's a picture of a kitchen window . And the curtains are very uh distinct . But the water is still flowing .
>
> 255 Levels: A boy's on the uh falling off the stool picking up cookies . The girl's reaching up for it . The girl the lady is is drying dishes . The water is uh running over uh from the sink into the floor . The window's opened . Dishes on the on the counter
. She's outside ."
>
[[elided Yahoo spam]]
> Elahe
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: count number of stop words in R

Patrick Casimir
In reply to this post by R help mailing list-2
Or use the qdap package to perform any quantitative analysis of your string.

https://cran.r-project.org/web/packages/qdap/qdap.pdf

Package �qdap� - The Comprehensive R Archive Network<https://cran.r-project.org/web/packages/qdap/qdap.pdf>
cran.r-project.org
Package �qdap� August 29, 2016 Type Package Title Bridging the Gap Between Qualitative Data and Quantitative Analysis Version 2.2.5 Date 2016-06-15





Patrick Casimir, PhD
Health Analytics, Data Science, Big Data Expert & Independent Consultant
C: 954.614.1178

________________________________
From: Elahe chalabi <[hidden email]>
Sent: Monday, June 12, 2017 11:42:43 AM
To: Patrick Casimir; Bert Gunter
Cc: R-help Mailing List
Subject: Re: [R] count number of stop words in R

Defining data as you mentioned in your respond causes the following error:


Error in UseMethod("tm_map", x) :
no applicable method for 'tm_map' applied to an object of class "character"

I can solve this error by using  Corpus(VectorSource(my string)) and the us[[elided Yahoo spam]]


On Monday, June 12, 2017 8:36 AM, Patrick Casimir <[hidden email]> wrote:



define your string as whatever object you want:
data <- "Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the sink . And the dishes might get falled over if you don't fell fall over there there if you don't get it . And it there it's a picture of a kitchen window . And the curtains are very uh distinct . But the water is still flowing."


Patrick Casimir, PhD
Health Analytics, Data Science, Big Data Expert & Independent Consultant
C: 954.614.1178

________________________________


Sent: Monday, June 12, 2017 11:23:42 AM
To: Patrick Casimir; Bert Gunter
Cc: R-help Mailing List
Subject: Re: [R] count number of stop words in R

Thanks for your reply. I know the command
data <- tm_map(data, removeWords, stopwords("english"))
removes English stop words, I don't know how should I count stop words of my string:


str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the sink
. And the dishes might get falled over if you don't fell fall over there there if you don't get it . And it there it's a picture of a kitchen window . And the curtains are very uh distinct . But the water is still flowing .





On Monday, June 12, 2017 7:24 AM, Patrick Casimir <[hidden email]> wrote:



You can define stop words as below.
data <- tm_map(data, removeWords, stopwords("english"))


Patrick Casimir, PhD
Health Analytics, Data Science, Big Data Expert & Independent Consultant
C: 954.614.1178

________________________________

From: R-help <[hidden email]> on behalf of Bert Gunter <[hidden email]>
Sent: Monday, June 12, 2017 10:12:33 AM
To: Elahe chalabi
Cc: R-help Mailing List
Subject: Re: [R] count number of stop words in R

You can use regular expressions.

?regex and/or the stringr package are good places to start.  Of
course, you have to define "stop words."


Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Jun 12, 2017 at 5:40 AM, Elahe chalabi via R-help
<[hidden email]> wrote:
> Hi all,
>
> Is there a way in R to count the number of stop words (English) of a string using tm package?
>
> str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the
sink . And the dishes might get falled over if you don't fell fall over there there if you don't get it . And it there it's a picture of a kitchen window . And the curtains are very uh distinct . But the water is still flowing .
>
> 255 Levels: A boy's on the uh falling off the stool picking up cookies . The girl's reaching up for it . The girl the lady is is drying dishes . The water is uh running over uh from the sink into the floor . The window's opened . Dishes on the on the counter
. She's outside ."
>
[[elided Yahoo spam]]
> Elahe
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: count number of stop words in R

Florian Schwendinger
In reply to this post by Bert Gunter-2
If you just want to count the stopwords you cloud do something like,

library(slam)
library(tm)

your_string <- "Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the sink . And the dishes might get falled over if you don't fell fall over there there if you don't get it . And it there it's a picture of a kitchen window . And the curtains are very uh distinct . But the water is still flowing ."
corp <- Corpus(VectorSource(your_string))

stopwords("en")

cntrl <- list(tolower=TRUE, stopwords = NULL,
              removePunctuation = FALSE, removeNumbers = TRUE,
              stemming = FALSE, wordLengths = c(0, Inf))


dtm <- DocumentTermMatrix(corp, cntrl)

col_sums(dtm[, which(colnames(dtm) %in% stopwords("en"))])



Best,
Florian
 

Gesendet: Montag, 12. Juni 2017 um 16:12 Uhr
Von: "Bert Gunter" <[hidden email]>
An: "Elahe chalabi" <[hidden email]>
Cc: "R-help Mailing List" <[hidden email]>
Betreff: Re: [R] count number of stop words in R
You can use regular expressions.

?regex and/or the stringr package are good places to start. Of
course, you have to define "stop words."


Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Jun 12, 2017 at 5:40 AM, Elahe chalabi via R-help
<[hidden email]> wrote:

> Hi all,
>
> Is there a way in R to count the number of stop words (English) of a string using tm package?
>
> str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the sink . And the dishes might get falled over if you don't fell fall over there there if you don't get it . And it there it's a picture of a kitchen window . And the curtains are very uh distinct . But the water is still flowing .
>
> 255 Levels: A boy's on the uh falling off the stool picking up cookies . The girl's reaching up for it . The girl the lady is is drying dishes . The water is uh running over uh from the sink into the floor . The window's opened . Dishes on the on the counter . She's outside ."
>
> Thanks for any help!
> Elahe
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html[http://www.R-project.org/posting-guide.html]
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help[https://stat.ethz.ch/mailman/listinfo/r-help]
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html[http://www.R-project.org/posting-guide.html]
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: count number of stop words in R

Bert Gunter-2
In reply to this post by R help mailing list-2
I am unfamiliar with the tm package, but using basic regex tools, is
this what you want:

test <- "Mhm . Alright . There's um a young boy that's getting a
cookie jar . And it he's uh in bad shape because uh the thing is
falling over . And in the picture the mother is washing dishes and
doesn't see it . And so is the the water is overflowing in the sink .
And the dishes might get falled over if you don't fell fall over there
there if you don't get it . And it there it's a picture of a kitchen
window . And the curtains are very uh distinct . But the water is
still flowing ."

out <- strsplit(test, " ") ## creates a list whose only component is a
vector of the words

stopw <- c("a","the") ## or whatever they are

sum(grepl(paste(stopw,collapse="|"), out[[1]]))

## If you want to include ".", a regex special character, add:
sum(grepl(".",out[[1]],fixed=TRUE))


If this is all nonsense, just ignore -- and sorry I couldn't help.

-- Bert




Cheers,
Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Jun 12, 2017 at 8:23 AM, Elahe chalabi <[hidden email]> wrote:

> Thanks for your reply. I know the command
> data <- tm_map(data, removeWords, stopwords("english"))
> removes English stop words, I don't know how should I count stop words of my string:
>
>
> str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the sink . And the dishes might get falled over if you don't fell fall over there there if you don't get it . And it there it's a picture of a kitchen window . And the curtains are very uh distinct . But the water is still flowing .
>
>
>
>
>
> On Monday, June 12, 2017 7:24 AM, Patrick Casimir <[hidden email]> wrote:
>
>
>
> You can define stop words as below.
> data <- tm_map(data, removeWords, stopwords("english"))
>
>
> Patrick Casimir, PhD
> Health Analytics, Data Science, Big Data Expert & Independent Consultant
> C: 954.614.1178
>
> ________________________________
>
> From: R-help <[hidden email]> on behalf of Bert Gunter <[hidden email]>
> Sent: Monday, June 12, 2017 10:12:33 AM
> To: Elahe chalabi
> Cc: R-help Mailing List
> Subject: Re: [R] count number of stop words in R
>
> You can use regular expressions.
>
> ?regex and/or the stringr package are good places to start.  Of
> course, you have to define "stop words."
>
>
> Cheers,
> Bert
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Mon, Jun 12, 2017 at 5:40 AM, Elahe chalabi via R-help
> <[hidden email]> wrote:
>> Hi all,
>>
>> Is there a way in R to count the number of stop words (English) of a string using tm package?
>>
>> str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the
> sink . And the dishes might get falled over if you don't fell fall over there there if you don't get it . And it there it's a picture of a kitchen window . And the curtains are very uh distinct . But the water is still flowing .
>>
>> 255 Levels: A boy's on the uh falling off the stool picking up cookies . The girl's reaching up for it . The girl the lady is is drying dishes . The water is uh running over uh from the sink into the floor . The window's opened . Dishes on the on the counter
> . She's outside ."
>>
>> Thanks for any help!
>> Elahe
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.