NEED HELP : Association in single DTM

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

NEED HELP : Association in single DTM

Rahul singh
I have free text data in a single text document. I create a corpus, and
then a document term matrix out of it. I can create a word cloud too.

But when I do word association for the same, using "findAssocs(), it always
returns numeric(0).

EX : findAssocs(dtm, "king" ,000000000000000000000.1)

I read on stack overflow that it is because I have a single document.

What is the workaround for the same ?

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: NEED HELP : Association in single DTM

Boris Steipe
If you consider the definition of a DTM, and that findAssoc() computes associations between words as correlations across documents(!), you will realize that you can't what you want from a single document. Indeed, what kind of an "association" would you even be looking for?

B.



> On Nov 15, 2017, at 12:40 AM, Rahul singh <[hidden email]> wrote:
>
> I have free text data in a single text document. I create a corpus, and
> then a document term matrix out of it. I can create a word cloud too.
>
> But when I do word association for the same, using "findAssocs(), it always
> returns numeric(0).
>
> EX : findAssocs(dtm, "king" ,000000000000000000000.1)
>
> I read on stack overflow that it is because I have a single document.
>
> What is the workaround for the same ?
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: NEED HELP : Association in single DTM

Rahul singh
Hi Boris,

In that case, if I have lot of free text data (let us assume part of an
Election speech) in one single TEXT document, and i want to find the
association of the top 3 most frequently occurring words with the other
words in the speech, what method do I adopt ?

On Wed, Nov 15, 2017 at 7:08 PM, Boris Steipe <[hidden email]>
wrote:

> If you consider the definition of a DTM, and that findAssoc() computes
> associations between words as correlations across documents(!), you will
> realize that you can't what you want from a single document. Indeed, what
> kind of an "association" would you even be looking for?
>
> B.
>
>
>
> > On Nov 15, 2017, at 12:40 AM, Rahul singh <[hidden email]>
> wrote:
> >
> > I have free text data in a single text document. I create a corpus, and
> > then a document term matrix out of it. I can create a word cloud too.
> >
> > But when I do word association for the same, using "findAssocs(), it
> always
> > returns numeric(0).
> >
> > EX : findAssocs(dtm, "king" ,000000000000000000000.1)
> >
> > I read on stack overflow that it is because I have a single document.
> >
> > What is the workaround for the same ?
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: NEED HELP : Association in single DTM

Bert Gunter-2
In general, statistical methodology queries, which seems to be your
concern,  are offtopic here.This list is about R programming.  Consider
stats.stackexchange.com  for statistical queries.

However, the CRAN task view on natural language processing might be useful,
so you may wish to check it:

https://cran.r-project.org/web/views/NaturalLanguageProcessing.html

Cheers,
Bert




Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Wed, Nov 15, 2017 at 6:17 PM, Rahul singh <[hidden email]> wrote:

> Hi Boris,
>
> In that case, if I have lot of free text data (let us assume part of an
> Election speech) in one single TEXT document, and i want to find the
> association of the top 3 most frequently occurring words with the other
> words in the speech, what method do I adopt ?
>
> On Wed, Nov 15, 2017 at 7:08 PM, Boris Steipe <[hidden email]>
> wrote:
>
> > If you consider the definition of a DTM, and that findAssoc() computes
> > associations between words as correlations across documents(!), you will
> > realize that you can't what you want from a single document. Indeed, what
> > kind of an "association" would you even be looking for?
> >
> > B.
> >
> >
> >
> > > On Nov 15, 2017, at 12:40 AM, Rahul singh <[hidden email]>
> > wrote:
> > >
> > > I have free text data in a single text document. I create a corpus, and
> > > then a document term matrix out of it. I can create a word cloud too.
> > >
> > > But when I do word association for the same, using "findAssocs(), it
> > always
> > > returns numeric(0).
> > >
> > > EX : findAssocs(dtm, "king" ,000000000000000000000.1)
> > >
> > > I read on stack overflow that it is because I have a single document.
> > >
> > > What is the workaround for the same ?
> > >
> > >       [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/
> > posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: NEED HELP : Association in single DTM

Boris Steipe
No - the CRAN task view is not going to help you at all, since you need to think more about the question that you are trying to ask before you can start worrying about which packages to pursue it with.

In your case this hinges on the question what you mean by "association". In the same phrase? In the same sentence? Adjacent? Or separated by k words? For what k?

Once you come clear on that, we can probably show you ways to translate your procedure into R code. But - as Bert mentioned - we are not well positioned to define the procedure for you.

Boris




> On Nov 15, 2017, at 10:35 PM, Bert Gunter <[hidden email]> wrote:
>
> In general, statistical methodology queries, which seems to be your concern,  are offtopic here.This list is about R programming.  Consider stats.stackexchange.com  for statistical queries.
>
> However, the CRAN task view on natural language processing might be useful, so you may wish to check it:
>
> https://cran.r-project.org/web/views/NaturalLanguageProcessing.html
>
> Cheers,
> Bert
>
>
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
> On Wed, Nov 15, 2017 at 6:17 PM, Rahul singh <[hidden email]> wrote:
> Hi Boris,
>
> In that case, if I have lot of free text data (let us assume part of an
> Election speech) in one single TEXT document, and i want to find the
> association of the top 3 most frequently occurring words with the other
> words in the speech, what method do I adopt ?
>
> On Wed, Nov 15, 2017 at 7:08 PM, Boris Steipe <[hidden email]>
> wrote:
>
> > If you consider the definition of a DTM, and that findAssoc() computes
> > associations between words as correlations across documents(!), you will
> > realize that you can't what you want from a single document. Indeed, what
> > kind of an "association" would you even be looking for?
> >
> > B.
> >
> >
> >
> > > On Nov 15, 2017, at 12:40 AM, Rahul singh <[hidden email]>
> > wrote:
> > >
> > > I have free text data in a single text document. I create a corpus, and
> > > then a document term matrix out of it. I can create a word cloud too.
> > >
> > > But when I do word association for the same, using "findAssocs(), it
> > always
> > > returns numeric(0).
> > >
> > > EX : findAssocs(dtm, "king" ,000000000000000000000.1)
> > >
> > > I read on stack overflow that it is because I have a single document.
> > >
> > > What is the workaround for the same ?
> > >
> > >       [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/
> > posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.