CREATE DICTIONARY WITH TM PACKAGE

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

CREATE DICTIONARY WITH TM PACKAGE

Patrick Casimir
Dear Members & Experts,


Since the Dictionary () function is no longer available with the tm package. How do I use other functions to do the same as below? I want to capture a list of specific terms from a corpus. By example, if my corpus has 102 files. I want to see a list with occurrences of price, crude, oil in all 102 files. When I use the function Dictionary (), I got the error: Error: could not find function "Dictionary"


> d <- Dictionary(c("prostatic", "adenocarcinoma", "grade"))
> inspect(DocumentTermMatrix(docs, list(dictionary = d)))


But if I use the codes below using inspect, the dictionary only returns the terms for 10 files instead of 102. I need a way to get my dictionary to capture and return those terms for all 102 files or whatever other terms I select. I know I am close but inspect () is not the right function.


> myTerms <- c("prostatic", "adenocarcinoma", "grade")
> inspect(DocumentTermMatrix(docs, list(dictionary = myTerms)))

 <<DocumentTermMatrix (documents: 102, terms: 3)>>
 Non-/sparse entries: 292/14
 Sparsity           : 5%
 Maximal term length: 14
 Weighting          : term frequency (tf)
 Sample             :
                Terms
 Docs            adenocarcinoma grade prostatic
   Patient14.txt             11     6         3
   Patient15.txt              7    12         2
   Patient16.txt             13    16         4
   Patient19.txt              5    13         2
   Patient24.txt             11    12         4
   Patient25.txt              8     9         4
   Patient41.txt              8    10         4
   Patient46.txt              8    10         3
   Patient8.txt               9    12         2
   Patient9.txt               8    23         2


Thanks


Patrick Casimir, PhD

Health Analytics, Data Science, Big Data Expert & Independent Consultant
C: 954.614.1178


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

PROBLEM USING DICTIONARY WITH TM PACKAGE

Patrick Casimir
Dear Members & Experts,


Since the Dictionary () function is no longer available with the tm package. How do I use other functions to do the same as below? I want to capture a list of specific terms from a corpus. By example, if my corpus has 102 files. I want to see a list with occurrences of prostatic, adenocarcinoma, grade in all 102 files. When I use the function Dictionary (), I got the error: Error: could not find function "Dictionary"


> d <- Dictionary(c("prostatic", "adenocarcinoma", "grade"))
> inspect(DocumentTermMatrix(docs, list(dictionary = d)))


But if I use the codes below using inspect, the dictionary only returns the terms for 10 files instead of 102. I need a way to get my dictionary to capture and return those terms for all 102 files or whatever other terms I select. I know I am close but inspect () is not the right function.


> myTerms <- c("prostatic", "adenocarcinoma", "grade")
> inspect(DocumentTermMatrix(docs, list(dictionary = myTerms)))

 <<DocumentTermMatrix (documents: 102, terms: 3)>>
 Non-/sparse entries: 292/14
 Sparsity           : 5%
 Maximal term length: 14
 Weighting          : term frequency (tf)
 Sample             :
                Terms
 Docs            adenocarcinoma grade prostatic
   Patient14.txt             11     6         3
   Patient15.txt              7    12         2
   Patient16.txt             13    16         4
   Patient19.txt              5    13         2
   Patient24.txt             11    12         4
   Patient25.txt              8     9         4
   Patient41.txt              8    10         4
   Patient46.txt              8    10         3
   Patient8.txt               9    12         2
   Patient9.txt               8    23         2


Thanks



Patrick Casimir, PhD
Health Analytics, Data Science, Big Data Expert & Independent Consultant
C: 954.614.1178



        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: PROBLEM USING DICTIONARY WITH TM PACKAGE

Jeff Newmiller
Considering the deafening silence after three repeats, one explanation could be that you are asking the wrong group of people. It is also possible that your failure to follow the Posting Guide with regard to using plain text email and a reproducible example [1][2] means that readers who are not experts do not feel inclined to follow along with you and help you think of solutions. Keep in mind that supporting  contributed packages like tm is technically not on topic here, though people often do feel the urge to help solve problems with them anyway.

With regard to asking the wrong group of people I would suggest asking the maintainer of the tm package what they recommend. See the help for the maintainer function or read the CRAN Web page for that package.

[1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

[2] http://adv-r.had.co.nz/Reproducibility.html
--
Sent from my phone. Please excuse my brevity.

On May 19, 2017 7:12:45 AM PDT, Patrick Casimir <[hidden email]> wrote:

>Dear Members & Experts,
>
>
>Since the Dictionary () function is no longer available with the tm
>package. How do I use other functions to do the same as below? I want
>to capture a list of specific terms from a corpus. By example, if my
>corpus has 102 files. I want to see a list with occurrences of
>prostatic, adenocarcinoma, grade in all 102 files. When I use the
>function Dictionary (), I got the error: Error: could not find function
>"Dictionary"
>
>
>> d <- Dictionary(c("prostatic", "adenocarcinoma", "grade"))
>> inspect(DocumentTermMatrix(docs, list(dictionary = d)))
>
>
>But if I use the codes below using inspect, the dictionary only returns
>the terms for 10 files instead of 102. I need a way to get my
>dictionary to capture and return those terms for all 102 files or
>whatever other terms I select. I know I am close but inspect () is not
>the right function.
>
>
>> myTerms <- c("prostatic", "adenocarcinoma", "grade")
>> inspect(DocumentTermMatrix(docs, list(dictionary = myTerms)))
>
> <<DocumentTermMatrix (documents: 102, terms: 3)>>
> Non-/sparse entries: 292/14
> Sparsity           : 5%
> Maximal term length: 14
> Weighting          : term frequency (tf)
> Sample             :
>                Terms
> Docs            adenocarcinoma grade prostatic
>   Patient14.txt             11     6         3
>   Patient15.txt              7    12         2
>   Patient16.txt             13    16         4
>   Patient19.txt              5    13         2
>   Patient24.txt             11    12         4
>   Patient25.txt              8     9         4
>   Patient41.txt              8    10         4
>   Patient46.txt              8    10         3
>   Patient8.txt               9    12         2
>   Patient9.txt               8    23         2
>
>
>Thanks
>
>
>
>Patrick Casimir, PhD
>Health Analytics, Data Science, Big Data Expert & Independent
>Consultant
>C: 954.614.1178
>
>
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: PROBLEM USING DICTIONARY WITH TM PACKAGE

Patrick Casimir
Thanks Jeff. I will try elsewhere.


Patrick Casimir, PhD
Health Analytics, Data Science, Big Data Expert & Independent Consultant
C: 954.614.1178

________________________________
From: Jeff Newmiller <[hidden email]>
Sent: Friday, May 19, 2017 11:04:22 AM
To: [hidden email]; Patrick Casimir; [hidden email]
Subject: Re: [R] PROBLEM USING DICTIONARY WITH TM PACKAGE

Considering the deafening silence after three repeats, one explanation could be that you are asking the wrong group of people. It is also possible that your failure to follow the Posting Guide with regard to using plain text email and a reproducible example [1][2] means that readers who are not experts do not feel inclined to follow along with you and help you think of solutions. Keep in mind that supporting  contributed packages like tm is technically not on topic here, though people often do feel the urge to help solve problems with them anyway.

With regard to asking the wrong group of people I would suggest asking the maintainer of the tm package what they recommend. See the help for the maintainer function or read the CRAN Web page for that package.

[1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

[2] http://adv-r.had.co.nz/Reproducibility.html
--
Sent from my phone. Please excuse my brevity.

On May 19, 2017 7:12:45 AM PDT, Patrick Casimir <[hidden email]> wrote:

>Dear Members & Experts,
>
>
>Since the Dictionary () function is no longer available with the tm
>package. How do I use other functions to do the same as below? I want
>to capture a list of specific terms from a corpus. By example, if my
>corpus has 102 files. I want to see a list with occurrences of
>prostatic, adenocarcinoma, grade in all 102 files. When I use the
>function Dictionary (), I got the error: Error: could not find function
>"Dictionary"
>
>
>> d <- Dictionary(c("prostatic", "adenocarcinoma", "grade"))
>> inspect(DocumentTermMatrix(docs, list(dictionary = d)))
>
>
>But if I use the codes below using inspect, the dictionary only returns
>the terms for 10 files instead of 102. I need a way to get my
>dictionary to capture and return those terms for all 102 files or
>whatever other terms I select. I know I am close but inspect () is not
>the right function.
>
>
>> myTerms <- c("prostatic", "adenocarcinoma", "grade")
>> inspect(DocumentTermMatrix(docs, list(dictionary = myTerms)))
>
> <<DocumentTermMatrix (documents: 102, terms: 3)>>
> Non-/sparse entries: 292/14
> Sparsity           : 5%
> Maximal term length: 14
> Weighting          : term frequency (tf)
> Sample             :
>                Terms
> Docs            adenocarcinoma grade prostatic
>   Patient14.txt             11     6         3
>   Patient15.txt              7    12         2
>   Patient16.txt             13    16         4
>   Patient19.txt              5    13         2
>   Patient24.txt             11    12         4
>   Patient25.txt              8     9         4
>   Patient41.txt              8    10         4
>   Patient46.txt              8    10         3
>   Patient8.txt               9    12         2
>   Patient9.txt               8    23         2
>
>
>Thanks
>
>
>
>Patrick Casimir, PhD
>Health Analytics, Data Science, Big Data Expert & Independent
>Consultant
>C: 954.614.1178
>
>
>
>       [[alternative HTML version deleted]]
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: PROBLEM USING DICTIONARY WITH TM PACKAGE

Patrick Casimir
In reply to this post by Jeff Newmiller
Jeff,


Here is the solution:


myTerms <- c("prostatic", "adenocarcinoma", "grade")
inspect(DocumentTermMatrix(docs, list(dictionary = myTerms)))  ## only returns from first 10 docs in DTM
as.matrix(DocumentTermMatrix(docs, list(dictionary = myTerms)))  ## returns from all docs in the DTM



Patrick Casimir, PhD
Health Analytics, Data Science, Big Data Expert & Independent Consultant
C: 954.614.1178

________________________________
From: Jeff Newmiller <[hidden email]>
Sent: Friday, May 19, 2017 11:04:22 AM
To: [hidden email]; Patrick Casimir; [hidden email]
Subject: Re: [R] PROBLEM USING DICTIONARY WITH TM PACKAGE

Considering the deafening silence after three repeats, one explanation could be that you are asking the wrong group of people. It is also possible that your failure to follow the Posting Guide with regard to using plain text email and a reproducible example [1][2] means that readers who are not experts do not feel inclined to follow along with you and help you think of solutions. Keep in mind that supporting  contributed packages like tm is technically not on topic here, though people often do feel the urge to help solve problems with them anyway.

With regard to asking the wrong group of people I would suggest asking the maintainer of the tm package what they recommend. See the help for the maintainer function or read the CRAN Web page for that package.

[1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

[2] http://adv-r.had.co.nz/Reproducibility.html
--
Sent from my phone. Please excuse my brevity.

On May 19, 2017 7:12:45 AM PDT, Patrick Casimir <[hidden email]> wrote:

>Dear Members & Experts,
>
>
>Since the Dictionary () function is no longer available with the tm
>package. How do I use other functions to do the same as below? I want
>to capture a list of specific terms from a corpus. By example, if my
>corpus has 102 files. I want to see a list with occurrences of
>prostatic, adenocarcinoma, grade in all 102 files. When I use the
>function Dictionary (), I got the error: Error: could not find function
>"Dictionary"
>
>
>> d <- Dictionary(c("prostatic", "adenocarcinoma", "grade"))
>> inspect(DocumentTermMatrix(docs, list(dictionary = d)))
>
>
>But if I use the codes below using inspect, the dictionary only returns
>the terms for 10 files instead of 102. I need a way to get my
>dictionary to capture and return those terms for all 102 files or
>whatever other terms I select. I know I am close but inspect () is not
>the right function.
>
>
>> myTerms <- c("prostatic", "adenocarcinoma", "grade")
>> inspect(DocumentTermMatrix(docs, list(dictionary = myTerms)))
>
> <<DocumentTermMatrix (documents: 102, terms: 3)>>
> Non-/sparse entries: 292/14
> Sparsity           : 5%
> Maximal term length: 14
> Weighting          : term frequency (tf)
> Sample             :
>                Terms
> Docs            adenocarcinoma grade prostatic
>   Patient14.txt             11     6         3
>   Patient15.txt              7    12         2
>   Patient16.txt             13    16         4
>   Patient19.txt              5    13         2
>   Patient24.txt             11    12         4
>   Patient25.txt              8     9         4
>   Patient41.txt              8    10         4
>   Patient46.txt              8    10         3
>   Patient8.txt               9    12         2
>   Patient9.txt               8    23         2
>
>
>Thanks
>
>
>
>Patrick Casimir, PhD
>Health Analytics, Data Science, Big Data Expert & Independent
>Consultant
>C: 954.614.1178
>
>
>
>       [[alternative HTML version deleted]]
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.