Sorting text docs based on document meta values in tm()

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Sorting text docs based on document meta values in tm()

Lamke
Hi all,

I wonder if there's any way to reshuffle the text collection by the document meta values.  For instance, if I have 5 documents that correspond to the following meta data:

MetaID Sex Age
0         M    38
0         M    46
0         F     24
0         F     49
0         F     33

Can I reorder the text documents based on the ascending order of age?  Thank you very much!!
Reply | Threaded
Open this post in threaded view
|

Re: Sorting text docs based on document meta values in tm()

Shad Thomas
Hi Kelvin,

I'm new to R and tm myself, however here is a way that you can sort your corpus.  Please keep in mind that there may be a more efficient approach -- but this will get the job done.

Basically, there are three steps (in pseudo code):
1.  Extract the meta data for Age into a list
2.  Sort the list by Age
3.  Create a new corpus by copying entries from the old corpus in age order

Your actual code would look something like this:
agelist <- lapply(mycorpus, meta, tag = "Age")
agedf <- data.frame(age=as.character(agelist))
agelistorder <- order(agedf$age)
mysortedcorpus <- mycorpus[agelistorder,]

IHTH,
Shad Thomas
www.glassboxresearch.com
Kelvin Lam wrote
Hi all,

I wonder if there's any way to reshuffle the text collection by the document meta values.  For instance, if I have 5 documents that correspond to the following meta data:

MetaID Sex Age
0         M    38
0         M    46
0         F     24
0         F     49
0         F     33

Can I reorder the text documents based on the ascending order of age?  Thank you very much!!