Create vectors based on text files

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Create vectors based on text files

-I have about 1000 text files.
-Each text file has a numerical rating from 1-10.
-If the rating is under 5, this results in a category A
-If the rating is 7 or above, this results in a category B

-I'm supposed to create a class with two vectors.
-One vector for category A texts and one vector for category B texts
-The size of these vectors is equal to the entire (unique) vocabulary in the 1000 text files
-I need to populate the vectors with the number of times each word appears in category A and B
-Each position in the two vectors should correspond to the same unique word, e.g. vec_a[24] corresponds to the same word as vec_b[24]

Later on, I'm supposed to plot two Histograms with these vectors. I think it's to see whether certain words appear more often in category A vs B.

Question: How do I go about creating these vectors? I'm really confused about what I need to do.