-I have about 1000 text files.
-Each text file has a numerical rating from 1-10.
-If the rating is under 5, this results in a category A
-If the rating is 7 or above, this results in a category B
-I'm supposed to create a class with two vectors.
-One vector for category A texts and one vector for category B texts
-The size of these vectors is equal to the entire (unique) vocabulary in the 1000 text files
-I need to populate the vectors with the number of times each word appears in category A and B
-Each position in the two vectors should correspond to the same unique word, e.g. vec_a corresponds to the same word as vec_b
Later on, I'm supposed to plot two Histograms with these vectors. I think it's to see whether certain words appear more often in category A vs B.
Question: How do I go about creating these vectors? I'm really confused about what I need to do.