how to use pridect function with text gategories

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

how to use pridect function with text gategories

amine016
This post was updated on .
hello sir,

I have been having trouble with the predict function underestimating (or overestimating) the predictions for new text category (or it's class if thay sport...health....politcs)

firstly i import a tdm matrix of my corpus than

split my data training / test to use in my modele with knn algorithem

and it's works fine

now i need to import new text unknown gategory to pridect it but i did not how to do that

i don't how to use pridect function

 
 library(tm)
    # KNN model
    library(class)
    # Stemming words
    library(SnowballC)
    # CrossTable
    library('gmodels')

    # Read csv with columns: Document , Terms and category
    PathFile <- read.csv(file.choose(), sep =";", header = TRUE)
    PathFilenameUnk<-read.csv(file.choose(), sep =",", header = TRUE)
    #Strectur of Csv file
    str(PathFile)

    # Split data by rownumber into two equal portions
    train <- sample(nrow(PathFile), ceiling(nrow(PathFile) * .70))
    test <- (1:nrow(PathFile))[- train]

    ##Show Training Data
    train
    ##Show Test Data
    test

    # Isolate classifier
    cl <- PathFile[, "Category"]

    # Create model data and remove "category"
    modeldata <- PathFile[,!colnames(PathFile) %in% "Category"]

    # Create model: training set, test set, training set classifier
    knn.pred <- knn(modeldata[train, ], modeldata[test, ], cl[train], 70)
    knn.pred
    # Confusion matrix
    conf.mat <- table("Predictions" = knn.pred, Actual = cl[test])
    conf.mat

   CrossTable(x = cl[test], y = knn.pred, prop.chisq=FALSE)

    predict(knn.pred,PathFilenameUnk) ### error here!!!!

    # Accuracy
    (accuracy <- sum(diag(conf.mat))/length(test) * 100)

    # Create data frame with test data and predicted category
    df.pred <- cbind(knn.pred, modeldata[test, ])
    write.table(df.pred, file="output.csv", sep=";")

Predict_TDM_2018_05_09_225025.csv

TDM_2018_05_09_225323.csv

and here is my csv file:

i know i had de the same step for unknown text and import theme as dtm matrix

but i some thing wrong !!!

thanks an advence

note:
TDM_2018_05_09_225323.csv this orignal file i use with this script

Predict_TDM_2018_05_09_225025.csv  this file is what i need to pridect how to use it with pridect function
Reply | Threaded
Open this post in threaded view
|

Re: how to use pridect function with text gategories

amine016
any help !!!