excel - Naive Bayes Ticket Classification Python -


i have csv export our ticketing system 2 columns.

short description , class.

both created agent when logging ticket. eg

  • data backup not working,backup
  • email change in groups,notes
  • backup directory not found,backup
  • email > global - lotus notes,notes

i have been asked write naive bayes program using python read short description in csv file , decide how should classified.

i have 329 tickets have been classified 6 different classes.

the following count of each:

  • class1 60
  • class2 77
  • class3 65
  • class4 16
  • class5 18
  • class6 93

i thinking have create 6 different dictionaries (one each class) containing words used in short description, excluding usual !"£$%^&*()<>,./?:;@'#~][{}

then when run program tokenize short description using nltk , compare dictionaries , whatever 1 has highest matches determine class.

am going right way? how many tickets should using sample?

the following have @ moment. runs through csv file named after class , outputs file punctuation removed, words in lower case , in separate cells. data used dictionary. i'm not sure if i'm going whole thing right way though.

import csv nltk.tokenize import regexptokenizer   #read csv readfile = open ('backup.csv', 'r') csv.readfile = csv.reader(readfile)  resultfile = open ('result.csv', 'w') wr = csv.writer(resultfile)  #removes punctuation tokenizer = regexptokenizer(r'\w+')  #for every row in file tokenize , covert lowercase #write tokenized words .csv file. row in csv.readfile:     wr.writerow(tokenizer.tokenize(row[0].lower()))  readfile.close() resultfile.close() 

edit: have started using following takes in data 2 column csv file:

from textblob.classifiers import naivebayesclassifier textblob import textblob  open('train.csv', 'r') fp:      cl = naivebayesclassifier(fp, format="csv")  print(cl.classify("backup"))  # "backup" print(cl.classify("lotus notes."))   #"lotus" etc.. 

pretty sure need better sample size of training , test data , feed in csv of short descriptions update class has been calculated.

from functionality point of view seems work unless i've made glaring mistakes?


Comments

Popular posts from this blog

How has firefox/gecko HTML+CSS rendering changed in version 38? -

javascript - Complex json ng-repeat -

jquery - Cloning of rows and columns from the old table into the new with colSpan and rowSpan -