excel - Naive Bayes Ticket Classification Python -
i have csv export our ticketing system 2 columns.
short description , class.
both created agent when logging ticket. eg
- data backup not working,backup
- email change in groups,notes
- backup directory not found,backup
- email > global - lotus notes,notes
i have been asked write naive bayes program using python read short description in csv file , decide how should classified.
i have 329 tickets have been classified 6 different classes.
the following count of each:
- class1 60
- class2 77
- class3 65
- class4 16
- class5 18
- class6 93
i thinking have create 6 different dictionaries (one each class) containing words used in short description, excluding usual !"£$%^&*()<>,./?:;@'#~][{}
then when run program tokenize short description using nltk , compare dictionaries , whatever 1 has highest matches determine class.
am going right way? how many tickets should using sample?
the following have @ moment. runs through csv file named after class , outputs file punctuation removed, words in lower case , in separate cells. data used dictionary. i'm not sure if i'm going whole thing right way though.
import csv nltk.tokenize import regexptokenizer #read csv readfile = open ('backup.csv', 'r') csv.readfile = csv.reader(readfile) resultfile = open ('result.csv', 'w') wr = csv.writer(resultfile) #removes punctuation tokenizer = regexptokenizer(r'\w+') #for every row in file tokenize , covert lowercase #write tokenized words .csv file. row in csv.readfile: wr.writerow(tokenizer.tokenize(row[0].lower())) readfile.close() resultfile.close()
edit: have started using following takes in data 2 column csv file:
from textblob.classifiers import naivebayesclassifier textblob import textblob open('train.csv', 'r') fp: cl = naivebayesclassifier(fp, format="csv") print(cl.classify("backup")) # "backup" print(cl.classify("lotus notes.")) #"lotus" etc..
pretty sure need better sample size of training , test data , feed in csv of short descriptions update class has been calculated.
from functionality point of view seems work unless i've made glaring mistakes?
Comments
Post a Comment