excel - Naive Bayes Ticket Classification Python -


i have csv export our ticketing system 2 columns.

short description , class.

both created agent when logging ticket. eg

  • data backup not working,backup
  • email change in groups,notes
  • backup directory not found,backup
  • email > global - lotus notes,notes

i have been asked write naive bayes program using python read short description in csv file , decide how should classified.

i have 329 tickets have been classified 6 different classes.

the following count of each:

  • class1 60
  • class2 77
  • class3 65
  • class4 16
  • class5 18
  • class6 93

i thinking have create 6 different dictionaries (one each class) containing words used in short description, excluding usual !"£$%^&*()<>,./?:;@'#~][{}

then when run program tokenize short description using nltk , compare dictionaries , whatever 1 has highest matches determine class.

am going right way? how many tickets should using sample?

the following have @ moment. runs through csv file named after class , outputs file punctuation removed, words in lower case , in separate cells. data used dictionary. i'm not sure if i'm going whole thing right way though.

import csv nltk.tokenize import regexptokenizer   #read csv readfile = open ('backup.csv', 'r') csv.readfile = csv.reader(readfile)  resultfile = open ('result.csv', 'w') wr = csv.writer(resultfile)  #removes punctuation tokenizer = regexptokenizer(r'\w+')  #for every row in file tokenize , covert lowercase #write tokenized words .csv file. row in csv.readfile:     wr.writerow(tokenizer.tokenize(row[0].lower()))  readfile.close() resultfile.close() 

edit: have started using following takes in data 2 column csv file:

from textblob.classifiers import naivebayesclassifier textblob import textblob  open('train.csv', 'r') fp:      cl = naivebayesclassifier(fp, format="csv")  print(cl.classify("backup"))  # "backup" print(cl.classify("lotus notes."))   #"lotus" etc.. 

pretty sure need better sample size of training , test data , feed in csv of short descriptions update class has been calculated.

from functionality point of view seems work unless i've made glaring mistakes?


Comments

Popular posts from this blog

c# - Where does the .ToList() go in LINQ query result -

android - CollapsingToolbarLayout: position the ExpandedText programmatically -

Listeners to visualise results of load test in JMeter -