excel - Naive Bayes Ticket Classification Python -

- February 15, 2013

i have csv export our ticketing system 2 columns.

short description , class.

both created agent when logging ticket. eg

data backup not working,backup
email change in groups,notes
backup directory not found,backup
email > global - lotus notes,notes

i have been asked write naive bayes program using python read short description in csv file , decide how should classified.

i have 329 tickets have been classified 6 different classes.

the following count of each:

class1 60
class2 77
class3 65
class4 16
class5 18
class6 93

i thinking have create 6 different dictionaries (one each class) containing words used in short description, excluding usual !"£$%^&*()<>,./?:;@'#~][{}

then when run program tokenize short description using nltk , compare dictionaries , whatever 1 has highest matches determine class.

am going right way? how many tickets should using sample?

the following have @ moment. runs through csv file named after class , outputs file punctuation removed, words in lower case , in separate cells. data used dictionary. i'm not sure if i'm going whole thing right way though.

import csv nltk.tokenize import regexptokenizer   #read csv readfile = open ('backup.csv', 'r') csv.readfile = csv.reader(readfile)  resultfile = open ('result.csv', 'w') wr = csv.writer(resultfile)  #removes punctuation tokenizer = regexptokenizer(r'\w+')  #for every row in file tokenize , covert lowercase #write tokenized words .csv file. row in csv.readfile:     wr.writerow(tokenizer.tokenize(row[0].lower()))  readfile.close() resultfile.close()

edit: have started using following takes in data 2 column csv file:

from textblob.classifiers import naivebayesclassifier textblob import textblob  open('train.csv', 'r') fp:      cl = naivebayesclassifier(fp, format="csv")  print(cl.classify("backup"))  # "backup" print(cl.classify("lotus notes."))   #"lotus" etc..

pretty sure need better sample size of training , test data , feed in csv of short descriptions update class has been calculated.

from functionality point of view seems work unless i've made glaring mistakes?

Search This Blog

Alconcel

excel - Naive Bayes Ticket Classification Python -

Comments

Post a Comment

Popular posts from this blog

c# - Where does the .ToList() go in LINQ query result -

Listeners to visualise results of load test in JMeter -

android - CollapsingToolbarLayout: position the ExpandedText programmatically -