cmusphinx - Build NEW Acoustic model, Dictionary , Language model for uncommon language speech recognition -


i want build new acoustic model ,new dictionary ,new language model "sinhala language speech recognition" sinhala language characters unicode based. example a=අ,i=ඉ,u=උ,ka=ක,ba=බ. did go through cmusphinx tutorial developers. did not me. works english language.

language model should arpa model. , how can map sinhala unicode english phonemes , how train language model different voices. there tool available generate unicode based language model?

overall, not complex. first need split task on parts: build phonetic dictionary, build language model, build acoustic model. start phonetic dictionary.

you need write python script map unicode input transliteration:

රට  r tt එකඟයි   e k ng yi අවසර දිම    v s r d m 

basically every write corresponding transliteration. need do, later can feed list of words script , dictionary in cmusphinx format. part covered in tutorial

http://cmusphinx.sourceforge.net/wiki/tutorialdict

once have transliteration tool can proceed language model. need lot of texts build language model. can download texts wikipedia or local newspaper. can use language model toolkit create arpa model. of them support unicode - srilm, mitlm, irstlm, can use of them. part covered in tutorial

http://cmusphinx.sourceforge.net/wiki/tutoriallm

third step create acoustic model. need record audio or segment existing recordings , start training. part covered in tutorial

http://cmusphinx.sourceforge.net/wiki/tutorialam


Comments

Popular posts from this blog

How has firefox/gecko HTML+CSS rendering changed in version 38? -

javascript - Complex json ng-repeat -

jquery - Cloning of rows and columns from the old table into the new with colSpan and rowSpan -