cmusphinx - Build NEW Acoustic model, Dictionary , Language model for uncommon language speech recognition -
i want build new acoustic model ,new dictionary ,new language model "sinhala language speech recognition
" sinhala language characters unicode based. example a=අ,i=ඉ,u=උ,ka=ක,ba=බ. did go through cmusphinx tutorial developers. did not me. works english language.
language model should arpa model. , how can map sinhala unicode english phonemes , how train language model different voices. there tool available generate unicode based language model?
overall, not complex. first need split task on parts: build phonetic dictionary, build language model, build acoustic model. start phonetic dictionary.
you need write python script map unicode input transliteration:
රට r tt එකඟයි e k ng yi අවසර දිම v s r d m
basically every write corresponding transliteration. need do, later can feed list of words script , dictionary in cmusphinx format. part covered in tutorial
http://cmusphinx.sourceforge.net/wiki/tutorialdict
once have transliteration tool can proceed language model. need lot of texts build language model. can download texts wikipedia or local newspaper. can use language model toolkit create arpa model. of them support unicode - srilm, mitlm, irstlm, can use of them. part covered in tutorial
http://cmusphinx.sourceforge.net/wiki/tutoriallm
third step create acoustic model. need record audio or segment existing recordings , start training. part covered in tutorial
Comments
Post a Comment