cmusphinx - Build NEW Acoustic model, Dictionary , Language model for uncommon language speech recognition -

- April 15, 2010

i want build new acoustic model ,new dictionary ,new language model "sinhala language speech recognition" sinhala language characters unicode based. example a=අ,i=ඉ,u=උ,ka=ක,ba=බ. did go through cmusphinx tutorial developers. did not me. works english language.

language model should arpa model. , how can map sinhala unicode english phonemes , how train language model different voices. there tool available generate unicode based language model?

overall, not complex. first need split task on parts: build phonetic dictionary, build language model, build acoustic model. start phonetic dictionary.

you need write python script map unicode input transliteration:

රට  r tt එකඟයි   e k ng yi අවසර දිම    v s r d m

basically every write corresponding transliteration. need do, later can feed list of words script , dictionary in cmusphinx format. part covered in tutorial

http://cmusphinx.sourceforge.net/wiki/tutorialdict

once have transliteration tool can proceed language model. need lot of texts build language model. can download texts wikipedia or local newspaper. can use language model toolkit create arpa model. of them support unicode - srilm, mitlm, irstlm, can use of them. part covered in tutorial

http://cmusphinx.sourceforge.net/wiki/tutoriallm

third step create acoustic model. need record audio or segment existing recordings , start training. part covered in tutorial

http://cmusphinx.sourceforge.net/wiki/tutorialam

Search This Blog

Alconcel

cmusphinx - Build NEW Acoustic model, Dictionary , Language model for uncommon language speech recognition -

Comments

Post a Comment

Popular posts from this blog

c# - Where does the .ToList() go in LINQ query result -

Listeners to visualise results of load test in JMeter -

android - CollapsingToolbarLayout: position the ExpandedText programmatically -