#julius #speechrecognition #sre #voxforge #ubuntu #linux #cli #speech2text
Step by step Bash script file: /voxforge/mb-scratch/stepbystep.sh
for easy access:
PATH=$PATH:$HOME/voxforge/bin/htk/bin:$HOME/voxforge/bin/julius-4.3.1/bin:$HOME/voxforge/bin/julia-1.0.2/bin
Add Word from dictionary to the user vocabulary
- cat ~/voxforge/lexicon/VoxForgeDict.txt | awk -F"\t|( ){2,}" '/PENCERE\s/ {print $1 " \t " $3}'
- cat ~/voxforge/lexicon/VoxForgeDict.txt | awk -F"\t|( ){2,}" '/PENCERE\s/ {print $1 " \t " $3}' >> sample.voca
After that update user dictionary:
- julia ../bin/mkdfa.jl sample
#create dfa after editing mb.voca and mb.grammar files. both files should be with the same name.
#run only with name, without extension
julia ../bin/mkdfa.jl /home/m1/sil/net/julius/grammar/mb/mb
This will generate: mb.dfa mb.term mb.dict
#julius' -h parameter accepts hmmdefs and binhmm files.
#load data faster in julius...
#and reduce file size of hmmdefs by converting it to binhmm format by:
mkbinhmm '/home/m1/voxforge/run/acoustic2/hmmdefs' '/home/m1/voxforge/run/acoustic2/binhmm'
#file size reduced from 6.2 MB to 1.9 MB
for live and short output:
./julius -C mb.jconf -input mic -demo
only result:
./julius -C mb.jconf -input mic -quiet
#transcribe audio files..
$ ls test.wav > test.txt
$ julius -input rawfile -filelist test.txt -C julian.jconf
https://linuxsagas.digitaleagle.net/2012/02/25/voice-recognition-in-ubuntu/
ENVR-v5.4.Dnn.Bin.zip giving better results for english:
http://www.repository.voxforge1.org/downloads/Main/Trunk/AcousticModels/Julius/
add extra dictionary:
-adddict '/home/m1/voxforge/tutorial/sample.dict'
[-adddict dictfile] (n-gram) load extra dictionary
[-addentry entry] (n-gram) load extra word entry
Record live audio as WAV files in a driectory:
- julius -C '/home/m1/voxforge/howto/julius.jconf' -input mic -quiet -record ~/voxforge/live_records/
Lookup for phoneme of a word
http://www.voxforge.org/home/dev/autoaudioseg/step-2
- festival> (lex.lookup "internet")
("internet" nil (((ih n t) 1) ((er n) 0) ((eh t) 1)))
"ih n t er n eh t" can be used in lexicon.
Another way to create a phonetic:
- espeak -v tr -x
Türkçe
tYRktS'E