m1gin 421

#julius #speechrecognition #sre #voxforge #ubuntu #linux #cli #speech2text

Step by step Bash script file: /voxforge/mb-scratch/stepbystep.sh

for easy access:
PATH=$PATH:$HOME/voxforge/bin/htk/bin:$HOME/voxforge/bin/julius-4.3.1/bin:$HOME/voxforge/bin/julia-1.0.2/bin


Add Word from dictionary to the user vocabulary

  • cat ~/voxforge/lexicon/VoxForgeDict.txt | awk -F"\t|( ){2,}" '/PENCERE\s/ {print $1 " \t " $3}'
  • cat ~/voxforge/lexicon/VoxForgeDict.txt | awk -F"\t|( ){2,}" '/PENCERE\s/ {print $1 " \t " $3}' >> sample.voca

After that update user dictionary:

  • julia ../bin/mkdfa.jl sample


#create dfa after editing mb.voca and mb.grammar files. both files should be with the same name.
#run only with name, without extension
julia ../bin/mkdfa.jl /home/m1/sil/net/julius/grammar/mb/mb

This will generate: mb.dfa mb.term mb.dict

#julius' -h parameter accepts hmmdefs and binhmm files.
#load data faster in julius...
#and reduce file size of hmmdefs by converting it to binhmm format by:
mkbinhmm '/home/m1/voxforge/run/acoustic2/hmmdefs' '/home/m1/voxforge/run/acoustic2/binhmm'
#file size reduced from 6.2 MB to 1.9 MB

for live and short output:
./julius -C mb.jconf -input mic -demo

only result:
./julius -C mb.jconf -input mic -quiet

#transcribe audio files..
$ ls test.wav > test.txt
$ julius -input rawfile -filelist test.txt -C julian.jconf

https://linuxsagas.digitaleagle.net/2012/02/25/voice-recognition-in-ubuntu/

ENVR-v5.4.Dnn.Bin.zip giving better results for english:
http://www.repository.voxforge1.org/downloads/Main/Trunk/AcousticModels/Julius/


add extra dictionary:
-adddict '/home/m1/voxforge/tutorial/sample.dict'
[-adddict dictfile] (n-gram) load extra dictionary
[-addentry entry] (n-gram) load extra word entry


Record live audio as WAV files in a driectory:

  • julius -C '/home/m1/voxforge/howto/julius.jconf' -input mic -quiet -record ~/voxforge/live_records/


Lookup for phoneme of a word

http://www.voxforge.org/home/dev/autoaudioseg/step-2

  • festival> (lex.lookup "internet")
    ("internet" nil (((ih n t) 1) ((er n) 0) ((eh t) 1)))

"ih n t er n eh t" can be used in lexicon.

Another way to create a phonetic:

  • espeak -v tr -x
    Türkçe
    tYRktS'E



m1gin 0

#whisper #speechrecognition #sre #speech2text

  • https://github.com/openai/whisper
  • https://github.com/guillaumekln/faster-whisper
  • https://github.com/jordimas/whisper-ctranslate2


whisper models with auto_subtitle

large

  • + whole sentence
  • + punctuations
  • + subtitle lines are short
  • - takes too much time on CPU

medium.en

  • - doesn't care much about punctuation

medium

  • + whole sentence.
  • + punctuations.
  • - subtitle lines are too long.

small.en

  • + accuracy
  • + works on GPU
  • - subtitle lines start with capital letter, irrespective of it is the beginning of the sentence or not.


to edit subtitles online:

https://subtitle-horse.com/editor/create-captions


install faster-whisper and ctranslate2

install cuda: https://developer.nvidia.com/cuda-zone

next:

sudo apt-get install libcudnn8

not sure from following. may not be necessary.

sudo apt-get install libcudnn8-dev

sudo apt install nvidia-cudnn

fix erorrs:

RuntimeError: Library libcublas.so.11 is not found or cannot be loaded

  • export LD_LIBRARY_PATH=/usr/local/cuda-12.2/targets/x86_64-linux/lib/:${LD_LIBRARY_PATH}
  • sudo ln -s /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcublas.so /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcublas.so.11
Add to: