OpenAI Whisper: speech recognition

2023-04-16 m1gin 1107

#julius #speechrecognition #sre #voxforge #ubuntu #linux #cli #speech2text

Step by step Bash script file: /voxforge/mb-scratch/stepbystep.sh

for easy access:
PATH=$PATH:$HOME/voxforge/bin/htk/bin:$HOME/voxforge/bin/julius-4.3.1/bin:$HOME/voxforge/bin/julia-1.0.2/bin

Add Word from dictionary to the user vocabulary

cat ~/voxforge/lexicon/VoxForgeDict.txt | awk -F"\t|( ){2,}" '/PENCERE\s/ {print $1 " \t " $3}'
cat ~/voxforge/lexicon/VoxForgeDict.txt | awk -F"\t|( ){2,}" '/PENCERE\s/ {print $1 " \t " $3}' >> sample.voca

After that update user dictionary:

julia ../bin/mkdfa.jl sample

#create dfa after editing mb.voca and mb.grammar files. both files should be with the same name.
#run only with name, without extension
julia ../bin/mkdfa.jl /home/m1/sil/net/julius/grammar/mb/mb

This will generate: mb.dfa mb.term mb.dict

#julius' -h parameter accepts hmmdefs and binhmm files.
#load data faster in julius...
#and reduce file size of hmmdefs by converting it to binhmm format by:
mkbinhmm '/home/m1/voxforge/run/acoustic2/hmmdefs' '/home/m1/voxforge/run/acoustic2/binhmm'
#file size reduced from 6.2 MB to 1.9 MB

for live and short output:
./julius -C mb.jconf -input mic -demo

only result:
./julius -C mb.jconf -input mic -quiet

#transcribe audio files..
$ ls test.wav > test.txt
$ julius -input rawfile -filelist test.txt -C julian.jconf

https://linuxsagas.digitaleagle.net/2012/02/25/voice-recognition-in-ubuntu/

ENVR-v5.4.Dnn.Bin.zip giving better results for english:
http://www.repository.voxforge1.org/downloads/Main/Trunk/AcousticModels/Julius/

add extra dictionary:
-adddict '/home/m1/voxforge/tutorial/sample.dict'
[-adddict dictfile] (n-gram) load extra dictionary
[-addentry entry] (n-gram) load extra word entry

Record live audio as WAV files in a driectory:

julius -C '/home/m1/voxforge/howto/julius.jconf' -input mic -quiet -record ~/voxforge/live_records/

Lookup for phoneme of a word

http://www.voxforge.org/home/dev/autoaudioseg/step-2

festival> (lex.lookup "internet")
("internet" nil (((ih n t) 1) ((er n) 0) ((eh t) 1)))

"ih n t er n eh t" can be used in lexicon.

Another way to create a phonetic:

espeak -v tr -x
Türkçe
tYRktS'E

2023-10-21 m1gin 0

OpenAI Whisper: speech recognition

#whisper #speechrecognition #sre #speech2text

https://github.com/openai/whisper
https://github.com/guillaumekln/faster-whisper
https://github.com/jordimas/whisper-ctranslate2

whisper models with auto_subtitle

large

+ whole sentence
+ punctuations
+ subtitle lines are short
- takes too much time on CPU

medium.en

- doesn't care much about punctuation

medium

+ whole sentence.
+ punctuations.
- subtitle lines are too long.

small.en

+ accuracy
+ works on GPU
- subtitle lines start with capital letter, irrespective of it is the beginning of the sentence or not.

to edit subtitles online:

https://subtitle-horse.com/editor/create-captions

install faster-whisper and ctranslate2

install cuda: https://developer.nvidia.com/cuda-zone

sudo apt-get install libcudnn8

not sure from following. may not be necessary.

sudo apt-get install libcudnn8-dev
sudo apt install nvidia-cudnn

fix erorrs:

RuntimeError: Library libcublas.so.11 is not found or cannot be loaded

export LD_LIBRARY_PATH=/usr/local/cuda-12.2/targets/x86_64-linux/lib/:${LD_LIBRARY_PATH}
sudo ln -s /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcublas.so /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcublas.so.11

Add to:

New List

Contact - About - Help