**Installing Kaldi**
cd /home/liam
mkdir git
cd git
git clone https://github.com/kaldi-asr/kaldi.git
cd kaldi/tools
bash extras/check_dependencies.sh
sudo apt-get update
extras/check_dependencies.sh: Compiler 'g++' is not installed.
extras/check_dependencies.sh: You need g++ >= 4.8.3, Apple Xcode >= 5.0 or clang >= 3.3.
extras/check_dependencies.sh: make is not installed.
extras/check_dependencies.sh: automake is not installed.
extras/check_dependencies.sh: autoconf is not installed.
extras/check_dependencies.sh: unzip is not installed.
extras/check_dependencies.sh: sox is not installed.
extras/check_dependencies.sh: gfortran is not installed
extras/check_dependencies.sh: neither libtoolize nor glibtoolize is installed
extras/check_dependencies.sh: subversion is not installed
extras/check_dependencies.sh: python2.7 is not installed
extras/check_dependencies.sh: Intel MKL does not seem to be installed.
... Run extras/install_mkl.sh to install it. Some distros (e.g., Ubuntu 20.04) provide
... a version of MKL via the package manager, but verify that it is up-to-date.
... You can also use other matrix algebra libraries. For information, see:
... http://kaldi-asr.org/doc/matrixwrap.html
extras/check_dependencies.sh: Some prerequisites are missing; install them using the command:
sudo apt-get install g++ make automake autoconf unzip sox gfortran libtool subversion python2.7
sudo apt-get install g++ make automake autoconf unzip sox gfortran libtool subversion python2.7
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package python2.7
E: Couldn't find any package by glob 'python2.7'
E: Couldn't find any package by regex 'python2.7'
Build python2 from source (https://tecadmin.net/install-python-2-7-on-ubuntu-and-linuxmint/)
cd /usr/src/
sudo wget https://www.python.org/ftp/python/2.7.18/Python-2.7.18.tgz
sudo tar xzf Python-2.7.18.tgz
cd Python-2.7.18/
sudo ./configure --enable-optimizations
sudo make altinstall
cd /home/liam/git/kaldi/tools
make
sudo apt-get install zlib1g-dev
make
cd ../src
./configure --shared
cd ../tools
sudo extras/install_mkl.sh -sp debian intel-mkl-64bit-2020.0-088
cd ../src
./configure --shared
make depend -j 8
make -j 8
cd ../cmake
mkdir -p build
sudo apt-get install cmake
cmake -DCMAKE_INSTALL_PREFIX=../dist ..
cmake --build . --target install -- -j8
**Kaldi is installed!**
Create /home/liam/git/kaldi/egs/abair/asr working directory
Populate this with models from Setanta server:
sudo rsync -azv -e 'ssh -A -J lonergan@phoneticsrv3.lcs.tcd.ie' liam@134.226.89.152:/media/storage/phonetics/kaldi/egs/exp_220802_train11 .
sudo rsync -azv -e 'ssh -A -J lonergan@phoneticsrv3.lcs.tcd.ie' liam@134.226.89.152:/media/storage/phonetics/kaldi/egs/exp_train_march23 .
These directories contain experiments from different timepoints, most recently march of this year compared with systems trained last year. The most recently trained model tends to perform better.
Remake "online" versions of the model directories to ensure correct paths:
steps/online/nnet3/prepare_online_decoding.sh exp_train_march23/chain/tdnn1i_sp_ep9_specaug_subsample/graph_CNnG+CnCB+wp_uniq-0.9-paracrawl+conll173gr/ exp_220802_train11/nnet3/extractor/ exp_train_march23/chain/tdnn1i_sp_ep9_specaug_subsample/ exp_train_march23/chain/tdnn1i_sp_ep9_specaug_subsample_online
Run tcp command:
online2-tcp-nnet3-decode-faster --config=exp_train_march23/chain/tdnn1i_sp_ep9_specaug_subsample_online/conf/online.conf --cmvn-config=exp_train_march23/chain/tdnn1i_sp_ep9_specaug_subsample_online/conf/online_cmvn.conf --max-active=7000 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=1 --frame-subsampling-factor=3 exp_train_march23/chain/tdnn1i_sp_ep9_specaug_subsample_online/final.mdl exp_train_march23/chain/tdnn1i_sp_ep9_specaug_subsample/graph_CNnG+CnCB+wp_uniq-0.9-paracrawl+conll173gr/HCLG.fst exp_train_march23/chain/tdnn1i_sp_ep9_specaug_subsample/graph_CNnG+CnCB+wp_uniq-0.9-paracrawl+conll173gr/words.txt
To explain the positional arguments of this command: final.mdl points to the final acoustic model; HCLG.fst is the decoding graph, so a composition of the acoustic model, pronunciation model and language model; words.txt is the indexed wordlist, effectively all the words recognisable by the model.