Authors: Linde Kuijpers (student assistent), Mario Ganzeboom (PhD
student, m.ganzeboom@let.ru.nl), Xing Wei (PHD, X.Wei@let.ru.nl)
Last changes in code: 14-04-2019
Last changes in readme: 19-06-2019
Current location: /vol/tensusers/xwei/clst-asr_forced-aligner/
Function: forced alignment of speech recordings using NNet2 online acoustic models trained with the Kaldi ASR toolkit (http://kaldi-asr.org). Currently, this tool includes acoustic models for Dutch trained on the Spoken Dutch Corpus (SDC, or 'CGN' in Dutch) by the Open-source Nederlandse spraakherkenning project (http://www.opensource-spraakherkenning.nl).
The directories provided as arguments to the run.sh script are added as separate jobs to the Slurm job queue manager on Ponyland (done via the Slurm 'sbatch' command). In other words, these directories are processed parallelly in separate jobs to speed up the forced alignment.
Details acoustic models: trained on all SDC-components, on which 3-fold speed perturbation is applied. This boils down to 1170 hours of speech data in total. The models are trained on the default high-res MFCCs (40 bins). Training parameters: initial_effective_lrate=0.0015, final_effective_lrate=0.00015.
Input: One or more directories which you have write access to that contain wav files and corresponding transcription files in Praat TextGrid format (Short or Long textgrid format does not matter and having the same name as the wav file with a .tg extension). Note: the forced alignment script contains a subscript that can convert the wav files to the required 16 khz mono format using sox. You can configure this in the force alignment script's config file (see step 2 below).
Output: Alignments in Praat TextGrid format (*_aligned.TextGrid) on word and phone level. Additionally, these files contain a tier with the transcription (and possible other tiers from the transcription textgrids). By default, the alignment files are stored in the same directory as the wav and transcription files.
How to run the script:
Login to one of the ponies (do not use (the old) applejack because of an older CUDA version!).
Run from your Home directory: /vol/tensusers/xwei/clst-asr_forced-aligner/run.sh <absolute-path-to-directory-with-recordings>. The script will create a directory in your Home dir and copy the default config and lexicon file.
Open ~/clst-asr-fa/align_config.rc with your favourite editor and change the configuration settings to your liking (the defaults are fine on average). Please make sure the tier name in this config file is consistent with with your *.tg files!!!
Run step 2 once again and a job will be added to the Slurm queue manager starting the forced alignment of the provided directory. The logs of this job can be found in ~/clst-asr-fa/slurm-logs/slurm-<job-id>.out. Provide multiple input directories at the command line to queue multiple jobs at once.
The force alignment logs can be found at <absolute-path-to-directory-with-recordings>/logs when all Slurm jobs have completed.
It may occur that certain words from the transcriptions are not in the lexicon provided with the acoustic models. Run the script /vol/tensusers/xwei/clst-asr_forced-aligner/list-missing-words.sh <absolute-path-to-directory-with-recordings> to print a list of these words and their corresponding transcription files to the command line. You could then add a phonemic transcription of these words to your custom lexicon file in ~/clst-asr-fa/lexicon.txt. It is recommended to base these new transcriptions on parts of already existing ones. Afterwards, rerun the script from step 4.
All the audio files and *.tg files will be moved to a folder named source_files (path is <absolute-path-to-directory-with-recordings>/source_files), alongside with a folder named results, which was created to save all the generated *_aligned.TextGrid with alignment results. The log, source_files and results folder are created automatically.
After process finished, please check the log file (path:~/clst-asr-fa/slurm-logs/) if there are some errors.
For the forced aligner for English:
use /vol/tensusers/xwei/clst-eng_forced-aligner/run.sh in step 2
the directory that is created in you home dir is ~/clst-eng-fa
Login to see discussion on improvements below.