This is an old revision of the document!
Authors: Linde Kuijpers (student assistent), Mario Ganzeboom (PhD
student, m.ganzeboom@let.ru.nl)
Last changes in code: 16-03-2018
Last changes in readme: 22-03-2018
Current location: /vol/tensusers/mganzeboom/clst-asr_forced-alignment
Function: forced alignment of speech recordings using NNet2 online acoustic models trained with the Kaldi ASR toolkit (http://kaldi-asr.org). Currently, this tool includes acoustic models for Dutch trained on the Spoken Dutch Corpus (SDC, or 'CGN' in Dutch) by the Open-source Nederlandse spraakherkenning project (http://www.opensource-spraakherkenning.nl). The directories provided as arguments to the run.sh script are added as separate jobs to the Slurm job queue manager on Ponyland (done via the Slurm 'sbatch' command). In other words, these directories are processed parallelly in separate jobs to speed up the forced alignment.
Details acoustic models: trained on all SDC-components, on which 3-fold speed perturbation is applied. This boils down to 1170 hours of speech data in total. The models are trained on the default high-res MFCCs (40 bins). Training parameters: initial_effective_lrate=0.0015, final_effective_lrate=0.00015.
Input: One or more directories which you have write access to that contain wav files and corresponding transcription files in Praat TextGrid format (Short or Long textgrid format does not matter and having the same name as the wav file with a .tg extension). Note: the forced alignment script contains a subscript that can convert the wav files to the required 16 khz mono format using sox. You can configure this in the force alignment script's config file (see step 2 below).
Output: Alignments in Praat TextGrid format (*_aligned.TextGrid) on word and phone level. Additionally, these files contain a tier with the transcription (and possible other tiers from the transcription textgrids). By default, the alignment files are stored in the same directory as the wav and transcription files.
How to run the script:
Login to see discussion on improvements below.