This is an old revision of the document!

CLST ASR Forced Aligner

Authors: Linde Kuijpers (student assistent), Mario Ganzeboom (PhD student, m.ganzeboom@let.ru.nl)
Last changes in code: 16-03-2018
Last changes in readme: 22-03-2018
Current location: /vol/tensusers/mganzeboom/clst-asr_forced-alignment

Function: forced alignment of speech recordings using NNet2 online acoustic models trained with the Kaldi ASR toolkit (http://kaldi-asr.org). Currently, this tool includes acoustic models for Dutch trained on the Spoken Dutch Corpus (SDC, or 'CGN' in Dutch) by the Open-source Nederlandse spraakherkenning project (http://www.opensource-spraakherkenning.nl). The directories provided as arguments to the run.sh script are added as separate jobs to the Slurm job queue manager on Ponyland (done via the Slurm 'sbatch' command). In other words, these directories are processed parallelly in separate jobs to speed up the forced alignment.

Details acoustic models: trained on all SDC-components, on which 3-fold speed perturbation is applied. This boils down to 1170 hours of speech data in total. The models are trained on the default high-res MFCCs (40 bins). Training parameters: initial_effective_lrate=0.0015, final_effective_lrate=0.00015.

Input: One or more directories which you have write access to that contain wav files and corresponding transcription files in Praat TextGrid format (Short or Long textgrid format does not matter and having the same name as the wav file with a .tg extension). Note: the forced alignment script contains a subscript that can convert the wav files to the required 16 khz mono format using sox. You can configure this in the force alignment script's config file (see step 2 below).

Output: Alignments in Praat TextGrid format (*_aligned.TextGrid) on word and phone level. Additionally, these files contain a tier with the transcription (and possible other tiers from the transcription textgrids). By default, the alignment files are stored in the same directory as the wav and transcription files.

How to run the script:

Login to one of the ponies (do not use (the old) applejack because of an older CUDA version!).
Run from your Home directory: /vol/tensusers/mganzeboom/clst-asr-forced-aligner/run.sh <absolute-path-to-directory-with-recordings>. The script will create a directory in your Home dir and copy the default config and lexicon file.
Open ~/clst-asr-fa/align_config.rc with your favourite editor and change the configuration settings to your liking (the defaults are fine on average).
Run step 2 once again and a job will be added to the Slurm queue manager starting the forced alignment of the provided directory. The logs of this job can be found in ~/clst-asr-fa/slurm-logs/slurm-<job-id>.out. Provide multiple input directories at the command line to queue multiple jobs at once.
The force alignment logs can be found at <absolute-path-to-directory-with-recordings>/logs when all Slurm jobs have completed.
It may occur that certain words from the transcriptions are not in the lexicon provided with the acoustic models. Run the script /vol/tensusers/mganzeboom/clst-asr-forced-aligner/list-missing-words.sh <absolute-path-to-directory-with-recordings> to print a list of these words and their corresponding transcription files to the command line. You could then add a phonemic transcription of these words to your custom lexicon file in ~/clst-asr-fa/lexicon.txt. It is recommended to base these new transcriptions on parts of already existing ones. Afterwards, rerun the script from step 4.

Login to see discussion on improvements below.

CLST-ASR

User Tools

Site Tools

CLST ASR Forced Aligner

Page Tools