User Tools

Site Tools


fa-evaluation

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
fa-evaluation [2018/11/27 11:59]
mganzeboom
fa-evaluation [2018/11/27 12:00] (current)
mganzeboom
Line 5: Line 5:
 **Montreal Forced Aligner (MFA)**: [[https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner]]\\ **Montreal Forced Aligner (MFA)**: [[https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner]]\\
 "MFA is quite user-friendly and easy to use. There are several options, you can use the pre-trained models or you can train the models on your own data/speakers. Attached you will find some examples of why I am not too happy with MFA (especially with the pretrained models), but I am not quite sure how to get a better system. It is especially hard because some of the speakers reduce and the dictionary is not equipped to handle reduction. I have not formally evaluated MFA yet, so I cannot make a general statement about its performance. However looking at just a few files, it seems like that training on the speaker yields better results." ([[https://www.ru.nl/english/people/marcoux-k/|Katherine Marcoux]])\\ "MFA is quite user-friendly and easy to use. There are several options, you can use the pre-trained models or you can train the models on your own data/speakers. Attached you will find some examples of why I am not too happy with MFA (especially with the pretrained models), but I am not quite sure how to get a better system. It is especially hard because some of the speakers reduce and the dictionary is not equipped to handle reduction. I have not formally evaluated MFA yet, so I cannot make a general statement about its performance. However looking at just a few files, it seems like that training on the speaker yields better results." ([[https://www.ru.nl/english/people/marcoux-k/|Katherine Marcoux]])\\
-By default the acoustic models are trained to the stage of speaker-adapted triphones (HMM-GMM). The pretrained models on their website also seem to be HMM-GMM. I assume Katherine uses these pretrained models and trains speaker-adapted models using the pretrained as a basis?\\+By default MFA trains the acoustic models to the stage of speaker-adapted triphones (HMM-GMM). The pretrained models on their website also seem to be HMM-GMM. I assume Katherine uses these pretrained models and trains speaker-adapted models using the pretrained as a basis?\\
 They do have code to also train acoustic models based on DNNs (Kaldi's NNet2), but they say: They do have code to also train acoustic models based on DNNs (Kaldi's NNet2), but they say:
-"The DNN framework for the Montreal Forced aligner is operational, but may not give a better result than the alignments produced by the standard HMM-GMM pipeline. Preliminary experiments suggest that results may improve when the DNN model used to produce alignments is pre-trained on a corpus similar in quality (conversational vs. clean speech) and longer in length than the test corpus." --> https://montreal-forced-aligner.readthedocs.io/en/latest/alignment_techniques.html#deep-neural-networks-dnns ([[https://www.ru.nl/english/people/ganzeboom-m/|Mario Ganzeboom]])+"The DNN framework for the Montreal Forced aligner is operational, but may not give a better result than the alignments produced by the standard HMM-GMM pipeline. Preliminary experiments suggest that results may improve when the DNN model used to produce alignments is pre-trained on a corpus similar in quality (conversational vs. clean speech) and longer in length than the test corpus." --> https://montreal-forced-aligner.readthedocs.io/en/latest/alignment_techniques.html#deep-neural-networks-dnns ([[https://www.ru.nl/english/people/ganzeboom-m/|Mario Ganzeboom]])\\
  
 ===== Kaldi vs. HTK ===== ===== Kaldi vs. HTK =====
fa-evaluation.1543316399.txt.gz ยท Last modified: 2018/11/27 11:59 by mganzeboom