User Tools

Site Tools


language_modeling

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
language_modeling [2015/03/31 16:30]
mganzeboom
language_modeling [2015/03/31 16:40] (current)
mganzeboom
Line 30: Line 30:
 ==== Commands to create a simple unsmoothed bigram text model ==== ==== Commands to create a simple unsmoothed bigram text model ====
 Also see this short and practical tutorial as part of a Linguistics course at UC San Diego [[http://idiom.ucsd.edu/~rlevy/teaching/2015winter/lign165/lectures/lecture13/lecture13_ngrams_with_SRILM.pdf|here]] for some notes on smoothed en unsmoothed models.\\ Also see this short and practical tutorial as part of a Linguistics course at UC San Diego [[http://idiom.ucsd.edu/~rlevy/teaching/2015winter/lign165/lectures/lecture13/lecture13_ngrams_with_SRILM.pdf|here]] for some notes on smoothed en unsmoothed models.\\
-When you have your vocabulary and corpus text files ready, the following command from SRI LM Toolkit will also create a bigram language model and store it in binary format. This format can be used in SPRAAK.+When you have your vocabulary and corpus text files ready, the following command from SRI LM Toolkit will create a bigram language model and store it in the ARPA backoff N-gram format. This format can be used to convert to the SPRAAK binary format by the [[http://www.spraak.org/documentation/doxygen/doc/html/spr__lm__arpabo_8c.html|spr_lm_arpabo]] utility.
  
-''ngram-count -vocab <path_to_vocab_file> -text <path_to_corpus_file> -order <max_length_n-grams_(2_for_bigrams)> -addsmooth <0-9_add_smoothing_of_lm_0_for_none> -lm <path_to_store_n-gram_model_in_n-gram_text_format> -write-binary <path_to_store_n-gram_model_in_binary_format>''+''ngram-count -vocab <path_to_vocab_file> -text <path_to_corpus_file> -order <max_length_n-grams_(2_for_bigrams)> -addsmooth <0-9_add_smoothing_of_lm_0_for_none> -lm <path_to_store_n-gram_model_in_n-gram_text_format>''
  
 For an explanation of this command and the options used, please refer to the above tutorial or the [[http://www.speech.sri.com/projects/srilm/manpages/ngram-count.1.html|man page]]. For an explanation of this command and the options used, please refer to the above tutorial or the [[http://www.speech.sri.com/projects/srilm/manpages/ngram-count.1.html|man page]].
 +
 +To convert this text format to the binary format required by the SPRAAK ASR Toolkit, execute the following command (assuming you have the toolkit on the PATH):\\
 +''spr_lm_arpabo -i <path_to_sri-lm_in_text_format> -o <path_to_store_SPRAAK_binary_lm>''
  
language_modeling.1427812223.txt.gz · Last modified: 2015/03/31 16:30 by mganzeboom