AI Research Review - Spelling and ASR

Company

AssemblyAI

Date Published

Sept. 8, 2022

Author

Taufiquzzaman Peyash

Word count

215

Language

English

Hacker News points

None

URL

www.assemblyai.com/blog/ai-research-review-spelling-and-asr

Summary

The paper "Towards Contextual Spelling Correction for Customization of End-to-end Speech Recognition Systems" proposes a general ASR biasing solution that is domain-insensitive and can be adopted in various scenarios. A Seq2Seq model corrects the spelling of rare words or proper nouns by considering both ASR hypotheses and external context words/phrases. Combining Shallow Fusion with Contextual Spelling Correction reduces Word Error Rate (WER). The model is efficient for large context phrase lists during training and inference. It works well on high OOV rate test sets, indicating that it learns error patterns at the subword level rather than word-level. ASR biasing post-processing can improve proper noun detection in end-to-end ASR compared to encoder biasing methods like Contextual RNN-T or CLAS. Non-AutoRegressive (NAR) models are faster for inference, speeding it up by 2.1 times compared to AutoRegressive solutions.