AI Research Review - Spelling and ASR
The paper "Towards Contextual Spelling Correction for Customization of End-to-end Speech Recognition Systems" proposes a general ASR biasing solution that is domain-insensitive and can be adopted in various scenarios. A Seq2Seq model corrects the spelling of rare words or proper nouns by considering both ASR hypotheses and external context words/phrases. Combining Shallow Fusion with Contextual Spelling Correction reduces Word Error Rate (WER). The model is efficient for large context phrase lists during training and inference. It works well on high OOV rate test sets, indicating that it learns error patterns at the subword level rather than word-level. ASR biasing post-processing can improve proper noun detection in end-to-end ASR compared to encoder biasing methods like Contextual RNN-T or CLAS. Non-AutoRegressive (NAR) models are faster for inference, speeding it up by 2.1 times compared to AutoRegressive solutions.
Company
AssemblyAI
Date published
Sept. 8, 2022
Author(s)
Taufiquzzaman Peyash
Word count
215
Language
English
Hacker News points
None found.