/plushcap/analysis/assemblyai/ai-research-review-spelling-and-asr

AI Research Review - Spelling and ASR

What's this blog post about?

The paper "Towards Contextual Spelling Correction for Customization of End-to-end Speech Recognition Systems" proposes a general ASR biasing solution that is domain-insensitive and can be adopted in various scenarios. A Seq2Seq model corrects the spelling of rare words or proper nouns by considering both ASR hypotheses and external context words/phrases. Combining Shallow Fusion with Contextual Spelling Correction reduces Word Error Rate (WER). The model is efficient for large context phrase lists during training and inference. It works well on high OOV rate test sets, indicating that it learns error patterns at the subword level rather than word-level. ASR biasing post-processing can improve proper noun detection in end-to-end ASR compared to encoder biasing methods like Contextual RNN-T or CLAS. Non-AutoRegressive (NAR) models are faster for inference, speeding it up by 2.1 times compared to AutoRegressive solutions.

Company
AssemblyAI

Date published
Sept. 8, 2022

Author(s)
Taufiquzzaman Peyash

Word count
215

Language
English

Hacker News points
None found.


By Matt Makai. 2021-2024.