Releasing our v8 Transcription Model - 18.72% Better Accuracy

Company

AssemblyAI

Date Published

Oct. 19, 2021

Author

Dylan Fox

Word count

787

Language

English

Hacker News points

None

URL

www.assemblyai.com/blog/releasing-our-v8-transcription-model-major-accuracy-improvements

Summary

AssemblyAI has released its most accurate Speech Recognition model to date, version 8 (v8), which delivers significant accuracy improvements across various types of audio and video data. The v8 model also introduces a major improvement in proper noun recognition. The company's research team, comprising AI researchers and engineers from leading technology companies, constantly researches and improves the models that power its Speech-to-Text API and other features like Topic Detection. By the end of 2022, AssemblyAI aims to develop speech recognition models approaching human level accuracy for challenging audio and video files with heavy accents and background noise. The v8 model's improvements include enhanced use of Transformers, interleaving Convolution Neural Network layers between Transformer layers, improved regularization via Layer Norm, jointly trained Language Model, and the use of word pieces instead of individual characters for predictions.