Company
Date Published
Author
Dylan Fox
Word count
787
Language
English
Hacker News points
None

Summary

AssemblyAI has released its most accurate Speech Recognition model to date, version 8 (v8), which delivers significant accuracy improvements across various types of audio and video data. The v8 model also introduces a major improvement in proper noun recognition. The company's research team, comprising AI researchers and engineers from leading technology companies, constantly researches and improves the models that power its Speech-to-Text API and other features like Topic Detection. By the end of 2022, AssemblyAI aims to develop speech recognition models approaching human level accuracy for challenging audio and video files with heavy accents and background noise. The v8 model's improvements include enhanced use of Transformers, interleaving Convolution Neural Network layers between Transformer layers, improved regularization via Layer Norm, jointly trained Language Model, and the use of word pieces instead of individual characters for predictions.