/plushcap/analysis/assemblyai/deep-learning-paper-recaps-modality-matching-and-masked-autoencoders

Deep Learning Paper Recaps - Modality Matching and Masked Autoencoders

What's this blog post about?

This week's recaps cover two Deep Learning papers: MAESTRO and Masked Autoencoders that Listen. The first paper proposes a method for learning unified representations from speech and text modalities, outperforming the current State-of-the-Art in ASR tasks. Key findings include the incorporation of lexical information using text-only inputs, improved performance in monolingual and multilingual setups, and efficient representation unification with minimal supervised data. In the second paper, a novel extension of masked autoencoders to audio is presented. The model works by splitting mel spectrograms into patches, masking most patches, and reconstructing them using an encoder-decoder approach. Key findings include the possibility of extending this method to temporal information like audio and video, extremely high patching ratios leading to more robust models in quality and bias settings, and local attention outperforming global for speech domains.

Company
AssemblyAI

Date published
July 27, 2022

Author(s)
Luka Chkhetiani, Ruben Bousbib

Word count
332

Language
English

Hacker News points
None found.


By Matt Makai. 2021-2024.