Deep Learning Paper Recap - Redundancy Reduction and Sparse MoEs
"Barlow Twins" introduces a novel self-supervised learning (SSL) solution that doesn't require negative instances. Unlike most SSL algorithms based on contrastive learning, Barlow Twins avoids collapse by measuring the cross correlation matrix between outputs of two identical networks fed with distorted versions of a sample, aiming to make it as close to the identity matrix as possible. Additionally, batch normalization of features prior to the Barlow Twins loss is crucial for preventing collapse. This technique has shown competitive performance compared to state-of-the-art contrastive methods like SimCLR. In "Sparse MoEs Meet Efficient Ensembles," the paper explores using sparse Mixtures of Experts (MoE) and model ensembles together. MoE are neural networks that use dynamic routing at the token level to execute subgraphs, allowing for a larger parameter count than dense counterparts while maintaining the same compute requirements. The results show that sparse MoEs and static ensembles can have complementary features and benefits, providing higher accuracy, more robustness, and better calibration when used together. This suggests that even as the number of experts in an MoE increases, there is still additional value added by incorporating more models into a traditional model ensemble.
Company
AssemblyAI
Date published
Aug. 17, 2022
Author(s)
Domenic Donato, Kevin Zhang
Word count
470
Language
English
Hacker News points
None found.