Introducing Our New Punctuation Restoration and Truecasing Models
New models for Punctuation Restoration and Truecasing have been introduced, outperforming previous production models on various data and metrics. The new models show significant improvements in handling casing for challenging linguistic types such as mixed-case words (+39% F1 score), acronyms (+20% F1 score), and capital-case (+11% F1 score). Overall, there is a 17% relative improvement on average across test datasets for predicting upper-case letter classification. Punctuation accuracy improves by 11% (F1 score). The new models are already in production, with API users automatically benefiting from the upgrades.
Company
AssemblyAI
Date published
Nov. 8, 2023
Author(s)
Marco Ramponi
Word count
1759
Hacker News points
None found.
Language
English