Company
Date Published
Jan. 13, 2025
Author
Dane Corneil
Word count
1073
Language
English
Hacker News points
None

Summary

The authors evaluate synthetic math datasets with inter-model variability to assess their alignment with downstream tasks, such as solving math problems on a real benchmark. They use the GSM8K-Synthetic dataset and measure the correlation between performance on the synthetic task and the downstream task, finding a strong logarithmic relationship between the two. The strongest correlation is with downstream GSM8K performance, followed closely by MMLU, suggesting that the synthetic dataset taps into the same math reasoning capabilities required to do well on the real benchmark. This approach can be used to sanity check whether a synthetic dataset is engaging the same set of skills as the target task, and the authors plan to explore using these signals to improve the quality of the trained model in future work.