Company
Date Published
Author
-
Word count
2823
Language
English
Hacker News points
None

Summary

The study explores the performance of a single ReAct agent architecture when given more domains, tools, and context. The results show that both more context and more tools degrade the agent's performance, with agents requiring longer trajectories degrading more quickly. The top-performing models are o1, o3-mini, and claude-3.5 sonnet, while gpt-4o and llama-3.3-70B perform poorly. Adding irrelevant domains to the agent causes a sharp drop in performance for o3-mini, but not as much for claude-3.5-sonnet. The study also finds that agents with more context tend to forget niche-specific instructions, leading to task failures. The authors plan to explore multi-agent architectures and cross-domain tasks to further test the limitations of single agent architectures.