Company
Date Published
Sept. 12, 2024
Author
Bryce Dubayah, Philip Kiely
Word count
604
Language
English
Hacker News points
None

Summary

A new feature has been introduced in TensorRT-LLM Engine Builder to generate structured output during LLM inference. This includes JSON mode, where model output matches a given JSON schema, and function calling, where the LLM selects from provided tools to accomplish a task. Both functionalities have no marginal impact on tokens per second and are available for all LLMs deployed using the Engine Builder. The new features aim to address challenges in integrating LLMs with structured data, enabling developers to call LLMs with guaranteed output structure while adding negligible latency.