Introducing function calling and structured output for open-source and fine-tuned LLMs
A new feature has been introduced in TensorRT-LLM Engine Builder to generate structured output during LLM inference. This includes JSON mode, where model output matches a given JSON schema, and function calling, where the LLM selects from provided tools to accomplish a task. Both functionalities have no marginal impact on tokens per second and are available for all LLMs deployed using the Engine Builder. The new features aim to address challenges in integrating LLMs with structured data, enabling developers to call LLMs with guaranteed output structure while adding negligible latency.
Company
Baseten
Date published
Sept. 12, 2024
Author(s)
Bryce Dubayah, Philip Kiely
Word count
604
Language
English
Hacker News points
None found.