/plushcap/analysis/baseten/baseten-how-to-build-function-calling-and-json-mode-for-open-source-and-fine-tuned-llms

How to build function calling and JSON mode for open-source and fine-tuned LLMs

What's this blog post about?

NVIDIA has announced support for function calling and structured output for LLMs deployed with its TensorRT-LLM Engine Builder, adding model server level support for two key features. Function calling allows users to pass a set of defined tools to an LLM as part of the request body, while structured output enforces an output schema defined as part of the LLM input. These features are built into NVIDIA's customized version of Triton inference server and use logit biasing to ensure valid tokens are generated during LLM inference. The implementation has minimal latency impact after the first call with a given schema is completed, allowing for efficient use of these new features.

Company
Baseten

Date published
Sept. 12, 2024

Author(s)
Bryce Dubayah, Philip Kiely

Word count
1339

Language
English

Hacker News points
1


By Matt Makai. 2021-2024.