Company
Date Published
Author
Abu Qader, Philip Kiely
Word count
939
Language
English
Hacker News points
2

Summary

The TensorRT-LLM Engine Builder is a tool that automates the process of building optimized model serving engines for open-source and fine-tuned large language models (LLMs) in minutes, replacing hours of manual work previously required. It uses the TensorRT-LLM performance optimization toolbox to create efficient inference servers with low latency and high throughput, compatible with over 50 LLMs and similar models. The engine builder is built into Truss, an open-source model packaging framework, and provides full control over the model server, including autoscaling, logging, and metrics, as well as secure and compliant inference. It can be used to build inference engines maximized for latency, throughput, cost, or a balance thereof, depending on the user's goals, such as supporting concurrent requests or minimizing latency.