9 |
Show HN: Baseten Chains – Framework and SDK for Multi-Model AI Products |
2024-06-27 |
2 |
Open Source Inference Engine Baseten Raises $40M from IVP, Spark and Greylock |
2024-03-14 |
2 |
How to double tokens per second for Llama 3 with Medusa |
2024-08-20 |
2 |
Show HN: Automatically Build Nvidia TRT-LLM Engines |
2024-08-01 |
2 |
FP8: Efficient model inference with 8-bit floating point numbers |
2024-03-08 |
1 |
How to build function calling and JSON mode for open-source and fine-tuned LLMs |
2024-09-12 |
1 |
Show HN: 60% higher tokens per second for 70B custom LLMs |
2024-07-31 |
1 |
Introduction to quantizing machine learning models |
2024-02-16 |
1 |
Deploying custom ComfyUI workflows as APIs |
2024-11-20 |