113 |
A guide to open-source LLM inference and performance |
2023-11-20 |
51 |
How we got Stable Diffusion XL inference to under 2 seconds |
2023-08-31 |
9 |
Show HN: Baseten Chains – Framework and SDK for Multi-Model AI Products |
2024-06-27 |
3 |
SDXL inference in under 2 seconds |
2023-08-31 |
2 |
Open Source Inference Engine Baseten Raises $40M from IVP, Spark and Greylock |
2024-03-14 |
2 |
Faster Mixtral inference with TensorRT-LLM and quantization |
2023-12-27 |
2 |
How to double tokens per second for Llama 3 with Medusa |
2024-08-20 |
2 |
Show HN: Automatically Build Nvidia TRT-LLM Engines |
2024-08-01 |
2 |
FP8: Efficient model inference with 8-bit floating point numbers |
2024-03-08 |
1 |
How to build function calling and JSON mode for open-source and fine-tuned LLMs |
2024-09-12 |
1 |
Show HN: 60% higher tokens per second for 70B custom LLMs |
2024-07-31 |
1 |
Introduction to quantizing machine learning models |
2024-02-16 |
1 |
Three techniques to adapt LLMs for any use case |
2023-06-15 |
402 |
Show HN: ChatLLaMA – A ChatGPT style chatbot for Facebook's LLaMA |
2023-03-22 |
16 |
Show HN: Fine-tune generative models in 1 line of code |
2023-03-01 |
1 |
Deploying custom ComfyUI workflows as APIs |
2024-11-20 |