Company
Date Published
Nov. 14, 2023
Author
Jesse Kipp, Celso Martinho
Word count
863
Language
English
Hacker News points
None

Summary

Cloudflare has launched a serverless GPU-powered inference platform called Workers AI, which offers off-the-shelf models running seamlessly with its existing Worker service. The platform allows developers to build powerful and scalable AI applications quickly. Key features include streaming responses for all Large Language Models (LLMs), larger context and sequence windows, and a full-precision Llama-2 model variant. In addition, server-sent events are now supported in the browser API for streaming text responses with LLM models. The introduction of higher precision and longer context and sequence lengths will provide a better user experience and enable new applications using large language models.