Streaming and longer context lengths for LLMs on Workers AI
Cloudflare has launched a serverless GPU-powered inference platform called Workers AI, which offers off-the-shelf models running seamlessly with its existing Worker service. The platform allows developers to build powerful and scalable AI applications quickly. Key features include streaming responses for all Large Language Models (LLMs), larger context and sequence windows, and a full-precision Llama-2 model variant. In addition, server-sent events are now supported in the browser API for streaming text responses with LLM models. The introduction of higher precision and longer context and sequence lengths will provide a better user experience and enable new applications using large language models.
Company
Cloudflare
Date published
Nov. 14, 2023
Author(s)
Jesse Kipp, Celso Martinho
Word count
863
Language
English
Hacker News points
None found.