AI21 Labs Co-CEO Yoav Shoham discusses how they built Jamba-Instruct, a foundation model with a context window of 256K tokens, to close the gap between claimed and effective context window length. The model is designed to efficiently serve long context workflows and offers a longer context than most competing models. Key questions addressed include whether having a long context window means the model does something useful with it, if long context models can be served with acceptable latency and unit economics, and if long context matters as much in RAGish days.