Scraping web pages is a useful way to fetch content for retrieval-augmented generation (RAG) applications, but parsing the content from a web page can be challenging due to irrelevant information like headers and footers. Mozilla's open-source library Readability.js is a helpful tool for extracting just the important parts of a web page, allowing developers to remove irrelevant content and return high-quality results. By using Readability.js in a data pipeline, developers can strip out unnecessary content and focus on the main subject of the page, making it easier to build RAG-powered applications with high relevancy and low latency. The library is battle-tested, powering Firefox's reader mode, and can be used directly or integrated into frameworks like LangChain.js for more complex applications.