RAG applications use retrieval-augmented generation (RAG) to provide users with natural language interfaces by searching for relevant information and then passing it to a large language model (LLM) to generate the best response. The quality of context is crucial in RAG systems, as without the right context, answers won't be as useful. Building a RAG application is fundamentally different from building an LLM interface, requiring custom rules and limitations to ensure accurate responses. To enable search, documents are embedded as vectors, which requires deciding on chunk size and using techniques such as window strategy to capture unique keywords, meaning, and synonyms. Model temperature and accuracy thresholds also need to be considered when setting up LMs for RAG applications, with parameters controlling the behavior of the model and its ability to generate responses. ANN search is used in RAG applications to find embedded vectors most relevant to a query, providing relevance scores critical for generating meaningful responses.