The latest advancements in multimodal deep learning have made it possible to extract high quality data from PDF documents and add it to a Weaviate workflow. Optical Character Recognition (OCR) technology is used to convert different types of visual documents into machine-readable formats, with new models like LayoutLMv3 and Donut leveraging both text and visual information using multimodal transformers. Unstructured, an open-source company working at the cutting edge of PDF processing, allows businesses to ingest diverse data sources and convert them into data that can be passed to a Language Learning Model (LLM). This enables users to chat with their PDFs by converting private documents from their company into text format.