All blog post summaries for Zilliz
2024
SingleStore vs Neo4j Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 19, 2024
Author(s)
Chloe Williams
Language
English
Word count
2073
Hacker News points
None found.
SingleStore and Neo4j are two popular vector databases used in AI applications, each with its strengths and weaknesses. SingleStore integrates vector search with relational data, scaling for big data, while Neo4j pairs semantic vector search with graph analytics for relationship-based insights. The choice between the two depends on the specific use case, such as hybrid queries across structured data or contextual graph-based recommendations. To evaluate these tools, users can utilize open-source benchmarking tools like VectorDBBench, which allows them to test and compare different vector database systems using their own datasets. By considering factors such as scalability, performance, and ease of use, developers can make informed decisions about which tool best fits their needs.
SingleStore vs Redis Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 19, 2024
Author(s)
Chloe Williams
Language
English
Word count
1802
Hacker News points
None found.
SingleStore and Redis are two popular vector databases designed to store and query high-dimensional vectors, enabling efficient similarity searches crucial for AI applications such as e-commerce product recommendations, content discovery platforms, anomaly detection in cybersecurity, medical image analysis, natural language processing tasks, and Retrieval Augmented Generation (RAG) techniques. SingleStore integrates vector search directly into its SQL database system, allowing users to store vectors in standard database tables and combine vector searches with regular SQL operations, while Redis builds its vector search capabilities on top of its existing in-memory architecture through the Redis Vector Library, providing fast query execution and hybrid search capabilities that combine vector similarity with metadata filtering. The choice between SingleStore and Redis depends on data size, query complexity, performance needs, and whether a full database or a vector search solution is required. Thorough benchmarking with actual datasets and query patterns will be key to making an informed decision.
SingleStore vs Milvus Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 19, 2024
Author(s)
Chloe Williams
Language
English
Word count
2012
Hacker News points
None found.
Here is a summary of SingleStore and Milvus in one paragraph: SingleStore and Milvus are two popular vector databases designed for high-dimensional vectors, enabling efficient similarity searches crucial for AI applications such as e-commerce product recommendations, content discovery platforms, and natural language processing tasks. SingleStore integrates vector search into a full database, storing vectors in columnstore tables alongside structured data, allowing seamless filtering and aggregation with standard SQL queries. It offers both exact k-Nearest Neighbors (kNN) search and Approximate Nearest Neighbors (ANN) search, with flexible configuration options for hybrid workloads combining traditional SQL queries with vector search. Milvus is an open-source vector database designed from the ground up for vector search and similarity search at its core, supporting 11+ indexing methods and offering horizontal scalability as a core feature, making it suitable for large-scale deployments and AI workloads. The choice between SingleStore and Milvus depends on the specific use case and ecosystem, with SingleStore being ideal for hybrid solutions that combine structured data processing with vector search and Milvus being more specialized for unstructured data-heavy workloads.
SingleStore vs Weaviate Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 19, 2024
Author(s)
Chloe Williams
Language
English
Word count
2042
Hacker News points
None found.
SingleStore is designed for high performance and scalability, especially when combining vector search with structured data queries. It has robust SQL and enterprise-grade security, making it suitable for large distributed data environments like recommendation systems, financial analysis, and AI business intelligence. Weaviate, on the other hand, excels in hybrid or multi-modal search capabilities, particularly with unstructured data like text, images, or videos. Its developer-friendly setup and ease of experimentation make it a great choice for proof-of-concept AI applications, content classification, or semantic search. Ultimately, the choice between SingleStore and Weaviate depends on your project requirements, data types, and performance needs. Assessing your use case is crucial to determining which tool fits best. Thorough benchmarking with your own datasets and query patterns will be key to making an informed decision.
SingleStore vs Pinecone Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 19, 2024
Author(s)
Chloe Williams
Language
English
Word count
1981
Hacker News points
None found.
A vector database is specifically designed to store and query high-dimensional vectors, which are numerical representations of unstructured data, enabling efficient similarity searches in AI applications such as e-commerce product recommendations, content discovery platforms, anomaly detection, medical image analysis, natural language processing tasks, and Retrieval Augmented Generation. SingleStore is a distributed relational SQL database management system with vector search capabilities built-in, allowing developers to build complex AI applications using SQL syntax while maintaining performance and scale. Pinecone is a SaaS-based vector database that handles infrastructure complexity, provides real-time updates, machine learning model compatibility, and proprietary indexing techniques for fast vector search, making it suitable for pure vector search scenarios and startups. The choice between SingleStore and Pinecone depends on the data and operational needs of the application, with SingleStore being a full solution combining traditional database operations with vector search and Pinecone being a more focused managed service designed for vector-specific applications. Thorough benchmarking with actual datasets and query patterns is key to making an informed decision.
SingleStore vs MongoDB Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 19, 2024
Author(s)
Chloe Williams
Language
English
Word count
2071
Hacker News points
None found.
A vector database is specifically designed to store and query high-dimensional vectors, which encode complex information such as semantic meaning of text or visual features of images. Vector databases play a pivotal role in AI applications, allowing for efficient similarity searches and enabling advanced data analysis and retrieval. Two popular options are SingleStore and MongoDB, both with their own strengths and weaknesses. SingleStore has multiple vector search options to fit different use cases, uses a structured approach based on columnstore tables, scales through data distribution across multiple nodes, and combines vector search with SQL operations efficiently. MongoDB Atlas Vector Search takes a more focused approach using the HNSW algorithm for indexing and searching vector data, supports flexible document-based storage, and scales through dedicated Search Nodes for vector search workloads. The choice between SingleStore and MongoDB depends on the use case, existing tech stack, team expertise, and whether precise SQL-based operations or flexibility and ease of AI integration are needed. Thorough benchmarking with actual datasets and query patterns is key to making a decision.
SingleStore vs Faiss Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 19, 2024
Author(s)
Chloe Williams
Language
English
Word count
2259
Hacker News points
None found.
SingleStore is a distributed, relational SQL database management system that integrates vector search capabilities directly into its SQL engine, making it suitable for companies that need both traditional database operations and AI features. It combines SQL database with vector search, great for e-commerce platforms, content recommendation systems, and customer analytics where similarity matching is fast. Faiss is an open-source library developed by Meta that provides highly efficient tools for fast similarity search and clustering of dense vectors, designed for large-scale nearest-neighbor search and can handle both approximate and exact searches in high-dimensional vector spaces. It excels in pure AI and machine learning environments where vector search performance is the only thing that matters, perfect for research teams, computer vision applications, large scale similarity search engines, and AI model development with GPU acceleration. The choice between SingleStore and Faiss depends on technical requirements and organization, considering existing tech stack, team expertise, performance requirements, and whether a full database or vector search only solution is needed.
SingleStore vs Qdrant Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 19, 2024
Author(s)
Chloe Williams
Language
English
Word count
2105
Hacker News points
None found.
SingleStore and Qdrant are two different vector databases that cater to distinct use cases. SingleStore is an all-in-one database that embeds vector search with SQL, making it suitable for complex enterprise workloads that require a mix of transactional and analytical capabilities. Its distributed architecture allows it to handle large datasets and mixed data types, making it ideal for high concurrency applications. Qdrant, on the other hand, is specifically designed for similarity search and machine learning applications, offering flexible data modeling, robust security features, and strong integrations with popular ML frameworks. It's better suited for AI-driven workflows that require high-performance search and filtering. The choice between SingleStore and Qdrant depends on the specific use case, data types, and scalability requirements of the application. Thorough benchmarking with actual datasets and query patterns is crucial to make an informed decision.
Leveraging Milvus and Friendli Serverless Endpoints for Advanced RAG and Multi-Modal Queries
Date published
Dec. 18, 2024
Author(s)
Wonook Song
Language
English
Word count
1374
Hacker News points
None found.
FriendliAI specializes in generative AI infrastructure, offering solutions that enable organizations to efficiently deploy and manage large language models (LLMs) and other generative AI models. Milvus is an open-source vector database that stores, indexes, and searches billion-scale unstructured data through high-dimensional vector embeddings. It's perfect for building modern AI applications such as retrieval augmented generation (RAG), semantic search, multimodal search, and recommendation systems. The combination of RAG and multi-modal models significantly improves AI systems by providing diverse and rich input types, up-to-date information, enhanced accuracy and relevance of responses, context-aware interactions, allowing for more accurate and nuanced interactions. By leveraging Milvus and Friendli Serverless Endpoints, users can perform Retrieval-Augmented Generation (RAG) on particular documents and materials and execute multi-modal queries that incorporate images and other visual content. The tutorial demonstrates how to use Milvus with Friendli Serverless Endpoints to perform RAG on specific documents and materials and execute multi-modal queries that include images. It also showcases the combination of RAG and multi-modal capabilities, enabling more sophisticated AI applications that can understand and process diverse types of information, leading to more accurate and context-aware responses.
Zilliz Cloud vs MyScale Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 17, 2024
Author(s)
Chloe Williams
Language
English
Word count
1915
Hacker News points
None found.
Zilliz Cloud vs MyScale: A comparison of two vector databases designed to store and query high-dimensional vectors, used in AI applications such as e-commerce product recommendations, content discovery platforms, anomaly detection, medical image analysis, and natural language processing tasks. Zilliz Cloud is a purpose-built vector database with automatic performance optimization, enterprise features, and cost management options, while MyScale is a cloud-based database built on top of ClickHouse architecture, offering native SQL support, hybrid search capabilities, and scalability for AI applications.
SingleStore vs Apache Cassandra Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 17, 2024
Author(s)
Chloe Williams
Language
English
Word count
1756
Hacker News points
None found.
A vector database is a specific type of database designed to store and query high-dimensional vectors, which encode complex information such as the semantic meaning of text or visual features of images. Vector databases play a pivotal role in AI applications by enabling efficient similarity searches, allowing for more advanced data analysis and retrieval. SingleStore and Apache Cassandra are two popular vector databases that offer different approaches to vector search, with SingleStore having native vector search capabilities and Cassandra offering vector search through its Storage-Attached Indexes (SAI) feature. Both databases have strong scalability features but differ in their design approach, with SingleStore distributing data across nodes for horizontal scaling and Cassandra's masterless architecture providing high availability. SingleStore integrates vector search with standard SQL syntax, making it more familiar to teams with SQL backgrounds, while Cassandra requires learning its own query language and data modeling concepts. The choice between SingleStore and Apache Cassandra depends on technical requirements and constraints, with SingleStore suitable for companies needing ACID compliance and Cassandra ideal for use cases requiring horizontal scalability and high availability. Thorough benchmarking with VectorDBBench or other tools will be key to making an informed decision between these two powerful approaches to vector search in distributed database systems.
SingleStore vs Deep Lake Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 17, 2024
Author(s)
Chloe Williams
Language
English
Word count
2262
Hacker News points
None found.
SingleStore and Deep Lake are two vector database solutions designed for different use cases. SingleStore is a distributed, relational SQL database management system that supports vectors within columnstore tables, making it ideal for structured data combined with vector operations. It offers flexibility through SQL queries, supporting exact and approximate vector search strategies, and combines vector search with traditional SQL operations. Deep Lake, on the other hand, specializes in managing unstructured data—images, audio, video, and text—alongside vector embeddings. It acts as both a data lake and vector store, making it suitable for AI/ML workflows where unstructured or multimedia data plays a significant role. Both tools offer robust security features, but SingleStore excels in scalability and performance, especially when combined with SQL operations. When choosing between SingleStore and Deep Lake, consider the type of data you're working with and the specific use case. If you need to combine structured data queries with vector similarity searches, SingleStore is a better fit. For AI/ML environments where unstructured data and multimedia embeddings are the focus, Deep Lake's flexibility and performance make it a more streamlined solution. Ultimately, thorough benchmarking with your own datasets and query patterns will be key to making an informed decision between these two powerful approaches to vector search in distributed database systems.
Introducing Milvus 2.5: Built-in Full-Text Search, Advanced Query Optimization, and More 🚀
Date published
Dec. 17, 2024
Author(s)
Steffi Li
Language
English
Word count
769
Hacker News points
None found.
Milvus 2.5 marks a significant milestone in its journey to build the world's most complete solution for all search workloads, combining different search paradigms and introducing built-in full-text search powered by Sparse-BM25. This release brings powerful text processing capabilities, simplifying implementation complexity and enabling seamless integration of semantic understanding and keyword precision in a single system. Enhanced text and data processing features include text match, bitmap index, nullable & default values, new beta features like cluster management webUI and clustering compaction, as well as numerous improvements to optimize performance and security.
SingleStore vs Elasticsearch Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 17, 2024
Author(s)
Chloe Williams
Language
English
Word count
1695
Hacker News points
None found.
SingleStore and Elasticsearch are vector databases designed to store and query high-dimensional vectors, enabling efficient similarity searches crucial for AI applications such as e-commerce product recommendations, content discovery platforms, anomaly detection in cybersecurity, medical image analysis, natural language processing tasks, and Retrieval Augmented Generation. SingleStore integrates vector search into its SQL database, allowing users to combine vector searches with regular database operations, whereas Elasticsearch uses the HNSW algorithm for vector search implemented through Apache Lucene, creating a graph where similar vectors connect to each other. Both databases support exact k-nearest neighbors (kNN) and Approximate Nearest Neighbor (ANN) search methods but differ in their data management and storage approaches. SingleStore is suitable for applications that need to combine SQL with vector capabilities, while Elasticsearch excels at combining vector similarity with its existing search functionality. The choice between the two databases depends on the specific use case, considering factors such as the primary function of the application, query patterns, and scalability requirements.
SingleStore vs Aerospike Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 17, 2024
Author(s)
Chloe Williams
Language
English
Word count
1902
Hacker News points
None found.
SingleStore is a distributed relational SQL database that integrates vector search capabilities, allowing users to combine traditional database operations with vector search in one system. It offers multiple index types and SQL integration, making it suitable for applications with structured data alongside vectors, such as e-commerce platforms and content recommendation systems. Aerospike, on the other hand, is a NoSQL database designed for high-performance real-time applications, with its vector search capability currently in Preview and requiring early access from Aerospike. Its HNSW implementation and concurrent processing make it suitable for use cases like real-time recommendation engines and live image similarity search. The choice between SingleStore and Aerospike depends on the user's needs, tech stack, team expertise, and real-time requirements, with thorough benchmarking using tools like VectorDBBench being key to making a decision.
Build RAG with LangChain, Milvus, and Strapi
Date published
Dec. 13, 2024
Author(s)
Denis Kuria
Language
English
Word count
4804
Hacker News points
None found.
This is a summary of the provided text: The Retrieval-Augmented Generation (RAG) system uses a combination of AI models, vector databases, and content management systems to provide accurate and relevant answers to user queries. The system consists of three main components: Milvus for vector storage, Strapi for content management, and LangChain for workflow coordination. The RAG system is designed to bridge the gap between generic AI responses and specialized knowledge by integrating a retrieval mechanism with the generation process. It uses OpenAI's GPT-3.5 model for generating responses and converts text into vectors using embeddings models. The system can be integrated with various tools and services, including Milvus vector store integration, Strapi content management, and LangChain workflow coordination. The RAG system is ideal for applications like customer support, knowledge management, and educational tools. It provides accurate and relevant answers grounded in real, up-to-date knowledge and can be tailored to specific needs with a clear understanding of the architecture and this step-by-step guide.
Matryoshka Representation Learning Explained: The Method Behind OpenAI’s Efficient Text Embeddings
Date published
Dec. 12, 2024
Author(s)
Ruben Winastwan
Language
English
Word count
2545
Hacker News points
None found.
The Matryoshka Representation Learning (MRL) approach enables machine learning models to produce feature representations of varying sizes, providing flexibility to optimize for either speed or accuracy depending on the use case and resources. By enabling any model to generate smaller or larger embeddings, MRL balances the cost-performance trade-off in machine learning, making it a promising advancement for more efficient and versatile solutions. This approach has been evaluated across multiple domains, including text, vision, and multimodal tasks, with comparable or improved performance compared to traditional fixed-size models.
Couchbase vs Zilliz Cloud Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 11, 2024
Author(s)
Chloe Williams
Language
English
Word count
1996
Hacker News points
None found.
Couchbase and Zilliz Cloud are two vector databases that cater to different needs in AI applications. Couchbase, a distributed NoSQL database, can be used for general-purpose applications and has workarounds for vector search, making it suitable for complex multi-functional systems where vector search is not the main focus. In contrast, Zilliz Cloud is a purpose-built vector database designed specifically for large-scale vector search in AI/ML workloads, offering features like AutoIndex for automatic performance optimization, hybrid search across multiple data types, and managed services. The choice between Couchbase and Zilliz Cloud depends on the specific use case, data management requirements, and importance of vector search in the application. Evaluating these options with a tool like VectorDBBench can help make an informed decision based on actual performance results.
Introducing IBM Data Prep Kit for Streamlined LLM Workflows
Date published
Dec. 11, 2024
Author(s)
Yesha Shastri
Language
English
Word count
1669
Hacker News points
None found.
IBM's Data Prep Kit (DPK) is an open-source toolkit designed to streamline unstructured data preparation for developers building Large Language Models (LLMs). DPK tackles common challenges like toxicity, overfitting, and bias in data by providing modular and scalable solutions to manage diverse data processing challenges. It simplifies data preprocessing with reusable transforms, allowing users to quickly start processing their data without requiring deep knowledge of underlying frameworks or runtimes. The kit's workflow begins by converting input files into standardized Parquet format, applying predefined or custom transforms, and generating document embeddings. These embeddings can be leveraged for advanced applications such as fine-tuning models, implementing RAG pipelines, or instruct-tuning. By automating and standardizing the data preparation process, DPK empowers developers to focus on building and refining their AI models, scaling from laptops to cluster-based environments with ease. Integrating DPK with Milvus enables the retrieval of contextually relevant documents and enhances LLM outputs with reliable and fact-based responses.
Zilliz Cloud vs Rockset Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 10, 2024
Author(s)
Chloe Williams
Language
English
Word count
1694
Hacker News points
None found.
Zilliz Cloud and Rockset are two vector databases designed to store and query high-dimensional vectors, which encode complex information in AI applications such as e-commerce product recommendations, content discovery platforms, anomaly detection, medical image analysis, natural language processing tasks, and Retrieval Augmented Generation. Zilliz Cloud is a fully managed vector database service built on top of the open-source Milvus engine, offering automatic performance optimization through its AutoIndex technology, enterprise features like cross-cloud deployment, strong security controls, and cost optimization through tiered storage. Rockset, on the other hand, is a real-time search and analytics database with vector search capabilities as an add-on, supporting K-Nearest Neighbors and Approximate Nearest Neighbors search methods, Converged Index for scalability, and algorithm agnosticism. When choosing between Zilliz Cloud and Rockset, consider your use case requirements around data update frequency, response time, and whether vector search is the main use case or part of a broader data processing strategy, as both databases have different data handling and optimization approaches. Thorough benchmarking with a tool like VectorDBBench can help make an informed decision between these powerful but different approaches to vector search in distributed database systems.
Qdrant vs Vearch Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 10, 2024
Author(s)
Chloe Williams
Language
English
Word count
1765
Hacker News points
None found.
Qdrant and Vearch are two purpose-built vector databases designed specifically for storing and querying high-dimensional vectors, which encode complex information such as text or image features. Qdrant is known for its flexible data modeling capabilities, ACID compliant transactions, and powerful query language with visual tools to explore vector relationships. It excels in applications requiring strong data consistency and complex querying. Vearch, on the other hand, focuses on scalability, real-time indexing, and hardware flexibility, making it suitable for large-scale AI applications like image similarity search or product recommendations. The choice between Qdrant and Vearch depends on specific requirements such as data volume, query complexity, and need for real-time updates. Thorough benchmarking with actual datasets and query patterns is essential to make an informed decision.
Qdrant vs Vald Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 10, 2024
Author(s)
Chloe Williams
Language
English
Word count
1984
Hacker News points
None found.
Qdrant and Vald are two purpose-built vector databases that cater to different needs in AI applications, particularly those requiring similarity search and machine learning capabilities. While both offer efficient indexing and querying features, they differ in their approach to scalability, flexibility, and data handling. Qdrant excels with its flexible data modeling, ACID compliant transactions, and powerful query language, making it suitable for complex queries and hybrid search scenarios. In contrast, Vald focuses on cloud-native scalability, horizontal scaling, and real-time indexing capabilities, ideal for large-scale deployments and applications requiring high availability and speed. Ultimately, the choice between Qdrant and Vald depends on specific use cases, data types, and performance requirements, with thorough benchmarking using tools like VectorDBBench being crucial in making an informed decision.
Zilliz Cloud vs ClickHouse Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 10, 2024
Author(s)
Chloe Williams
Language
English
Word count
1905
Hacker News points
None found.
Zilliz Cloud and ClickHouse are two popular vector databases designed to store and query high-dimensional vectors, which encode complex information in AI applications. Zilliz Cloud is a fully managed vector database service built on top of the open-source Milvus engine, offering automatic performance optimization through AutoIndex technology, enterprise features like cross-cloud deployment and strong security controls, and tiered storage for cost management. ClickHouse, on the other hand, is an open-source OLAP database with vector search capabilities as an add-on, exceling in scenarios where vector operations are combined with SQL-based analysis and traditional data filtering and aggregation. The choice between Zilliz Cloud and ClickHouse depends on technical requirements and organizational capabilities, with Zilliz Cloud being suitable for pure vector search applications and ClickHouse for more complex analytical queries that combine vector similarity with traditional data analysis.
Zilliz Cloud vs Deep Lake Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 10, 2024
Author(s)
Chloe Williams
Language
English
Word count
1963
Hacker News points
None found.
Zilliz Cloud and Deep Lake are two powerful vector databases designed for different use cases. Zilliz Cloud is a fully managed vector database service built on top of the open-source Milvus engine, focusing on large-scale distributed data management and efficient vector search. It offers advanced indexing techniques using IVF and graph-based algorithms, robust security features, and tiered storage to optimize cost. This makes it suitable for organizations with massive datasets where performance, scalability, and ease of use matter. On the other hand, Deep Lake is a specialized database built for handling multimedia data such as images, audio, video, and unstructured types, widely used in AI and machine learning. It functions as both a data lake and a vector store, offering seamless integration with tools like LangChain and LlamaIndex to boost productivity. Its strengths lie in dataset visualization and managing AI-focused data pipelines. When choosing between the two, consider your use case, data types, and performance requirements to select the tool that aligns with your development goals. Thorough benchmarking with your own datasets and query patterns will be key to making a decision.
Zilliz Cloud vs Aerospike Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 10, 2024
Author(s)
Chloe Williams
Language
English
Word count
2157
Hacker News points
None found.
Zilliz Cloud is a fully managed vector database service built on top of the open-source Milvus engine, designed for large-scale AI applications, offering automatic performance optimization, horizontal scalability, and hybrid search capabilities across multimodal data. It excels in AI and machine learning workflows that require efficient handling of vector embeddings, scalability, and ease of use. In contrast, Aerospike is a distributed, scalable NoSQL database with vector search capabilities as an add-on, supporting Hierarchical Navigable Small World (HNSW) indexes for high-dimensional similarity searches, but requiring manual configuration and indexing parameter tuning. While both platforms have their strengths, Zilliz Cloud is better suited for applications that require ease of use, scalability, and hybrid search, whereas Aerospike is more suitable for organizations already using its NoSQL capabilities and needing vector search as an add-on. Ultimately, the choice between Zilliz Cloud and Aerospike depends on the specific needs of your AI application, requiring thorough evaluation based on factors such as data types, scalability, integration, and operational complexity.
Qdrant vs Myscale Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 10, 2024
Author(s)
Chloe Williams
Language
English
Word count
2083
Hacker News points
None found.
Qdrant and MyScale are two distinct vector databases designed to meet different needs in AI applications. Qdrant is a purpose-built vector database optimized for high-dimensional vector data and advanced AI use cases, providing flexible data modeling, ACID compliance, and HNSW indexing. In contrast, MyScale is a unified platform combining SQL capabilities with advanced vector search, catering to hybrid use cases that require real-time analytics alongside vector search. When choosing between these two options, it's essential to evaluate based on your specific use case, considering factors such as data modalities, performance requirements, and scalability needs. Thorough benchmarking with tools like VectorDBBench can help make an informed decision. Ultimately, Qdrant excels in high-dimensional vector data and advanced AI applications, while MyScale is better suited for hybrid use cases that require real-time analytics and structured data processing.
Qdrant vs Neo4j Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 10, 2024
Author(s)
Chloe Williams
Language
English
Word count
2497
Hacker News points
None found.
Qdrant and Neo4j are two vector databases that serve different primary needs. Qdrant is perfect for pure vector search scenarios with high performance requirements, while Neo4j shines when combining vector similarity with graph relationships. The choice between the two should depend on specific needs, considering factors such as existing infrastructure, team expertise, and the benefits of additional graph database features. Thorough benchmarking using an open-source tool like VectorDBBench can help make a decision based on actual performance results rather than marketing claims or hearsay. Both Qdrant and Neo4j use Hierarchical Navigable Small World (HNSW) algorithm for vector search, but each has its own implementation, with Qdrant having a custom HNSW for high-dimensional vector spaces and Neo4j supporting vectors up to 4096 dimensions with both cosine and Euclidean similarity functions. Qdrant is great at flexible data modeling, storing vectors alongside payload data, while maintaining consistency through ACID compliant transactions. Neo4j handles data through its graph architecture, with support for vector indexes on node and relationship properties. Performance optimization mechanisms include automatic sharding and replication, on-disk text and geo indexing, intelligent caching, scalar, product, and binary quantization to reduce memory usage without compromising search quality. Qdrant's query system is built for vector search operations, while Neo4j queries are centered around its graph database heritage, integrating well with vector similarity searches. Ultimately, the choice between Qdrant and Neo4j depends on the specific use case, requiring evaluation based on actual performance results rather than marketing claims or hearsay.
Qdrant vs Rockset Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 10, 2024
Author(s)
Chloe Williams
Language
English
Word count
1761
Hacker News points
None found.
Qdrant and Rockset are two vector databases designed to store and query high-dimensional vectors, which encode complex information such as the semantic meaning of text or product attributes. Qdrant is a purpose-built vector database optimized for performance and flexible data modeling, allowing it to handle high-dimensional vector data and combine vector similarity with metadata filtering. It uses the HNSW algorithm for indexing and supports complex queries like Facet API for aggregation and counting unique values in the data. Qdrant's query language works seamlessly with vector search and supports trade-offs between search precision and performance. Rockset is a real-time search and analytics database that supports structured and unstructured data, including vector embeddings, and has Converged Indexing built on mutable RocksDB for efficient updates of vectors and metadata. It can handle high velocity event streams and change data capture feeds with 1-2 second latency. Both databases have different strengths in vector search - Qdrant is great for pure vector search performance and AI-focused features, while Rockset excels in real-time processing and SQL-based analytics. Choosing between the two depends on technical requirements, such as data update frequency, query patterns, and the need for real-time analytics alongside vector search. Thorough benchmarking with actual datasets and query patterns is key to making a decision between these powerful but different approaches to vector search in distributed database systems.
Zilliz Cloud vs Neo4j Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 10, 2024
Author(s)
Chloe Williams
Language
English
Word count
2038
Hacker News points
None found.
Zilliz Cloud is a fully managed vector database service designed specifically for AI applications, offering scalable multimodal data handling with minimal management overhead. Its key features include automatic performance optimization, hybrid search capabilities, and seamless scalability. Zilliz Cloud is ideal for developers who want rapid deployment and cost-efficient operations for growing data. In contrast, Neo4j is a graph database with vector search capabilities as an add-on, offering fine-grained control over vector index behavior but requiring more setup and optimization effort. The choice between Zilliz Cloud and Neo4j ultimately depends on the project's requirements, with Zilliz Cloud being suitable for AI-centric applications and Neo4j being better suited for applications where graph relationships are key. To evaluate these platforms effectively, users can utilize open-source benchmarking tools like VectorDBBench.
Zilliz Cloud vs Vearch Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 10, 2024
Author(s)
Chloe Williams
Language
English
Word count
2052
Hacker News points
None found.
Here is a summary of Zilliz Cloud and Vearch in one paragraph: Zilliz Cloud and Vearch are two purpose-built vector databases designed to store and query high-dimensional vectors, which encode complex information from unstructured data. Zilliz Cloud excels in hybrid search capabilities, supports various similarity metrics, and has automatic horizontal scaling, making it suitable for large-scale applications with enterprise-grade security and ease of use. Vearch offers more direct control over system behavior, real-time update capabilities, and flexibility in deployment options, making it ideal for teams that want to fine-tune their vector search implementation or have established infrastructure. When choosing between the two, consider factors such as technical expertise, scaling needs, and whether you prefer a managed service or hands-on control, using tools like VectorDBBench to evaluate and compare performance on your own datasets.
Qdrant vs Deep LakeChoosing the Right Vector Database for Your AI Apps
Date published
Dec. 10, 2024
Author(s)
Chloe Williams
Language
English
Word count
1843
Hacker News points
None found.
Qdrant and Deep Lake are two vector databases designed to store and query high-dimensional vectors, which encode complex information from unstructured data such as text, images, or product attributes. Qdrant is a purpose-built vector database with flexible data modeling, ACID compliant transactions, and a custom version of the HNSW algorithm for indexing, making it suitable for applications requiring strong vector search combined with complex filtering and aggregation operations. In contrast, Deep Lake is a specialized database built for handling vector and multimedia data, supporting version control for unstructured data like images, audio, and video, and providing seamless integration with AI development tools like LangChain and LlamaIndex. The choice between Qdrant and Deep Lake depends on specific needs, including data types, expected growth, and required features such as version control or multimedia support. Thorough benchmarking with a tool like VectorDBBench can help make an informed decision between these two powerful but different approaches to vector search in distributed database systems.
Zilliz Cloud vs Vald Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 10, 2024
Author(s)
Chloe Williams
Language
English
Word count
1914
Hacker News points
None found.
Zilliz Cloud and Vald are two purpose-built vector databases designed to efficiently store, manage, and search high-dimensional vectors, which encode complex information in AI applications. Zilliz Cloud is a fully managed service built on top of the open-source Milvus engine, offering automatic performance optimization, enterprise-grade security, and cost-effective tiered storage. It excels in hybrid search across multiple data types and supports strong security features, making it suitable for big AI applications with minimal ops. Vald, on the other hand, is a powerful tool for searching through huge amounts of vector data quickly, using its NGT algorithm, and offering real-time indexing, Kubernetes native customization, and high configurability. It's ideal for developers who want a highly customizable solution and are comfortable with managing distributed systems. Ultimately, the choice between Zilliz Cloud and Vald depends on your specific use case, data diversity, ops needs, and level of control, and evaluating these factors with tools like VectorDBBench can help make an informed decision.
Building a RAG Application with Milvus and Databricks DBRX
Date published
Dec. 10, 2024
Author(s)
Benito Martin
Language
English
Word count
2032
Hacker News points
None found.
This tutorial explores how to build a robust Retrieval Augmented Generation (RAG) application using Milvus, a scalable vector database, and DBRX, an open-source large language model with a fine-grained mixture-of-experts (MoE) architecture. The combination of these two technologies enables contextually accurate and domain-specific responses in RAG systems, making them highly valuable in use cases such as knowledge management, customer support, content creation, and scientific research. DBRX's MoE design allows it to dynamically adapt to diverse tasks, ensuring computational efficiency and exceptional performance across a variety of use cases. Milvus complements this architecture by enabling RAG systems to easily handle massive knowledge bases. The tutorial demonstrates how to implement a RAG pipeline using Milvus as a vector store, DBRX as the language model, and LangChain as the framework.
Vespa vs Neo4j Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 9, 2024
Author(s)
Chloe Williams
Language
English
Word count
2021
Hacker News points
None found.
Vespa and Neo4j are two popular vector databases with different strengths. Vespa is a powerful search engine and vector database that can handle multiple types of searches, including vector search, text search, and structured data search. It's great for big data applications and supports tensor-based search. On the other hand, Neo4j is designed for graph data where relationships are as important as the nodes themselves. Its vector search capabilities combined with graph traversal make it a great option for developers who want to add semantic similarity matching to traditional graph queries. When choosing between them, consider your use case requirements, your data, and performance requirements of your application.
Vespa vs Vearch Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 9, 2024
Author(s)
Chloe Williams
Language
English
Word count
1998
Hacker News points
None found.
Vespa and Vearch are purpose-built vector databases designed to store and query high-dimensional vectors, which are numerical representations of unstructured data. They play a crucial role in AI applications by enabling efficient similarity searches for tasks like e-commerce product recommendations, content discovery platforms, anomaly detection in cybersecurity, medical image analysis, and natural language processing (NLP). Vespa is a powerful search engine and vector database that can handle multiple types of searches all at once. It supports vector search, text search, and searching through structured data. Vearch is a tool for developers building AI applications that need fast and efficient similarity searches. It’s built to handle vector embeddings that power modern AI tech. Vespa's key features include its ability to do vector search, tensor operations support, auto scaling capabilities, and comprehensive TLS encryption. Vearch supports hybrid search, real-time updates, flexible schema definitions, and GPU acceleration support. Both systems have different cost structures and operational considerations. The choice between Vespa and Vearch depends on the technical requirements, operational capabilities, and business needs of the user. Vespa is best for large scale enterprise applications that need multiple types of search, while Vearch is ideal for specialized vector search applications where GPU acceleration can bring significant performance gains.
Vespa vs Rockset Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 9, 2024
Author(s)
Chloe Williams
Language
English
Word count
1702
Hacker News points
None found.
Vespa and Rockset are both powerful vector databases designed to store and query high-dimensional vectors, which represent complex information such as the semantic meaning of text or visual features of images. They play a crucial role in AI applications by enabling efficient similarity searches for tasks like e-commerce product recommendations, content discovery platforms, anomaly detection in cybersecurity, medical image analysis, and natural language processing (NLP). Vespa is a purpose-built vector database that can handle multiple types of searches at once, including vector search, text search, and searching through structured data. It's built to be fast and efficient, with the ability to automatically scale up to handle more data or traffic. Vespa supports any number of vector fields per document and high-dimensional tensors, making it suitable for large-scale applications that need to handle a lot of traffic and data. Rockset is a real-time search and analytics database with vector search capabilities as an add-on. It's designed for ingesting, indexing, and querying data in real-time, making it great for applications that require up-to-the-second insights. Rockset supports both streaming and bulk data ingestion, can process high velocity event streams and change data capture (CDC) feeds in 1-2 seconds, and has a unique Converged Indexing system built on mutable RocksDB for efficient updates to vectors and metadata. When choosing between Vespa and Rockset for vector search, consider factors such as search performance, data management and updates, scaling and architecture, integration and APIs, team expertise, existing infrastructure, budget, and long-term maintenance. Additionally, thorough benchmarking with your own datasets and query patterns using tools like VectorDBBench can help you make an informed decision based on actual vector database performance.
Vespa vs Deep Lake Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 9, 2024
Author(s)
Chloe Williams
Language
English
Word count
2048
Hacker News points
None found.
Vespa and Deep Lake are both vector databases designed to store and query high-dimensional vectors, which are numerical representations of unstructured data such as text, images, or product attributes. They play a crucial role in AI applications by enabling efficient similarity searches for advanced data analysis and retrieval. Common use cases include e-commerce product recommendations, content discovery platforms, anomaly detection in cybersecurity, medical image analysis, and natural language processing (NLP) tasks. Vespa is a powerful search engine and vector database that can handle multiple types of searches all at once, including vector search, text search, and searching through structured data. It's built to be super fast and efficient, with the ability to automatically scale up to handle more data or traffic. Vespa is great for complex, distributed search scenarios with multiple data types and lots of customization for enterprise scale. Deep Lake is a specialized database built for handling vector and multimedia data, such as images, audio, video, and other unstructured types, widely used in AI and machine learning. It functions as both a data lake and a vector store, allowing users to store and search vector embeddings and related metadata (e.g., text, JSON, images). Deep Lake is great for AI and machine learning workflows that heavily rely on unstructured or multimedia data like images, audio, and video. When deciding between Vespa and Deep Lake as a vector search tool, understanding the differences across the key dimensions will help you choose the right one for your use case. Factors to consider include search methodology, data handling, scalability and performance, flexibility and customization, integration and ecosystem, ease of use, cost, and security. To evaluate these tools further, users can utilize VectorDBBench, an open-source benchmarking tool for vector database comparison. This will allow users to make decisions based on actual vector database performance rather than marketing claims or hearsay.
Vespa vs Vald Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 9, 2024
Author(s)
Chloe Williams
Language
English
Word count
1849
Hacker News points
None found.
Vespa and Vald are both purpose-built vector databases designed to store and query high-dimensional vectors, which are numerical representations of unstructured data. They play a crucial role in AI applications by enabling efficient similarity searches for tasks such as e-commerce product recommendations, content discovery platforms, anomaly detection in cybersecurity, medical image analysis, and natural language processing (NLP). Vespa is a powerful search engine and vector database that can handle multiple types of searches all at once, including vector search, text search, and searching through structured data. It's built to be super fast and efficient, with the ability to automatically scale up to handle more data or traffic. Vespa supports hybrid search, combining vector search with text and structured data search, making it very versatile for applications that need multi-modal search like e-commerce or document repositories. Vald is a powerful tool for searching through huge amounts of vector data really fast, using the NGT (Neighborhood Graph and Tree) algorithm for high speed approximate nearest neighbor (ANN) search. It's built for vector only workloads and can easily grow as your needs get bigger. Vald scales by distributing vector indexes across machines and has features like dynamic indexing and index replication to ensure it performs well under high traffic or frequent updates. The key differences between Vespa and Vald include their search methods, data handling capabilities, scalability and performance, flexibility and customization, integration and ecosystem, usability, cost, and security. Ultimately, the choice between these two vector search tools depends on your specific use case, data diversity, and performance requirements.
Vespa vs MyScale Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 9, 2024
Author(s)
Chloe Williams
Language
English
Word count
1845
Hacker News points
None found.
Vespa and MyScale are two popular vector databases used in AI applications. A vector database is designed to store and query high-dimensional vectors, which represent complex information such as the semantic meaning of text or visual features of images. Common use cases for vector databases include e-commerce product recommendations, content discovery platforms, anomaly detection in cybersecurity, medical image analysis, and natural language processing tasks. Vespa is a powerful search engine and vector database that can handle multiple types of searches simultaneously, including vector search, text search, and structured data search. It uses its own special C++ engine for memory management and query processing, making it efficient even when dealing with complex queries and large amounts of data. Vespa also supports auto-scaling across multiple machines to optimize resource usage and costs. MyScale is a cloud-based database built on top of ClickHouse that combines vector search capabilities with SQL analytics. It integrates vector search directly with SQL, supporting multiple index types and common distance metrics. MyScale's proprietary MSTG vector engine uses NVMe SSDs to increase data density, outperforming specialized vector databases in both performance and cost. The choice between Vespa and MyScale depends on the specific requirements of your project. Vespa is ideal for applications that need multiple search types working together seamlessly, while MyScale works best for teams already using SQL databases who want to add vector search capabilities without learning new query languages. Ultimately, thorough benchmarking with your own datasets and query patterns will be key to making a decision between these two powerful but different approaches to vector search in distributed database systems.
Chroma vs ClickHouse: Choosing the Right Vector Database for Your Needs
Date published
Dec. 9, 2024
Author(s)
Chloe Williams
Language
English
Word count
2084
Hacker News points
None found.
Chroma and ClickHouse are two open-source databases that offer vector search capabilities, which are essential in AI applications. Chroma is an AI-native vector database designed to simplify the process of building AI applications by providing tools for managing vector data and enabling efficient similarity searches. It supports various types of data and integrates seamlessly with other AI tools and frameworks. ClickHouse, on the other hand, is a real-time OLAP database known for its high-speed query processing and full SQL support. It excels at handling large datasets and can integrate vector search functionality into its SQL framework. The choice between Chroma and ClickHouse depends on specific use cases. Chroma is best suited for teams building AI applications that require simplicity and speed, particularly for projects with in-memory datasets and no complex SQL operations. ClickHouse is the better choice when an application needs both vector search and complex data operations, especially for large-scale analytics platforms or enterprise applications where vector search needs to integrate with existing data warehousing solutions.
Chroma vs Deep Lake: Choosing the Right Vector Database for Your Needs
Date published
Dec. 9, 2024
Author(s)
Chloe Williams
Language
English
Word count
1928
Hacker News points
None found.
Chroma and Deep Lake are two popular vector databases that cater to different needs in AI applications. Chroma is an open-source, AI-native vector database designed for simplicity and developer productivity, making it ideal for text-based LLM workflows and natural language processing tasks. On the other hand, Deep Lake is a specialized data lake system optimized for handling diverse multimedia embeddings, making it suitable for applications involving multiple data types such as images, videos, and audio files. The choice between Chroma and Deep Lake depends on the specific use case, with Chroma being more appropriate for text-heavy LLM workflows and Deep Lake being better suited for multimedia or large scale AI pipelines where flexibility and data lake are key.
Apache Cassandra vs Weaviate: Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 9, 2024
Author(s)
Chloe Williams
Language
English
Word count
1988
Hacker News points
None found.
Apache Cassandra and Weaviate are two notable vector databases designed to handle complex data structures like vector embeddings essential for AI applications. Apache Cassandra is an open-source, distributed NoSQL database system known for its high scalability, fault tolerance, and ability to operate in distributed environments with minimal downtime or performance degradation. With the release of Cassandra 5.0, it supports vector embeddings and vector search. Weaviate is an open-source vector database designed to simplify AI application development, offering built-in vector and hybrid search capabilities, easy integration with machine learning models, and a focus on data privacy. Choosing between Apache Cassandra and Weaviate for vector search depends on your needs. Key differences include their search methodology, data handling, scalability and performance, flexibility and customization, integration and ecosystem, usability, and cost. Apache Cassandra is good at scale, security, and handling diverse workloads, making it a great choice for enterprise-scale applications. Weaviate is good at simplicity, AI application development, and semantic search, making it suitable for small to mid-sized projects focused on AI innovation. Ultimately, the decision between these two powerful but different approaches to vector search in distributed database systems should be based on your use cases, data types, and performance requirements.
Qdrant vs Click House Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 9, 2024
Author(s)
Chloe Williams
Language
English
Word count
1782
Hacker News points
None found.
Qdrant and ClickHouse are both vector databases designed to store and query high-dimensional vectors, which are numerical representations of unstructured data. They play a crucial role in AI applications, allowing for more advanced data analysis and retrieval. While Qdrant is a purpose-built vector database, ClickHouse is an open-source column-oriented database with vector search capabilities as an add-on. Qdrant excels in performance optimization and can work with high-dimensional vector data, making it a top choice for developers working on AI-driven projects. It offers flexible data modeling, rich query options, and features like automatic sharding and replication to help users scale as their data and query load grow. ClickHouse is great for vector search when you need to combine vector matching with metadata filtering or aggregation, especially for very large vector datasets that need parallel processing and when you combine vector search with SQL-based filtering and aggregation. Both systems have different approaches to vector search and serve different needs. Qdrant is a specialized vector database with optimized search algorithms and full vector operations, perfect for dedicated vector search applications. ClickHouse is a powerful analytical database that brings vector search into the SQL world, great for combining vector operations with broader data analytics. Choose what fits your use case, data volume, search requirements, existing infrastructure, and team expertise.
Qdrant vs Aerospike Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 9, 2024
Author(s)
Chloe Williams
Language
English
Word count
2019
Hacker News points
None found.
Qdrant and Aerospike are both vector databases designed to store and query high-dimensional vectors, which are numerical representations of unstructured data. They play a crucial role in AI applications, enabling efficient similarity searches for tasks such as e-commerce product recommendations, content discovery platforms, anomaly detection in cybersecurity, medical image analysis, and natural language processing (NLP). Qdrant is a purpose-built vector database that excels in performance optimization and can work with high-dimensional vector data. It allows you to store and index not just vectors but also payload data associated with each vector, enabling more powerful and nuanced search capabilities. Qdrant uses a custom version of the HNSW algorithm for indexing, allowing fast approximate nearest neighbor search. Aerospike is a distributed, scalable NoSQL database with vector search capabilities as an add-on. It supports hierarchical navigable small world (HNSW) indexes for vector search and uses concurrent processing across nodes and advanced CPU for scalability. Aerospike's vector search functionality is still in preview and its query ecosystem is evolving. Key differences between Qdrant and Aerospike include their search methodology, data handling, scalability and performance, flexibility and customization, integration and ecosystem, usability, pricing, and security features. The choice between the two depends on the project's use case, data and scalability requirements, and how these technologies fit into your long-term plans.
Vespa vs Aerospike Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 7, 2024
Author(s)
Chloe Williams
Language
English
Word count
2420
Hacker News points
None found.
Vespa and Aerospike are both vector databases designed to store and query high-dimensional vectors, which are numerical representations of unstructured data. They play a crucial role in AI applications by enabling efficient similarity searches for tasks such as e-commerce product recommendations, content discovery platforms, anomaly detection in cybersecurity, medical image analysis, and natural language processing (NLP). Vespa is a purpose-built vector database that supports multiple types of searches all at once, including vector search, text search, and searching through structured data. It is built to be super fast and efficient, with the ability to automatically scale up to handle more data or traffic. Aerospike, on the other hand, is a distributed, scalable NoSQL database with vector search capabilities as an add-on. Vespa supports multiple search types in one engine, while Aerospike's vector search is based on Hierarchical Navigable Small World (HNSW) indexing. Vespa can handle structured, semi-structured, and unstructured data in one document, whereas Aerospike is optimized for real-time storage of structured and semi-structured data. Both databases are built for scalability but do it differently, with Vespa designed to automatically distribute data and processing across multiple nodes and adjust resource allocation dynamically, while Aerospike uses a distributed architecture where data is partitioned across nodes and both reads and writes are optimized for low latency access. The choice between Vespa and Aerospike depends on the use case, data types, and the balance between search complexity and performance. Users can evaluate these databases using VectorDBBench, an open-source benchmarking tool that allows users to test and compare different vector database systems like Milvus and Zilliz Cloud (the managed Milvus) using their own datasets.
Apache Cassandra vs Chroma: Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 7, 2024
Author(s)
Chloe Williams
Language
English
Word count
1861
Hacker News points
None found.
Apache Cassandra and Chroma are two notable vector databases that handle complex data structures like vector embeddings essential for AI applications. While both offer robust capabilities, they cater to different needs. Apache Cassandra is great for large-scale distributed operations and combines traditional db capabilities with vector search, making it ideal for enterprise environments with huge datasets across many servers and high availability. On the other hand, Chroma is a streamlined, AI-focused approach that prioritizes developer experience and fast implementation, making it perfect for teams building AI applications that need quick implementation and simple vector search. The choice between these two powerful but different approaches to vector search in distributed database systems should be based on factors such as technical expertise, scale requirements, existing infrastructure, development timeline, and long-term scaling needs.
Vespa vs ClickHouse Choosing the Right Vector Database for Your AI Apps
Date published
Dec. 7, 2024
Author(s)
Chloe Williams
Language
English
Word count
2063
Hacker News points
None found.
Vespa and ClickHouse are both powerful tools used in AI applications, but they serve different purposes. Vespa is a purpose-built vector database designed to handle multiple types of searches all at once, including vector search, text search, and structured data search. It's great for handling large amounts of data without slowing down and can automatically scale up to handle more data or traffic. ClickHouse, on the other hand, is an open-source column-oriented database with vector search capabilities as an add-on. It's great for analytical queries because of its fully parallelized query pipeline and high compression ratios. The choice between Vespa and ClickHouse depends on your specific use case and operational requirements.
Deliver RAG Applications 10x Faster with Zilliz and Vectorize
Date published
Dec. 6, 2024
Author(s)
Jamie Ferguson
Language
English
Word count
723
Hacker News points
None found.
Vectorize has integrated with Milvus and Zilliz Cloud to simplify building and maintaining retrieval-augmented generation (RAG) pipelines that connect to various data sources and AI platforms. The integration makes it fast and easy to get high-quality data into a vector database, ensuring the latest and most relevant information is always available for AI applications. Zilliz Cloud is 10x faster than Milvus and offers reliable vector storage and high-performance vector search capabilities. Vectorize automates RAG pipelines and keeps embeddings up-to-date, allowing AI engineers to focus on creating accurate, reliable AI applications.
Designing Multi-Tenancy RAG with Milvus: Best Practices for Scalable Enterprise Knowledge Bases
Date published
Dec. 4, 2024
Author(s)
Robert Guo
Language
English
Word count
2261
Hacker News points
None found.
Retrieval-Augmented Generation (RAG) has emerged as a trusted solution for large organizations to enhance their Language Model-powered applications, especially those with diverse users. As these applications grow, implementing a multi-tenancy framework becomes essential. Multi-tenancy provides secure, isolated access to data for different user groups, ensuring user trust, meeting regulatory standards, and improving operational efficiency. Milvus is an open-source vector database built to handle high-dimensional vector data and is an indispensable infrastructure component of RAG, storing and retrieving contextual information for LLMs from external sources. Milvus offers flexible multi-tenancy strategies for various needs, including database-level, collection-level, and partition-level multi-tenancy.
Evaluating Retrieval-Augmented Generation (RAG): Everything You Should Know
Date published
Dec. 3, 2024
Author(s)
Benito Martin
Language
English
Word count
2617
Hacker News points
None found.
Retrieval Augmented Generation (RAG) is a widely adopted approach to enhance Generative AI applications powered by Large Language Models (LLMs). By integrating external knowledge sources, RAG improves the model's ability to provide accurate and contextually relevant responses. Despite its potential, RAG-generated answers are not always entirely accurate or consistent with the retrieved knowledge. In a recent webinar, Stefan Webb, Developer Advocate at Zilliz, explored evaluation strategies for RAG applications, focusing on methods to assess the performance of LLMs and addressing current challenges and limitations in the field. The talk covered various RAG pipeline architectures, retrieval and evaluation frameworks, and examples of biases and failures in LLMs. RAG architecture includes semantic search, which leverages vector databases for efficient searching over unstructured data to retrieve semantically similar contexts relevant to a user's query. A modular approach to building the RAG pipeline enables incremental improvements at each stage, addressing specific challenges and enhancing the quality of generated outputs. Evaluating foundation models requires a nuanced approach, as different aspects of the pipeline need to be evaluated. Performance evaluation includes task-based evaluation (using standard benchmarks) and self-evaluation (focusing on internal measures or introspection). Introspection-based evaluation can be divided into generation-based evaluation and retrieval-based evaluation, with relevant metrics such as faithfulness, answer relevancy, context relevance, and context recall. Challenges and limitations of LLM-as-a-Judge include position bias, verbosity bias, wrong judgments, and wrong judgment with chain-of-thought reasoning. Open-source evaluation frameworks like RAGAS, DeepEval, ARES, and HuggingFace Lighteval provide structured methodologies and tools to evaluate retrieval and generation performance effectively. The future of RAG lies in its adaptability and continuous refinement. Addressing current limitations and embracing innovative evaluation methods will be essential for unlocking the full potential of AI applications.
Elasticsearch Was Great, But Vector Databases Are the Future
Date published
Dec. 2, 2024
Author(s)
Jiang Chen
Language
English
Word count
1264
Hacker News points
None found.
The article discusses how semantic search is becoming more popular as AI technology advances, with embedding models and vector databases playing a central role in this shift. Semantic search surpasses keyword matching by representing data as vector embeddings, providing a more nuanced understanding of search intent and transforming applications ranging from retrieval-augmented generation (RAG) to multimodal search. Many organizations are adopting a hybrid search approach, combining the strengths of both semantic and full-text search methods to balance flexible relevance with predictable exact keyword matching. Vector databases like Milvus are poised to surpass Elasticsearch as the unified solution for hybrid search due to their superior performance, scalability, and efficiency in integrating dense vector search with optimized sparse vector techniques.
Weaviate vs Aerospike: Choosing the Right Vector Database for Your Needs
Date published
Dec. 1, 2024
Author(s)
Zilliz
Language
English
Word count
1918
Hacker News points
None found.
Weaviate and Aerospike are two options in the vector database space. Vector databases store high-dimensional vectors, which represent unstructured data such as text semantics, image features, or product attributes. They enable efficient similarity searches, playing a crucial role in AI applications for advanced data analysis and retrieval. Common use cases include e-commerce recommendations, content discovery platforms, cybersecurity anomaly detection, medical image analysis, and natural language processing tasks. Weaviate is an open-source vector database designed to simplify AI application development, offering built-in vector and hybrid search capabilities, easy integration with machine learning models, and a focus on data privacy. It uses HNSW indexing for fast and accurate similarity searches and supports combining vector searches with traditional filters. Weaviate is suitable for developers building AI applications, data engineers working with large datasets, and data scientists deploying machine learning models. Aerospike is a distributed, scalable NoSQL database with added support for vector search capabilities called Aerospike Vector Search (AVS). It uses HNSW indexes for vector search and has specialized hardware instructions (AVX) for parallel processing. AVS processes indexing queues in batches across the cluster, using all available CPU cores and pre-hydrating index caches during ingestion to boost query performance. The choice between Weaviate and Aerospike depends on specific use cases, data nature, and future scalability needs. Both technologies continue to evolve, so it's worth keeping an eye on their development as you make your decision.
Weaviate vs Neo4j: Choosing the Right Vector Database for Your Needs
Date published
Dec. 1, 2024
Author(s)
Chloe Williams
Language
English
Word count
2121
Hacker News points
None found.
Weaviate and Neo4j are two popular vector databases that offer efficient similarity searches, making them crucial in AI applications. While both technologies have their strengths and trade-offs, the choice between them depends on specific use cases, data types, query complexity, and the importance of relationships versus semantic similarity. Weaviate is great for vector-centric workloads, multi-modal data, and ease of use, making it perfect for AI-driven applications. On the other hand, Neo4j excels in scenarios where relationships are key, as it's a mature graph database with vector search capabilities. Users can make informed decisions by testing these technologies with their own datasets using VectorDBBench, an open-source benchmarking tool.
Elasticsearch vs Neo4j Selecting the Right Database for GenAI Applications
Date published
Nov. 30, 2024
Author(s)
Chloe Williams
Language
English
Word count
2027
Hacker News points
None found.
Elasticsearch and Neo4j are two prominent databases with vector search capabilities that play a crucial role in AI applications such as recommendation engines, image retrieval, and semantic search. While both support vector search, they have different strengths and use cases. Elasticsearch is great for large-scale document search with its mature full-text search and efficient vector search, while Neo4j is good for combining relationship-based queries with vector similarity search. The choice between these two powerful but different approaches to vector search in distributed database systems should be based on the specific requirements of the application, including data structure, scale, and the importance of relationships between data points.
Chroma vs Neo4j: Choosing the Right Vector Database for Your Needs
Date published
Nov. 30, 2024
Author(s)
Chloe Williams
Language
English
Word count
2225
Hacker News points
None found.
Chroma and Neo4j are two popular vector databases that offer efficient similarity searches, making them suitable for AI applications. Chroma is an open-source, AI-native vector database designed to streamline the development of AI-powered applications by providing tools for managing vector data and metadata. It supports various types of data and integrates seamlessly with other AI tools and frameworks. On the other hand, Neo4j is a graph database that offers vector search capabilities as an add-on. Its strength lies in handling structured, semi-structured, and unstructured data by combining graph queries with vector search for hybrid applications. When choosing between Chroma and Neo4j, consider factors such as search methodology, data types, scalability, flexibility, integration, ease of use, cost, and security. Chroma is good for simplicity and embedding-centric workflows, while Neo4j is suitable for graph modeling and semantic search. The choice should match your specific use case, data types, and performance requirements.
Couchbase vs Neo4jChoosing the Right Vector Database for Your AI Apps
Date published
Nov. 30, 2024
Author(s)
Chloe Williams
Language
English
Word count
2025
Hacker News points
None found.
Couchbase and Neo4j are both distributed databases with vector search capabilities added on, but they differ in their core technology and use cases. Couchbase is a NoSQL document-oriented database that can store vector embeddings within JSON documents for similarity searches. It offers flexibility in data modeling and integrates well with external tools and frameworks. Neo4j is a graph database that combines graph relationships with vector embeddings, allowing for seamless integration of vector search and graph queries. It's suitable for applications requiring both graph structures and high-dimensional vector data. The choice between Couchbase and Neo4j depends on the specific use case, type of data being stored, and performance or integration requirements.
Couchbase vs Weaviate Choosing the Right Vector Database for Your AI Apps
Date published
Nov. 30, 2024
Author(s)
Chloe Williams
Language
English
Word count
2104
Hacker News points
None found.
Couchbase and Weaviate are both distributed databases designed to store high-dimensional vectors, which are numerical representations of unstructured data such as text or images. They play a crucial role in AI applications by enabling efficient similarity searches for tasks like recommendation systems, content discovery platforms, anomaly detection, medical image analysis, and natural language processing (NLP). Couchbase is an open-source NoSQL database that can be used to build applications for cloud, mobile, AI, and edge computing. It combines the strengths of relational databases with the versatility of JSON and provides flexibility to implement vector search despite not having native support for vector indexes. Developers can store vector embeddings within Couchbase documents as part of their JSON structure and perform similarity searches using Full Text Search (FTS) or external integrations like FAISS or HNSW. Weaviate is an open-source vector database designed to simplify AI application development, offering built-in vector and hybrid search capabilities, easy integration with machine learning models, and a focus on data privacy. It uses HNSW indexing for fast and accurate similarity searches and supports combining vector searches with traditional filters for more granular queries. Key differences between Couchbase and Weaviate include their search methodology, data handling capabilities, scalability and performance, flexibility and customization options, integration and ecosystem support, ease of use, cost considerations, and security features. The choice between the two should be based on an application's priorities and specific requirements for vector search functionality, general-purpose database operations, or AI/ML workflows.
pgvector vs Neo4j: Choosing the Right Vector Database for Your Needs
Date published
Nov. 30, 2024
Author(s)
Chloe Williams
Language
English
Word count
2101
Hacker News points
None found.
Both pgvector and Neo4j are vector databases that store high-dimensional vectors to enable efficient similarity searches, which play a crucial role in AI applications. However, they have different approaches and features. pgvector is an extension for PostgreSQL that adds support for vector operations, allowing users to store and query vector embeddings directly within their PostgreSQL database. It supports both exact and approximate nearest neighbor search and integrates with PostgreSQL's indexing mechanisms. Neo4j is a graph database that allows developers to create vector indexes to search for similar data across their graph. It uses HNSW graphs for fast approximate k-nearest neighbor searches within the context of a graph database. Key differences between pgvector and Neo4j include: 1. Search Methodology: While both support distance metrics like cosine similarity and Euclidean distance, Neo4j's graph relationships add complexity for hybrid graph + vector search scenarios. 2. Data Handling: pgvector is good for environments where structured and semi-structured data is handled natively by PostgreSQL, while Neo4j is optimized for graph data. 3. Scalability and Performance: Neo4j supports native distributed graph storage and query execution, making it better suited for large datasets or scenarios that benefit from distributed architecture. 4. Flexibility and Customization: pgvector provides direct integration with PostgreSQL's indexing and querying mechanism, while Neo4j allows customization through its query language (Cypher). 5. Integration and Ecosystem: Both systems integrate well with their respective ecosystems but depend on whether your stack revolves around relational or graph data tools. 6. Ease of Use: pgvector is easier to use for PostgreSQL users, while Neo4j has a steeper learning curve for teams without graph database experience. 7. Cost: Both systems have robust security options, but implementation differs. 8. Security: Both systems have robust security options, but implementation differs. The choice between pgvector and Neo4j ultimately depends on your use case, data type, workload complexity, scaling needs, and integration requirements.
Couchbase vs Vespa Choosing the Right Vector Database for Your AI Apps
Date published
Nov. 30, 2024
Author(s)
Chloe Williams
Language
English
Word count
1774
Hacker News points
None found.
Couchbase and Vespa are both distributed databases with vector search capabilities, but they differ in their approach to handling vector data. Couchbase is a NoSQL database that can be adapted to support vector search through various methods such as Full Text Search (FTS) or integrating with external libraries like FAISS or HNSW. Vespa, on the other hand, is a purpose-built vector database with built-in vector search capabilities and supports multiple search types simultaneously. When choosing between Couchbase and Vespa for your AI applications, consider factors such as native support vs adapted solutions, performance and scalability, data handling, ease of implementation, and specific requirements for vector search implementation. Additionally, using open-source benchmarking tools like VectorDBBench can help you evaluate and compare the performance of these vector databases on your own datasets.
MongoDB vs Neo4j: Selecting the Right Database for GenAI Applications
Date published
Nov. 30, 2024
Author(s)
Chloe Williams
Language
English
Word count
1870
Hacker News points
None found.
MongoDB Atlas Vector Search and Neo4j are two prominent databases with vector search capabilities, essential for recommendation engines, image retrieval, and semantic search in AI-driven applications. Both use the Hierarchical Navigable Small World (HNSW) algorithm for indexing and searching vector data. MongoDB Atlas Vector Search is built into its document-based architecture, while Neo4j has vector search built into its graph structure. The choice between them depends on factors such as data model, application requirements, scaling needs, and integration preferences.
Redis vs Neo4j: Choosing the Right Vector Database for Your Needs
Date published
Nov. 30, 2024
Author(s)
Chloe Williams
Language
English
Word count
1890
Hacker News points
None found.
Redis and Neo4j are two popular vector databases that offer efficient similarity searches, making them crucial in AI applications. While both support vector search capabilities, they differ in their core technologies, data handling methods, scalability, performance, integration, ease of use, cost considerations, and security features. Redis is an in-memory database with a simpler learning curve and faster real-time vector search operations, making it suitable for applications that require instant responses like recommendation engines or chatbots. On the other hand, Neo4j combines graph capabilities with vector search features, making it ideal for applications that need to analyze patterns in connected data such as knowledge graphs or social networks. The choice between Redis and Neo4j depends on specific use cases and performance requirements.
Couchbase vs Qdrant Choosing the Right Vector Database for Your AI Apps
Date published
Nov. 30, 2024
Author(s)
Chloe Williams
Language
English
Word count
2034
Hacker News points
None found.
Couchbase and Qdrant are both vector databases designed to store and query high-dimensional vectors, which are numerical representations of unstructured data. They play a crucial role in AI applications by enabling efficient similarity searches for tasks such as e-commerce product recommendations, content discovery platforms, anomaly detection in cybersecurity, medical image analysis, natural language processing (NLP), and Retrieval Augmented Generation (RAG). Couchbase is a distributed multi-model NoSQL document-oriented database with vector search capabilities. It can store vector embeddings within its JSON structure and perform similarity searches using Full Text Search (FTS) or application-side computations. Couchbase integrates with specialized libraries or algorithms like FAISS or HNSW for more advanced use cases. Qdrant is a purpose-built vector database designed specifically for similarity search and machine learning applications. It uses a custom version of the HNSW algorithm for indexing, allowing fast approximate nearest neighbor searches. Qdrant supports both vector similarity and metadata-based filtering, making it suitable for complex queries that combine these features. The choice between Couchbase and Qdrant depends on the specific use case, existing infrastructure, and priorities. Couchbase is best suited for general-purpose NoSQL functionality alongside occasional vector search capabilities, while Qdrant excels at managing and querying high-dimensional vector data with speed and precision, making it ideal for AI and machine learning applications where vector search is central to the application.
Couchbase vs Pinecone Choosing the Right Vector Database for Your AI Apps
Date published
Nov. 30, 2024
Author(s)
Chloe Williams
Language
English
Word count
1865
Hacker News points
None found.
Couchbase and Pinecone are both distributed databases designed to store and query high-dimensional vectors, which are numerical representations of unstructured data. They play a crucial role in AI applications by enabling efficient similarity searches for tasks like recommendation systems or retrieval-augmented generation. While Couchbase is an open-source NoSQL database that can be adapted for vector search, Pinecone is a purpose-built vector database with native support for vector indexes and compatibility with machine learning models. The choice between the two depends on factors such as infrastructure preferences, scaling needs, and whether vector search is primary or secondary in your application architecture.
Couchbase vs Redis Choosing the Right Vector Database for Your AI Apps
Date published
Nov. 30, 2024
Author(s)
Chloe Williams
Language
English
Word count
1746
Hacker News points
None found.
Couchbase is a distributed multi-model NoSQL document-oriented database that can be used to build applications for cloud, mobile, AI, and edge computing. It combines the strengths of relational databases with the versatility of JSON. Couchbase provides flexibility to implement vector search despite not having native support for vector indexes. Developers can store vector embeddings within Couchbase documents as part of their JSON structure. These vectors can be used in similarity search use cases, such as recommendation systems or retrieval-augmented generation both based on semantic search, where finding data points close to each other in a high-dimensional space is important. Couchbase enables efficient similarity searches by leveraging Full Text Search (FTS), which converts vector data into searchable fields or stores raw vector embeddings and performs the mathematical comparison logic at the application level. This allows Couchbase to serve as a storage solution for vectors while the application handles the mathematical comparison logic. For more advanced use cases, developers can integrate Couchbase with specialized libraries or algorithms that enable efficient vector search. Redis, on the other hand, is an in-memory database that has added vector search capabilities through its Redis Vector Library. Redis uses FLAT and HNSW (Hierarchical Navigable Small World) algorithms for approximate nearest neighbor search which allows for fast and accurate search in high dimensional vector spaces. One of the main strengths of Redis vector search is that it can combine vector similarity search with traditional filtering on other attributes. The Redis Vector Library provides a simple interface for developers to work with vector data in Redis, featuring flexible schema design, custom vector queries, and extensions for LLM related tasks like semantic caching and session management. When choosing between Couchbase and Redis, the decision depends on specific needs such as data size, search speed requirements, and scaling needs. Redis is recommended for real-time applications that need fast vector similarity searches, while Couchbase offers flexibility and strong enterprise features, making it good for complex, large-scale applications. Ultimately, thorough benchmarking with actual datasets and query patterns will be key to making a decision between these two powerful but different approaches to vector search in distributed database systems.
GLiNER: Generalist Model for Named Entity Recognition Using Bidirectional Transformer
Date published
Nov. 30, 2024
Author(s)
Haziqa Sajid
Language
English
Word count
2631
Hacker News points
None found.
GLiNER is an open-source Named Entity Recognition (NER) model using a bidirectional transformer encoder, designed to improve efficiency, scalability, and multilingual performance while maintaining accuracy. It outperforms both ChatGPT and fine-tuned LLMs like UniNER in zero-shot evaluations across various NER benchmarks, including those in multiple languages. GLiNER's architecture is effective across different BiLMs (Bidirectional Language Models) and achieves strong performance with smaller model sizes than large LLMs. Its ability to generalize across various domains and languages makes it a promising solution for scenarios with limited labeled data.
Mixture-of-Agents (MoA): How Collective Intelligence Elevates LLM Performance
Date published
Nov. 29, 2024
Author(s)
Ruben Winastwan
Language
English
Word count
2245
Hacker News points
None found.
The Mixture-of-Agents (MoA) approach combines multiple large language models (LLMs) with different specialties into a single system to improve overall performance and tackle multi-domain use cases. By leveraging the unique strengths of each LLM, MoA generates higher quality outputs compared to direct input prompts. The MoA framework consists of layers containing specialized LLMs that collaborate to solve tasks iteratively. It has been evaluated on benchmark datasets such as AlpacaEval 2.0 and MT-Bench, demonstrating superior performance over state-of-the-art models like GPT-4 family. However, the reliance on multiple LLMs increases latency, impacting user experience due to higher Time to First Token (TTFT). Future work aims to address this by implementing chunk-wise response aggregation while maintaining its performance.
Couchbase vs MongoDB Choosing the Right Vector Database for Your AI Apps
Date published
Nov. 28, 2024
Author(s)
Chloe Williams
Language
English
Word count
1991
Hacker News points
None found.
Couchbase and MongoDB are both NoSQL databases with vector search capabilities as an add-on. Couchbase is a distributed, open-source, multi-model database that can be adapted to handle vector search functionality using workarounds like tokenizing vectors for Full Text Search (FTS) or performing similarity computations at the application level. MongoDB Atlas Vector Search has native support for vector embeddings and indexing with HNSW for Approximate Nearest Neighbor (ANN) searches, as well as Exact Nearest Neighbors (ENN) for small scale queries. Key differences include search methodology, data handling, scalability and performance, flexibility and customization, integration and ecosystem, ease of use, cost, and security. The choice between Couchbase and MongoDB depends on the specific use case and requirements of the user.
Couchbase vs pgvector Choosing the Right Vector Database for Your AI Apps
Date published
Nov. 28, 2024
Author(s)
Chloe Williams
Language
English
Word count
1801
Hacker News points
None found.
Couchbase and pgvector are both distributed databases with vector search capabilities, but they differ in their approach to handling vector data. Couchbase is a NoSQL document-oriented database that can be adapted to handle vector search by storing vector embeddings within JSON documents or integrating with specialized libraries like FAISS. On the other hand, pgvector is an extension for PostgreSQL that adds support for vector operations directly within the relational database, offering built-in vector indexing options and native vector operations. When choosing between Couchbase and pgvector, consider factors such as your existing infrastructure, scaling needs, and whether you prefer built-in vector operations (pgvector) or implementation flexibility (Couchbase). Additionally, benchmarking with your own datasets and query patterns will be key to making a decision based on actual performance.
Couchbase vs FAISS Choosing the Right Vector Database for Your AI Apps
Date published
Nov. 28, 2024
Author(s)
Chloe Williams
Language
English
Word count
1540
Hacker News points
None found.
Couchbase and FAISS are both used in AI applications but serve different purposes. Couchbase is a distributed, open-source NoSQL document-oriented database that can be adapted to handle vector search functionality through Full Text Search or application level calculations. Faiss (Facebook AI Similarity Search), on the other hand, is an open-source library designed for efficient vector similarity search and clustering of dense vectors. While Couchbase provides full database features including JSON document storage, indexing, querying, ACID transactions, Faiss only has vector operations. Therefore, Couchbase is best when you need a database that can do both traditional data operations and vector search, while FAISS is the clear winner for vector search only, especially in AI and machine learning applications where high performance similarity search is key.
pgvector vs Vearch: Choosing the Right Vector Database for Your Needs
Date published
Nov. 27, 2024
Author(s)
Chloe Williams
Language
English
Word count
1654
Hacker News points
None found.
The choice between pgvector and Vearch as a vector database depends on various factors such as existing infrastructure, scale requirements, and specific features needed. Pgvector is an extension for PostgreSQL that adds support for vector operations, making it ideal for teams already using PostgreSQL and wanting to add vector search capabilities within their existing database setup. On the other hand, Vearch is a purpose-built vector database designed for large-scale AI applications requiring fast hybrid search capabilities and horizontal scalability. It offers distributed architecture with specialized nodes and supports GPU acceleration. To make an informed decision, users should consider their team's expertise with distributed systems versus traditional databases and thoroughly benchmark the performance of these tools using their own datasets and query patterns.
pgvector vs Vald: Choosing the Right Vector Database for Your Needs
Date published
Nov. 27, 2024
Author(s)
Chloe Williams
Language
English
Word count
1484
Hacker News points
None found.
Choosing between pgvector and Vald depends on specific use cases and requirements. Both are vector databases designed to store and query high-dimensional vectors, enabling efficient similarity searches in AI applications. However, they differ in their core technologies, search performance methodology, data management capabilities, scalability, integration ease, and cost analysis. pgvector is an extension for PostgreSQL that adds support for vector operations, allowing users to store and query vector embeddings directly within their PostgreSQL database. It supports both exact and approximate nearest neighbor searches through HNSW and IVFFlat indexes. pgvector integrates with PostgreSQL's indexing mechanisms and is suitable for applications that already use PostgreSQL and need vector search capabilities along with regular database operations. Vald, on the other hand, is a purpose-built vector database designed for massive vector datasets requiring high availability and real-time processing. It uses NGT (Neighborhood Graph and Tree) for approximate nearest neighbor search and excels in distributed systems with automatic sharding, replication, and live index updates across multiple nodes. Vald is ideal for large scale image recognition, real-time recommendation engines, and systems that need continuous index updates without downtime, especially when scaling across multiple machines. To make an informed decision between pgvector and Vald, developers should consider their specific use case, infrastructure, and operational requirements. Additionally, using open-source benchmarking tools like VectorDBBench can help evaluate these vector databases based on actual performance with custom datasets and query patterns.
pgvector vs MyScale: Choosing the Right Vector Database for Your Needs
Date published
Nov. 27, 2024
Author(s)
Chloe Williams
Language
English
Word count
1577
Hacker News points
None found.
The article compares two vector databases, pgvector and MyScale, to help users make an informed decision based on their specific needs. A vector database is designed to store and query high-dimensional vectors, which are numerical representations of unstructured data such as text or images. Common use cases for vector databases include e-commerce product recommendations, content discovery platforms, anomaly detection in cybersecurity, medical image analysis, and natural language processing (NLP) tasks. pgvector is an extension for PostgreSQL that adds support for vector operations, allowing users to store and query vector embeddings directly within their PostgreSQL database. It supports exact and approximate nearest neighbor search, integration with PostgreSQL's indexing mechanisms, and various distance metrics (Euclidean, cosine, inner product). MyScale is a cloud-based database built on top of the open source ClickHouse database, designed for AI and machine learning workloads. It combines vector search and SQL analytics with added vector search capabilities. MyScale supports multiple vector index types and similarity metrics to support different use cases and offers native SQL support, making it accessible to developers familiar with relational databases. Key differences between pgvector and MyScale include their search methodology, data handling, scalability and performance, flexibility and customization, integration and ecosystem, and ease of use. Users should choose pgvector when they already use PostgreSQL, need basic vector search capabilities, work with moderate-sized datasets, and want to avoid managing multiple databases. On the other hand, users should choose MyScale when they need advanced vector indexing options, combined vector and full-text search capabilities, high-performance scaling for large datasets, built-in monitoring for LLM systems, or plan to handle complex data types requiring sophisticated query operations. The article also introduces VectorDBBench, an open-source benchmarking tool that allows users to test and compare different vector database systems using their own datasets and find the one that fits their use cases.
Making Sense of the Vector Database Landscape
Date published
Nov. 27, 2024
Author(s)
Emily Kurze
Language
English
Word count
337
Hacker News points
None found.
By 2025, 90% of new data will be unstructured, creating challenges in modern data management but also opportunities to innovate in AI and search systems. Vector databases are designed to store and query high-dimensional vector embeddings, transforming unstructured data into actionable insights. However, the rapidly evolving landscape presents a challenge for organizations seeking the right solution. The Definitive Guide to Choosing a Vector Database provides insights on why purpose-built vector databases outperform traditional systems, how Approximate Nearest Neighbor (ANN) algorithms enable fast searches, and key features for AI applications. It also compares top players in the market and offers guidance on running benchmarks using open-source tools to evaluate performance, scalability, and cost-effectiveness.
LLaVA: Advancing Vision-Language Models Through Visual Instruction Tuning
Date published
Nov. 25, 2024
Author(s)
Ruben Winastwan
Language
English
Word count
2590
Hacker News points
None found.
LLaVA (Large Language and Vision Assistant) is a pioneering effort to implement text-based instruction for visual-based models, combining large language models with visual processing capabilities. It uses pre-trained LLMs like Vicuna to process textual instructions and the visual encoder from pre-trained CLIP, a ViT model, to process image information. LLaVA is fine-tuned on multimodal instruction-following data generated using GPT-4 or ChatGPT, enabling it to perform tasks like summarizing visual content, extracting information from images, and answering questions about visual data. The evaluation results demonstrate the effectiveness of visual instruction tuning, as LLaVA's performance consistently outperforms two other visual-based models: BLIP-2 and OpenFlamingo.
Advanced RAG Techniques: Bridging Text and Visuals for More Accurate Responses
Date published
Nov. 24, 2024
Author(s)
Fendy Feng and Simon Mwaniki
Language
English
Word count
2327
Hacker News points
None found.
Retrieval-Augmented Generation (RAG) is a technique that combines large language models' generative abilities with retrieval systems to fetch relevant information from external sources, improving the accuracy and contextual relevance of AI responses. Advanced RAG techniques like Small to Slide enhance performance when dealing with visual data such as presentations or documents with images. RAG methods require infrastructure to manage complex queries and retrieval operations, and emerging techniques like ColPali work directly with visual features of documents, enabling it to index and retrieve information without the error-prone step of text extraction.
Stop Waiting, Start Building: Voice Assistant With Milvus and Llama 3.2
Date published
Nov. 23, 2024
Author(s)
Stephen Batifol
Language
English
Word count
1335
Hacker News points
None found.
This blog guides users through building a Voice Assistant using open-source projects such as Milvus, Llama 3.2, and various GenAI technologies including Assemby AI, DuckDuckGo, and ElevenLabs. The voice assistant is designed for voice interactions and uses an agentic Retrieval-Augmented Generation (RAG) system. Key technologies used include Milvus, a high-performance vector database, Llama 3.2, an advanced large language model, Assembly AI for speech-to-text conversion, DuckDuckGo for privacy-focused search results, and ElevenLabs for voice synthesis. The architecture of the RAG system is broken down into multiple components, each handling a specific part of the process. The system retrieves information from various sources simultaneously, including Milvus knowledge base, calendar integration, and web search fallback. The results showcase a modular design with full control, privacy-focused data management, and true ownership and control of the AI stack.
Elasticsearch vs Aerospike: Selecting the Right Database for GenAI Applications
Date published
Nov. 23, 2024
Author(s)
Chloe Williams
Language
English
Word count
2027
Hacker News points
None found.
Elasticsearch and Aerospike are two prominent databases with vector search capabilities, essential for applications such as recommendation engines, image retrieval, and semantic search. Both provide robust support for handling vector search, but they differ in their architecture, implementation, data management, performance, scalability, integration, and additional features. Elasticsearch is built on top of Apache Lucene and is a go-to search engine for heavy applications and log analytics. It has added vector search capabilities to support AI use cases like image recognition, document retrieval, and Generative AI. Aerospike is a NoSQL database for high-performance real-time applications with vector indexing and searching capabilities called Aerospike Vector Search (AVS). The choice between Elasticsearch and Aerospike depends on technical requirements, project timeline, existing infrastructure, data consistency needs, processing power, and whether the deployment is needed immediately or can work with preview features.
Elasticsearch vs Clickhouse: Selecting the Right Database for GenAI Applications
Date published
Nov. 23, 2024
Author(s)
Chloe Williams
Language
English
Word count
2281
Hacker News points
None found.
Elasticsearch and ClickHouse are two prominent databases with vector search capabilities, essential for recommendation engines, image retrieval, and semantic search in AI-driven applications. While both have strengths and weaknesses, the choice between them depends on specific requirements such as search methodology, data types, scalability, flexibility, integration, ease of use, cost, and security. Elasticsearch is good for real-time hybrid search with a mature ecosystem and user-friendly APIs, while ClickHouse is suitable for large scale analytics with SQL centric workflows and scalable architecture. Evaluating these databases using VectorDBBench can help users make an informed decision based on their use case.
Elasticsearch vs Vearch Selecting the Right Database for GenAI Applications
Date published
Nov. 23, 2024
Author(s)
Chloe Williams
Language
English
Word count
2279
Hacker News points
None found.
Elasticsearch and Vearch are two prominent databases with vector search capabilities that play a crucial role in AI applications such as recommendation engines, image retrieval, and semantic search. While both have vector search capabilities, they serve different needs and excel in different scenarios. Elasticsearch is versatile, has an ecosystem, and hybrid search capabilities, making it suitable for traditional and emerging search workloads. Vearch is optimized for AI applications and does fast and efficient similarity search for embedding-heavy use cases. The choice between these two powerful but different approaches to vector search in distributed database systems depends on the specific requirements of the user's project goals.
Elasticsearch vs Deep Lake: Selecting the Right Database for GenAI Applications
Date published
Nov. 23, 2024
Author(s)
Chloe Williams
Language
English
Word count
2293
Hacker News points
None found.
Elasticsearch and Deep Lake are two prominent databases with vector search capabilities, essential for applications such as recommendation engines, image retrieval, and semantic search. While both have vector search capabilities, they serve different use cases and requirements. Elasticsearch is a general-purpose search engine that can handle both traditional and vector search needs at scale, while Deep Lake is focused on AI/ML workloads and unstructured data management. The choice between these tools comes down to the specific needs of the user, including their existing infrastructure, use case, team expertise, and future scaling needs.
Elasticsearch vs Vald Selecting the Right Database for GenAI Applications
Date published
Nov. 23, 2024
Author(s)
Chloe Williams
Language
English
Word count
1869
Hacker News points
None found.
Elasticsearch and Vald are two prominent databases with vector search capabilities that play a crucial role in AI applications such as recommendation engines, image retrieval, and semantic search. While Elasticsearch is an open-source search engine built on Apache Lucene with vector search as an add-on, Vald is a purpose-built vector database. The choice between the two depends on specific requirements, with Elasticsearch being best for combined search needs and Vald being suitable for pure vector search at scale.
Elasticsearch vs Rockset Selecting the Right Database for GenAI Applications
Date published
Nov. 23, 2024
Author(s)
Chloe Williams
Language
English
Word count
2121
Hacker News points
None found.
Elasticsearch and Rockset are two prominent databases with vector search capabilities that play a crucial role in AI applications such as recommendation engines, image retrieval, and semantic search. Both offer robust capabilities for handling vector search but have different strengths and weaknesses. Elasticsearch is built on Apache Lucene and is known for real-time indexing and full-text search, while Rockset is a search and analytics database designed for structured and unstructured data, including vector embeddings. When choosing between the two, it depends on your use case, technical requirements, and constraints. Elasticsearch is good for its maturity, hybrid search, and text-heavy workloads, making it suitable for e-commerce, log analytics, and document retrieval where you need hybrid searches that combine full-text search and vector similarity. On the other hand, Rockset is better for real-time analytics and applications that require low latency updates, making it ideal for dynamic environments like event-driven architectures, live dashboards, and AI-powered applications. In conclusion, thorough benchmarking with your own datasets and query patterns will be key to making a decision between these two powerful but different approaches to vector search in distributed database systems.
Elasticsearch vs MyScale Selecting the Right Database for GenAI Applications
Date published
Nov. 23, 2024
Author(s)
Chloe Williams
Language
English
Word count
1932
Hacker News points
None found.
Elasticsearch and MyScale are two prominent databases with vector search capabilities that play a crucial role in AI applications such as recommendation engines, image retrieval, and semantic search. While Elasticsearch is built on Apache Lucene's library and focuses on search and analytics, MyScale is built on ClickHouse and designed for AI and machine learning workloads. The choice between the two depends on factors like technical requirements, existing infrastructure, team expertise, storage needs, and whether hybrid search capabilities or AI workload optimization are needed.
Zilliz Cloud’s Redesigned UI: A Streamlined and Intuitive User Experience
Date published
Nov. 20, 2024
Author(s)
Koko Lv
Language
English
Word count
2777
Hacker News points
None found.
Zilliz Cloud has released a redesigned user interface (UI) to streamline workflows, reduce cognitive load, and boost productivity for developers. The new UI is more intuitive and specifically designed to support enterprise-level GenAI applications. It includes features such as multi-replica, data migration, and an improved Cardinal vector search engine for a 10x performance boost. The redesign was driven by the need to maintain a great user experience in the competitive vector database market.
New for Zilliz Cloud: 10X Performance Boost and Enhanced Enterprise Features
Date published
Nov. 19, 2024
Author(s)
Steffi Li
Language
English
Word count
556
Hacker News points
None found.
Zilliz Cloud has released a new version with enhanced features and performance improvements. The key highlights include Cardinal, a vector search engine that delivers a 10X performance boost in production environments, and various enterprise-ready features such as multi-replica support for high-traffic applications, increased capacity of compute units by 50%, enhanced observability with Prometheus integration, simplified data migration from Qdrant and Pinecone Serverless, Auth0-based authentication system, global expansion to AWS Tokyo region, and developer experience improvements. These features are available now across all Zilliz Cloud deployments, with a free tier and 30-day enterprise trial offered.
Enabling Fine-Grained Access Control with Milvus Row-Level RBAC
Date published
Nov. 16, 2024
Author(s)
Ken Zhang
Language
English
Word count
2195
Hacker News points
None found.
Access control is crucial in modern data systems, especially for industries handling sensitive information like healthcare and finance. Milvus offers a fine-grained RBAC solution based on a permission model that uses bitmap indexing to enable row-level access control. This feature allows you to control access to specific Milvus resources and permissions based on user roles and privileges. The implementation of fine-grained access control not only enhances security but also offers flexibility for evolving business needs, ensuring that access policies can adapt as roles and responsibilities change. With its powerful tools and flexible permissions model, Milvus empowers organizations to create highly secure, scalable data systems that meet regulatory requirements while offering seamless access to the right people.
Learn Llama 3.2 and How to Build a RAG Pipeline with Llama and Milvus
Date published
Nov. 15, 2024
Author(s)
Benito Martin
Language
English
Word count
2764
Hacker News points
None found.
Meta has released a series of powerful open-source models called Llama, including Llama 3, Llama 3.1, and Llama 3.2 in just six months. These models are designed to narrow the gap between proprietary and open-source tools, offering developers valuable resources to push the boundaries of their projects. The recent Unstructured Data Meetup hosted by Zilliz discussed the rapid evolution of the Llama models since 2023, advancements in open-source AI, and the architecture of these models. The talk covered up to Llama 3.1, with some notes on Llama 3.2 focusing mainly on size and version differences. The Llama series is based on a decoder-only transformer architecture and can be divided into two main categories: core models and safeguards. The core models are further categorized by size and purpose, while the safeguard tools include LlamaGuard 3, Prompt Guard, CyberSecEval 3, and Code Shield. These models have been trained and fine-tuned on representative datasets and evaluated rigorously for harmful content to ensure safe and reliable use in AI applications. In addition to the core models, Meta has released specialized models like LlamaGuard to promote responsible and safe AI development. The Llama System (Llama Stack API) is a set of standard interfaces that can be used to build adapters for different applications. By providing high-performance models to the public, Meta is fostering innovation in AI and encouraging collaboration within the open-source community.
Deploying a Multimodal RAG System Using vLLM and Milvus
Date published
Nov. 13, 2024
Author(s)
Stephen Batifol
Language
English
Word count
1636
Hacker News points
None found.
This blog post guides users through creating a Multimodal Retrieval Augmented Generation (RAG) system using open-source solutions Milvus and vLLM. The tutorial demonstrates how to self-host an AI application, providing full control over the technology while enhancing its capabilities. By leveraging the power of an open-source vector database combined with open-source LLM inference, users can design a system capable of processing and understanding multiple types of data - text, images, audio, and even videos. The resulting multimodal RAG system is flexible, scalable, and under complete user control, mitigating risks associated with relying solely on cloud API providers.
Transformers4Rec: Bringing NLP Power to Modern Recommendation Systems
Date published
Nov. 12, 2024
Author(s)
ShriVarsheni R
Language
English
Word count
1660
Hacker News points
None found.
Transformers4Rec is a powerful library designed for creating sequential and session-based recommendation systems with PyTorch, integrating with transformer models from natural language processing (NLP). It includes four main components—Feature Aggregation, Sequence Masking, Sequence Processing, and Prediction Head—that work together to make predictions. Transformers4Rec supports various architectures for sequence processing, including XLNet, GPT-2, and LSTM, allowing users to choose the most suitable model for their recommendation system. Evaluation metrics like precision, recall, MAP, and NDCG help evaluate system effectiveness, ensuring recommendations meet user needs. Challenges of scaling Transformers4Rec include infrastructure costs, storage needs, and handling new or frequently changing product catalogs.
How Inkeep and Milvus Built a RAG-driven AI Assistant for Smarter Interaction
Date published
Nov. 8, 2024
Author(s)
Ruben Winastwan
Language
English
Word count
2334
Hacker News points
None found.
Inkeep and Milvus have developed an AI-powered assistant to enhance interaction with technical documentation, aiming to save time for developers searching through platforms or services. The AI assistant is built using Retrieval Augmented Generation (RAG), a method that combines advanced NLP techniques such as vector search and LLMs to generate accurate answers to users' queries. Inkeep handles the ingestion and generation parts, while Zilliz provides support in the indexing and retrieval steps. The AI assistant is currently available on both the Zilliz and Milvus documentation sites.
Safe RAG with HydroX AI and Zilliz: PII Masking for Responsible GenAI
Date published
Nov. 7, 2024
Author(s)
Jiang Chen and Victor Bian
Language
English
Word count
837
Hacker News points
None found.
Zilliz and HydroX AI have partnered to introduce PII Masker, an advanced tool designed to enhance data privacy in AI applications. The collaboration aims to protect Personally Identifiable Information (PII) during model training and inference for Generative AI (GenAI) models like Retrieval Augmented Generation (RAG). With the increasing use of unstructured data in AI, ensuring PII safety is crucial for responsible GenAI usage. PII Masker automatically detects and masks sensitive information with high precision using DeBERTa-v3 NLP model. The tool has seamlessly integrated with both Milvus and Zilliz Cloud vector databases, allowing users to build compliant GenAI applications while protecting user data. Future iterations of PII Masker will expand language support and improve the detection of contextually implied PII.
Couchbase vs Chroma Choosing the Right Vector Database for Your AI Apps
Date published
Nov. 3, 2024
Author(s)
Chloe Williams
Language
English
Word count
2321
Hacker News points
None found.
Couchbase and Chroma are both vector databases designed to store and query high-dimensional vectors, which are numerical representations of unstructured data. They play a crucial role in AI applications by enabling efficient similarity searches for tasks such as e-commerce product recommendations, content discovery platforms, anomaly detection in cybersecurity, medical image analysis, and natural language processing (NLP). Couchbase is a distributed multi-model NoSQL document-oriented database with vector search capabilities. It combines the strengths of relational databases with the versatility of JSON and provides flexibility to implement vector search despite not having native support for vector indexes. Developers can store vector embeddings within Couchbase documents as part of their JSON structure, allowing for similarity search use cases like recommendation systems or retrieval-augmented generation based on semantic search. Chroma is an open-source, AI-native vector database that simplifies the process of building AI applications by making knowledge, facts, and skills easily accessible to large language models (LLMs). It provides tools for managing vector data, allowing developers to store embeddings along with their associated metadata, which enables efficient similarity searches and data retrieval based on vector relationships. When choosing between Couchbase and Chroma, consider factors such as search methodology, data storage requirements, scalability and performance, flexibility and customization, integration and ecosystem, cost and security, and the specific needs of your application. Couchbase is a full-featured database that can include vector search capabilities with enterprise features, strong security, and proven scalability, while Chroma is simple and vector-focused, perfect for AI-first applications where vector search is the top priority.
Couchbase vs Elasticsearch Choosing the Right Vector Database for Your AI Apps
Date published
Nov. 3, 2024
Author(s)
Chloe Williams
Language
English
Word count
2113
Hacker News points
None found.
Couchbase and Elasticsearch are both distributed databases with vector search capabilities as an add-on. Couchbase is a NoSQL document-oriented database, while Elasticsearch is a search engine based on Apache Lucene. Both can be adapted to handle vector search functionality for various AI tasks that rely on similarity searches. Elasticsearch has native vector search through Apache Lucene and uses the HNSW algorithm for efficient similarity search. It manages vector search performance through its segment-based architecture, which allows concurrent search without locks. Elasticsearch treats vector data as a native data type and automatically maintains vector indexes. Couchbase stores vectors as part of JSON documents, giving developers full control over the structure and organization of their vectors. It requires more setup for vector search integration but offers flexibility in implementation methods. Couchbase's performance for vector search varies depending on the chosen implementation method, with its core strength being efficient document storage and retrieval. The choice between Elasticsearch and Couchbase depends on technical requirements and development resources. Elasticsearch is a ready-to-use vector search solution with performance optimizations and text search integration, while Couchbase offers more flexibility and control over vector search implementation with strong distributed computing and edge capabilities.
Catch a Cute Ghost this Halloween with Milvus
Date published
Oct. 31, 2024
Author(s)
Tim Spann
Language
English
Word count
1420
Hacker News points
None found.
This article discusses practical applications of Multimodal Retrieval Augmented Generation (RAG) using Milvus, a vector database. It covers two Halloween-themed use cases: identifying if something is a ghost and finding the cutest cat ghost. The first application involves image search with filters and uses Ollama, LLava 7B, and LLM reranking to determine if an object is a ghost by comparing it to a database of ghost images. The second application focuses on finding the cutest cat ghost using a visualized BGE model. Both applications demonstrate how multimodal RAG can be used for various tasks beyond text-based search. Additionally, the article highlights running advanced RAG techniques locally with Milvus Lite, Ollama, and LLava 7B.
Chroma vs Aerospike: Choosing the Right Vector Database for Your Needs
Date published
Oct. 31, 2024
Author(s)
Chloe Williams
Language
English
Word count
2084
Hacker News points
None found.
Chroma and Aerospike are two options in the vector database space. Vector databases store and query high-dimensional vectors, which represent unstructured data such as text semantics, image features, or product attributes. They enable efficient similarity searches for applications like e-commerce recommendations, content discovery platforms, cybersecurity anomaly detection, medical image analysis, and natural language processing (NLP). Chroma is an open-source, AI-native vector database that simplifies the process of building AI applications by providing tools for managing vector data. It supports various types of data and can work with different embedding models. Chroma integrates seamlessly with other AI tools and frameworks and has a commitment to ongoing development and support. Aerospike is a distributed, scalable NoSQL database that added support for vector indexing and searching. Its vector search capability uses the Hierarchical Navigable Small World (HNSW) index exclusively. Aerospike shines in scalability with its concurrent distributed indexing system and smart caching through "pre-hydration" of the index cache. When choosing between Chroma and Aerospike, consider factors such as search methodology, data handling, scalability and performance, flexibility and customization, integration and ecosystem, ease of use, and cost considerations. For newer AI projects prioritizing development speed, Chroma is often the better choice. For enterprise applications requiring scalability and precise control, especially those already using Aerospike, AVS is likely the better fit.
Chroma vs Vearch: Choosing the Right Vector Database for Your Needs
Date published
Oct. 31, 2024
Author(s)
Chloe Williams
Language
English
Word count
2223
Hacker News points
None found.
Chroma and Vearch are two popular vector databases that enable efficient similarity searches in AI applications. Chroma is an open-source, AI-native vector database designed to simplify the process of building AI applications by providing tools for managing vector data and enabling efficient similarity searches. It supports various types of data and can work with different embedding models. Vearch is a tool for developers building AI applications that need fast and efficient similarity searches. It uses a hybrid search system, allowing users to combine vector similarity search with traditional database filtering. Both Chroma and Vearch have their strengths and are suitable for different use cases. When choosing between the two, consider factors such as search methodology and performance, data and storage, scalability, integrations, usability, and cost and deployment.
Chroma vs Vald: Choosing the Right Vector Database for Your Needs
Date published
Oct. 31, 2024
Author(s)
Chloe Williams
Language
English
Word count
1953
Hacker News points
None found.
Chroma and Vald are two popular vector databases that offer efficient similarity searches, making them suitable for AI applications such as e-commerce product recommendations, content discovery platforms, and natural language processing tasks. Chroma is an open-source, AI-native vector database designed to simplify the process of building AI applications by providing tools for managing vector data and enabling efficient similarity searches. It supports various types of data and can work with different embedding models, allowing users to choose the best approach for their specific use case. Vald is a powerful tool for searching through massive amounts of vector data quickly and reliably. It uses the NGT algorithm for similarity searches and has features like index replication and distributed computing that make it suitable for large-scale deployments. The choice between Chroma and Vald depends on factors such as scale requirements, technical expertise, and whether simplicity or maximum performance is more important.
The Role of LLMs in Modern Travel: Opportunities and Challenges Ahead
Date published
Oct. 25, 2024
Author(s)
Fendy Feng and Yesha Shastri
Language
English
Word count
1451
Hacker News points
None found.
Large Language Models (LLMs) are revolutionizing various industries, including tourism. GetYourGuide (GYC), an online marketplace for travel experiences, is leveraging LLMs to enhance customer experiences and streamline operations. One of the primary applications of LLMs at GYC is content translation and localization, enabling real-time translation of travel information in users' native languages. Additionally, LLMs are used for content generation and customer support through automated FAQs and multi-turn conversations. However, challenges such as hallucinations, prompt leakage, and role consistency arise when using LLMs like ChatGPT. To address these issues, Retrieval-Augmented Generation (RAG) is proposed as a solution. RAG combines an LLM, a vector database, and an embedding model to mitigate hallucinations by retrieving relevant context and feeding it to the LLM for more accurate responses. While fine-tuning a model can improve its understanding of domain-specific language, RAG offers flexibility and cost efficiency in handling diverse or dynamic queries without extensive re-training. Combining fine-tuning with RAG can result in a more robust and effective solution that meets both general and specialized requirements.
The Practical Guide to Self-Hosting Compound LLM Systems
Date published
Oct. 23, 2024
Author(s)
Trevor Trinh
Language
English
Word count
1807
Hacker News points
None found.
The article discusses self-hosting large language models (LLMs) and provides actionable advice for those who prefer control and customization while trying to achieve the performance of just calling a managed API. It highlights BentoML's research insights in AI orchestration, demonstrating solutions it developed for optimizing common performance issues when self-hosting models. The article also explores how to integrate BentoML and Milvus to build more powerful GenAI applications. The LLM Doom Stack is introduced as a framework that includes data, operations, orchestration, and AI models. It explains the benefits of using vector databases like Zilliz/Milvus in various LLM-powered systems, particularly retrieval augmented generation (RAG). The article also discusses the challenges and considerations for self-hosting LLMs, such as control, customization, and long-term cost benefits. The article presents key approaches to address these challenges, including inference optimization techniques like batching requests, token streaming, quantization, kernel optimizations, and model parallelism. It also discusses scaling LLM inference with concurrency-based autoscaling, prefix caching for cost savings, and solutions to the cold start problem. Finally, the article explores integrating BentoML and Milvus for more powerful LLM applications, particularly Retrieval Augmented Generation (RAG). It provides resources for building RAG or other types of GenAI APPs using these tools.
Combining Images and Text Together: How Multimodal Retrieval Transforms Search
Date published
Oct. 22, 2024
Author(s)
David Wang
Language
English
Word count
3733
Hacker News points
None found.
The rise of multimodal models has led to a shift in search methods, with multimodal retrieval gaining popularity due to its ability to combine inputs from multiple modalities such as text and images. This approach allows for more nuanced and precise ways to capture users' search intents by leveraging the strengths of both modalities. One common task within multimodal retrieval is Composed Image Retrieval (CIR), where users provide a query that includes a reference image along with a descriptive caption. This dual-input approach enables the retrieval of specific images by combining visual content with textual instructions, creating a more detailed and accurate query. Various techniques have been developed for CIR, including Pic2Word, CompoDiff, CIReVL, and MagicLens. Each of these builds on the foundational capabilities of CLIP while adopting different approaches to improve retrieval. For example, Pic2Word transforms images into text tokens embedded in a text-based search, leveraging CLIP text embeddings for highly versatile, text-driven image retrieval. CompoDiff employs text-guided denoising, refining noisy visual embeddings with text input to conditionally reconstruct image embeddings, improving search precision. MagicLens uses Transformer models to process text and images in parallel, generating a unified embedding that captures both modalities and enhances retrieval performance. Explore Our Multimodal Search Demo! We’ve developed an online demo for multimodal search powered by the Milvus vector database. In this demo, you can upload an image and input text instructions, which are processed by a composed image retrieval model to find matching images based on both visual and textual input.
MongoDB vs Vearch: Selecting the Right Database for GenAI Applications
Date published
Oct. 21, 2024
Author(s)
Chloe Williams
Language
English
Word count
2098
Hacker News points
None found.
MongoDB Atlas Vector Search and Vearch are two prominent databases with vector search capabilities, essential for AI applications such as recommendation engines, image retrieval, and semantic search. Both offer robust vector search features but have different strengths. MongoDB integrates well with document-based data and is a managed service within the MongoDB ecosystem, making it suitable for projects that need to combine vector similarity searches with document filtering. Vearch offers flexibility in indexing methods, hardware optimization, and scalable architecture, making it ideal for projects that need real-time indexing, can handle multiple vector fields in a single document, or require scaling out to handle massive amounts of vector data. The choice between these two should be based on the specific use case, existing infrastructure, performance requirements, and team expertise.
MongoDB vs Vald: Selecting the Right Database for GenAI Applications
Date published
Oct. 21, 2024
Author(s)
Chloe Williams
Language
English
Word count
2037
Hacker News points
None found.
MongoDB and Vald are two prominent databases with vector search capabilities, essential for applications such as recommendation engines, image retrieval, and semantic search. While both offer powerful vector data handling, they have different approaches and strengths. MongoDB integrates vector search with its flexible document model, making it great for applications that require contextual searches where you need to consider both vector similarity and other document attributes. Vald is high-performance vector search for massive scale and continuous indexing, ideal for applications that have billions of vectors and need fast, efficient similarity searches. The choice between these should be based on the use case, type of data, and performance requirements.
MongoDB vs Rockset: Selecting the Right Database for GenAI Applications
Date published
Oct. 21, 2024
Author(s)
Chloe Williams
Language
English
Word count
1950
Hacker News points
None found.
MongoDB Atlas Vector Search and Rockset are two prominent databases with vector search capabilities, essential for applications such as recommendation engines, image retrieval, and semantic search. Both offer robust support for handling vector search but have different strengths that align with specific use cases and data handling needs. MongoDB Atlas Vector Search integrates with the existing MongoDB ecosystem and is great for applications that need vector search to be seamlessly integrated with document querying. Rockset, on the other hand, is perfect for real-time analytics and high dimensional vector search with its unique indexing approach to query fast on fast changing data. The choice between these two ultimately depends on factors such as existing infrastructure, nature of data, dimensionality of vector embeddings, and the importance of real-time analytics in an application.
Best Practices in Implementing Retrieval-Augmented Generation (RAG) Applications
Date published
Oct. 21, 2024
Author(s)
Ruben Winastwan
Language
English
Word count
3361
Hacker News points
None found.
Retrieval-Augmented Generation (RAG) is a method that improves Language Model's responses and addresses hallucinations by providing context to the LLMs. RAG consists of several components, including query processing, context chunking, context retrieval, context reranking, and response generation. The best approach for each component leads to optimal RAG performance. Query classification helps determine whether a query requires context retrieval or can be processed directly by the LLM. Chunking techniques split long input documents into smaller segments, improving the LLM's granular context understanding. Vector databases store and retrieve relevant contexts efficiently. Retrieval techniques improve the quality of fetched contexts, while reranking and repacking techniques reorder and present the most relevant contexts to the LLM. Summarization techniques condense long contexts while preserving key information. Fine-tuning an LLM is not always necessary but can be done for smaller models to improve their robustness when generating responses related to specific use cases.
MongoDB vs ClickHouse: Selecting the Right Database for GenAI Applications
Date published
Oct. 20, 2024
Author(s)
Chloe Williams
Language
English
Word count
2166
Hacker News points
None found.
MongoDB Atlas Vector Search and ClickHouse are two prominent databases with vector search capabilities, essential for applications such as recommendation engines, image retrieval, and semantic search. Both provide robust capabilities for handling vector search but have different approaches to it. MongoDB is great for handling flexible, document-based data structures and integrates well with AI services and tools. ClickHouse is best when you have massive datasets that require complex queries combining vector search with SQL filtering and aggregation. The choice between these should be driven by your use case, data types, and performance requirements.
MongoDB vs Deep Lake: Selecting the Right Database for GenAI Applications
Date published
Oct. 20, 2024
Author(s)
Chloe Williams
Language
English
Word count
2094
Hacker News points
None found.
MongoDB Atlas Vector Search and Deep Lake are two prominent databases with vector search capabilities, essential for recommendation engines, image retrieval, and semantic search. MongoDB is a NoSQL database that stores data in JSON-like documents while Deep Lake is a data lake optimized for vector embeddings. Both use the Hierarchical Navigable Small World (HNSW) algorithm for indexing and searching vector data. MongoDB Atlas Vector Search supports both Approximate Nearest Neighbor (ANN) and Exact Nearest Neighbors (ENN) search, integrates with popular AI services and tools, and allows combining vector similarity searches with traditional document filtering. It also supports hybrid search, combining vector search with full text search for more granular results. Deep Lake is designed for storing and searching vector embeddings and related metadata, including text, JSON, images, audio, and video files. It integrates seamlessly with tools like LangChain and LlamaIndex, allowing developers to easily build Retrieval Augmented Generation (RAG) applications. When choosing between MongoDB and Deep Lake as a vector search tool, consider the differences in search methodology, data types supported, scalability and performance, flexibility and customization, integration and ecosystem, ease of use, cost, and security features. The choice should be guided by your specific needs and requirements.
MongoDB vs MyScale: Selecting the Right Database for GenAI Applications
Date published
Oct. 20, 2024
Author(s)
Chloe Williams
Language
English
Word count
2175
Hacker News points
None found.
MongoDB Atlas Vector Search and MyScale are two prominent databases with vector search capabilities, essential for applications such as recommendation engines, image retrieval, and semantic search. Both provide robust capabilities for handling vector search, but their strengths fit different scenarios and dev environments. MongoDB integrates seamlessly with your existing MongoDB deployment, has powerful vector search, and can combine vector search with document filtering. MyScale is a single platform for SQL, vector, and full-text search with flexible indexing and native SQL support for vector queries. Users should consider factors like integration with document data, SQL-based querying, types of data they're working with, and scalability needs when choosing between these two powerful but different approaches to vector search in distributed database systems.
MongoDB vs Aerospike: Selecting the Right Database for GenAI Applications
Date published
Oct. 20, 2024
Author(s)
Chloe Williams
Language
English
Word count
2236
Hacker News points
None found.
MongoDB Atlas Vector Search and Aerospike Vector Search (AVS) are two prominent databases with vector search capabilities, essential features for AI applications such as recommendation engines, image retrieval, and semantic search. Both use the Hierarchical Navigable Small World (HNSW) algorithm for indexing and searching vector data. MongoDB Atlas Vector Search is great for applications that need a flexible data model and integration with regular queries, hybrid searches, and AI tools ecosystems. Aerospike Vector Search excels in high-performance, real-time scenarios where low latency and high throughput are key. The choice between MongoDB and Aerospike should be driven by application requirements, data complexity, performance needs, and scalability demands.
Pinecone vs Aerospike: Selecting the Right Database for GenAI Applications
Date published
Oct. 18, 2024
Author(s)
Chloe Williams
Language
English
Word count
1954
Hacker News points
None found.
Pinecone and Aerospike are two prominent databases with vector search capabilities that play a crucial role in AI applications, such as recommendation engines, image retrieval, and semantic search. While both support vector search, they differ in their approach and features. Pinecone is a purpose-built vector database designed for machine learning applications, offering real-time updates, compatibility with ML models, and proprietary indexing techniques for fast searches. Aerospike, on the other hand, is a distributed NoSQL database that has added support for vector search as an add-on feature called Aerospike Vector Search (AVS). Pinecone's key features include real-time updates, machine learning model compatibility, metadata filtering, and serverless offering. It supports hybrid search, which combines dense and sparse vector embeddings to balance semantic understanding with keyword matching. Pinecone integrates with popular ML frameworks and cloud services, making it a complete solution for vector search in AI applications. Aerospike's AVS uses HNSW indexes for approximate nearest neighbor search and supports multiple vectors and indexes per record. It is designed for high-performance real-time applications and can handle large scale, high throughput workloads. Aerospike has flexibility in data modeling and indexing, as well as a wide range of connectors and integrations. When choosing between Pinecone and Aerospike, consider factors such as search methodology, data types, scalability and performance, flexibility and customization, integration and ecosystem, ease of use, cost, and security features. Ultimately, the decision should be based on your specific use case, data types, performance requirements, and team expertise.
Pinecone vs Myscale: Selecting the Right Database for GenAI Applications
Date published
Oct. 18, 2024
Author(s)
Chloe Williams
Language
English
Word count
1588
Hacker News points
None found.
Pinecone and Myscale are two prominent databases with vector search capabilities that play a crucial role in AI applications such as recommendation engines, image retrieval, and semantic search. Pinecone is a purpose-built vector database while MyScale is built on ClickHouse and combines vector search and SQL analytics. Both offer robust vector search capabilities but differ in their features, performance, and ecosystems. Developers and engineers should consider factors such as search methodology, data handling, scalability, flexibility, integration, ease of use, and cost when choosing between these two powerful tools for their specific requirements.
Pinecone vs Deep Lake: Selecting the Right Database for GenAI Applications
Date published
Oct. 18, 2024
Author(s)
Chloe Williams
Language
English
Word count
1837
Hacker News points
None found.
Pinecone and Deep Lake are two prominent databases with vector search capabilities that play a crucial role in AI applications such as recommendation engines, image retrieval, and semantic search. While both offer robust vector search capabilities, they have some key differences. Pinecone is a purpose-built vector database designed for machine learning applications requiring fast vector search even with billions of vectors. It supports real-time updates, machine learning model compatibility, metadata filtering, and hybrid search. Deep Lake, on the other hand, is a specialized data lake optimized for vector embeddings that can handle multiple data types, including multimedia and has versioning for datasets. The choice between these two should be based on specific use cases, data requirements, performance needs, and preference for managed or self-hosted solutions.
Pinecone vs ClickHouse: Selecting the Right Database for GenAI Applications
Date published
Oct. 18, 2024
Author(s)
Chloe Williams
Language
English
Word count
1913
Hacker News points
None found.
Pinecone and ClickHouse are two prominent databases with vector search capabilities that play a crucial role in AI applications, such as recommendation engines, image retrieval, and semantic search. Pinecone is a purpose-built vector database designed for machine learning applications, while ClickHouse is an open-source column-oriented database with vector search capabilities as an add-on. Both databases have their unique features and strengths, making them suitable for different use cases in vector search. Pinecone uses a proprietary indexing technique for fast similarity searches across billions of vectors and supports real-time updates, machine learning model compatibility, metadata filtering, and hybrid search. It is designed for storing and querying vector embeddings and integrates with popular ML frameworks and multiple languages. Pinecone's serverless offering makes database management easy and cost-effective. ClickHouse is an open-source OLAP database that supports fast query processing, especially for large datasets. It has a SQL interface, making it powerful for combining vector search with traditional data operations like filtering and aggregation. ClickHouse also offers experimental Approximate Nearest Neighbour (ANN) indices for faster approximate matching and exact matching through linear scans with parallel processing. When choosing between Pinecone and ClickHouse, consider factors such as search method, data types, scalability and performance, flexibility and customization, integration and ecosystem, ease of use, cost, and security. Ultimately, the decision should be based on your specific requirements and long-term scalability needs.
Pinecone vs Vearch: Selecting the Right Database for GenAI Applications
Date published
Oct. 18, 2024
Author(s)
Chloe Williams
Language
English
Word count
1726
Hacker News points
None found.
Pinecone and Vearch are two prominent databases with vector search capabilities that play a crucial role in AI applications such as recommendation engines, image retrieval, and semantic search. Both offer robust capabilities for handling vector search but have different strengths. Pinecone is great for ease of use, managed infrastructure, and strong ML ecosystem integration, while Vearch offers deployment flexibility, indexing methods, and hardware optimization. The best fit will depend on how you align their strengths with your project's needs.
Pinecone vs Vald: Selecting the Right Database for GenAI Applications
Date published
Oct. 18, 2024
Author(s)
Chloe Williams
Language
English
Word count
1741
Hacker News points
None found.
Pinecone and Vald are two prominent databases with vector search capabilities that play a crucial role in AI applications, such as recommendation engines, image retrieval, and semantic search. Both databases have their own strengths and can handle large scale vector data but differ in features like search methodology, data handling, scalability, flexibility, integration, ease of use, cost, and security. Pinecone is a fully managed service with strong machine learning integration, real-time updates, hybrid search, metadata filtering, and auto scaling for large datasets. Vald is highly customizable, can handle billions of vectors, has a distributed architecture that allows concurrent indexing and searching, and works well in cloud environments. The choice between the two should be based on factors like data volume, level of control required, resources for infrastructure management, and integration with existing systems.
Pinecone vs Rockset: Selecting the Right Database for GenAI Applications
Date published
Oct. 18, 2024
Author(s)
Chloe Williams
Language
English
Word count
1900
Hacker News points
None found.
Pinecone and Rockset are two prominent databases with vector search capabilities that play a crucial role in AI applications such as recommendation engines, image retrieval, and semantic search. While both offer robust vector search capabilities, they have different approaches that may fit different use cases. Pinecone is designed for vector embeddings and associated metadata, works well with unstructured data converted into vector representations, and has auto-scaling to handle billions of vectors efficiently. Rockset can handle structured, semi-structured, and unstructured data, including vector embeddings, supports multiple query types out of the box, and is algorithm-agnostic, allowing users more control over search implementation. The choice between Pinecone and Rockset depends on factors such as the scale of vector data, complexity of queries, need for real-time analytics, and team expertise in database management.
The Importance of Data Engineering for Successful AI with Airbyte and Zilliz
Date published
Oct. 17, 2024
Author(s)
Sydney Blanchard
Language
English
Word count
518
Hacker News points
None found.
The article discusses the importance of data engineering in supporting AI projects at an enterprise level. It highlights how adhering to best practices in data engineering can help resolve common challenges associated with deploying and scaling effective AI usage. Airbyte, an open-source data movement company, enables over 20,000 data and AI professionals to manage diverse data across multi-cloud environments. Zilliz's Milvus is a high-performance, open-source vector database built for scale, which makes unstructured data searchable and helps organizations make sense of it. The article emphasizes the need for efficient handling of unstructured data in enabling AI success.
OpenSearch vs ClickHouse: Selecting the Right Database for GenAI Applications
Date published
Oct. 16, 2024
Author(s)
Chloe Williams
Language
English
Word count
2211
Hacker News points
None found.
OpenSearch and ClickHouse are two prominent databases with vector search capabilities that play a crucial role in AI applications such as recommendation engines, image retrieval, and semantic search. Both databases have evolved to include vector search capabilities as an add-on. OpenSearch is built on Apache Lucene and supports various machine learning-powered methods for vector search, while ClickHouse has integrated vector search capabilities into its SQL engine. Key differences between the two include their search methodology, data handling, scalability, flexibility, integration, ease of use, cost considerations, and security features. Depending on specific application needs, developers may choose OpenSearch or ClickHouse for GenAI applications. For large-scale, high-performance vector search tasks, specialized vector databases like Milvus and Zilliz Cloud are recommended.
Unlocking Rich Visual Insights with RGB-X Models
Date published
Oct. 16, 2024
Author(s)
Simon Mwaniki
Language
English
Word count
3865
Hacker News points
None found.
RGB-X models are advanced machine learning models in computer vision that extend traditional RGB (Red, Green, Blue) data by incorporating additional channels such as depth, infrared, or surface normals. These models have found applications across various industries and use cases, including object tracking across frames and surveying difficult terrain. Recent advancements in RGB-X model development have led to significant improvements in performance and capabilities, with challenges and considerations related to data complexity, model interpretability, and ethics and privacy. Integrating RGB-X models with vector databases like Milvus enhances their applications by enabling efficient storage, indexing, and retrieval of the rich embeddings produced by these models.
OpenSearch vs Aerospike: Selecting the Right Database for GenAI Applications
Date published
Oct. 15, 2024
Author(s)
Chloe Williams
Language
English
Word count
2155
Hacker News points
None found.
OpenSearch and Aerospike are two prominent databases with vector search capabilities that play a crucial role in AI applications such as recommendation engines, image retrieval, and semantic search. Both platforms have evolved to address modern data challenges, offering powerful and flexible solutions tailored to diverse application needs. While OpenSearch is highly scalable and excels in handling complex searches, Aerospike offers peak performance and efficient management of large data volumes. The choice between the two largely depends on specific use cases, requirements for search capabilities, performance needs, and system architecture.
Securing AI: Advanced Privacy Strategies with PrivateGPT and Milvus
Date published
Oct. 15, 2024
Author(s)
ShriVarsheni R and Fendy Feng
Language
English
Word count
2238
Hacker News points
None found.
As organizations increasingly adopt AI tools like Large Language Models (LLMs), concerns about data privacy and security are rising. To mitigate these risks, companies are exploring advanced privacy strategies such as compliant SaaS, data anonymization, local execution, in-house development, and on-prem infra agnostic solutions. PrivateGPT is a framework designed to develop context-aware LLMs with enhanced data privacy controls, offering flexibility for users to customize configurations and select the APIs or models that best meet their needs. By integrating tools like PrivateGPT with vector databases such as Milvus, businesses can create robust and efficient AI systems while upholding strict data protection standards.
OpenSearch vs Deep Lake: Selecting the Right Database for GenAI Applications
Date published
Oct. 13, 2024
Author(s)
Chloe Williams
Language
English
Word count
1935
Hacker News points
None found.
OpenSearch and Deep Lake are two prominent databases with vector search capabilities that play a crucial role in AI applications such as recommendation engines, image retrieval, and semantic search. Both databases have evolved to include vector search capabilities as an add-on. OpenSearch is a robust, open-source search and analytics suite that manages diverse data types and supports various machine learning-powered search methods. Deep Lake is a specialized database system designed for handling vector and multimedia data, making it ideal for complex media search applications. Choosing between the two depends on specific application needs, such as advanced text search capabilities, scalable analytics and visualization, or robust support for storing and searching vector embeddings.
Weaviate vs MyScale: Choosing the Right Vector Database for Your Needs
Date published
Oct. 12, 2024
Author(s)
Chloe Williams
Language
English
Word count
2105
Hacker News points
None found.
Weaviate and MyScale are two popular vector databases that offer efficient storage and retrieval of high-dimensional vectors, which are numerical representations of unstructured data. These databases play a crucial role in AI applications by enabling advanced data analysis and retrieval. While both databases have their strengths, they differ in search methodology, data handling capabilities, scalability, flexibility, integration, ease of use, and security features. Weaviate is an open-source vector database designed for simplicity and efficiency in AI application development. It supports fast and accurate similarity searches using HNSW indexing and hybrid queries that combine vector searches with traditional filters. Weaviate is suitable for projects requiring quick implementation, flexibility with different data types, and easy integration with the GenAI ecosystem. MyScale, on the other hand, is a cloud-based database built on top of ClickHouse designed for AI and machine learning workloads. It supports both structured and vector data and offers native SQL support, making it perfect for teams familiar with relational databases. MyScale's architecture can handle large datasets and high query loads, making it ideal for enterprise-level applications that require high performance analytics and machine learning workloads. When choosing between Weaviate and MyScale, consider your use cases, data types, and performance requirements. Weaviate might be suitable for teams looking for a user-friendly approach to vector search with fast similarity searches, hybrid queries, and easy integration with AI ecosystems. In contrast, MyScale may be better for organizations that need a full SQL-based solution for large-scale data processing and AI-driven analytics.
Weaviate vs Rockset: Choosing the Right Vector Database for Your Needs
Date published
Oct. 12, 2024
Author(s)
Chloe Williams
Language
English
Word count
1895
Hacker News points
None found.
Weaviate and Rockset are two popular vector databases that offer efficient similarity searches, making them crucial in AI applications. While both have their strengths, they cater to different needs. Weaviate is an open-source vector database designed for simplifying AI application development, offering built-in vector and hybrid search capabilities, easy integration with machine learning models, and a focus on data privacy. On the other hand, Rockset is a real-time search and analytics database that excels in ingesting, indexing, and querying data in real-time. When choosing between Weaviate and Rockset, consider factors such as search methodology, data types supported, scalability and performance, flexibility and customization, integration and ecosystem, ease of use, and security features. Ultimately, the choice should align with your project's specific needs, taking into account data volume, update frequency, query complexity, and the balance between vector and traditional search.
Weaviate vs Vald: Choosing the Right Vector Database for Your Needs
Date published
Oct. 12, 2024
Author(s)
Chloe Williams
Language
English
Word count
1802
Hacker News points
None found.
Weaviate and Vald are two purpose-built vector databases designed to store and query high-dimensional vectors, which represent unstructured data such as text semantics, image features, or product attributes. Both technologies enable efficient similarity searches, playing a crucial role in AI applications for advanced data analysis and retrieval. Weaviate is an open-source vector database that offers built-in vector and hybrid search capabilities, easy integration with machine learning models, and focuses on data privacy. It uses HNSW indexing to enable fast vector searches and supports combining vector searches with traditional filters for powerful hybrid queries. Weaviate is suitable for developers building AI applications, semantic search systems, or recommendation engines when working with different data types like text, images, and audio. Vald is a high-performance tool designed to handle large amounts of vector data quickly and reliably. It uses NGT for fast approximate nearest neighbor searches and can handle billions of vectors. Vald is built for scalability from the ground up, using distributed indexing so searches can continue even while the index is being updated. The choice between Weaviate and Vald depends on specific project needs such as data volume, search complexity, and integration with existing systems. For projects that require versatility and ease of integration, especially for smaller to medium-sized projects, Weaviate may be a better choice. On the other hand, if handling massive vector datasets with high performance and scalability is crucial, Vald would be more suitable.
Weaviate vs Deep Lake: Choosing the Right Vector Database for Your Needs
Date published
Oct. 12, 2024
Author(s)
Chloe Williams
Language
English
Word count
1894
Hacker News points
None found.
Weaviate and Deep Lake are two popular vector databases designed to store and query high-dimensional vectors, which represent unstructured data such as text, images, audio, video, or product attributes. Both technologies play a crucial role in AI applications by enabling efficient similarity searches for advanced data analysis and retrieval. Weaviate is an open-source vector database that offers built-in vector and hybrid search capabilities, easy integration with machine learning models, and focuses on data privacy. It uses HNSW (Hierarchical Navigable Small World) indexing to enable fast and accurate similarity searches and supports combining vector searches with traditional filters for powerful hybrid queries. Deep Lake is a specialized database system designed to handle the storage, management, and querying of vector and multimedia data, such as images, audio, video, and other unstructured data types. It provides robust vector search capabilities for various data types like text, JSON, images, audio, and video files. When choosing between Weaviate and Deep Lake, consider the project requirements, data types, scalability, data complexity, integration needs, and long-term technology strategy. Weaviate is suitable for fast similarity search and hybrid queries, great for structured data, and quick AI development. In contrast, Deep Lake is ideal for unstructured multimedia data and complex deep learning scenarios with large datasets.
Weaviate vs ClickHouse: Choosing the Right Vector Database for Your Needs
Date published
Oct. 12, 2024
Author(s)
Chloe Williams
Language
English
Word count
2056
Hacker News points
None found.
Weaviate and ClickHouse are two open-source vector databases with different strengths and use cases. Weaviate is designed for AI focused applications, offering built-in vector search capabilities, easy integration with machine learning models, and a focus on data privacy. It supports multi-modal data and has deep integration with the GenAI ecosystem. ClickHouse, on the other hand, is an OLAP database for real-time analytics with full SQL support and fast query processing. It can handle large vector datasets without being memory bound and supports filtering and aggregation on metadata. Weaviate is best for AI focused projects with diverse data types, while ClickHouse is great for massive datasets and powerful SQL based vector operations alongside traditional analytics.
Weaviate vs Vearch: Choosing the Right Vector Database for Your Needs
Date published
Oct. 12, 2024
Author(s)
Chloe Williams
Language
English
Word count
1777
Hacker News points
None found.
Weaviate and Vearch are both purpose-built vector databases designed to store and query high-dimensional vectors, which represent unstructured data such as text, images, audio, or video. They enable efficient similarity searches in AI applications, playing a crucial role in tasks like recommendation systems, content discovery platforms, anomaly detection, medical image analysis, and natural language processing (NLP). Weaviate is an open-source vector database that offers built-in vector and hybrid search capabilities, easy integration with machine learning models, and focuses on data privacy. It uses HNSW indexing for fast and accurate similarity searches and supports combining vector searches with traditional filters. Weaviate is suitable for developers building AI applications, data engineers working with large datasets, and data scientists deploying machine learning models. Vearch is a tool for developers building AI applications that need fast and efficient similarity searches. It uses hybrid search capabilities to search by vectors and filter by regular data types like numbers or text. Vearch supports multiple indexing methods, including IVFPQ and HNSW, and has both CPU and GPU versions. The choice between Weaviate and Vearch depends on the specific use case, considering factors such as data types, scale, performance requirements, development resources, and integration needs. Both tools have their strengths and are suitable for different contexts.
ColPali: Enhanced Document Retrieval with Vision Language Models and ColBERT Embedding Strategy
Date published
Oct. 12, 2024
Author(s)
Stephen Batifol
Language
English
Word count
1622
Hacker News points
None found.
ColPali is a document retrieval model that uses Vision Language Models (VLMs) to index documents through their visual features, capturing both textual and visual elements. It generates ColBERT-style multi-vector representations of text and images, encoding document images directly into a unified embedding space. This approach bypasses complex extraction processes, improving retrieval accuracy and efficiency. The model is built upon Google's PaliGemma-3B model and uses a late interaction similarity mechanism to compare query and document embeddings at query time. ColPali faces challenges due to its high storage demands and computational complexity but has significant potential in transforming how we retrieve visually rich content with textual context in Retrieval Augmented Generation (RAG) systems.
OpenSearch vs MyScale: Selecting the Right Database for GenAI Applications
Date published
Oct. 12, 2024
Author(s)
Chloe Williams
Language
English
Word count
1939
Hacker News points
None found.
OpenSearch and MyScale are two prominent databases with vector search capabilities that play a crucial role in AI applications such as recommendation engines, image retrieval, and semantic search. OpenSearch is an open-source search and analytics suite built on Apache Lucene, while MyScale is a cloud-based database built on ClickHouse designed for AI and machine learning workloads. Both offer robust search capabilities but with different focuses: OpenSearch emphasizes advanced search functionalities like vector search, semantic search, and hybrid models, whereas MyScale integrates SQL with vector and full-text searches. When selecting the right database for GenAI applications, developers should consider factors such as search methodology, data handling, scalability and performance, flexibility and customization, integration and ecosystem, ease of use, cost considerations, and security features. OpenSearch is suitable for complex search needs across various data types, real-time analytics and visualization, scalable search operations, and community-driven features and support. On the other hand, MyScale is ideal for AI and machine learning workloads, unified database solutions, high-performance requirements for large datasets, and ease of use with SQL. To evaluate and compare vector databases on your own, you can use VectorDBBench, an open-source benchmarking tool designed to test the performance of different vector database systems using custom datasets.
Scaling Search for AI: How Milvus Outperforms OpenSearch
Date published
Oct. 11, 2024
Author(s)
Stephen Batifol
Language
English
Word count
1093
Hacker News points
None found.
Milvus is an open-source vector database designed for scalable and high-performance vector search, specifically tailored for AI and large-scale data applications. It efficiently handles and searches billion-scale high-dimensional vectors, making it an excellent choice for AI-powered systems such as retrieval-augmented generation (RAG), image and video search, and recommendation engines. Milvus offers a wide range of search functionalities, including traditional full-text search, scalar filtering, hybrid search, and multimodal search capabilities. This comprehensive feature set allows Milvus to serve diverse search needs, making it a versatile and scalable platform for AI-driven and data-intensive applications.
OpenSearch vs Vald: Selecting the Right Database for GenAI Applications
Date published
Oct. 11, 2024
Author(s)
Chloe Williams
Language
English
Word count
2003
Hacker News points
None found.
OpenSearch and Vald are two prominent databases with vector search capabilities, essential for recommendation engines, image retrieval, and semantic search in AI-driven applications. OpenSearch is a robust open-source search and analytics suite that supports various data types and machine learning-powered search methods. Vald is a powerful tool for searching through massive amounts of vector data quickly and reliably. Comparing the two, OpenSearch offers advanced text search capabilities, real-time analytics, diverse data type handling, scalability, customization options, and an extensive integration ecosystem. It's ideal for applications requiring complex text-based querying and analysis, real-time analytics, and diverse data types. Vald is designed for high-performance vector search, efficient resource management, real-time indexing updates, and handling large volumes of high-dimensional vector data. Choosing between OpenSearch and Vald depends on the specific needs of your application, such as whether advanced text search capabilities or high-performance vector search is more critical. Additionally, users can utilize VectorDBBench to evaluate and compare vector databases based on their own datasets.
Industrial Problem-Solving through Domain-Specific Models and Agentic AI: A Semiconductor Manufacturing Case Study
Date published
Oct. 9, 2024
Author(s)
Simon Mwaniki
Language
English
Word count
2816
Hacker News points
None found.
The semiconductor industry faces a critical shortage of specialized expertise, impacting project timelines and innovation. General-purpose AI models often fall short in specialized industrial applications. Domain-specific language models like SemiKong are being developed to address this gap by incorporating domain-specific knowledge. Aitomatic's Open Small Specialist Agents (OpenSSA) architecture leverages the deep industry knowledge embedded in SemiKong to create agentic AI systems capable of complex decision-making in semiconductor manufacturing. Milvus, a high-performance vector database, plays a crucial role in enabling advanced AI applications in industrial settings by providing efficient retrieval and storage of complex manufacturing data. The combination of domain-specific language models, agentic AI systems, and vector databases has several implications for the semiconductor industry, including addressing expertise shortages, accelerating innovation in manufacturing processes, and enhancing process optimization and efficiency.
OpenSearch vs Vearch: Selecting the Right Database for GenAI Applications
Date published
Oct. 9, 2024
Author(s)
Chloe Williams
Language
English
Word count
2026
Hacker News points
None found.
OpenSearch and Vearch are two prominent databases with vector search capabilities, essential for recommendation engines, image retrieval, and semantic search in AI-driven applications. OpenSearch is an open-source search and analytics suite that supports a variety of machine learning-powered search methods, while Vearch specializes in fast and efficient similarity searches for AI applications. Key differences between the two include their search methodology, data handling capabilities, scalability and performance features, flexibility and customization options, integration and ecosystem support, ease of use, cost considerations, and security features. OpenSearch is ideal for comprehensive search and analytics needs, real-time data visualization, and multi-purpose applications, while Vearch is best suited for AI-driven similarity searches, hybrid search requirements, scalability in AI applications, and developer-friendly rapid AI development workflows.
OpenSearch vs Rockset: Selecting the Right Database for GenAI Applications
Date published
Oct. 8, 2024
Author(s)
Chloe Williams
Language
English
Word count
2032
Hacker News points
None found.
OpenSearch and Rockset are two prominent databases with vector search capabilities that play a crucial role in AI applications such as recommendation engines, image retrieval, and semantic search. Both offer robust capabilities for handling vector search but have different strengths and use cases. OpenSearch is an open-source search and analytics suite that manages diverse data types and integrates machine learning-powered search methods, making it ideal for complex queries and large datasets. Rockset focuses on real-time search and analytics with advanced indexing and querying techniques, making it highly efficient in delivering up-to-the-second insights for real-time applications. The choice between OpenSearch and Rockset depends on specific needs such as data type management, scalability requirements, and the complexity of search and query needs.
A Different Angle: Retrieval Optimized Embedding Models
Date published
Oct. 7, 2024
Author(s)
Denis Kuria
Language
English
Word count
3002
Hacker News points
None found.
In this blog post, we explored Generalized Contrastive Learning (GCL), a solution introduced by Marqo to address the limitations of traditional embedding models in modern data retrieval systems. GCL enhances these models by incorporating rank and query awareness into the training process, significantly improving the relevance and ranking of retrieval results. We discussed how GCL can be fine-tuned for specific tasks and real-world applications, such as e-commerce search optimization and academic research paper retrieval. Additionally, we examined advanced techniques in GCL that further improve performance in production environments. Finally, we looked at how to integrate GCL with Milvus, a leading vector database, to create optimized Retrieval-Augmented Generation (RAG) systems.
Annoy vs Voyager: Choosing the Right Vector Search Tool for GenAI
Date published
Oct. 7, 2024
Author(s)
Chloe Williams
Language
English
Word count
2480
Hacker News points
None found.
Annoy and Voyager are two widely used vector search tools that offer distinct advantages. Vector search is a key element in recommendation systems, image retrieval, natural language processing (NLP), and other fields where finding similarities between high-dimensional data is critical. Both libraries focus on approximate nearest neighbor search but have different strengths and use cases. Annoy is known for its speed in performing approximate nearest-neighbor searches and is particularly useful when working with large datasets where exact matches aren't as important as quickly finding "close enough" results. Voyager, on the other hand, offers more than 10 times the speed of Annoy while maintaining the same recall rate and delivers up to 50% more accuracy for the same level of speed. It is also highly memory-efficient and supports multithreaded index creation and querying, making it ideal for memory-constrained environments and large-scale data environments where multiple data types are involved.
Redis vs MyScale: Choosing the Right Vector Database for Your Needs
Date published
Oct. 6, 2024
Author(s)
Chloe Williams
Language
English
Word count
1915
Hacker News points
None found.
Redis and MyScale are two options in the vector database space, designed to store and query high-dimensional vectors. Both technologies have vector search capabilities as an add-on. Redis is known for its in-memory speed and ability to combine vector similarity search with attribute filtering, making it great for applications that need low latency and real-time data processing. MyScale is a unified platform for SQL, vector, and full-text search, with strong scalability for large AI and ML workloads. It can handle diverse data types and complex queries, making it ideal for advanced analytics platforms, complex search engines, or AI-driven business intelligence tools. The choice between Redis and MyScale depends on the specific use case, data volume, query complexity, and existing infrastructure.
Redis vs Deep Lake: Choosing the Right Vector Database for Your Needs
Date published
Oct. 6, 2024
Author(s)
Chloe Williams
Language
English
Word count
1917
Hacker News points
None found.
Redis and Deep Lake are two popular vector databases used in AI applications. Redis is an in-memory database with vector search capabilities, while Deep Lake is a data lake optimized for vector embeddings. Both technologies have their strengths and use cases. Redis is great for high performance in-memory processing and hybrid search for real time applications with structured data. On the other hand, Deep Lake is ideal for managing and querying many data types, particularly unstructured multimedia data in AI and machine learning workflows. The choice between these two technologies should be based on specific use cases, the type of data being worked with, and performance requirements.
pgvector vs Aerospike: Choosing the Right Vector Database for Your AI Apps
Date published
Oct. 6, 2024
Author(s)
Chloe Williams
Language
English
Word count
1922
Hacker News points
None found.
A vector database is a type of database specifically designed to store and query high-dimensional vectors, which are numerical representations of unstructured data such as text, images, or product attributes. They play a crucial role in AI applications by enabling efficient similarity searches for advanced data analysis and retrieval. Common use cases include e-commerce recommendations, content discovery platforms, anomaly detection in cybersecurity, medical image analysis, and natural language processing tasks. pgvector is an extension for PostgreSQL that adds support for vector operations, allowing users to store and query vector embeddings directly within their PostgreSQL database. It supports both exact and approximate nearest neighbor search with two types of approximate indexes: HNSW (Hierarchical Navigable Small World) and IVFFlat (Inverted File Flat). Aerospike is a distributed, scalable NoSQL database that has added support for vector indexing and searching. Its vector capability, called Aerospike Vector Search (AVS), only supports HNSW indexes for vector search. AVS uses concurrent indexing across all nodes in the cluster and builds the index asynchronously from an indexing queue. When choosing between pgvector and Aerospike for vector search, consider factors such as search methodology, data handling, scalability and performance, flexibility and customization, integration and ecosystem, ease of use, cost, and security. The choice should be based on the specific use case, existing infrastructure, data volume, and performance requirements.
Redis vs Rockset: Choosing the Right Vector Database for Your Needs
Date published
Oct. 6, 2024
Author(s)
Chloe Williams
Language
English
Word count
1828
Hacker News points
None found.
Redis and Rockset are two popular vector databases that offer efficient similarity searches in high dimensional spaces, making them crucial for AI applications. While both technologies have their strengths, the choice between them depends on specific use cases. Redis is best for low latency real-time applications with simple data models, while Rockset is suitable for complex, changing data with analytics and real-time search and analytics on multiple data types. To make an informed decision, users should evaluate these databases based on their requirements, including data types, query complexity, latency, and scalability. VectorDBBench, an open-source benchmarking tool, can assist in this process by allowing users to test and compare the performance of different vector database systems using their own datasets.
Redis vs Aerospike: Choosing the Right Vector Database for Your Needs
Date published
Oct. 6, 2024
Author(s)
Chloe Williams
Language
English
Word count
1957
Hacker News points
None found.
Redis and Aerospike are two options in the vector database space, with each having its own strengths and weaknesses. Redis is known for its in-memory data storage and has added vector search capabilities through the Redis Vector Library. It uses FLAT and HNSW algorithms for approximate nearest neighbor search and supports hybrid search, combining vector similarity with attribute filtering. Aerospike, on the other hand, is a distributed NoSQL database that supports vector indexing and searching. Its vector capability, called Aerospike Vector Search (AVS), only uses HNSW indexes for vector search and updates vector records asynchronously across all AVS nodes in the cluster. When choosing between Redis and Aerospike for vector search, consider factors such as search methodology, data handling, scalability and performance, flexibility and customization, integration and ecosystem, ease of use, cost, and security. Redis is great for applications that need real-time vector search and traditional data operations, while Aerospike is better for high scalability and performance with large datasets especially when dealing with high dimensional vectors. Ultimately, the best choice will be the one that fits your project's unique needs and long term scalability requirements.
Redis vs Vearch: Choosing the Right Vector Database for Your Needs
Date published
Oct. 6, 2024
Author(s)
Chloe Williams
Language
English
Word count
1672
Hacker News points
None found.
Redis and Vearch are two popular vector databases used in AI applications. A vector database is designed to store and query high-dimensional vectors, which represent unstructured data such as text semantics or image features. They enable efficient similarity searches, crucial for tasks like recommendation systems, content discovery platforms, and natural language processing (NLP). Redis is an in-memory database with added vector search capabilities through its Redis Vector Library. It uses FLAT and HNSW algorithms for approximate nearest neighbor search and supports hybrid queries combining vector similarity and attribute filtering. Vearch is a purpose-built vector database designed for developers working on AI applications requiring fast and efficient similarity searches. It has hybrid search capability, can handle vector embeddings and regular data types in one system, and uses a cluster setup to distribute tasks and scale horizontally. When choosing between Redis and Vearch, consider factors such as search method, data handling, scalability and performance, flexibility and customization, integration and ecosystem, ease of use, cost, and security. Redis is best for applications needing real-time vector search with traditional data operations, while Vearch is ideal for large-scale AI applications requiring complex similarity searches across massive data. VectorDBBench is an open-source benchmarking tool that helps users evaluate and compare the performance of different vector databases using their own datasets. It's crucial to thoroughly benchmark with specific datasets and query patterns to make informed decisions between these two powerful vector search approaches.
Redis vs Vald: Choosing the Right Vector Database for Your Needs
Date published
Oct. 6, 2024
Author(s)
Chloe Williams
Language
English
Word count
1918
Hacker News points
None found.
Redis and Vald are two popular vector databases used in AI applications. A vector database is designed to store and query high-dimensional vectors, which represent unstructured data such as text semantics or image features. They enable efficient similarity searches, making them crucial for advanced data analysis and retrieval. Redis is an in-memory database with vector search capabilities added through the Redis Vector Library. It uses FLAT and HNSW algorithms for approximate nearest neighbor search, allowing hybrid search combining vector similarity with attribute filtering. Redis supports both structured and unstructured data and can handle real-time processing. Vald is a purpose-built vector database designed for handling billions of vectors. It uses the NGT algorithm for fast similarity searches across large datasets. Vald's distributed indexing allows it to spread data across multiple machines, ensuring high availability during index updates. When choosing between Redis and Vald, consider factors such as search methodology, data handling, scalability, flexibility, integration, ease of use, cost, and security. Redis is suitable for diverse real-time applications with moderate data size, while Vald excels in massive vector datasets with high speed searches and scalability. Ultimately, the right choice depends on your specific project requirements and team expertise.
Couchbase vs OpenSearch: Choosing the Right Vector Database for Your AI Apps
Date published
Oct. 6, 2024
Author(s)
Chloe Williams
Language
English
Word count
1754
Hacker News points
None found.
Couchbase and OpenSearch are both open source tools that can be used for vector search in AI applications. Couchbase is a distributed, multi-model NoSQL document-oriented database that allows developers to store vector embeddings within JSON documents. It supports Full Text Search (FTS) and application-level computations for vector similarity searches. OpenSearch is an open source search and analytics platform with built-in vector search capabilities through its k-NN plugin, supporting both approximate and exact k-NN search methods. Both tools have their strengths and can be used depending on the specific use case, existing tech stack, and performance requirements.
Redis vs ClickHouse: Choosing the Right Vector Database for Your Needs
Date published
Oct. 6, 2024
Author(s)
Chloe Williams
Language
English
Word count
2016
Hacker News points
None found.
Redis and ClickHouse are two popular vector databases that offer efficient similarity searches, making them crucial in AI applications. While both have vector search capabilities, they differ in their core technologies, features, and use cases. Redis is an in-memory database with hybrid search capabilities, combining vector similarity search with traditional filtering on other attributes. It's great for real-time applications that need low latency and can handle datasets that fit in memory. ClickHouse, on the other hand, is an open-source column-oriented database designed for real-time analytics with full SQL support. It can handle large-scale vector datasets and combines vector search with metadata filtering or aggregation. The choice between Redis and ClickHouse depends on specific use cases, considering data volume, query complexity, response time, and integration requirements.
pgvector vs Clickhouse: Choosing the Right Vector Database for Your AI Apps
Date published
Oct. 6, 2024
Author(s)
Chloe Williams
Language
English
Word count
1746
Hacker News points
None found.
A vector database is a type of database specifically designed to store and query high-dimensional vectors, which are numerical representations of unstructured data such as text, images, or product attributes. They enable efficient similarity searches and play a crucial role in AI applications like e-commerce recommendations, content discovery platforms, cybersecurity anomaly detection, medical image analysis, and natural language processing (NLP) tasks. pgvector is an extension for PostgreSQL that adds support for vector operations, allowing users to store and query vector embeddings directly within their PostgreSQL database. It supports exact and approximate nearest neighbor search with HNSW and IVFFlat indexing methods. ClickHouse is an open-source OLAP database for real-time analytics with full SQL support and fast query processing. It has vector search functionality through SQL functions, including exact matching with parallel processing and experimental Approximate Nearest Neighbour (ANN) indices. When choosing between pgvector and ClickHouse for vector search, consider factors such as search methodology, scalability and performance, flexibility and customization, integration and ecosystem, ease of use, cost, and security. Use pgvector when you're already using PostgreSQL and want to add vector search to your existing relational database setup, while ClickHouse is better for very large vector datasets with high-performance analytical processing and vector search needs.
Modern Analytics & Reporting with Milvus Vector DB and GenAI
Date published
Oct. 6, 2024
Author(s)
Yesha Shastri
Language
English
Word count
1552
Hacker News points
None found.
The integration of Milvus and Qarbine can transform the way unstructured data is analyzed, offering excellent efficiency and insight. Milvus is a leading vector database designed to store, manage, and query high-dimensional data represented as vectors, while Qarbine simplifies the analytics and reporting process by acting as a bridge between developers and analysts. The combination of these two tools can be used to create advanced Generative AI (GenAI) applications, such as Retrieval Augmented Generation (RAG), which combines retrieval-based and generation-based methods to enhance the capabilities of language models. This collaboration enhances efficiency and grants stakeholders across various departments access to harness the power of AI without needing deep technical knowledge.
HNSWlib vs Voyager: Choosing the Right Vector Search Tool for Your GenAI Application
Date published
Oct. 6, 2024
Author(s)
Chloe Williams
Language
English
Word count
2157
Hacker News points
None found.
HNSWlib and Voyager are both libraries designed to efficiently perform nearest-neighbor searches in high-dimensional spaces, a key component of many AI applications. While HNSWlib is known for its speed and accuracy, Spotify's Voyager addresses some limitations of HNSWlib and offers additional features such as multithreading and support for both Python and Java. The choice between the two depends on factors like data size, infrastructure requirements, and desired level of customization. Additionally, purpose-built vector databases like Milvus offer comprehensive solutions for large-scale vector data management, including persistent storage, real-time updates, and advanced querying capabilities. Benchmarking tools such as ANN benchmarks and VectorDBBench can help evaluate the effectiveness of different ANN algorithms and vector database systems.
pgvector vs Rockset: Choosing the Right Vector Database for Your Needs
Date published
Oct. 5, 2024
Author(s)
Chloe Williams
Language
English
Word count
1795
Hacker News points
None found.
The choice between pgvector and Rockset as a vector database depends on specific use cases, existing tech stack, data scale, real-time requirements, and search complexity. Pgvector integrates with PostgreSQL for adding vector search to existing applications, suitable for moderate scale vector search within a single database instance, and preferable for those who prefer an open source, self-hosted solution with full control. On the other hand, Rockset is designed for real-time analytics across multiple data types, ideal for large scale, distributed data environments that need to handle multiple data formats and sources, and for those who prefer a managed service that scales automatically.
Couchbase vs Deeplake: Choosing the Right Vector Database for Your AI Apps
Date published
Oct. 5, 2024
Author(s)
Chloe Williams
Language
English
Word count
1799
Hacker News points
None found.
Couchbase and Deeplake are two popular vector databases used in AI applications. Couchbase is a distributed, open source NoSQL document-oriented database with vector search capabilities as an add-on, while Deep Lake is a data lake optimized for vector embeddings. Both systems have their strengths and weaknesses depending on the use case, data types, and performance requirements. Couchbase excels in handling structured and semi-structured data, primarily working with JSON documents, and can store vector embeddings within these documents. It uses Full Text Search (FTS) for approximate vector search by converting vector data into searchable fields or allows developers to store raw vector embeddings with similarity calculations done at the application level. Deep Lake is designed to handle unstructured data types like images, audio, and video, alongside vector embeddings and metadata. It provides built-in support for vector operations and similarity search, making it a good fit for machine learning and AI projects focused on vector and multimedia data management. When choosing between Couchbase and Deep Lake, consider your use case, data types, performance requirements, existing infrastructure, size of your vector search operations, and team's expertise. Test both with your data and use cases to get more insight into their performance and suitability for your specific needs.
Couchbase vs Singlestore: Choosing the Right Vector Database for Your AI Apps
Date published
Oct. 5, 2024
Author(s)
Chloe Williams
Language
English
Word count
1837
Hacker News points
None found.
Couchbase and Singlestore are both distributed databases that offer vector search capabilities as an add-on. Couchbase is a NoSQL document-oriented database, while SingleStore is a SQL database with vector processing features. Key differences between the two include their search methodology, data handling, scalability and performance, flexibility and customization, integration and ecosystem, ease of use, cost considerations, and security features. The choice between Couchbase and Singlestore depends on factors such as data types, use cases, team expertise, existing tech stack, scalability requirements, and the importance of vector search in the overall application architecture. VectorDBBench is an open-source benchmarking tool that can assist users in evaluating and comparing vector databases based on their specific datasets and query patterns.
pgvector vs Deeplake: Choosing the Right Vector Database for Your AI Apps
Date published
Oct. 5, 2024
Author(s)
Chloe Williams
Language
English
Word count
1562
Hacker News points
None found.
This article compares two vector databases, pgvector and Deeplake, which are designed to store and query high-dimensional vectors that represent unstructured data such as text, images, or product attributes. Both technologies play a crucial role in AI applications by enabling efficient similarity searches for advanced data analysis and retrieval. pgvector is an extension for PostgreSQL that adds support for vector operations, allowing users to store and query vector embeddings directly within their PostgreSQL database. It supports exact and approximate nearest neighbor search algorithms with HNSW and IVFFlat indexes for approximate search. Deeplake is a specialized database system designed to handle the storage, management, and querying of vector and multimedia data, such as images, audio, video, and other unstructured data types. It can be used as a data lake and a vector store, offering seamless integration with AI/ML tools like LangChain and LlamaIndex. The key differences between the two technologies include their search methodology, data handling capabilities, scalability and performance, flexibility and customization options, integration and ecosystem support, ease of use, cost considerations, and security features. Choosing between pgvector and Deeplake depends on factors such as current infrastructure, data types, scale of vector search requirements, and need for specialized AI features. For projects that require seamless integration with PostgreSQL-based systems and moderate-sized datasets, pgvector is a suitable choice. On the other hand, Deep Lake is best suited for machine learning workflows dealing with diverse data types, especially unstructured multimedia data. The article also introduces VectorDBBench, an open-source benchmarking tool designed to compare vector database performance using custom datasets and query patterns. This can help users make informed decisions when selecting a vector database for their specific use case.
Couchbase vs LanceDB: Choosing the Right Vector Database for Your AI Apps
Date published
Oct. 5, 2024
Author(s)
Chloe Williams
Language
English
Word count
1628
Hacker News points
None found.
Couchbase and LanceDB are both vector databases designed to store and query high-dimensional vectors, which are numerical representations of unstructured data. They play a crucial role in AI applications by enabling efficient similarity searches for tasks such as recommendation systems or retrieval-augmented generation. While Couchbase is a distributed multi-model NoSQL document-oriented database with vector search added on, LanceDB is a serverless vector database. Couchbase allows developers to store vector embeddings within its JSON structure and perform vector search through Full Text Search (FTS) or by storing raw vector embeddings for application-level calculations. It can be used for various AI and machine learning use cases that require similarity search. LanceDB, on the other hand, is an open-source vector database for AI applications, offering both exhaustive k-nearest neighbors (kNN) and approximate nearest neighbor (ANN) search using an IVF_PQ index. It supports various distance metrics for vector similarity and can handle large scale multi modal data and embeddings. The choice between Couchbase and LanceDB depends on the specific use case, data types, performance requirements, and integration needs. Couchbase is suitable for large-scale distributed systems that require both traditional database features and vector search, while LanceDB is ideal for AI applications with a primary focus on efficient vector search operations.
Evaluating Safety & Alignment of LLM in Specific Domains
Date published
Oct. 4, 2024
Author(s)
Benito Martin
Language
English
Word count
1659
Hacker News points
None found.
Recent advancements in AI have led to sophisticated Large Language Models (LLMs) with potential transformative impacts across high-stakes domains such as healthcare, financial services, and legal industries. However, their use in critical decision-making requires thorough evaluation to ensure safety, accuracy, and ethical standards. Companies like Hydrox AI and AI Alliance are working on comprehensive evaluation frameworks for LLMs tailored to sensitive environments. Safety evaluations must consider factors such as accuracy, legal regulations, and ethical responsibilities, with regular testing and improvements essential to adapt to the changing landscape. The implications of inaccurate or biased AI outputs can be critical in high-stakes environments, making robust evaluation methodologies imperative.
Contributing to Open Source Milvus: A Beginner’s Guide
Date published
Oct. 3, 2024
Author(s)
Stefan Webb
Language
English
Word count
1128
Hacker News points
None found.
Open source software (OSS) relies on community contributions from developers, testers, writers, and designers to improve projects. The core team of maintainers or lead developers manage the project's direction, review contributions, ensure code quality, and make key decisions. Contributions are reviewed through issue tracking, pull requests, code reviews, feedback iterations, and automated testing before being merged into the main branch for release. To submit a pull request to an open source repository on GitHub, developers should fork the repository, clone it locally, create a new branch, make changes, commit them, push the changes to their fork, and finally create a pull request with a clear title and description. The maintainers will review the contribution and may ask for revisions or suggest improvements before merging it into the original project's codebase.
Couchbase vs Vearch: Choosing the Right Vector Database for Your AI Apps
Date published
Oct. 2, 2024
Author(s)
Chloe Williams
Language
English
Word count
1837
Hacker News points
None found.
Couchbase and Vearch are both distributed databases designed to handle high-dimensional vectors, which are numerical representations of unstructured data. They play a crucial role in AI applications by enabling efficient similarity searches. While Couchbase is a general-purpose NoSQL database with vector search capabilities as an add-on, Vearch is a purpose-built vector database designed for fast and efficient similarity searches. Couchbase offers flexibility in data modeling and queries, leveraging its JSON structure, while Vearch provides built-in vector search capabilities with options to customize indexing methods and supports multiple vector fields in a single document. Both systems offer scalable solutions and have their own strengths and weaknesses depending on the use case. When choosing between Couchbase and Vearch for vector search, factors such as search methodology, data handling, scalability, flexibility, integration, ease of use, and cost should be considered. Ultimately, thorough benchmarking with specific datasets and query patterns will be essential in making an informed decision between these two powerful approaches to vector search in distributed database systems.
Top 5 Reasons to Migrate from Open Source Milvus to Zilliz Cloud
Date published
Oct. 2, 2024
Author(s)
Steffi Li
Language
English
Word count
2622
Hacker News points
None found.
The article presents five reasons to migrate from Milvus, an open-source vector database, to Zilliz Cloud, a fully managed service built on Milvus. These reasons include performance advantages due to advanced automation and optimization tools in Zilliz Cloud; scalability benefits provided by its cloud native architecture and elastic scaling features; superior security and compliance measures offered by Zilliz Cloud; better availability and data management capabilities; and cost-effectiveness and resource optimization features that make it a more economical choice. The article also discusses the migration process, expert support from Milvus experts, and when to consider migrating from Milvus to Zilliz Cloud.
Couchbase vs Vald: Choosing the Right Vector Database for Your AI Apps
Date published
Oct. 1, 2024
Author(s)
Chloe Williams
Language
English
Word count
1826
Hacker News points
None found.
Couchbase and Vald are two popular vector databases used in AI applications. A vector database is specifically designed to store and query high-dimensional vectors, which are numerical representations of unstructured data. Common use cases include e-commerce product recommendations, content discovery platforms, anomaly detection in cybersecurity, medical image analysis, and natural language processing (NLP) tasks. Couchbase is a distributed multi-model NoSQL document-oriented database with vector search capabilities as an add-on. It combines the best of relational databases with the flexibility of JSON and allows developers to store vector embeddings within Couchbase documents as part of their JSON structure. These vectors can be used in similarity search use cases such as recommendation systems or retrieval-augmented generation based on semantic search. Vald is a purpose-built vector database designed for handling billions of vectors and can easily grow as your needs get bigger. It uses a super quick algorithm called NGT to find similar vectors and spreads the index across different machines, allowing searches to continue even during updates. Vald also automatically backs up your index data. When selecting between Couchbase and Vald for vector search, consider factors such as search methodology, data handling, scalability and performance, flexibility and customization, integration and ecosystem, ease of use, cost, and security features. Ultimately, the choice will depend on specific needs and priorities.
Couchbase vs Clickhouse Choosing the Right Vector Database for Your AI Apps
Date published
Oct. 1, 2024
Author(s)
Chloe Williams
Language
English
Word count
2026
Hacker News points
None found.
Couchbase and ClickHouse are both distributed databases with vector search capabilities as add-ons, but they differ in their core technologies and use cases. Couchbase is a NoSQL document-oriented database that combines the strengths of relational databases with JSON flexibility, making it suitable for diverse applications requiring both traditional database functionalities and vector search capabilities. ClickHouse is an open-source OLAP database known for its full SQL support and high-speed query processing, excelling in handling large-scale vector datasets without memory constraints and combining vector search operations with complex SQL queries. The choice between Couchbase and ClickHouse depends on factors such as dataset size, query complexity, team familiarity with SQL, and scalability requirements.
Couchbase vs Aerospike: Choosing the Right Vector Database for Your AI Apps
Date published
Oct. 1, 2024
Author(s)
Chloe Williams
Language
English
Word count
1991
Hacker News points
None found.
Couchbase and Aerospike are both distributed NoSQL databases with vector search capabilities, but they differ in their approach to handling vector data and their primary use cases. Couchbase is a flexible database that combines the features of relational databases with JSON support, allowing developers to implement custom vector search within a familiar environment. It's suitable for recommendation systems, content retrieval, and applications that can store and query both structured and unstructured data alongside vector embeddings. On the other hand, Aerospike excels in its dedicated high-performance vector search feature, optimized for real-time applications requiring fast and efficient processing of high dimensional vector data at scale. It's great for machine learning, artificial intelligence, and advanced analytics where similarity searches are critical. Choose Couchbase when you need a flexible database that can handle many data types and vector search alongside other database operations, while Aerospike is more suitable if high-performance vector search in real-time applications is your primary focus.
Couchbase vs Rockset: Choosing the Right Vector Database for Your AI Apps
Date published
Oct. 1, 2024
Author(s)
Chloe Williams
Language
English
Word count
2100
Hacker News points
None found.
Couchbase and Rockset are both distributed databases with vector search capabilities, but they differ in their approach to handling vector data and their primary use cases. Couchbase is a flexible general-purpose NoSQL database that allows developers to implement custom vector search within a familiar environment. It's great for applications that need to balance traditional database operations with vector search and can handle diverse data types, including JSON documents. On the other hand, Rockset is designed for real-time search and analytics applications that require immediate insights from rapidly changing data. Its Converged Indexing and high-dimensional vectors make it a good choice for applications that need to process high velocity data streams and frequent updates to vector embeddings. When choosing between Couchbase and Rockset, consider your use cases, data types, performance requirements, existing infrastructure, development team's expertise, and the type of data (static vs streaming).
From CLIP to JinaCLIP: General Text-Image Representation Learning for Search and Multimodal RAG
Date published
Oct. 1, 2024
Author(s)
Simon Mwaniki
Language
English
Word count
3342
Hacker News points
None found.
The modality gap is a significant challenge in multimodal embedding models, which are used to interpret text and images across various industries. This gap arises due to the spatial separation between embeddings from different input types, such as texts and images that are semantically similar but far apart in the vector space. Despite advancements in multimodal embedding models like OpenAI's CLIP, these models still face challenges in accurately capturing semantic relationships within data. To address this issue, JinaCLIP was developed to build upon the original CLIP architecture and improve its performance by expanding text input and using an adapted BERT v2 architecture for text encoding. The training process of JinaCLIP focuses on overcoming the challenges posed by short text inputs in image captions and introducing hard negatives, which significantly improves the model's text-only performance while maintaining strong performance in multimodal tasks. A practical example of how to build a multimodal retrieval system using Milvus, an open-source vector database, and JinaCLIP is also discussed. This system allows users to input either text or images and retrieve the most semantically relevant results from a mixed dataset. By understanding the reasons behind the modality gap and implementing strategies to mitigate its impact, multimodal retrieval systems can be optimized for more accurate and efficient performance across various applications.
Couchbase vs Kdb: Choosing the Right Vector Database for Your AI Apps
Date published
Sept. 30, 2024
Author(s)
Chloe Williams
Language
English
Word count
1707
Hacker News points
None found.
Couchbase and Kdb are both distributed databases with vector search capabilities, but they differ in their core technologies and use cases. Couchbase is a NoSQL document-oriented database that can handle JSON documents with vector embeddings, making it suitable for cloud, mobile, AI, and edge computing applications requiring vector search capabilities. It offers flexibility in implementing vector search through various approaches, such as adapting Full Text Search or integrating with specialized libraries. Kdb is a time series database designed for real-time data processing without needing GPUs, handling raw data, generating vector embeddings, and running similarity searches all in real-time. It's suitable for use cases that require multi-modal performance across various data types, including streaming data and time-series. Users should evaluate these databases based on their specific use case and perform thorough benchmarking with their own datasets to make an informed decision.
Building a GraphRAG Agent With Neo4j and Milvus
Date published
Sept. 30, 2024
Author(s)
Jason Koo and Stephen Batifol
Language
English
Word count
1579
Hacker News points
None found.
This blog post details how to build a GraphRAG agent using Neo4j graph database and Milvus vector database. The agent combines the power of graph databases and vector search to provide accurate and relevant answers to user queries. In this example, we use LangGraph, Llama 3.1 8B with Ollama, and GPT-4o. The architecture of our GraphRAG agent follows three key concepts: routing, fallback mechanisms, and self-correction. These principles are implemented through a series of LangGraph components including retrieval, graph enhancement, and LLMs integration. The GraphRAG Architecture is visualized as a workflow with several interconnected nodes such as question routing, retrieval, generation, evaluation, and refinement if needed.
Couchbase vs Apache Cassandra: Choosing the Right Vector Database for Your AI Apps
Date published
Sept. 30, 2024
Author(s)
Chloe Williams
Language
English
Word count
2119
Hacker News points
None found.
Couchbase and Apache Cassandra are both distributed NoSQL databases with vector search capabilities as an add-on. Couchbase is a flexible option that allows developers to implement custom vector search solutions, while Cassandra provides integrated vector search capabilities through its Storage-Attached Indexes (SAI) feature. The choice between the two depends on specific project requirements and team expertise. Couchbase is more suitable for projects that require a flexible NoSQL database with customizable vector search solutions, whereas Apache Cassandra is better suited for large-scale, distributed applications that demand native vector search functionality.
Couchbase vs MyScale: Choosing the Right Vector Database for Your AI Apps
Date published
Sept. 30, 2024
Author(s)
Chloe Williams
Language
English
Word count
2007
Hacker News points
None found.
Couchbase and MyScale are two popular vector databases used in AI applications. Couchbase is a distributed, open-source NoSQL document-oriented database with vector search capabilities as an add-on. It combines the strengths of relational databases with the versatility of JSON and can be adapted to handle vector search functionality through various methods like adapting Full Text Search or integrating external libraries. MyScale is a cloud-based database solution built on ClickHouse, designed specifically for AI and machine learning workloads. It provides native vector search capabilities and supports various vector index types and similarity metrics. When choosing between Couchbase and MyScale for vector search applications, consider factors such as your specific needs, team's expertise, the importance of native vector search support, and whether you need a general-purpose database or a specialized solution for AI and analytics tasks.
Couchbase vs Milvus: Choosing the Right Vector Database for Your AI Apps
Date published
Sept. 30, 2024
Author(s)
Chloe Williams
Language
English
Word count
2009
Hacker News points
None found.
Couchbase and Milvus are both distributed databases designed to handle high-dimensional vectors, which are numerical representations of unstructured data. They play a crucial role in AI applications by enabling efficient similarity searches. While Couchbase is a general-purpose NoSQL database with vector search capabilities as an add-on, Milvus is a purpose-built vector database designed specifically for vector search and similarity search at its core. Couchbase offers more flexibility as a general-purpose NoSQL database but may require additional integration with specialized libraries for vector search tasks. In contrast, Milvus provides extensive customization options for vector indexing and search algorithms, making it more efficient for native vector similarity searches. The choice between Couchbase and Milvus depends on the specific needs and project requirements of the user.
Couchbase vs TiDB: Choosing the Right Vector Database for Your AI Apps
Date published
Sept. 30, 2024
Author(s)
Chloe Williams
Language
English
Word count
1931
Hacker News points
None found.
Couchbase and TiDB are both distributed databases with vector search capabilities as add-ons, making them suitable for AI applications that require efficient similarity searches. Couchbase is a NoSQL document-oriented database that can store vector embeddings within JSON structures, while TiDB is a SQL database with hybrid transactional and analytical processing (HTAP) capabilities. Key differences between the two include their search methodology, data handling, scalability and performance, flexibility and customization, integration and ecosystem, ease of use, cost considerations, and security features. When choosing between Couchbase and TiDB for vector search, it's essential to evaluate your specific data management needs and existing infrastructure.
Zilliz is named a Leader in the Forrester Wave™ Vector Database Report
Date published
Sept. 27, 2024
Author(s)
Chris Churilo
Language
English
Word count
488
Hacker News points
None found.
Zilliz has been named a Leader in Forrester's Wave™ Vector Database Report, recognized for its cutting-edge, high-performance database for cloud scalability. The company scored highly in criteria such as vector dimensionality, indexing, performance, and scalability. As the creators of Milvus, the world's most popular open-source vector database, Zilliz is committed to innovation that benefits everyone. Their roadmap includes distributed indexing, advanced result reranking, more admin tools, data security certifications, multi-cloud support, and enhanced data intelligence.
Garbage In, Garbage Out: Why Poor Data Curation Is Killing Your AI Models
Date published
Sept. 26, 2024
Author(s)
Fendy Feng and ShriVarsheni R
Language
English
Word count
1907
Hacker News points
None found.
Poor data curation can significantly impact AI models' performance and reliability. Organizations must shift their focus from collecting large datasets to ensuring high-quality data. Effective data curation involves organizing, managing, and preparing data for model training or labeling, ensuring it is relevant and structured for the specific task. Cleaning and refining training data at scale is a major challenge, but meticulous curation and cleaning can improve model accuracy and performance. Modern pipelines should incorporate additional stages for enhanced data curation, such as verification, cleaning, and curating before proceeding to model training. Encord offers innovative approaches to tackle common data quality challenges like duplicates, corrupted data, and noisy samples through embedding-based approaches, NLP for data curation, persistence layers, metadata validation, and data cleaning techniques.
Stefan Webb: Why I Joined Zilliz
Date published
Sept. 25, 2024
Author(s)
Stefan Webb
Language
English
Word count
484
Hacker News points
None found.
Stefan is a newly appointed Developer Advocate at Zilliz, creators of Milvus—the leading open-source vector database. He shares his journey to becoming a Developer Advocate and explains that he was inspired by the advancements in Generative AI. Stefan realized that he enjoyed face-to-face interactions and community engagement most in his previous roles, particularly in developing and managing open-source software and sharing knowledge about new technologies. At Zilliz, he is excited to contribute to the growth of the Milvus open-source community, leveraging his passion for knowledge sharing and open-source development.
Unlock AI-powered search with Fivetran and Milvus
Date published
Sept. 23, 2024
Author(s)
Jiang Chen and Charles Wang
Language
English
Word count
912
Hacker News points
None found.
Fivetran now supports Milvus as a destination, making it easier to onboard every data source for retrieval-augmented generation (RAG) and AI-powered search. With the integration of Fivetran's automated data movement platform and Milvus's high-performance vector database, businesses can quickly build AI-powered search tools to extract insights from their unstructured datasets. The partnership simplifies data ingestion from various sources into Milvus, allowing developers to focus on creating business value rather than managing infrastructure complexities.
Chroma vs OpenSearch: Choosing the Right Vector Database for Your AI Applications
Date published
Sept. 21, 2024
Author(s)
Chloe Williams
Language
English
Word count
2095
Hacker News points
None found.
Chroma and OpenSearch are two popular vector databases used in AI applications. A vector database is designed to store and query high-dimensional vectors, which represent unstructured data such as text's semantic meaning or images' visual features. These technologies play a crucial role in AI applications, enabling efficient similarity searches for advanced data analysis and retrieval. Chroma is an open-source, AI-native vector database that simplifies the process of building AI applications by providing tools for managing vector data and associated metadata. It focuses on vector similarity search for AI applications and is particularly well-suited for projects that primarily deal with vector data and require quick integration of vector search capabilities. OpenSearch is a versatile search and analytics engine derived from Elasticsearch, designed to handle full-text search, log analytics, and vector search. It supports various data types, including structured, semi-structured, and unstructured data, making it suitable for diverse applications. OpenSearch offers more extensive customization through its query DSL, scripting capabilities, and plugin system. The choice between Chroma and OpenSearch depends on the specific needs of a project or organization. Chroma is ideal for AI-centric applications that primarily rely on vector similarity search, while OpenSearch provides a more comprehensive solution for diverse search and analytics needs. Additionally, specialized vector databases like Milvus and Zilliz Cloud are better suited for large-scale, high-performance vector search tasks.
Chroma vs MyScale on Vector Search Capabilities
Date published
Sept. 21, 2024
Author(s)
Chloe Williams
Language
English
Word count
2626
Hacker News points
None found.
Chroma and MyScale are two popular vector databases used in AI applications. A vector database is designed to store and query high-dimensional vectors, which represent unstructured data such as text's semantic meaning or images' visual features. These databases enable efficient similarity searches, playing a crucial role in AI applications for advanced data analysis and retrieval. Chroma is an open-source, AI-native vector database that simplifies the process of building AI applications by providing tools for managing vector data and associated metadata. It supports various types of data and different embedding models, allowing users to choose the best approach for their specific use case. Chroma's API is designed to be intuitive and easy to use, reducing the learning curve for developers new to vector databases. MyScale is a cloud-based database solution built on the open-source ClickHouse database, designed specifically for AI and machine learning workloads. It can handle both structured and vector data, supporting real-time analytics and machine learning tasks. MyScale offers native SQL support, simplifying complex AI-driven queries by integrating vector search, full-text search, and traditional SQL queries in a unified system. The choice between Chroma and MyScale depends on the specific requirements of your project, including the complexity of your data operations, the size of your datasets, your team's expertise, and your long-term scalability requirements. Both technologies offer valuable tools for implementing vector search in modern AI and data-driven applications, each catering to different use cases and preferences.
Chroma vs Rockset: Choosing the Right Vector Database for Your AI Applications
Date published
Sept. 21, 2024
Author(s)
Chloe Williams
Language
English
Word count
2428
Hacker News points
None found.
Chroma and Rockset are two popular vector databases used in AI applications. A vector database is specifically designed to store and query high-dimensional vectors, which represent complex information such as text's semantic meaning or images' visual features. These technologies play a crucial role in AI applications, enabling efficient data analysis and retrieval. Chroma is an open-source, AI-native vector database that simplifies the process of building AI applications. It focuses on vector similarity search and embedding management, making it ideal for projects integrating vector search capabilities with large language models (LLMs) or AI frameworks. Chroma's API is designed to be intuitive and easy to use, offering flexible querying options. Rockset is a real-time search and analytics database designed to handle both structured and unstructured data, including vector embeddings. It supports streaming and bulk data ingestion, processing high-velocity event streams and change data capture (CDC) feeds within 1-2 seconds. Rockset's Converged Indexing technology allows for efficient handling of a wide range of query patterns out of the box. The choice between Chroma and Rockset should be driven by your project's specific requirements, such as primary use case, data types, need for real-time analytics, scale of vector operations, and your broader ecosystem of tools. For large-scale, high-performance vector search tasks, specialized vector databases like Milvus or Zilliz Cloud are recommended.
Chroma vs TiDB: Choosing the Right Vector Database for Your AI Applications
Date published
Sept. 21, 2024
Author(s)
Chloe Williams
Language
English
Word count
2322
Hacker News points
None found.
Chroma and TiDB are two popular options for handling vector data in AI applications. Chroma is an open-source, AI-native vector database that simplifies the process of building AI applications by providing tools for managing vector data and enabling efficient similarity searches. It focuses on simplicity and developer productivity, offering flexibility in terms of embedding models and data types. On the other hand, TiDB is an open-source distributed SQL database with HTAP capabilities, making it suitable for large enterprises or growing businesses that require MySQL compatibility but need to scale beyond traditional MySQL. The choice between Chroma and TiDB should be guided by specific use cases, data types, and performance requirements.
Challenges in Structured Document Data Extraction at Scale with LLMs
Date published
Sept. 21, 2024
Author(s)
Benito Martin
Language
English
Word count
1233
Hacker News points
None found.
The text discusses challenges in structured document data extraction at scale with large language models (LLMs). It highlights that while LLMs have advanced the ability to analyze and extract information from documents, they face notable limitations such as handling diverse data formats and varying layouts. Unstract, an open-source platform designed for unstructured data extraction and transformation into structured formats, is introduced as a solution to simplify data management by automating the structuring process. The text also explores how Unstract tackles various scenarios, including its integration with vector databases like Milvus, to bring structure to previously unmanageable data.
How Testcontainers Streamlines the Development of AI-Powered Applications
Date published
Sept. 20, 2024
Author(s)
Tim Mugabi
Language
English
Word count
2035
Hacker News points
None found.
Testcontainers is an open-source framework that streamlines the development of AI-powered applications by providing lightweight, modular instances of databases, browsers, message brokers, and other pre-configured dependencies that can run in a container. This reduces the operational cost of projects while encouraging experimentation and facilitating streamlined development. Testcontainers also enhances productivity by simplifying the integration of AI components within applications, allowing developers to easily make LLM calls from within the code and automating the deployment of AI models.
HNSWlib vs ScaNN: Choosing the Right Vector Search Tool for Your Application
Date published
Sept. 19, 2024
Author(s)
Chloe Williams
Language
English
Word count
2560
Hacker News points
None found.
HNSWlib and ScaNN are two popular vector search tools used in AI applications such as recommendation systems, image retrieval, natural language processing (NLP), and more. Both libraries offer fast approximate nearest neighbor searches but differ in their methodologies, data handling approaches, scalability, and flexibility. HNSWlib is a graph-based search algorithm that performs well for mid-sized datasets and real-time applications with minimal latency. ScaNN, on the other hand, uses partitioning and quantization techniques to handle large-scale datasets efficiently while maintaining a good balance between speed and accuracy. Developers should choose HNSWlib for smaller, static datasets and faster search speeds, while ScaNN is better suited for larger datasets and applications requiring integration with TensorFlow. Additionally, purpose-built vector databases like Milvus offer comprehensive systems designed for large-scale vector data management, including features like persistent storage, real-time updates, distributed architecture, and advanced querying capabilities.
Multimodal RAG: Expanding Beyond Text for Smarter AI
Date published
Sept. 19, 2024
Author(s)
Stephen Batifol
Language
English
Word count
1479
Hacker News points
None found.
Retrieval Augmented Generation (RAG) has evolved from a text-based technique to Multimodal RAG, which integrates different data types such as images and videos to provide more reliable knowledge to AI models. The Milvus vector database enables the storage and search of diverse data types, while NVIDIA GPUs accelerate these complex operations. Key components of a multimodal RAG pipeline include Vision Language Models (VLMs), vector databases like Milvus, text embedding models, large language models (LLMs), and orchestration frameworks. Multimodal RAG systems offer multi-format processing, image analysis via VLMs, and efficient indexing and retrieval capabilities.
Unstructured Data Processing from Cloud to Edge
Date published
Sept. 19, 2024
Author(s)
Denis Kuria
Language
English
Word count
4362
Hacker News points
None found.
In this tutorial, we will create a real-time pose estimation system using Raspberry Pi and Milvus, an open-source vector database. The system leverages edge AI processing capabilities of the Raspberry Pi to perform object detection and pose estimation on live video streams. It utilizes a YOLOv8 model for object detection and a Hailo AI accelerator for efficient inference. The processed data is then stored in Milvus, allowing for fast and accurate similarity searches. We will also demonstrate how to integrate the system with Slack for real-time notifications and updates. This tutorial assumes that you have basic knowledge of Python programming and GStreamer, a framework for building multimedia applications. Here is an overview of the steps we will follow: 1. Set up the environment and install required dependencies. 2. Create a YOLOv8 model for pose estimation. 3. Implement a callback function to process video frames using Hailo SDK. 4. Create a utility function for COCO keypoints. 5. Create the GStreamer pipeline. 6. Execute the program and observe the results. 7. Explore use cases of combining AI and vector databases. By the end of this tutorial, you will have built a real-time pose estimation system that can be easily adapted for various applications in robotics, smart cities, industrial automation, healthcare, and more.
Ensuring Secure and Permission-Aware RAG Deployments
Date published
Sept. 18, 2024
Author(s)
Benito Martin
Language
English
Word count
2562
Hacker News points
None found.
Retrieval Augmented Generation (RAG) is a powerful approach to enhance the capabilities of generative models such as OpenAI's GPT series and Google's Gemini. However, with great potential comes significant responsibility, particularly when it comes to safeguarding sensitive data and ensuring compliance with privacy regulations. Organizations increasingly rely on AI-driven solutions, making understanding the security implications of these technologies crucial. Implementing strong security measures that not only protect data but also build user trust is essential for production-ready RAG applications. Key aspects of secure and permission-aware RAG deployments include data anonymization, strong encryption, input/output validation, and robust access controls.
Faiss vs ScaNN: Choosing the Right Vector Search Tool for Your Application
Date published
Sept. 18, 2024
Author(s)
Chloe Williams
Language
English
Word count
2424
Hacker News points
None found.
Faiss and ScaNN are two popular tools that offer vector search capabilities, each with distinct strengths optimized for different use cases. Faiss is designed to handle large-scale nearest neighbor searches and clustering of dense vectors, offering flexibility in choosing between exact and approximate nearest neighbor (ANN) searches. It supports GPU acceleration and various indexing methods to optimize memory usage and speed. ScaNN focuses on fast, approximate nearest neighbor searches in large-scale datasets, particularly those involving embeddings. It integrates seamlessly with TensorFlow and uses partitioning and quantization techniques to reduce the search space for faster query times. Faiss is better suited for applications requiring exact search capabilities or handling very large datasets, while ScaNN is ideal for machine learning models where fast approximate nearest neighbor searches are required.
Harnessing Function Calling to Build Smarter LLM Applications
Date published
Sept. 17, 2024
Author(s)
Simon Kiruri
Language
English
Word count
2794
Hacker News points
None found.
Large language models (LLMs) have evolved to handle more complex tasks through function calling, enabling interaction with external tools, databases, and APIs. This allows LLMs to work with real-world data and services beyond text generation. Function calling in LLMs involves a structured interaction between the model and an external API or service, allowing it to perform dynamic operations such as querying live databases, executing commands, or performing real-time calculations. The integration of function calling with other techniques like Retrieval Augmented Generation (RAG) can create more interactive systems capable of handling complex, real-world interactions in industries such as healthcare, finance, and customer service. However, challenges include ensuring security and privacy during data access, managing latency issues, and addressing ethical concerns around transparency and user consent.
Tame High-Cardinality Categorical Data in Agentic SQL Generation with VectorDBs
Date published
Sept. 16, 2024
Author(s)
Jiang Chen and Gunther Hagleitner
Language
English
Word count
1824
Hacker News points
None found.
The article discusses the challenge of handling high-cardinality categorical data in text-to-SQL systems and how integrating vector databases with agentic workflows can address this issue. Traditional methods such as preprocessed database techniques and LLM-based translation often fall short when dealing with high-cardinality data, leading to a significant gap in translating natural language queries to accurate SQL. Vector databases like Milvus offer a solution by storing and efficiently querying high-dimensional vector representations of data, enabling semantic searches rather than keyword matches. By combining Waii's intelligent text-to-SQL capabilities with Zilliz Cloud's powerful vector storage, users can create robust, scalable, and accurate systems for handling high-cardinality categorical data in their text-to-SQL applications.
Faiss vs. HNSWlib: Choosing the Right Vector Search Tool for Your Application
Date published
Sept. 16, 2024
Author(s)
Chloe Williams
Language
English
Word count
2568
Hacker News points
None found.
Faiss and HNSWlib are two leading vector search libraries designed to handle large-scale datasets efficiently. While both tools focus on fast, approximate nearest neighbor searches, they differ in key areas such as search methodology, data handling, scalability, and performance. Faiss offers multiple ways to perform searches, including exact brute-force methods and approximate searches using product quantization or inverted file indices. It is designed to handle large datasets efficiently by leveraging various algorithms to balance speed and accuracy. HNSWlib uses a graph-based algorithm for vector search, which creates a navigable graph where each node is connected to its nearest neighbors, forming a structure that dramatically reduces the number of comparisons needed to find approximate nearest neighbors. Faiss is better suited for large datasets and applications requiring GPU acceleration, while HNSWlib excels when search speed is the primary concern, and your dataset can fit into memory.
Up to 50x Cost Savings for Building GenAI Apps Using Zilliz Cloud Serverless
Date published
Sept. 15, 2024
Author(s)
Ruben Winastwan
Language
English
Word count
2308
Hacker News points
None found.
Zilliz has introduced a new offering called Zilliz Cloud Serverless that allows users to store, index, and query massive amounts of vector embeddings at only a fraction of the cost compared to in-memory vector databases. The performance of Zilliz Cloud Serverless is also very competitive with other in-memory vector databases. This serverless offering is available on major cloud providers including AWS and GCP and will be available on Azure soon. It offers up to 50x cost savings through features such as pay-as-you-go pricing and auto-scaling that adapt to various workloads. Zilliz Cloud Serverless implements four key technologies: logical clusters and auto-scaling, disaggregation of streaming and historical data, tiered storage catered to different data storage needs, and multi-tenancy and hot-cold data separation.
Annoy vs ScaNN: Choosing the Right Vector Search Tool for Your Application
Date published
Sept. 15, 2024
Author(s)
Chloe Williams
Language
English
Word count
2447
Hacker News points
None found.
Annoy and ScaNN are two popular vector search tools that differ in their search methodology, data handling, scalability, performance, flexibility, integration, ease of use, cost considerations, and security features. Annoy is a lightweight library designed for fast approximate searches on large static datasets, while ScaNN is an open-source tool optimized for high-dimensional vector data in machine learning applications. Both tools have their strengths and are suitable for different use cases. When choosing between the two, consider factors such as dataset size, data dynamics, search accuracy requirements, integration with existing systems, and available computational resources.
Advanced Video Search: Leveraging Twelve Labs and Milvus for Semantic Retrieval
Date published
Sept. 14, 2024
Author(s)
Yesha Shastri
Language
English
Word count
1825
Hacker News points
None found.
In August 2024, James Le from Twelve Labs presented an insightful talk on advanced video search for semantic retrieval at the Unstructured Data Meetup in San Francisco. He discussed how cutting-edge multimodal models like those developed by Twelve Labs can help machines understand videos as intuitively as humans do, and how integrating these models with efficient vector databases such as Milvus by Zilliz can create exciting applications for semantic retrieval. Video understanding involves analyzing, interpreting, and extracting meaningful information from videos using computer vision and deep learning techniques. Twelve Labs' latest state-of-the-art video foundation model, Marengo 2.6, is capable of performing 'any-to-any' search tasks, significantly enhancing video search efficiency and allowing robust interactions across different modalities. By harnessing the power of advanced multimodal embeddings and integrating it with Milvus, developers can unlock new possibilities in video content analysis by creating applications such as search engines, recommendation systems, and content-based video retrieval.
Introducing Comprehensive Monitoring & Observability in Zilliz Cloud
Date published
Sept. 13, 2024
Author(s)
Steffi Li
Language
English
Word count
861
Hacker News points
None found.
Zilliz has introduced comprehensive monitoring and observability features in its cloud platform to help users maintain high-performance vector database applications. The new Metrics dashboard provides a detailed view of cluster performance, including resource usage, query performance, and data metrics. Customizable alerts have also been added for organization-related matters and operational aspects of clusters. Key features include real-time monitoring, customizable dashboards, flexible alert configuration, multiple notification channels, and access to historical data. The platform is designed to be easily accessible within the Zilliz Cloud console, with additional enhancements planned for future updates.
Milvus on GPUs with NVIDIA RAPIDS cuVS
Date published
Sept. 12, 2024
Author(s)
Ruben Winastwan
Language
English
Word count
2516
Hacker News points
None found.
NVIDIA's latest advancements in GPU-accelerated vector search through their cuVS library and CAGRA algorithm significantly improve the performance of AI applications, particularly in cases involving high recall values, high vector dimensionality, and a large number of vectors. The integration of cuVS into Milvus, a popular open-source vector database, allows for efficient scaling and improved cost-performance ratio compared to CPU-based solutions. While GPU operational costs are higher than CPUs, the performance benefits often outweigh the expenses in large-scale applications.
The Critical Role of VectorDBs in Building Intelligent AI Agents
Date published
Sept. 11, 2024
Author(s)
Stephen Batifol
Language
English
Word count
1271
Hacker News points
None found.
Agents are AI systems capable of autonomous thought and action, distinguishing them from traditional systems. They can reason, plan, and learn to perform complex tasks beyond simple input-output responses. To effectively learn and adapt, agents need a robust memory system like Milvus, an open-source vector database that provides efficient storage, rapid vector retrieval, and scalability. By offering these capabilities, Milvus equips agents with the power to store and retrieve massive amounts of data, make smarter decisions, and learn from past interactions, ultimately improving their performance over time.
Introducing Migration Services: Efficiently Move Unstructured Data Across Platforms
Date published
Sept. 11, 2024
Author(s)
James Luan
Language
English
Word count
1161
Hacker News points
None found.
Zilliz introduces its open-source Migration Services to address challenges in efficiently moving unstructured data across platforms, such as data fragmentation and format heterogeneity. The service is built on Apache Seatunnel and supports real-time data streaming and offline batch imports. It also simplifies unstructured data transformation and ensures end-to-end data quality with robust monitoring and alerting mechanisms. By open-sourcing Migration Services, Zilliz aims to foster an open vector data ecosystem, attract contributors, enhance cloud service offerings, and gain valuable community input for future development.
New for Zilliz Cloud: Migration Service, Fivetran Connector, Multi-replica, and More
Date published
Sept. 10, 2024
Author(s)
Steffi Li
Language
English
Word count
1140
Hacker News points
None found.
Zilliz Cloud has introduced new features to enhance support for running AI workloads in production environments, including Migration Service, Fivetran Connector, Multi-replica, and Auto-scaling. These updates aim to provide developers with advanced tools to efficiently deploy and scale AI-driven applications while maintaining full ownership of their unstructured data. The new features address critical challenges such as managing large volumes of unstructured data, ensuring high performance at scale, and maintaining operational robustness in production environments.
Apache Cassandra vs. Rockset: Choosing the Right Vector Database for Your AI Applications
Date published
Sept. 9, 2024
Author(s)
Chloe Williams
Language
English
Word count
1799
Hacker News points
None found.
Apache Cassandra and Rockset are two popular options for handling vector data in AI applications. Both databases have their strengths, with Cassandra excelling in managing large-scale distributed data and offering high availability, fault tolerance, and scalability across multiple data centers. On the other hand, Rockset shines in real-time search and analytics scenarios, supporting quick ingestion and indexing of high-velocity data streams, in-place updates, and flexible vector search capabilities through its Converged Indexing technology. The choice between these technologies should be driven by specific project requirements, such as the scale of data distribution needed, the importance of real-time processing, the complexity of vector operations required, and how vector search fits into the overall data architecture of the application.
Apache Cassandra vs. Redis: Choosing the Right Vector Database for Your AI Applications
Date published
Sept. 9, 2024
Author(s)
Chloe Williams
Language
English
Word count
1819
Hacker News points
None found.
Apache Cassandra and Redis are two popular options for handling vector data in AI applications. Both databases have evolved to include vector search capabilities, but they cater to different use cases and requirements. Cassandra is ideal for large-scale distributed data with strong consistency and fault tolerance across multiple data centers, while Redis excels in scenarios demanding high-speed, real-time vector operations, particularly for datasets that can fit in memory. The choice between these technologies ultimately depends on specific project requirements, such as dataset size, the need for real-time processing, scalability needs, and the complexity of your data model.
Annoy vs HNSWlib: Choosing the Right Tool for Vector Search
Date published
Sept. 8, 2024
Author(s)
Chloe Williams
Language
English
Word count
2254
Hacker News points
None found.
Vector search has become a crucial element in modern AI applications such as recommendation engines, image retrieval systems, and natural language processing tasks. Unlike traditional search engines that rely on keyword matching, vector search allows us to retrieve information based on vector similarity, unlocking deeper insights from unstructured data like images, audio, and text embeddings. Two standout vector search solutions are Annoy and HNSWlib. Both are designed for fast and efficient vector search, but their strengths and use cases differ, making the choice between them crucial. Annoy (Approximate Nearest Neighbors Oh Yeah) is a lightweight open-source library developed by Spotify. It is specifically designed to handle large-scale, read-heavy vector searches. Its primary advantage lies in its minimal memory consumption and simplicity, making it ideal for static datasets that don't change frequently. HNSWlib (Hierarchical Navigable Small World Library) is a high-performance, graph-based library designed for approximate nearest neighbor (ANN) search. Its search algorithm relies on building a hierarchical graph structure, where nodes represent vectors, and edges represent the proximity between them. HNSWlib is widely used for vector similarity search tasks, where the goal is to find the closest vectors (or "neighbors") to a query vector from a large dataset of high-dimensional vectors. The key differences between Annoy and HNSWlib include their search methodology, data handling capabilities, scalability and performance, flexibility and customization options, integration and ecosystem support, ease of use, and cost considerations. When choosing between the two libraries, developers should consider factors such as dataset size, update frequency, memory resources, required accuracy, and desired level of control over the search algorithm.
Apache Cassandra vs Deep Lake: Choosing the Right Vector Database for Your AI Apps
Date published
Sept. 8, 2024
Author(s)
Chloe Williams
Language
English
Word count
1393
Hacker News points
None found.
Apache Cassandra and Deep Lake are both robust vector databases designed to handle complex data structures like vector embeddings essential for AI applications. While Cassandra is an open-source, distributed NoSQL database system that integrates vector search through extensions, Deep Lake is a specialized database system built with a focus on vector search and management. The choice between the two depends heavily on specific application needs, such as scalability, data handling, performance, flexibility, integration, cost, and ease of use. Apache Cassandra is suitable for applications requiring massive scalability, high availability, and flexible data management, while Deep Lake is ideal for projects involving vector data, AI workflows, and large volumes of multimedia or unstructured data.
Apache Cassandra vs. Clickhouse: Choosing the Right Vector Database for Your AI Applications
Date published
Sept. 8, 2024
Author(s)
Chloe Williams
Language
English
Word count
2324
Hacker News points
None found.
Apache Cassandra and ClickHouse are two popular options for handling vector data in AI applications. Both technologies have their strengths, with Cassandra excelling at large-scale, distributed systems that prioritize high availability and fault tolerance, while ClickHouse shines in environments requiring fast, real-time analytics on large datasets with advanced query capabilities. The choice between the two depends on specific use cases and requirements for handling vector data efficiently.
Apache Cassandra vs TiDB: Choosing the Right Database for Your AI Applications
Date published
Sept. 8, 2024
Author(s)
Chloe Williams
Language
English
Word count
2009
Hacker News points
None found.
Apache Cassandra and TiDB are both scalable distributed databases that can handle large datasets, but they differ in their core architecture and how they handle vector search functionality. Cassandra is a NoSQL database designed to handle massive amounts of unstructured or semi-structured data with its flexible schema, while TiDB is an open-source SQL database offering hybrid transactional and analytical processing (HTAP) capabilities. Both systems support vector search through integration with external libraries and plugins, but specialized vector databases like Milvus and Zilliz Cloud are better suited for large-scale, high-performance vector search tasks. When choosing between Cassandra and TiDB for vector search, consider factors such as data handling, scalability, flexibility, integration, ease of use, cost, and security features.
How to Load Test an LLM API with Gatling
Date published
Sept. 8, 2024
Author(s)
Simon Kiruri
Language
English
Word count
2332
Hacker News points
None found.
Load testing is crucial when building applications with large language models (LLMs) to ensure they can handle varying demand levels and maintain performance under different conditions. This approach helps identify potential bottlenecks and areas for improvement, ensuring the application remains reliable and responsive. Gatling, an open-source performance-testing framework, can be used to load test javascript web applications and LLM APIs like RAG apps powered by vector databases like Milvus. Load testing involves capacity tests, stress tests, and soak tests to evaluate the system's behavior under specific load conditions, identify bottlenecks, and improve performance, load, and response times.
Apache Cassandra vs. Vald: Choosing the Right Vector Database for Your AI Applications
Date published
Sept. 7, 2024
Author(s)
Chloe Williams
Language
English
Word count
1841
Hacker News points
None found.
Apache Cassandra and Vald are two popular options for handling vector data in AI applications. Cassandra is a traditional NoSQL database that has evolved to include vector search capabilities, while Vald is a purpose-built vector database designed from the ground up for efficient similarity searches. Both systems offer robust scalability but through different mechanisms: Cassandra provides a masterless architecture with tunable consistency, while Vald distributes vector indexes across multiple agents and supports horizontal scaling of memory and CPU resources. The choice between these technologies depends on specific use cases, data types, and performance requirements.
Apache Cassandra vs Qdrant: Choosing the Right Vector Database for Your Needs
Date published
Sept. 7, 2024
Author(s)
Chloe Williams
Language
English
Word count
1845
Hacker News points
None found.
Apache Cassandra and Qdrant are two popular options for handling vector data in AI applications. While both support vector search capabilities, they cater to different use cases. Cassandra is a distributed NoSQL database known for its scalability and availability, with vector search implemented as an extension of its existing architecture. On the other hand, Qdrant is a purpose-built vector database designed specifically for similarity search and machine learning applications. Key differences between the two include their search methodology, data handling capabilities, scalability and performance optimization, flexibility and customization options, integration with ecosystems, ease of use, cost considerations, and security features. The choice between these technologies ultimately depends on specific use cases, scale of vector data operations, and how they fit into an overall data architecture.
Apache Cassandra vs. Vespa: Choosing the Right Vector Database for Your AI Applications
Date published
Sept. 7, 2024
Author(s)
Chloe Williams
Language
English
Word count
2097
Hacker News points
None found.
Apache Cassandra and Vespa are two popular options for handling vector data in AI applications. While both databases offer scalability, performance, and flexibility, they differ in their approach to search methodology, data handling, and ecosystem integration. Cassandra is best suited for large-scale distributed data applications with basic vector search functionality, while Vespa excels in search-heavy applications requiring advanced multi-modal search capabilities. Choosing between the two depends on whether your focus is distributed data management or powerful, real-time search capabilities.
Apache Cassandra vs MongoDB: Choosing the Right Vector Database for AI Applications
Date published
Sept. 7, 2024
Author(s)
Chloe Williams
Language
English
Word count
1919
Hacker News points
None found.
Apache Cassandra and MongoDB are two leading NoSQL databases known for their scalability and flexibility, but they have fundamental differences that influence their suitability for different workloads. Both databases can handle vector search tasks, but specialized vector databases like Milvus and Zilliz Cloud offer better performance for large-scale, high-performance vector search tasks. Apache Cassandra is better for environments requiring high availability, fault tolerance, and massive scalability, particularly for write-heavy workloads. MongoDB offers more flexibility in handling unstructured data, real-time performance, and ease of use, making it a strong choice for AI applications that require similarity searches, recommendation engines, or NLP.
Apache Cassandra vs Faiss: Choosing the Right Tool for Vector Search
Date published
Sept. 7, 2024
Author(s)
Chloe Williams
Language
English
Word count
2160
Hacker News points
None found.
Apache Cassandra and Faiss are two technologies that handle vector data differently. While both can perform vector searches, they approach the task from different angles. Apache Cassandra is a distributed NoSQL database designed to handle large-scale structured data across many servers, ensuring high availability and scalability. It can be extended for vector search through integrations with vector search libraries or custom plugins like the DataStax integration. Faiss (Facebook AI Similarity Search) is an open-source library that provides highly efficient tools for fast similarity search and clustering of dense vectors, designed for large-scale nearest neighbor search in high-dimensional vector spaces. Key differences between the two include their search methodology, data handling capabilities, scalability and performance, flexibility and customization, integration and ecosystem support, ease of use, cost considerations, and security features. Apache Cassandra is suitable when vector search is not the primary focus, while Faiss is a better fit for high-performance vector search tasks. For large-scale, high-performance, and production vector search tasks, specialized vector databases like Milvus and Zilliz Cloud are recommended.
Apache Cassandra vs Elasticsearch: Choosing a Vector Database for Your Needs
Date published
Sept. 7, 2024
Author(s)
Chloe Williams
Language
English
Word count
2009
Hacker News points
None found.
Apache Cassandra and Elasticsearch are both traditional databases that have evolved to include vector search capabilities, making them suitable options for applications involving AI-driven tasks such as recommendation systems, image recognition, and natural language processing. While both technologies support vector search, they differ significantly in how they handle data, scale, and perform. Apache Cassandra is optimized for handling structured and semi-structured data with a strong focus on write-heavy workloads, while Elasticsearch excels at handling unstructured and semi-structured data, particularly in scenarios where real-time indexing and retrieval are needed. Both technologies have robust communities and ecosystems, but their ease of use, cost considerations, and security features vary. Apache Cassandra is a better choice when managing large-scale, distributed data with high write throughput and fault tolerance, while Elasticsearch is the go-to solution for real-time search and analytics, particularly when handling unstructured data or complex queries. For applications that rely on fast, accurate similarity searches over millions or billions of high-dimensional vectors, specialized vector databases like Milvus and Zilliz Cloud are a better fit.
Improving Analytics with Time Series and Vector Databases
Date published
Sept. 7, 2024
Author(s)
Ruben Winastwan
Language
English
Word count
2739
Hacker News points
None found.
Time series analysis plays a crucial role in many fields, particularly in Internet of Things (IoT) devices. With time series data, we can detect patterns and trends over particular periods, enabling us to forecast and analyze future time-dependent events. Common examples of time series use cases include forecasting weather temperatures and stock prices and monitoring sensor data. InfluxDB is a highly optimized time series database for storing vast amounts of time series data. It offers efficient solutions for operations such as aggregations and downsampling. However, relying on time-series databases alone can be challenging, especially if our use case demands us to perform a similarity search. In a recent talk at the Zilliz Unstructured Data Meetup, Zoe Steinkamp, Developer Advocate at InfluxDB, discussed an approach to combining InfluxDB with Milvus to store, query, and perform similarity searches on time-dependent use cases. Milvus is a vector database that stores data in vectors, enabling efficient similarity searches using techniques like cosine similarity or Euclidean distance. By combining the two databases, the strengths of both systems can be fully utilized. As you can see in the example above, time series data from sensors can be stored in InfluxDB, while vector data can be stored in Milvus. This integration allows for advanced use cases like anomaly detection in real-time traffic conditions.
Apache Cassandra vs Milvus: Choosing the Right Vector Database for Your Needs
Date published
Sept. 6, 2024
Author(s)
Chloe Williams
Language
English
Word count
2089
Hacker News points
None found.
Apache Cassandra and Milvus are both vector databases designed to handle high-dimensional vectors, which are numerical representations of unstructured data like text, images, and videos. They differ in their search methodology, data handling capabilities, scalability, flexibility, integration with other tools, ease of use, and cost considerations. Milvus is a specialized vector database designed for high-performance vector search and supports at least 11 indexing methods. It is suitable for AI-centric applications that rely on fast, accurate similarity searches over large volumes of high-dimensional vectors. Milvus offers three deployment options: Milvus Lite, Standalone, and Distributed. On the other hand, Apache Cassandra is a distributed NoSQL database known for its high availability, fault tolerance, and scalability across large clusters. It has added vector search capabilities through DataStax but remains primarily focused on traditional data management. Cassandra's strengths include linear scalability, handling various data types, and integrating with popular big data tools. The choice between Milvus and Apache Cassandra depends on the specific use case and the complexity of the data. Milvus is better suited for AI-heavy applications that require fast vector search capabilities, while Cassandra offers more versatility for environments where vector search is an add-on rather than the core focus.
Apache Cassandra vs. Vearch: Choosing the Right Vector Database for Your AI Applications
Date published
Sept. 6, 2024
Author(s)
Chloe Williams
Language
English
Word count
1917
Hacker News points
None found.
Apache Cassandra and Vearch are two popular options for handling vector data in AI applications. While both technologies offer strong scalability, they differ in their approach to vector search and data handling. Cassandra is a NoSQL database designed to handle structured and semi-structured data efficiently, with the addition of vector search capabilities through its Storage-Attached Indexes (SAI) feature. Vearch, on the other hand, is purpose-built for vector search and offers hybrid search capabilities, allowing users to perform complex queries that combine similarity searches with traditional filtering. When choosing between these two technologies, consider your specific needs in terms of data management, scalability, performance, and flexibility.
Apache Cassandra vs MyScale: Choosing the Right Vector Database for Your AI Applications
Date published
Sept. 6, 2024
Author(s)
Chloe Williams
Language
English
Word count
1935
Hacker News points
None found.
Apache Cassandra and MyScale are two popular databases that offer vector search capabilities, but they have different strengths. Apache Cassandra is an open-source NoSQL database designed to handle large amounts of structured data across multiple servers, while MyScale is built on the ClickHouse database for AI and machine learning workloads. Both can perform vector search, but MyScale has native support for it, making it more straightforward to use for AI-heavy workloads. Cassandra excels in environments where scalability and high availability are critical, while MyScale is better suited for applications that rely heavily on vector search and real-time data processing. For large-scale, high-performance vector search tasks, specialized vector databases like Milvus and Zilliz Cloud are recommended.
Evaluating Multimodal RAG Systems Using Trulens
Date published
Sept. 6, 2024
Author(s)
Fendy Feng
Language
English
Word count
1831
Hacker News points
None found.
Multimodal architectures are gaining prominence in Generative AI (GenAI) as organizations increasingly build solutions using multimodal models such as GPT-4V and Gemini Pro Vision. These models can semantically embed and interpret various data types, making them more versatile and effective than traditional large language models across a broader range of applications. However, challenges arise in ensuring their reliability and accuracy due to hallucinations where they produce incorrect or irrelevant outputs. Multimodal Retrieval Augmented Generation (RAG) addresses these limitations by enriching models with relevant contextual information from external sources. Evaluation tools like Trulens help developers monitor performance, test reliability, and identify areas for improvement in multimodal RAG systems to ensure accuracy and relevance while minimizing hallucinations.
Apache Cassandra vs Pinecone: Choosing Your Vector Database
Date published
Sept. 5, 2024
Author(s)
Chloe Williams
Language
English
Word count
1518
Hacker News points
None found.
Apache Cassandra and Pinecone are two popular vector databases used in AI applications. While both can handle large amounts of data, they differ in their approach to vector search. Cassandra is an open-source database that has evolved to include vector search capabilities, while Pinecone is a proprietary SaaS built specifically for vector search. Cassandra's main advantage lies in its flexibility and ability to handle various types of data, including vectors. It also benefits from being part of the Apache ecosystem, which includes other popular tools like Spark and Hadoop. However, it can be complex to set up and manage, especially for those new to distributed systems. On the other hand, Pinecone is simpler to start with as a managed service that handles infrastructure and security. It's designed to work easily with machine learning frameworks and cloud services, making it an attractive choice for developers focusing on vector search performance. However, its specialized nature means less room for customization compared to Cassandra. The best choice between the two depends on specific project needs and team capabilities. Factors such as open-source availability, customization requirements, and expertise in managing distributed systems should be considered when making a decision.
Annoy vs Faiss: Choosing the Right Tool for Vector Search
Date published
Sept. 5, 2024
Author(s)
Chloe Williams
Language
English
Word count
2533
Hacker News points
None found.
In this blog post, we explored two powerful vector search tools, Annoy and Faiss, which are popular in high-dimensional data applications such as natural language processing (NLP), semantic search, or image retrieval. We clarified what vector search is and provided an overview of various solutions available on the market for performing vector searches. Annoy is an open-source library developed by Spotify that focuses on speed and memory efficiency for static data. It uses a method based on random projection trees to quickly find items similar to a given query item, making it suitable for applications where speed is critical and exact results aren't necessary. Annoy is widely praised for its simplicity, speed, and ease of use, especially for developers needing a fast static data search tool. Faiss is an open-source library developed by Meta (formerly Facebook) that provides highly efficient tools for fast similarity search and clustering of dense vectors. Faiss is designed for large-scale nearest-neighbor search and can handle both approximate and exact searches in high-dimensional vector spaces. It stands out for its ability to leverage GPU acceleration, providing a major boost in performance for large-scale applications. When deciding between Annoy and Faiss, several key factors must be considered, including search methodologies, data handling, performance, and scalability. While both tools perform well in terms of scalability, they are built with different goals in mind. Vector search libraries like Annoy and Faiss focus solely on search algorithms and require the developer to manage all other aspects, such as data storage, scalability, and infrastructure. In contrast, purpose-built vector databases like Milvus and Zilliz Cloud provide a more comprehensive solution, including data storage, scaling, indexing, replication, and query management. To ensure your search algorithm returns accurate results and does so at lightning speed, we need a benchmarking tool. Two efficient tools are ANN Benchmarks and VectorDBBench, which allow developers to measure metrics like search speed, accuracy, and memory usage across various datasets. By using these tools, you can assess the trade-offs between speed and precision for algorithms like those found in libraries such as Faiss, Annoy, HNSWlib, and others.
Apache Cassandra vs Pinecone: Choosing Your Vector Database
Date published
Sept. 5, 2024
Author(s)
Chloe Williams
Language
English
Word count
1518
Hacker News points
None found.
Apache Cassandra and Pinecone are two popular vector databases that differ in their approach and capabilities. Cassandra is an open-source distributed database designed to handle large amounts of data across multiple computers, while Pinecone is a proprietary SaaS built specifically for vector search. Both can handle lots of data but in different ways: Cassandra allows users to add more machines to handle more data, and being open-source, developers have full control over this process; Pinecone handles scaling as a managed service. Apache Cassandra is good if you need to handle various types of data, not just vectors, and want control over your infrastructure or are already using other Apache tools. On the other hand, consider Pinecone when focusing on vector search without managing infrastructure, wanting to get started quickly, or needing a system that's easy to use with machine learning models.
Harnessing Embedding Models for AI-Powered Search
Date published
Sept. 5, 2024
Author(s)
Haziqa Sajid
Language
English
Word count
2136
Hacker News points
None found.
Embedding models and vector embeddings are crucial in handling vast amounts of unstructured data, particularly when dealing with modern datasets that require understanding meaning and context. These models transform unstructured data into numerical representations, enabling computers to understand, process, and analyze it more effectively. They capture the relationships and meanings within the data, allowing for tasks like question-answering, translation, and summarization. Advanced embedding models can handle multiple languages and data types such as text, images, and audio, making them important in building modern search systems that understand and retrieve relevant content using meaning rather than keywords.
Apache Cassandra vs pgvector: Choosing the Right Vector Database for Your Needs
Date published
Sept. 4, 2024
Author(s)
Chloe Williams
Language
English
Word count
1932
Hacker News points
None found.
Apache Cassandra and pgvector are two popular options in the vector database space. Both technologies have evolved from traditional databases to include vector search capabilities, enabling efficient similarity searches on high-dimensional data. Key differences between them include their search methodology, data handling, scalability, flexibility, integration, ease of use, cost considerations, and security features. Cassandra is well-suited for large-scale systems that require distributed architecture and vector similarity searches at scale, while pgvector offers a more accessible entry point into vector search for teams already familiar with relational databases. The choice between these technologies should depend on specific use cases, data volume, existing technology stack, and team expertise.
Implementing Agentic RAG Using Claude 3.5 Sonnet, LlamaIndex, and Milvus
Date published
Sept. 4, 2024
Author(s)
Benito Martin
Language
English
Word count
2481
Hacker News points
None found.
The concept of Compound AI Systems is introduced by Bill Zhang, Director of Engineering at Zilliz, in his talk on the evolution of LLM app architectures. This modular approach integrates multiple components to handle various tasks rather than relying on a single AI model, delivering more tailored and efficient results. The architecture development of LLM applications is discussed, along with the concepts of Retrieval Augmented Generation (RAG) and Agentic RAG. Challenges and benefits of these systems are also highlighted. An example of building an Agentic RAG using Claude 3.4 Sonnet, LlamaIndex, and Milvus vector database is provided in a step-by-step manner. The complete architecture of the agentic RAG built with Milvus, LlamaIndex, and Cluade 3.5 Sonnet is also presented.
Apache Cassandra vs pgvector: Choosing the Right Vector Database for Your Needs
Date published
Sept. 4, 2024
Author(s)
Chloe Williams
Language
English
Word count
1932
Hacker News points
None found.
Apache Cassandra and pgvector are two options in the vector database space. A vector database is specifically designed to store and query high-dimensional vectors, which are numerical representations of unstructured data. These databases play a crucial role in AI applications, allowing for more advanced data analysis and retrieval. Both technologies continue to evolve, so it's worth monitoring their progress as you make your decision. Consider Cassandra when handling very large amounts of data across a distributed system, while pgvector is suitable for scenarios where vector search needs to be tightly integrated with traditional relational data.
Weaviate vs Elasticsearch: Choosing the Right Vector Database for Your Needs
Date published
Sept. 3, 2024
Author(s)
Fendy Feng
Language
English
Word count
1610
Hacker News points
None found.
Weaviate and Elasticsearch are two technologies that offer search capabilities but cater to different needs and use cases. Weaviate is an open-source, purpose-built vector database designed for semantic searches, while Elasticsearch is a NoSQL database with vector search capabilities as an add-on. Key differences between the two include their search methodologies (vector search vs inverted index-based search), data handling capabilities, integrations with AI and machine learning, scalability and performance, use cases, ease of use, ecosystems, data modeling and query languages, community support, and licensing. The choice between Weaviate and Elasticsearch depends on specific needs, nature of the data, and future scalability requirements.
Apache Cassandra vs. Aerospike: Choosing the Right Vector Database for Your AI Applications
Date published
Sept. 3, 2024
Author(s)
Chloe Williams
Language
English
Word count
1765
Hacker News points
None found.
Apache Cassandra and Aerospike are two popular distributed NoSQL databases that have evolved to include support for vector search capabilities, making them suitable for AI-driven applications requiring efficient handling of high-dimensional vector data. Both systems leverage their existing strengths while addressing the growing demand for efficient vector data storage and retrieval. Cassandra integrates vector search into its core database using Storage-Attached Indexes (SAI), allowing for flexible schema design with vector data stored alongside other attributes. Aerospike introduces a dedicated vector search layer (AVS) on top of its core database, focusing on low-latency, high-throughput operations. The choice between these two databases largely depends on specific use case requirements, such as data scale and complexity, performance needs, team expertise, and production timeline. Conducting proof-of-concept tests with specific datasets and query patterns is essential in making an informed decision. Additionally, using open-source benchmarking tools like VectorDBBench can assist in evaluating and comparing vector database performance based on actual results.
Apache Cassandra vs. Kdb: Choosing the Right Vector Database for Your AI Applications
Date published
Sept. 3, 2024
Author(s)
Chloe Williams
Language
English
Word count
2101
Hacker News points
None found.
Apache Cassandra and Kdb are two popular options for handling vector data in AI applications. Both databases have their strengths, with Cassandra excelling in large-scale distributed data management and Kdb offering superior real-time data processing and advanced vector search capabilities. The choice between the two depends on specific use cases, such as scalability, performance, flexibility, integration, ease of use, cost considerations, and security features. To make an informed decision, developers should evaluate these databases based on their own datasets and query patterns using tools like VectorDBBench.
Weaviate vs Elasticsearch: Choosing the Right Vector Database for Your Needs
Date published
Sept. 3, 2024
Author(s)
Fendy Feng
Language
English
Word count
1610
Hacker News points
None found.
Weaviate and Elasticsearch are two technologies that offer search capabilities but cater to different needs and use cases. Weaviate is an open-source, purpose-built vector database designed for semantic searches, while Elasticsearch is a NoSQL database with vector search capabilities as an add-on. The primary distinction between the two lies in their search methodologies: Weaviate uses vector search, whereas Elasticsearch primarily uses inverted index-based search. Both technologies are scalable and have different strengths in handling data and integrating AI and machine learning. Choosing between them depends on specific use cases, nature of data, and future scalability needs.
Apache Cassandra vs OpenSearch: Choosing the Right Vector Database for Your Needs
Date published
Sept. 2, 2024
Author(s)
Chris Churilo
Language
English
Word count
1454
Hacker News points
None found.
Apache Cassandra and OpenSearch are two options in the vector database space that have evolved to include vector search capabilities as an add-on. Both technologies offer distributed architectures, support for structured and unstructured data, and scalability features. However, they differ in their search methodology, data handling, flexibility, integration with other tools, ease of use, cost considerations, and security features. The choice between Cassandra and OpenSearch should depend on the specific use case, data types, scalability needs, and existing technology stack.
Apache Cassandra vs OpenSearch: Choosing the Right Vector Database for Your Needs
Date published
Sept. 2, 2024
Author(s)
Chris Churilo
Language
English
Word count
1454
Hacker News points
None found.
Apache Cassandra and OpenSearch are two popular options in the vector database space. Both technologies have evolved to include vector search capabilities as an add-on, making them suitable for AI-driven applications. Key differences between the two include their search methodology, data handling, scalability and performance, flexibility and customization, integration and ecosystem, ease of use, cost considerations, and security features. The choice between Cassandra and OpenSearch should depend on specific use cases, data types, scalability needs, and existing technology stacks.
Scaling Search with Milvus: Handling Massive Datasets with Ease
Date published
Aug. 26, 2024
Author(s)
Stephen Batifol
Language
English
Word count
3015
Hacker News points
None found.
Milvus is a powerful open-source vector database designed to handle massive datasets with ease. Its key features include a distributed architecture, optimized indexing techniques, and the ability to search through billions of vectors efficiently. In this blog post, we explore how Milvus can be used to work with 40 million vectors and demonstrate its metadata filtering capabilities, which significantly enhance search results. We also discuss advanced features like data partitioning and hybrid search for enhanced scalability.
Relational Databases vs Vector Databases
Date published
Aug. 25, 2024
Author(s)
Chris Churilo
Language
English
Word count
2144
Hacker News points
None found.
The article discusses the shift from traditional relational databases to specialized databases tailored to specific use cases, such as graph, search, time series, key-value, in-memory, and vector databases. It highlights that while relational databases remain dominant, purpose-built databases are gaining traction due to increasing demands for performance and advanced features. The article also provides an overview of vector databases and compares them with traditional relational databases, emphasizing the importance of selecting the right indexing strategy and benchmarking tools like VectorDBBench to optimize performance. Finally, it outlines various use cases for vector databases, such as Retrieval Augmented Generation (RAG), recommender systems, multimodal similarity search, and molecular similarity search.
Navigating the Challenges of ML Management: Tools and Insights for Success
Date published
Aug. 21, 2024
Author(s)
Fendy Feng
Language
English
Word count
1426
Hacker News points
None found.
The management and versioning of massive datasets and models in machine learning (ML) have become increasingly complex, requiring specialized solutions beyond traditional tools like Git. XetHub is a tool that extends Git's capabilities to handle petabyte-scale data efficiently, addressing the challenges of scalability, data management, collaboration, and observability in ML development. Vector databases such as Milvus and Zilliz Cloud are also crucial for managing high-dimensional unstructured data, particularly in applications like Retrieval Augmented Generation (RAG). By combining solutions like XetHub with vector databases and machine learning models, we can enhance the effectiveness of ML projects, ensuring they are well-managed and adaptable to new data.
How Metadata Lakes Empower Next-Gen AI/ML Applications
Date published
Aug. 19, 2024
Author(s)
Fendy Feng and ShriVarsheni R
Language
English
Word count
1533
Hacker News points
None found.
As AI technologies like large language models (LLMs) and Retrieval Augmented Generation (RAG) continue to evolve, the demand for flexible and efficient data infrastructure is growing. Metadata lakes are emerging as a key solution in this regard, offering a unified approach to data management by storing metadata from various sources in an organization. Metadata provides context and understanding of the stored data, including data source, quality, lineage, ownership, content, structure, and context. Metadata lakes can assist with RAG development, model registration, AI governance, and implementing advanced analytics. By providing a unified plane for data operations, metadata lakes empower teams to maintain observability in metadata analysis, ensure smooth transitions between different cloud environments and data sources like the Milvus vector database, and uphold governance frameworks seamlessly. As AI technologies advance, metadata lakes will play a key role in supporting next-generation AI/ML applications.
RoBERTa: An Optimized Method for Pretraining Self-supervised NLP Systems
Date published
Aug. 18, 2024
Author(s)
Haziqa Sajid
Language
English
Word count
3647
Hacker News points
None found.
RoBERTa (Robustly Optimized BERT Pretraining Approach) is an improved version of BERT designed to address its limitations and enhance performance across various NLP problems. It introduced several key improvements, including dynamic masking, removal of the next sentence prediction task, larger training data and extended duration, increasing batch sizes, and byte text encoding. These modifications led to significant improvements in model performance on downstream tasks compared to the originally reported BERT results.
Streamlining the Deployment of Enterprise GenAI Apps with Efficient Management of Unstructured Data
Date published
Aug. 15, 2024
Author(s)
ShriVarsheni R and Fendy Feng
Language
English
Word count
1123
Hacker News points
None found.
Streamlining the Deployment of Enterprise GenAI Apps with Efficient Management of Unstructured Data is a challenge faced by many companies due to the complexity and volume of unstructured data. Aparavi, a data management service provider, offers a comprehensive platform designed to simplify the management and utilization of unstructured data. The platform integrates seamlessly with various data sources and ensures data privacy by keeping it securely on-premises. It also supports advanced OCR capabilities for extracting text from images and includes built-in features for automating processes and controlling data ownership at a granular level. Integration with Milvus, an open-source vector database, enables Aparavi's platform to offer Enterprise RAG solutions with the Semantic Search Retriever and AI Data Loader. While there are still some challenges, such as heavy footprint and complex user interface, leveraging advanced data management platforms like Aparavi can help enterprises streamline their AI projects and scale their applications as their business grows.
Boosting Work Efficiency with Generative AI Use Cases
Date published
Aug. 12, 2024
Author(s)
Fendy Feng and Fariba Laiq
Language
English
Word count
1119
Hacker News points
None found.
Generative AI (GenAI) is transforming business operations by automating mundane tasks, enhancing productivity, and offering deeper insights. Advanced tools like Large Language Models (LLMs), multimodal models, vector databases, and embedding models are central to GenAI's success. Applications of GenAI span across industries, from customer service automation to supply chain optimization. Upstage AI is an example of a platform that utilizes these technologies to automate workflows and solve industry-specific challenges. The transformative potential of GenAI is evident in various applications such as process automation, customer support automation, content creation and personalization, supply chain optimization, and automated news generation.
How to Choose the Right Milvus Deployment Mode for Your AI Applications
Date published
Aug. 9, 2024
Author(s)
Robert Guo
Language
English
Word count
2061
Hacker News points
None found.
Milvus is an open-source vector database that offers three deployment options: Milvus Lite, Standalone, and Distributed. Milvus Lite is a lightweight Python library ideal for rapid prototyping and small-scale experiments. Milvus Standalone is suitable for early production environments with moderate data sizes and growing user demands. Milvus Distributed is designed for large-scale production deployments requiring high availability, scalability, and flexibility. The choice of deployment mode depends on the stage of application development, data size, and use case.
The Landscape of GenAI Ecosystem: Beyond LLMs and Vector Databases
Date published
Aug. 9, 2024
Author(s)
Jiang Chen
Language
English
Word count
2374
Hacker News points
None found.
The landscape of Generative Artificial Intelligence (GenAI) has significantly expanded beyond Large Language Models (LLMs) and vector databases, with a focus on Retrieval-Augmented Generation (RAG) and multimedia generation. RAG combines information retrieval techniques with generative language models to produce relevant outputs, while multimedia generation leverages generative models for complex visual content creation. The GenAI ecosystem includes various components such as data connectors, embedding models, LLM inference frameworks, agentic frameworks, and frontend UI experiences. Key projects within the ecosystem include LlamaIndex, Ragas, Airbyte, Voyage AI, vLLM, MemGPT, Bing API, Streamlit, WhyHow, AnythingLLM, Midjourney, and Zilliz Cloud.
How Vector Databases are Revolutionizing Unstructured Data Search in AI Applications
Date published
Aug. 8, 2024
Author(s)
Denis Kuria
Language
English
Word count
2693
Hacker News points
None found.
Vector databases are revolutionizing unstructured data search in AI applications by enabling efficient and semantically meaningful retrieval of relevant information. They store and search data based on semantic similarity rather than exact matches, allowing for more nuanced and context-aware information retrieval. Applications of vector databases include retrieval-augmented generation (RAG), recommender systems, molecular similarity search, and multimodal similarity search. These databases are transforming various fields by providing a unified way to represent and search across different types of data.
Optimizing Multi-agent Systems with Mistral Large, Mistral Nemo, and Llama-agents
Date published
Aug. 6, 2024
Author(s)
Stephen Batifol
Language
English
Word count
2813
Hacker News points
None found.
This blog post explores how to build agents using Llama-agents and Milvus, combining large language models (LLMs) with vector similarity search capabilities to create sophisticated agentic systems. The text discusses the use of Mistral Nemo for simpler tasks and Mistral Large for orchestrating different agents, as well as how to load data into Milvus and define various tools for the agent. It also covers using an LLM to create metadata filters automatically and orchestrating everything with Mistral Large.
Building a Multilingual RAG with Milvus, LangChain, and OpenAI LLM
Date published
Aug. 5, 2024
Author(s)
Tim Mugabi
Language
English
Word count
2061
Hacker News points
None found.
Retrieval Augmented Generation (RAG) is a popular technique used to build GenAI applications powered by large language models (LLMs). It enhances an LLM's output by providing contextual information on which the model wasn’t pre-trained. Multilingual RAG is an extended RAG that handles text data in multiple languages. Building a multilingual RAG involves using embedding models, vector databases, and LLMs as core components. The choice of embedding model is crucial for supporting multiple languages.
Building RAG with Milvus, vLLM, and Llama 3.1
Date published
Aug. 4, 2024
Author(s)
Christy Bergman
Language
English
Word count
1673
Hacker News points
None found.
The University of California – Berkeley has donated vLLM, a fast and easy-to-use library for LLM inference and serving, to LF AI & Data Foundation as an incubation-stage project. Large Language Models (LLMs) and vector databases are usually paired to build Retrieval Augmented Generation (RAG), a popular AI application architecture to address AI Hallucinations. This blog demonstrates how to build and run a RAG with Milvus, vLLM, and Llama 3.1.1. The process includes embedding and storing text information as vector embeddings in Milvus, using this vector store as a knowledge base to efficiently retrieve text chunks relevant to user questions, and leveraging vLLM to serve Meta's Llama 3.1-8B model to generate answers augmented by the retrieved text.
GraphRAG Explained: Enhancing RAG with Knowledge Graphs
Date published
Aug. 2, 2024
Author(s)
Cheney Zhang
Language
English
Word count
3308
Hacker News points
None found.
Retrieval Augmented Generation (RAG) is a technique that connects external data sources to enhance the output of large language models (LLMs). This technique is perfect for LLMs to access private or domain-specific data and address hallucination issues. Therefore, RAG has been widely used to power many GenAI applications, such as AI chatbots and recommendation systems. Microsoft Research introduced GraphRAG, a brand-new method that augments RAG retrieval and generation with knowledge graphs. Unlike a baseline RAG that uses a vector database to retrieve semantically similar text, GraphRAG enhances RAG by incorporating knowledge graphs (KGs). Knowledge graphs are data structures that store and link related or unrelated data based on their relationships. A GraphRAG pipeline usually consists of two fundamental processes: indexing and querying. The GraphRAG Pipeline includes four key steps in the indexing process: Text Unit Segmentation, Entity, Relationship, and Claims Extraction, Hierarchical Clustering, and Community Summary Generation. In the querying stage, GraphRAG has two different querying workflows tailored for different queries: Global Search and Local Search. Baseline RAG vs. GraphRAG in Output Quality demonstrates that GraphRAG significantly improves multi-hop reasoning and complex information summarization. The research indicates that GraphRAG surpasses Baseline RAG in both comprehensiveness and diversity.
Building a Multimodal Product Recommender Demo Using Milvus and Streamlit
Date published
July 30, 2024
Author(s)
Christy Bergman, David Wang, and Reina Wang
Language
English
Word count
1134
Hacker News points
None found.
The Milvus Multimodal RAG demo is a product recommendation system that uses Google's MagicLens multimodal embedding model to encode both images and text into a single multimodal vector. This vector is then used to search for the closest-matching Amazon products from a Milvus vector database. The technologies used in this demo include Google DeepMind's MagicLens, OpenAI's GPT-4o, Milvus, and Streamlit. The data comes from the Amazon Reviews 2023 dataset, with a subset of 5K items being used for the demonstration. The setup instructions for MagicLens involve setting up an environment, installing dependencies, and downloading model weights. The Milvus server is used to store, index, and search vectors, while Streamlit provides a user-friendly interface for uploading images and entering text instructions. The Ask GPT function utilizes OpenAI's GPT-4o mini multimodal generative model to provide AI-powered recommendations based on the search results.
Function Calling with Ollama, Llama 3.1 and Milvus
Date published
July 30, 2024
Author(s)
Stephen Batifol
Language
English
Word count
1242
Hacker News points
None found.
Function calling with LLMs allows developers to create powerful and context-aware applications by integrating Language Learning Models (LLMs) like Llama 3.1 with external tools such as user-defined functions or APIs. This enables the creation of solutions for data extraction, natural language conversion to API calls or database queries, and conversational knowledge retrieval engines that interact with a knowledge base. In this blog post, we explore how to integrate Llama 3.1 with external tools like Milvus and APIs to build advanced applications. The integration of LLMs with external tools opens up new possibilities for developers to create versatile and powerful AI applications catering to specific use cases and practical problems.
Techniques and Challenges in Evaluating Your GenAI Applications Using LLM-as-a-judge
Date published
July 24, 2024
Author(s)
Fariba Laiq
Language
English
Word count
2236
Hacker News points
None found.
Large language models (LLMs) are increasingly being adopted across various industries and production environments. Ensuring their outputs are accurate, reliable, and unbiased is crucial as they become more widespread. Traditional human evaluation methods often fall short due to their time-consuming nature and inconsistency in handling the complexity and scale of modern LLMs. One promising approach to this challenge is using LLMs as judges to evaluate their outputs. By leveraging their extensive training data and contextual understanding, LLMs can provide automated, scalable, and consistent assessments. During a meetup hosted by Zilliz, Sourabh Agrawal discussed the real-world difficulties of implementing LLM-as-a-judge techniques and highlighted four primary metrics for assessing LLM performance: response quality, context awareness, conversational quality, and safety. He also shared strategies for addressing challenges associated with using LLMs as judges, such as biases in evaluations, consistency problems, lack of domain-specific knowledge, and the complexity of evaluating complex responses. To tackle these limitations, developers can adopt objective evaluations, check for conciseness, use a grading system with "YES, NO, MAYBE" options, and maintain cost-effective evaluations by leveraging cheaper LLMs as much as possible. Additionally, fine-tuning the judge LLM for specific domains ensures more accurate and relevant evaluations. UpTrain AI is an open-source framework that developers can use to evaluate their LLM applications. It provides scores and explanations, breaking down long responses into subparts and evaluating each for a more objective measure of conciseness. The UpTrain dashboard logs all data, enabling comparison of models and prompts and monitoring performance.
Enhancing Your RAG with Knowledge Graphs Using KnowHow
Date published
July 23, 2024
Author(s)
Haziqa Sajid
Language
English
Word count
1740
Hacker News points
None found.
Retrieval Augmented Generation (RAG) is a technique that enhances large language models (LLMs) by providing them with additional knowledge and long-term memories through vector databases like Milvus and Zilliz Cloud. While RAG can address many LLM headaches, it may be insufficient for more advanced requirements such as customization or greater control of the retrieved results. Knowledge Graphs (KG) can be incorporated into the RAG pipeline to improve performance and accuracy. By integrating KGs with RAG systems, users can enhance contextual understanding, improve accuracy and factual consistency, enable multi-hop reasoning capabilities, facilitate efficient information retrieval, provide transparent and traceable outputs, synthesize knowledge across domains, and handle ambiguity more effectively.
Setting up Milvus on Amazon EKS
Date published
July 16, 2024
Author(s)
Christy Bergman
Language
English
Word count
3240
Hacker News points
None found.
This blog post provides a detailed guide on how to deploy an open-source vector database called Milvus on Amazon Elastic Kubernetes Service (EKS). The author explains the architecture of Milvus and its integration with EKS, AWS S3 for object storage, Amazon Managed Streaming for Apache Kafka (MSK) for message storage, and Amazon Elastic Load Balancing (ELB) as a load balancer. The post also covers prerequisites such as installing the AWS Command Line Interface (CLI), EKS tools like kubectl, eksctl, and helm, creating an S3 bucket with a KMS customer-managed key, and setting up an Amazon MSK instance. Next, the author guides users through creating an Amazon EKS cluster using eksctl, installing the AWS Load Balancer Controller, and deploying Milvus on EKS using Helm. The post also explains how to configure S3 as object storage, MSK as message storage, expose Milvus services for external access, and enable high availability deployment of Milvus core components. Finally, the author demonstrates how to access and manage Milvus endpoints through Kubernetes Services and Attu, an open-source Milvus administration tool. The post concludes with a test using Milvus' official example code to verify if the Milvus database is working properly.
Building a Conversational AI Agent with Long-Term Memory Using LangChain and Milvus
Date published
July 15, 2024
Author(s)
Rok Benko
Language
English
Word count
1894
Hacker News points
None found.
LangChain is an open-source framework that simplifies building conversational AI agents using large language models (LLMs). It provides tools and templates to create smart, context-aware chatbots and other applications. Conversational agents are software programs that interact with users in natural language, handling tasks like answering questions or translating languages. LangChain Agents use LLMs to interact with external tools and data sources, making them more powerful for various applications. To build a conversational agent using LangChain, developers need to install dependencies such as LangChain, langchain-openai, OpenAI API SDK, dotenv, Milvus, pymilvus, and tiktoken. They can then create a conversation chain with the ConversationChain class from langchain.chains, making predictions by passing user input to the conversation chain. To enhance conversational agents with long-term memory, developers can integrate Milvus Lite as a vector store to store and retrieve data efficiently. By incorporating memory into their agents using LangChain and Milvus Lite, developers can create more accurate and personalized responses based on previous interactions. This integration significantly enhances the capabilities of AI agents, allowing them to provide better assistance in various applications.
Metadata Filtering, Hybrid Search or Agent When Building Your RAG Application
Date published
July 12, 2024
Author(s)
Stephen Batifol
Language
English
Word count
825
Hacker News points
None found.
Retrieval Augmented Generation (RAG) is a technique that enhances Language Learning Models (LLMs) by integrating additional data sources. Milvus, a vector database, can boost the performance of RAG applications with its Metadata Filtering, Hybrid Search, and Agent capabilities. Metadata Filtering allows for precise and efficient searches by enriching data with additional attributes. Hybrid Search expands search capabilities by allowing queries across multiple vector columns. Agents automate actions based on LLM's output, enabling continuous updates to the RAG system.
Simplifying Legal Research with RAG, Milvus, and Ollama
Date published
July 11, 2024
Author(s)
Stephen Batifol
Language
English
Word count
1441
Hacker News points
None found.
In this blog post, we explore how Retrieval Augmented Generation (RAG) can be applied to legal data using Ollama and Milvus. RAG is a technique that enhances Language Learning Models (LLMs) by integrating additional data sources. We demonstrate how to set up a RAG system for legal data, leveraging Milvus as our vector database and Ollama for local LLM operations. The process involves indexing the data, retrieval and generation at runtime, and using an LLM to generate a response based on enriched context. This approach can significantly streamline legal research by making it more efficient and easier.
Building Production Ready Search Pipelines with Spark and Milvus
Date published
July 10, 2024
Author(s)
Ruben Winastwan
Language
English
Word count
2372
Hacker News points
None found.
Building a scalable vector search pipeline in production is challenging due to handling massive amounts of unstructured data and high query volumes. To address this, a combination of Milvus, an open-source vector database, and Apache Spark, a distributed computing framework, can be used. Milvus enables efficient vector search operations on large datasets, while Spark accelerates data processing tasks by distributing them across multiple computers in batches. By integrating these tools, developers can create production-ready applications that leverage AI models for improved information retrieval and search processes.
Safeguarding Data Integrity: On-Prem RAG Deployment with LLMware and Milvus
Date published
July 9, 2024
Author(s)
Haziqa Sajid
Language
English
Word count
2600
Hacker News points
None found.
Darren Oberst, CEO of AI Blocks, discussed deploying Retrieval Augmented Generation (RAG) on-premises for large financial and legal services companies during a recent Unstructured Data Meetup session. He highlighted the challenges faced by enterprises in adopting RAG, including data privacy and security concerns, elevated costs, and neglecting retrieval strategies. To address these issues, Darren advocates for deploying RAG on private cloud solutions, offering better data security, lower cost, and enhanced generation with retrieval capabilities. The session also covered the Dragon models designed specifically for Retrieval Augmented Generation (RAG) in the Huggingface Transformers library and LLMware, a library designed for enterprise-level LLM-based applications.
Build Better Multimodal RAG Pipelines with FiftyOne, LlamaIndex, and Milvus
Date published
July 9, 2024
Author(s)
Denis Kuria
Language
English
Word count
1882
Hacker News points
None found.
The talk by Jacob Marks at the Unstructured Data Meetup hosted by Zilliz focused on building robust multimodal Retrieval Augmented Generation (RAG) pipelines using FiftyOne, LlamaIndex, and Milvus. RAG enhances large language models' capabilities by augmenting their knowledge with relevant external data. The architecture of a text-based RAG system is simple, integrating LLMs with vector databases like Milvus or Zilliz Cloud to provide users with more accurate and contextually relevant responses. Multimodal RAG proves invaluable for systems that need multiple data types to make informed decisions. It combines information retrieval and generative modeling to enhance the capabilities of multimodal LLMs, integrating various data types such as text, images, audio, and video. The fiftyone-multimodal-rag-plugin can be used to implement a multimodal RAG pipeline using FiftyOne, LlamaIndex, and Milvus.
Getting Started with LLMOps: Building Better AI Applications
Date published
July 8, 2024
Author(s)
Tim Mugabi
Language
English
Word count
2656
Hacker News points
None found.
OpenAI's ChatGPT has sparked a surge of interest in large language models (LLMs) among corporations, leading to increased demand for technology vendors that support LLM operations (LLMOps). These vendors provide comprehensive workflows for developing, fine-tuning, and deploying LLMs into production environments. Sage Elliott, a machine learning engineer at Union.ai, discussed deploying and managing LLMs during a recent Unstructured Data Meetup, focusing on ensuring the reliability and scalability of LLM applications in production settings. LLMOps stands for Large Language Model Operations, which are analogous to MLOps but specifically for large language models (LLMs). MLOps (Machine Learning Operations) refers to the practices and tools used to efficiently deploy and maintain machine learning models in production environments. It is an extension of DevOps (Development and Operations), which integrates application development and operations into a cohesive process. Continuous Integration/Continuous Deployment (CI/CD) is one of the core principles of LLMOps, automating the LLM application development lifecycle. Continuous integration (CI) involves automatically taking application updates and merging them with the main branch, while continuous delivery/deployment (CD) refers to the process of automatically deploying changes to the application in a production environment after integration and validation. LLMOps is essential for production-level AI applications, with the exact infrastructure dependent on the application's needs. Integrating LLMOps into your AI application offers benefits such as resource management and scalability, model updating and improvements, and ethical and responsible AI practices. A simplified LLMOps pipeline includes elements like Sys Prompt, Model, Guardrail, Data Store, Monitor, and CI/CD Orchestrator. The HuggingFace Spaces platform streamlines the shipping of your model into production, offering low-cost cloud GPUs to power LLMs. To get started with LLMOps, follow a simple three-step philosophy: ship the model, monitor its performance, and improve it based on insights gained from monitoring. Tools like LangKit, Ragas, Continuous Eval, TruLens-Eval, LlamaIndex, Phoenix, DeepEval, LangSmith, and OpenAI Evals can help evaluate LLM applications.
Infrastructure Challenges in Scaling RAG with Custom AI Models
Date published
July 6, 2024
Author(s)
Uppu Rajesh Kumar
Language
English
Word count
3730
Hacker News points
None found.
Retrieval Augmented Generation (RAG) systems have significantly enhanced AI applications by providing more accurate and contextually relevant responses. However, scaling and deploying these systems in production have presented considerable challenges as they become more sophisticated and incorporate custom AI models. BentoML is a valuable tool that simplifies the process of building and deploying inference APIs for custom models, optimizes serving performance, and enables seamless scaling. By integrating BentoML with the Milvus vector database, organizations can build more powerful, scalable RAG systems.
Building an End-to-End GenAI App with Ruby and Milvus
Date published
July 5, 2024
Author(s)
Ruben Winastwan
Language
English
Word count
2896
Hacker News points
None found.
The introduction of specialized GenAI frameworks like LangChain has enabled developers to build sophisticated AI applications quickly and easily by leveraging powerful large language models (LLMs) such as ChatGPT and LLaMA. These frameworks allow users to create Retrieval Augmented Generation (RAG) applications in just a few lines of code without requiring deep theoretical AI knowledge. However, these GenAI frameworks are typically written in Python, which may not be familiar to all full-stack engineers and software developers. There is a need for extensions of these GenAI frameworks in other programming languages so that these full-stack engineers can leverage powerful LLMs to build GenAI applications in their software projects. Andrei Bondarev introduced a Ruby extension of LangChain called LangChain.rb to make it easier for full-stack engineers to build GenAI applications in their software projects.
Metrics-Driven Development of RAGs
Date published
July 4, 2024
Author(s)
Denis Kuria
Language
English
Word count
2351
Hacker News points
None found.
Jithin James and Shahul Es shared insights on leveraging metrics-driven development to evaluate Retrieval Augmented Generation (RAG) systems at the Zilliz Unstructured Data Meetup. They discussed both the theoretical foundations and practical applications of RAG system evaluation, explaining how understanding the theory behind the evaluation code can provide deeper insights into its functionality. The talk covered key metrics such as factual similarity and semantic similarity to determine the quality and relevance of the generated answer compared to the ground truth. Practical examples were provided on how to evaluate and improve a RAG system powered by Milvus, an open-source vector database known for its efficiency in similarity search and AI applications.
Exploring Three Key Strategies for Building Efficient Retrieval Augmented Generation (RAG)
Date published
July 3, 2024
Author(s)
Christy Bergman
Language
English
Word count
1100
Hacker News points
None found.
Retrieval Augmented Generation (RAG) is a technique that uses an AI chatbot with personal data. Three key strategies to optimize RAG include smart text chunking, iterating on different embedding models, and experimenting with various LLMs or generative models. Smart text chunking involves breaking down text into manageable pieces for efficient retrieval by the Vector Database. Different techniques for this process include recursive character text splitting, small-to-big text splitting, and semantic text splitting. Iterating on embedding models determines how data is represented as vectors, which are crucial in AI applications. Lastly, experimenting with different LLMs allows users to choose the most suitable one for their workload.
Building RAG with Self-Deployed Milvus Vector Database and Snowpark Container Services
Date published
June 28, 2024
Author(s)
Ruben Winastwan
Language
English
Word count
3362
Hacker News points
None found.
In a recent talk at the Unstructured Data Meetup, Jiang Chen discussed how to build a Retrieval Augmented Generation (RAG) system using Milvus vector database and Snowflake ecosystem with Snowpark Container Service (SPCS). RAG is an advanced information retrieval method that enhances large language models' response quality by providing relevant context from internal knowledge bases. The integration of Milvus, a powerful open-source vector database, with Snowflake allows users to easily interact with the data stored in Snowflake and build sophisticated applications like RAG systems.
A Review of Hybrid Search in Milvus
Date published
June 27, 2024
Author(s)
Ken Zhang
Language
English
Word count
1877
Hacker News points
1
Milvus 2.4 introduces multi-vector columns within a single collection, enabling more advanced and flexible data searches by allowing simultaneous queries across multiple vector types and fields. This hybrid search feature supports multimodal search, hybrid sparse and dense search, and hybrid dense and full-text search. The results from each field are integrated and re-ranked using multiple reranking algorithms to deliver more accurate outcomes. Hybrid search is designed to handle complex and multimodal data representations, accommodating diverse facets of information and various types of vector embeddings. It also supports the fusion of multimodal vectors from different unstructured data types such as images, videos, audio, and text files. The latest Milvus releases support hybrid search to meet escalating demands for handling intricate datasets in AI-driven applications.
Building Intelligent RAG Applications with LangServe, LangGraph, and Milvus
Date published
June 25, 2024
Author(s)
Stephen Batifol
Language
English
Word count
840
Hacker News points
None found.
This blog post discusses how to build intelligent Retrieval Augmented Generation (RAG) applications using LangServe, LangGraph, and Milvus from the LangChain ecosystem. The author guides readers through setting up a FastAPI application, configuring LangServe and LangGraph, and utilizing Milvus for efficient data retrieval. The post also covers building an LLM agent with LangGraph and integrating Milvus for vector storage and retrieval. Key prerequisites include Python 3.9+, Docker, and basic knowledge of FastAPI and Docker.
Processing streaming data in Kafka with Timeplus Proton
Date published
June 24, 2024
Author(s)
Fariba Laiq
Language
English
Word count
1774
Hacker News points
None found.
In April 2024, Jove Zhong, co-founder of Timeplus, delivered a talk on "Processing Streaming Data in Kafka with Timeplus Proton" at the Seattle Unstructured Data Meetup. Timeplus is revolutionizing real-time data handling with its streaming SQL database and real-time analytics platform. The company's core engine, Timeplus Proton, serves as an alternative to platforms like ksqlDB and Apache Flink. It supports diverse data sources, including Apache Kafka, Confluent Cloud, and Redpanda, allowing for real-time insights and alerts. Jove also discussed the integration of streaming data with Large Language Models (LLMs) and vector databases, emphasizing their potential to enhance AI applications. The talk concluded with a look towards the future of streaming SQL and real-time processing, highlighting its growing importance in various industries.
Generative AI for Creative Applications Using Storia Lab
Date published
June 23, 2024
Author(s)
Denis Kuria
Language
English
Word count
1381
Hacker News points
None found.
Storia Lab is a suite of APIs designed to integrate advanced image editing functionalities into applications, making it easier for developers to perform nuanced edits while preserving the integrity of original images. The platform offers solutions such as text correction, background removal and replacement, removing unwanted elements, and sketch-to-image conversion. Storia Lab can also be integrated with Milvus, an open-source vector database designed to handle billion-scale vectors efficiently, for advanced multimodal applications like content creation, image search and recommendation, visual content curation, e-commerce, creative design tools, and visual content moderation.
Multilingual Narrative Tracking in the News
Date published
June 22, 2024
Author(s)
ShriVarsheni R
Language
English
Word count
1474
Hacker News points
None found.
Large Language Models (LLMs) have revolutionized various tasks such as content generation and customer service chatbots, but ensuring their unbiased knowledge of current news is crucial for reliable responses. Multilingual narrative tracking helps achieve this by analyzing how a narrative is reported in terms of volume and sentiment across different languages and countries. This process involves tracking narratives, which are sequences of interconnected events, such as the Barbie Movie Campaign narrative with its diverse coverage and sentiments across various regions and languages. LLMs should be exposed to diverse news across languages, countries, and other demographics to avoid biased reporting. Multilingual narrative tracking can increase transparency and ensure global perspectives and comprehensive event coverage by including voices from different limits and cultural backgrounds.
Decoding LLM Hallucinations: A Deep Dive into Language Model Errors
Date published
June 21, 2024
Author(s)
Abhiram Sharma
Language
English
Word count
1826
Hacker News points
None found.
Large language models (LLMs) can sometimes produce confident but incorrect information, a phenomenon known as hallucination. This issue is significant in industries like law and healthcare where the accuracy of information generated by LLMs is critical. There are two major categories of hallucinations: intrinsic hallucinations and extrinsic hallucinations. Intrinsic hallucinations tend to contrast the source information given to them, while extrinsic hallucinations occur when LLMs generate information that cannot be verified against the provided source data. Hallucinations can have far-reaching societal implications, undermining trust in reliable information sources and contributing to widespread confusion and mistrust among the public. Several methodologies are used to detect LLM hallucinations: self-evaluation, reference-based detection, uncertainty-based detection, and consistency-based detection. Implementing these approaches ensures the responsible deployment of LLMs and other generative AI technologies, maximizing their positive impact on society.
Introduction to LLM Customization
Date published
June 20, 2024
Author(s)
Ruben Winastwan
Language
English
Word count
1675
Hacker News points
None found.
Recent advancements in artificial intelligence have led to the development of large language models (LLMs), revolutionizing natural language processing. These powerful models, such as ChatGPT and Llama, demonstrate superior capabilities in understanding and generating human-like language but are limited by their training data cut-off date. To unlock their full potential, LLM customization is essential. Customization options include Retrieval Augmented Generation (RAG) and fine-tuning methods like supervised fine-tuning and Reinforcement Learning from Human Feedback (RLHF). RAG enhances response quality by injecting relevant contexts alongside the query, while fine-tuning involves training LLMs on specific data domains.
Clearing Up Misconceptions about Data Insertion Speed in Milvus
Date published
June 18, 2024
Author(s)
Christy Bergman
Language
English
Word count
932
Hacker News points
None found.
Misconceptions about data insertion speed in Milvus may arise due to users overlooking the detailed process steps involved. When using libraries like LangChain or LlamaIndex, these platforms convert unstructured data into vectors and then insert them into Milvus Lite. The abstraction of this complex process can create an illusion that the data insertion process takes a long time. However, the actual time-consuming step is generating embeddings from unstructured data, which is computationally intensive. In comparison, the average Milvus vector database insert time is only about a tenth of a second. Thus, around 97% of the "Milvus insert" time observed in LangChain or LlamaIndex is spent on embedding generation, while about 3% is spent on the actual database insertion step.
Full RAG: A Modern Architecture for Hyperpersonalization
Date published
June 17, 2024
Author(s)
Abdelrahman Elgendy
Language
English
Word count
1503
Hacker News points
None found.
Personalization is crucial in maintaining long-term customer satisfaction and retention for user-centric products like Netflix, Disney, or food delivery apps. AI recommendation engines leverage historical data to provide personalized experiences. Mike Del Balso, CEO of Tecton, discussed using the RAG architecture to improve AI recommendation engine personalization at a recent Unstructured Data Meetup hosted by Zilliz. He highlighted that AI-powered personalization could add $5 trillion in value to global GDP. RAG (Retrieval Augmented Generation) is an effective technique for enhancing the response quality and relevance of large language models (LLMs). It consists of a retriever, which combines an embedding model and a vector database like Milvus or Zilliz Cloud, and a generator, which is the LLM. The RAG pipeline involves transforming all documents into vector embeddings stored in a vector database, converting user queries into vector embeddings, retrieving top candidates from the vector database based on similarity to the query, and generating a coherent response using the query and Top-K candidates. However, traditional RAG systems lack personalized context for users' likes and dislikes. Full-RAG addresses this by adding context in the retrieval pipeline. This involves providing context on candidate locations (e.g., weather, activities) and user preferences (e.g., historical sites, accommodation). Tecton has developed a feature platform to integrate different business data sources for creating personalized contexts at various levels: Base, Batch Context, Batch + Streaming Data Context, and Batch + Streaming data + Real-time Context. RAG is essential in enhancing AI recommendation engines' effectiveness and long-term customer retention. Tecton simplifies building streaming context by providing a Python SDK for coding context definitions and real-time evaluation of data. However, challenges remain, such as managing trade-offs between speed and costs, integrating third-party real-time data sources, and ensuring proper model governance, debugging, and version control.
🚀 What’s New with Metadata Filtering in Milvus v2.4.3
Date published
June 16, 2024
Author(s)
Christy Bergman
Language
English
Word count
745
Hacker News points
None found.
Milvus v2.4.3 introduces full-string metadata matching, allowing users to match strings using prefix, infix, postfix, or character wildcard searches. This update makes metadata filtering more versatile and powerful. The blog demonstrates how to use this feature with an example using IMDB movie data. It covers connecting to Milvus Lite, transforming movie text into vectors, inserting vectors and metadata into Milvus, and handling user queries by searching for similar data vectors. Additionally, the blog provides resources and further reading on using array fields in Milvus and filtering searches.
Copilot Workspace: What It Is, How It Works, Why It Matters
Date published
June 15, 2024
Author(s)
Denis Kuria
Language
English
Word count
1375
Hacker News points
None found.
GitHub Next introduced Copilot Workspace, a task-oriented development environment that integrates generative AI models into coding environments. The workspace allows developers to brainstorm, plan, build, test, and run code using natural language conversations and prompts. It follows a task-to-code workflow, starting with task creation and flowing into specification, planning, and coding. Steering points are implemented between the given task and suggested code, allowing developers to guide the model when it misunderstands requests or misses edge cases. The workspace is useful for tasks such as fixing bugs and implementing features.
Voyage AI Embeddings and Rerankers for Search and RAG
Date published
June 14, 2024
Author(s)
Haziqa Sajid
Language
English
Word count
2199
Hacker News points
None found.
The article discusses Retrieval Augmented Generation (RAG), a technique that optimizes large language models by providing context from the query. It explains how embedding models convert unstructured data into vector embeddings, enabling computers to understand semantics. RAG is particularly useful in reducing hallucinations in generative AI models like ChatGPT. The article also introduces Voyage AI's domain-specific and general-purpose embedding models and rerankers that contribute significantly to search and RAG. Furthermore, it demonstrates how to integrate Zilliz Cloud Pipelines with Voyage AI for streamlined embedding generation and retrieval, using Cohere as the LLM to build a RAG application.
Local Agentic RAG with LangGraph and Llama 3
Date published
June 14, 2024
Author(s)
Stephen Batifol
Language
English
Word count
1304
Hacker News points
None found.
LLMs can be empowered with important new capabilities through agents that use planning, memory, and tools to accomplish tasks. This post demonstrates how to build agents capable of tool-calling using LangGraph with Llama 3 and Milvus. Agents can perform actions such as web searching, browsing emails, correcting RAGs, and more. The process involves setting up LangGraph, Ollama & Llama 3, and Milvus Lite. Using these tools, a custom local Llama 3 powered RAG agent is built with different approaches like routing, fallback, and self-correction. Examples of agents include the Hallucination Grader and the Answer Grader. The post concludes by compiling the LangGraph graph and testing it.
How to Detect and Correct Logical Fallacies from GenAI Models
Date published
June 13, 2024
Author(s)
Abdelrahman Elgendy
Language
English
Word count
1482
Hacker News points
None found.
Large language models (LLMs) have revolutionized AI, particularly in conversational AI and text generation. However, a critical issue that needs to be addressed is the occurrence of logical fallacies in LLM output. Logical fallacies can lead to flawed reasoning and misinformation. There are multiple reasons why these fallacies occur, including imperfect training data, small context window, and the probabilistic nature of LLMs. To tackle this problem, strategies such as human feedback, reinforcement learning, prompt engineering, and more have been proposed. One interesting approach is RLAIF (Reinforcement Learning from AI Feedback), which uses AI to fix itself by detecting and correcting logical fallacies. The FallacyChain module in LangChain has been developed to implement this approach, making LLM outputs more reliable and trustworthy.
Using Vector Search to Better Understand Computer Vision Data
Date published
June 11, 2024
Author(s)
Daniella Pontes
Language
English
Word count
1308
Hacker News points
None found.
Bad data can significantly impact AI-powered applications and workflows, leading to inaccurate results and frustrated users. To address this issue, Voxel51 has developed a solution that brings transparency and clarity to visual AI workflows, making it faster and more efficient to build high-quality datasets and models. By integrating vector databases with tools like Voxel51's FiftyOne open source project, users can test and assess models by feeding them the exact datasets they need for robust, accurate results. This approach accelerates the path to success in AI development, as better data leads to better models. Vector search capabilities are essential in computer vision, offering a powerful engine for data exploration, model evaluation, and innovative multimodal search using embeddings, concept interpolation, and traversal. As AI continues to evolve, integrating vector databases will play a crucial role in shaping the future of unstructured data-driven technologies.
How Delivery Hero Implemented the Safety System for AI-Generated Images
Date published
June 10, 2024
Author(s)
Ruben Winastwan
Language
English
Word count
2211
Hacker News points
None found.
Delivery Hero, a multinational online food delivery company, has implemented an AI safety system to generate high-quality images of products. The system consists of two stages: food image generation and building a safety system. For the first stage, they use DALL-E from OpenAI and implement an image inpainting method with Grounding DINO and DALL-E. In the second stage, four components are used to generate a final score for each image: image tagging, image centering, text detection, and image sharpness. The scores obtained from these components are combined with a weighted function to give each image one final score value. By applying a threshold, an image with a final score below the threshold will be filtered out and not recommended to vendors.
Training Text Embeddings with Jina AI
Date published
June 9, 2024
Author(s)
Denis Kuria
Language
English
Word count
1913
Hacker News points
None found.
Bo Wang from Jina AI discussed the development of state-of-the-art text embeddings, which power modern vector search and Retrieval-Augmented Generation (RAG) systems. The release of Jina-Embeddings-V2 garnered significant attention in the AI community, with over 3 million downloads on Hugging Face. It has been integrated into various AI frameworks like LangChain and LlamaIndex, as well as vector databases such as Milvus and Zilliz Cloud. Jina embeddings closely compete with OpenAI embeddings. Jina AI initially began by fine-tuning existing models like BERT but soon realized that the industry was not ready for fine-tuning techniques. This led them to develop their own embedding model from scratch, resulting in Jina-Embeddings-V1 and later V2. The latest version, V2, can handle sequences up to 8,192 tokens during inference while training on shorter sequences. Jina-Embeddings-V2 removes position embeddings and introduces Attention with Linear Biases (ALiBi) for dynamic context modeling. It also adapts ALiBi for bidirectional transformers and retrains BERT from scratch, resulting in JinaBERT as the backbone for V2. The model has been successful in handling multilingual data and consistently outperforms competitors like Multilingual E5 and Cohere Embed V3. When developing RAG applications using Jina-Embeddings-V2, it's essential to consider document length and the positioning of relevant information within these documents. The team at Jina AI is already working on Jina-Embeddings-V3, which promises improvements in speed, efficiency, multilingual support, real-world problem solving, task-specific enhancements, and chunk and schema awareness.
Introduction to MemGPT and Its Integration with Milvus
Date published
June 8, 2024
Author(s)
Haziqa Sajid
Language
English
Word count
1655
Hacker News points
None found.
During a meetup in May, Charles Packer discussed how MemoryGPT (MemGPT) aims to solve the problem of limited memory in large language models (LLMs). MemGPT introduces a virtually extended context window inspired by computer system design. It divides the LLM context into two parts: main context and external context. The main context has a limited bandwidth, while the external context is stored on persistent storage with an infinite window. MemGPT efficiently manages information flow between the two contexts, allowing for long-context memory applications like personal assistant chatbots.
Text as Data, From Anywhere to Anywhere
Date published
June 7, 2024
Author(s)
Denis Kuria
Language
English
Word count
2102
Hacker News points
None found.
In March 2024, AJ Steers discussed utilizing Airbyte and PyAirbyte to integrate structured and unstructured data from various sources across different platforms at the SF Unstructured Data Meetup. AJ Steers is an experienced architect, data engineer, software developer, and data ops expert who has designed end-to-end solutions at Amazon and created a vision for quantified self-data models. He currently works as a staff software engineer at Airbyte. Airbyte's focus so far has been on offering reliability, flexible deployment options, and a robust library of connectors to ensure seamless data integration for traditional tabular data. However, the platform has expanded its capabilities in recent months to cover unstructured data sources as well. This expansion includes support for vector database destinations like Milvus, ensuring effective utilization of data across various applications. PyAirbyte is a Python library that provides an interface to interact with Airbyte and allows users to control and manage their Airbyte instances using Python. It offers several advantages, such as the ability to run anywhere, reduce time to value, fast prototyping, and flexibility. Users can choose between the hosted version of Airbyte (no-code approach) or PyAirbyte (minimal code approach) for integrating data sources with data destinations. In conclusion, whether you prefer a no-code or minimal-code approach, Airbyte and PyAirbyte offer robust solutions for integrating both structured and unstructured data from various sources across different platforms.
How to Connect to Milvus Lite Using LangChain and LlamaIndex
Date published
June 7, 2024
Author(s)
Christy Bergman
Language
English
Word count
787
Hacker News points
None found.
Milvus Lite is a new open-source vector database that has become the default method for third-party connectors like LangChain and LlamaIndex to connect to Milvus. Comparing timings using the same HuggingFace embedding model, it was found that using Milvus Lite APIs directly provides the best balance between high control over Milvus settings and fast setup. The full code and timings are available on GitHub. This article covers connecting to Milvus Lite using LlamaIndex, LangChain, and Milvus Lite APIs, as well as choosing the right Milvus Light method based on control and speed trade-offs.
Are CPUs Enough? A Review Of Vector Search Running On Novel Hardware
Date published
June 6, 2024
Author(s)
Antony G.
Language
English
Word count
1159
Hacker News points
None found.
The article discusses the role of CPUs in vector search operations and whether they are sufficient to meet modern demands. It highlights a recent talk by George Williams, who explored how new hardware solutions could revolutionize vector search technology. The NeurIPS BigANN competitions aim to push the boundaries of vector search technology, with Zilliz being one of the winners. Zilliz's approach involves optimizing memory layout and access patterns for SSDs, as well as focusing on memory efficiency and maximizing data retrieval speed. The future of vector search may shift from relying on approximation techniques to leveraging brute force methods while still maintaining short timeframes.
Expanding Our Reach: Zilliz Cloud Now Available in 11 Regions across 3 Major Cloud Providers
Date published
June 5, 2024
Author(s)
Steffi Li
Language
English
Word count
395
Hacker News points
None found.
Zilliz Cloud, a fully managed version of the open-source vector database Milvus, is now available in 11 regions across three major cloud providers: AWS, Azure, and Google Cloud Platform (GCP). This expansion allows users to deploy Zilliz Cloud closer to their user base, reducing latency and improving performance. The availability includes five regions on AWS, four on GCP, and two on Azure, covering key areas in the Americas, EMEA, and APAC. By offering broader coverage than other vector database vendors, Zilliz Cloud provides more deployment options and ensures better service availability. This expansion reflects Zilliz's commitment to providing the best possible infrastructure for development needs, helping users reduce complexity and lower total cost of ownership (TCO).
Advanced Retrieval Augmented Generation (RAG) Apps with LlamaIndex
Date published
June 4, 2024
Author(s)
By Abhiram Sharma
Language
English
Word count
1298
Hacker News points
None found.
Laurie Voss, VP of Developer Relations at LlamaIndex, discussed building advanced Retrieval Augmented Generation (RAG) apps with LlamaIndex in a recent Unstructured Data Meetup. RAG is designed to overcome the limitations of Language Models (LLMs) by assisting them with retrieval capabilities. The main drawback of LLMs is their limited context windows, which can only handle part of an organization's data simultaneously. LlamaIndex is an open-source framework that connects your data to LLMs and simplifies the creation of RAG applications, allowing developers to build functional RAG systems with minimal code. It provides advanced data ingestion and querying features for RAG applications, such as Data Connectors, PDF Parsing, Embedding Models, Vector Stores, Sub-Question Query Engine, Small to Big Retrieval, Metadata Filtering, Hybrid Search, and Agents.
Elevating User Experience with Image-based Fashion Recommendations
Date published
June 4, 2024
Author(s)
By Mostafa Ibrahim
Language
English
Word count
1131
Hacker News points
None found.
Joan Kusuma's innovative approach to enhancing the fashion retail experience involves using image-based recommendations. By utilizing convolutional neural networks (CNNs) and visual embeddings, she has created a personalized outfit recommendation system that can transform the fashion industry. The process includes image preprocessing, feature extraction, vector search with vector databases, indexing, and image recommendation model in action. Joan's work demonstrates the potential of AI in fashion retail by delivering personalized outfit suggestions using visual embeddings and vector databases.
The Path to Production: LLM Application Evaluations and Observability
Date published
June 2, 2024
Author(s)
By Fendy Feng
Language
English
Word count
1538
Hacker News points
None found.
The text discusses the challenges faced by machine learning teams in deploying large language models (LLMs) into production, such as addressing hallucinations and ensuring responsible deployment. It highlights strategies for conducting quick and accurate LLM evaluations shared by Hakan Tekgul, an ML Solutions Architect at Arize AI, during a recent Unstructured Data Meetup. The article emphasizes the importance of leveraging evaluation tools for seamless LLM observability and explores five primary facets of LLM observability: LLM Evaluations, Spans and Traces, Prompt Engineering, Search and Retrieval, and Fine-tuning. It delves into the LLM Evaluation and LLM Spans and Traces categories in more detail to highlight their significance in optimizing LLM observability. The article concludes by reflecting on Hakan Tekgul's talk, emphasizing that deploying LLMs into production is challenging but can be achieved with a robust observability framework.
Vector Search and RAG - Balancing Accuracy and Context
Date published
June 1, 2024
Author(s)
Abdelrahman Elgendy
Language
English
Word count
2205
Hacker News points
None found.
Large language models (LLMs) have made significant strides in machine learning and natural language processing, but they face a unique issue called AI hallucinations, where incorrect or false information is generated. This can happen due to lack of context, training data issues, overgeneralization, or design limitations. Retrieval Augmented Generation (RAG) is an advanced approach that aims to enhance the accuracy and reliability of AI models by providing relevant, current information related to a user's question. RAG helps ensure that models can access the newest data, like recent news or research, to give better answers and reduce mistakes. Building a Retrieval Augmented Generation (RAG) system involves several complex steps and decisions, including choosing an embedding model, selecting an index structure, chunking, determining keywords or semantic search, and integrating rerankers. RAG's ability to handle trillions of tokens makes it ideal for handling massive, ever-changing datasets. Combining RAG's precision with the adaptability of long-context models could lead to a powerful synergy. Evaluating large language models (LLMs) can be challenging, but one solution is to have LLMs evaluate each other by generating test cases and measuring the model's performance.
Improving Behavior Science Experiments with LLMs and Milvus
Date published
May 31, 2024
Author(s)
By Daniella Pontes
Language
English
Word count
1509
Hacker News points
None found.
Dr. Damon Abraham, a behavioral scientist with a PhD in Psychology, has researched how reappraising an image can shift our valence and arousal emotions. His project aims to create a stimuli repository for experimental psychology using LLM and vector database technologies. The research involves collaboration with the University of Denver and other institutions, and it seeks to develop an open-source normative database of images and techniques to measure the dynamic psychological distance between images and their potential for successful reappraisal. In his presentation at the Zilliz Unstructured Data Meetup in Seattle on February 13, 2024, Dr. Abraham discussed how different contextual interpretations of images can change our feelings, providing insights into emotional regulation. The study also explores the concept of 'reappraisal affordances,' which examines how an image's inherent semantic properties and range of associations may affect its capacity for reinterpretation.
Tim Spann: Why I Joined Zilliz
Date published
May 29, 2024
Author(s)
By Tim Spann
Language
English
Word count
608
Hacker News points
None found.
Tim Spann, an advocate of Open Source projects, has been working on the intersection of streaming and AI at Zilliz. He emphasizes the importance of a database for AI that can store and query any type of data in any mode needed. With Milvus, Towhee, Attu, and integrations with Kafka and LlamaX frameworks, Spann aims to build up a global group of unstructured data engineers and data superstars. He believes the future will see a rise in unstructured data engineering and processing like Spark, Flink, and Kafka for structured and semistructured data. The need for powerful, fast ways to do unstructured data processing and Vector ETL is evident and growing.
How to Build a LangChain RAG Agent with Reporting
Date published
May 24, 2024
Author(s)
By Yujian Tang
Language
English
Word count
1531
Hacker News points
None found.
This tutorial demonstrates how to build an AI Agent using LangChain, Milvus, and OpenAI. The agent performs Retrieval Augmented Generation (RAG) tasks by retrieving information from a vector database like Milvus. Additionally, the monitoring tool Portkey is used to track token usage, token count, and request latency. The tech stack includes LangChain for orchestration, Milvus as a vector database, Portkey for monitoring, and OpenAI for the Language Learning Model (LLM).
Choosing the Right Embedding Model for Your Data
Date published
May 22, 2024
Author(s)
By Christy Bergman
Language
English
Word count
1051
Hacker News points
None found.
Retrieval Augmented Generation (RAG) is an approach in Generative AI that utilizes data to enhance the knowledge of Language Learning Model (LLM) generators, such as ChatGPT. RAG consists of two LLMs: embedding and generator models, both used in inference mode. The HuggingFace MTEB leaderboard provides a comprehensive list of text embedding models, where users can filter by language or specialty domain like law. Users should be cautious when selecting models as some may be overfitted, resulting in deceptively high rankings. ResNet50 is a popular Convolutional Neural Network (CNN) model for image data and PANNs are commonly used embedding models for audio data. Multimodal embedding models like SigLIP or Unum can handle text, image, audio, or video data simultaneously. For multimodal applications involving sound or video, a generative LLM is often employed to convert the input into text before using RAG techniques.
Approximate Nearest Neighbor Search in Recommender Systems
Date published
May 20, 2024
Author(s)
By Tyler Falcon
Language
English
Word count
1971
Hacker News points
None found.
In February 2024, Yury Malkov discussed Approximate Nearest Neighbor (ANN) and its key role in recommender systems at the SF Unstructured Data Meetup. ANN search is already integrated into the production stacks of popular tools worldwide. The talk covered the key concepts and background that have driven ANN's adoption in large-scale recommender systems. Yuri Malkov, a genius physicist, laser researcher, and inventor of HNSW (a graph-based indexing algorithm), now works as a Research Scientist for OpenAI. ANN search algorithms use various indexing techniques to return approximate nearest neighbors, making them core to many applications and technologies that are customer facing today. From search engines like Google to social media sites, ANN and recommender systems are already integrated through the stack in production. Yuri notes that many mature ANN solutions exist, including LSH, graph-based indexes like HNSW and SCANN, quantization-based indices like IVF_PQ and IVF_HSNW, DiskANN, and ANNOY. ANN benchmarking is done through platforms like ANNBenchmarks, which evaluate various approximate nearest neighbor search algorithms and provide results split by distance measure and dataset on their website. The performance metrics include recall rate and queries per second (QPS). More QPS indicates better performance. The Milvus team built Knowhere, an open-source vector execution engine that incorporates several vector similarity search libraries like Faiss, Hnswlib, and Annoy. It controls on which hardware (CPU or GPU) to execute index building and search requests. Cardinal is Zilliz's core vector search engine, offering a threefold increase in performance compared to the previous version. Recommender systems have large market potential due to their ability to generate consumer behavior. Typical challenges at scale include generability, handling huge corpuses, and efficiency. Traditional recommender systems use a multi-staged funnel approach with candidate generation, lightweight ranking, and full ranking stages. Novel solutions for item-query incompatibility include L2 distance on data vectors, bipartite graph ranking, text-focused graph re-ranking, and cascaded graph search. ANN algorithms have seen extensive implementation due to their good enough matching, flexibility, and maturity. Further resources are available through Yury Malkov's talk on YouTube.
Why Milvus Makes Building RAG Easier, Faster, and More Cost-Efficient
Date published
May 17, 2024
Author(s)
By Ken Zhang
Language
English
Word count
1185
Hacker News points
None found.
Milvus, an open-source vector database, enhances the development of Retrieval Augmented Generation (RAG) applications by streamlining processes and improving efficiency. Its integration with popular embedding models simplifies text transformation into searchable vectors, while its hybrid search capability supports multimodal data retrieval. Additionally, Milvus offers a cost-effective solution for managing large knowledge bases through minimizing memory consumption, implementing tiered data storage, and leveraging intelligent caching and data-sharding techniques. Overall, Milvus helps developers build faster, more accurate, and cost-efficient RAG applications.
Multimodal RAG locally with CLIP and Llama3
Date published
May 17, 2024
Author(s)
By Stephen Batifol
Language
English
Word count
744
Hacker News points
None found.
This tutorial demonstrates how to build a Multimodal Retrieval Augmented Generation (RAG) System, which allows the use of different types of data such as images, audio, videos, and text. The system utilizes OpenAI CLIP for understanding the connection between pictures and text, Milvus Standalone for efficient management of large-scale embeddings, Ollama for Llama3 usage on a laptop, and LlamaIndex as the Query Engine in combination with Milvus as the Vector Store. The tutorial provides code examples available on Github and explains how to run queries that can involve both text and images.
Zilliz Achieves AWS Generative AI Competency Partner Designation, Driving Innovation in AI Solutions
Date published
May 16, 2024
Author(s)
By Sachi Tolani
Language
English
Word count
334
Hacker News points
None found.
Zilliz has achieved AWS Generative AI Competency Partner designation, demonstrating its commitment to advancing generative AI technologies. As an AWS Differentiated Partner, Zilliz provides critical infrastructure and best practices for implementing transformative generative AI applications such as image retrieval, video analysis, NLP, recommendation engines, customized search, intelligent customer service, fraud detection, and more. The company's expertise combined with the scalability, performance, and security of AWS Cloud enables customers to unlock new possibilities and gain a competitive edge in their industries. Zilliz remains dedicated to fostering collaboration, knowledge sharing, and responsible AI practices while working closely with AWS and its customers to shape the future of generative AI.
Running Llama 3, Mixtral, and GPT-4o
Date published
May 15, 2024
Author(s)
By Christy Bergman
Language
English
Word count
1801
Hacker News points
None found.
This blog post discusses various ways to run the G-Generation part of Retrieval Augmented Generation (RAG) using different models and inference endpoints. The author provides step-by-step instructions on how to use Llama 3 from Meta, Mixtral from Mistral, and the newly announced GPT-4o from OpenAI. They also cover running these models locally or through Anyscale, OctoAI, and Groq endpoints. Additionally, the author explains how to evaluate answers using Ragas and provides a summary table of results for each model endpoint. The conclusion emphasizes the importance of considering answer quality, latencies, and costs when choosing an appropriate model and inference endpoint for the G-Generation part of RAG.
Harnessing Generative Feedback Loops in AI Systems with Milvus
Date published
May 10, 2024
Author(s)
By Uppu Rajesh Kumar
Language
English
Word count
2958
Hacker News points
None found.
Milvus, an open-source vector database designed to store, index, and search massive amounts of vector data in real-time, can be integrated with LLMs in a generative feedback loop. This allows for continuous learning and improvement of the AI system. Feedback loops are crucial in ensuring the ongoing refinement of model outputs in AI systems, offering benefits such as adaptability to new data, reduced bias and errors, personalized model outputs, and enhanced creativity and innovation. Milvus's features make it suitable for enhancing the data-handling capabilities of LLMs, particularly in scenarios where feedback loops are used to refine predictive and generative accuracies.
Exploring DSPy and Its Integration with Milvus for Crafting Highly Efficient RAG Pipelines
Date published
May 9, 2024
Author(s)
By David Wang
Language
English
Word count
2043
Hacker News points
None found.
DSPy is a programmatic framework designed to optimize prompts and weights in language models (LMs), particularly in use cases where you integrate LMs across multiple pipeline stages. It provides various composable and declarative modules for instructing LMs in Pythonic syntax. Unlike traditional prompting engineering techniques that rely on manually crafting and tweaking prompts, DSPy learns query-answer examples and imitates this learning to generate optimized prompts for more tailored results. This allows for the dynamic reassembly of the entire pipeline, explicitly tailored to the nuances of your task, thus eliminating the need for ongoing manual prompt adjustments. DSPy has been integrated into the DSPy workflow as a retrieval module in the form of the MilvusRM Client, making it easier to implement a fast and efficient RAG pipeline. In this demonstration, we'll build a simple RAG application using GPT-3.5 (gpt-3.5-turbo) for answer generation. We use Milvus as the vector store through MilvusRM and DSPy to configure and optimize the RAG pipeline.
Milvus Reference Architectures
Date published
May 9, 2024
Author(s)
By Christy Bergman
Language
English
Word count
1017
Hacker News points
None found.
This blog discusses resource allocation for Milvus, an open-source vector database. It provides reference architectures based on specific numbers of users or requests per second (RPS) and different mixes of READ and WRITE operations. The article emphasizes the importance of understanding workload characteristics to determine Milvus' computational power and memory requirements. It also outlines a method for estimating resource needs, load testing, benchmarking, and concludes with recommendations for resource allocation based on data size and QPS requirements.
Revolutionizing Search with Zilliz and Azure OpenAI
Date published
May 2, 2024
Author(s)
By Daniella Pontes
Language
English
Word count
1910
Hacker News points
None found.
Zilliz and Azure OpenAI have integrated to redefine similarity and semantic search, offering remarkable speed, intelligence, and safeguards. The collaboration combines Azure OpenAI's advanced generative AI capabilities with Zilliz's scalable search solutions, enhancing AI search functionalities and data retrieval. This partnership enables seamless integration of AI models and scalable search solutions for developers. Zilliz is a specialized data management system optimized for managing high-dimensional vector data on a large scale, while Azure OpenAI provides additional features like private networking, regional availability, and responsible AI content filtering. The integration of these technologies offers robust data storage, sophisticated indexing options, and comprehensive similarity metrics and retrieval mechanisms, enabling developers to create scalable and efficient AI-driven search solutions.
Hybrid Search with Milvus
Date published
April 30, 2024
Author(s)
By Stephen Batifol
Language
English
Word count
1100
Hacker News points
None found.
Milvus 2.4 introduces multi-vector search and hybrid search capabilities, allowing simultaneous queries across multiple vector fields and integrating the results with re-ranking strategies. Hybrid search is a process of conducting searches across various vector fields within the same dataset. This tutorial demonstrates how to leverage Milvus's hybrid search capabilities using the eSci dataset and BGE-M3 model. The steps include preparing the dataset, generating embeddings with BGE-M3, setting up a Milvus collection, inserting data into the collection, and executing hybrid searches in Milvus.
Vector Databases Are the Base of RAG Retrieval
Date published
April 28, 2024
Author(s)
By Ken Zhang
Language
English
Word count
1523
Hacker News points
None found.
Implementing Retrieval Augmented Generation (RAG) technology in chatbots can significantly enhance customer support by combining large language models with knowledge stored in vector databases from various fields. RAG systems consist of two core components: the Retriever and the Generator, which work synergistically to handle complex queries effectively. Compared to traditional LLMs, RAG offers several advantages such as reduced hallucination issues, enhanced data privacy and security, and real-time information retrieval. While advancements in LLMs also address these challenges, RAG remains a robust, reliable, and cost-effective solution due to its transparency, operability, and private data management capabilities. RAG technology is often integrated with vector databases, leading to the development of popular solutions like the CVP stack. Vector databases are favored in RAG implementations for their efficient similarity retrieval capabilities, superior handling of diverse data types, and cost-effectiveness. Ongoing engineering optimizations aim to enhance the retrieval quality of vector databases by improving precision, response speed, multimodal data handling, and interpretability. As demand for RAG applications grows across various industries, RAG technology will continue to evolve and revolutionize information retrieval and knowledge acquisition processes.
The Landscape of Open Source Licensing in AI: A Primer on LLMs and Vector Databases
Date published
April 28, 2024
Author(s)
By Emily Kurze
Language
English
Word count
1467
Hacker News points
None found.
This guide provides an overview of open-source licensing in the context of AI technology, specifically vector databases and large language models (LLMs). Open source allows creators to make software or hardware available for free, often developed and maintained by community efforts. Understanding different license types is crucial as changes can significantly impact companies and businesses that rely on open-source software. The benefits of open-source vector databases and LLMs include rapid prototyping, increased trust and transparency, and reduced costs for developers. Various types of licenses exist, including permissive licenses (e.g., MIT License), copyleft licenses (e.g., GNU General Public License), weak copyleft licenses (e.g., GNU Affero General Public License), non-commercial licenses (e.g., Creative Commons Non-Commercial License), and public domain releases. Key organizations like the Open Source Initiative, Free Software Foundation, and Apache Software Foundation govern open-source licensing standards. The degrees of openness in different licensing models influence collaboration, innovation, and transparency in AI development. Licensing plays a vital role in shaping AI technologies' trajectory by governing accessibility, adaptability, and equitable distribution.
Ensuring Data Privacy in AI Search with Langchain and Zilliz Cloud
Date published
April 27, 2024
Author(s)
By Antony G.
Language
English
Word count
1330
Hacker News points
None found.
LangChain and Zilliz Cloud offer an effective combination to create AI-powered search systems. These systems use natural language processing (NLP) and machine learning algorithms to enhance the accuracy and relevance of information retrieval across business-specific data. With the rise of generative models, AI-powered search applications have become more prominent compared to traditional search methods. However, ensuring user privacy in these applications is critical due to ethical and legal implications. The integration of LangChain with Zilliz Cloud allows for the creation of custom search engines that prioritize data privacy while offering tailored solutions based on specific needs and data. Both tools provide robust frameworks for ensuring privacy and safety when utilizing large language models (LLMs), effectively preventing private data misuse and generating harmful or unethical content.
Practical Tips and Tricks for Developers Building RAG Applications
Date published
April 27, 2024
Author(s)
By James Luan
Language
English
Word count
2804
Hacker News points
None found.
Vector search is a technique used in data retrieval for RAG applications and information retrieval systems to find items or data points that are similar or closely related to a given query vector. While many vector database providers market their capabilities as easy, user-friendly, and simple, building a scalable real-world application requires considering various factors beyond the coding, including search quality, scalability, availability, multi-tenancy, cost, security, and more. To effectively deploy your vector database in your RAG application production environment with Milvus, follow these best practices: design an effective schema, plan for scalability, and select the optimal index and fine-tune performance.
Demystifying the Milvus Sizing Tool
Date published
April 26, 2024
Author(s)
By Christy Bergman
Language
English
Word count
658
Hacker News points
None found.
Milvus is an open source vector database that enables efficient search over large amounts of data. When deploying Milvus, it's crucial to select the optimal configuration to ensure efficient performance and resource utilization. Key points to consider include index selection, balancing memory usage, disk space, cost, speed, and accuracy; segment size and deployment configuration; and additional customization options available in the Enterprise version of Zilliz Cloud. The Milvus sizing tool provides a starting point for these configurations, but users should also consider their specific needs and requirements when choosing an index algorithm or segment size.
An Overview of Milvus Storage System and Techniques to Evaluate and Optimize Its Performance
Date published
April 24, 2024
Author(s)
By Fendy Feng, and Jay Zhu
Language
English
Word count
1593
Hacker News points
None found.
This guide explores Milvus, an open-source vector database known for its horizontal scalability and fast performance. At the core of Milvus lies its robust storage system, which comprises meta storage, log broker, and object storage. The architecture is organized into four key layers: access layer, coordinator service, worker nodes, and storage. Milvus uses three main storage components to ensure data integrity and availability: meta storage (etcd), object storage (MinIO), and a log broker (Pulsar or Kafka). To evaluate and optimize the performance of Milvus storage, it is crucial to monitor disk write latency, I/O throughput, and disk drive performance. The guide provides recommendations for selecting appropriate block storage options from various cloud providers and offers strategies to enhance MinIO's throughput performance by using SSD or NVMe-type drives.
RAG Without OpenAI: BentoML, OctoAI and Milvus
Date published
April 23, 2024
Author(s)
By Yujian Tang
Language
English
Word count
2820
Hacker News points
None found.
This tutorial demonstrates how to build retrieval augmented generation (RAG) applications using large language models (LLMs) without relying on OpenAI. The process involves serving embeddings with BentoML, inserting data into a vector database for RAG, setting up an LLM for RAG, and providing instructions to the LLM. Key components include BentoML for serving embeddings, OctoAI for accessing open-source models, and Milvus as the vector database. The example uses BentoML's Sentence Transformers Embeddings repository, a local Milvus instance using Docker Compose, and the Nous Hermes fine-tuned Mixtral model from OctoAI for RAG.
Spring AI and Milvus: Using Milvus as a Spring AI Vector Store
Date published
April 22, 2024
Author(s)
Abhiram Sharma
Language
English
Word count
1377
Hacker News points
None found.
Milvus is an open-source vector database designed to efficiently manage and retrieve high-dimensional vector data, making it ideal for use in artificial intelligence and machine learning applications. By integrating Milvus with Spring AI, developers can leverage advanced search capabilities and optimize their applications' performance and scalability. This integration allows users to perform complex queries and similarity searches quickly and accurately, enhancing user experiences and enabling more intelligent application behavior. Key features of Milvus include support for various indexing strategies, compatibility with different metric types, and the ability to handle large volumes of vector data through partitioning and sharding. Use cases for Milvus in Spring AI applications span across recommendation systems, content search engines, image and video recognition, and AI-driven chatbots and customer support.
Kickstart Your Local RAG Setup: A Beginner's Guide to Using Llama 3 with Ollama, Milvus, and Langchain
Date published
April 19, 2024
Author(s)
By Stephen Batifol
Language
English
Word count
844
Hacker News points
8
This guide provides a beginner's approach to setting up a Retrieval Augmented Generation (RAG) system using Ollama, Llama 3, Milvus, and Langchain. The RAG technique enhances large language models (LLMs) by integrating additional data sources. In this tutorial, we will build a question-answering chatbot that can answer questions about specific information. Key components of the setup include indexing data using Milvus, retrieval and generation with Llama 3, and interaction with data using Langchain. The guide assumes familiarity with Docker and Docker Compose, as well as installation of Milvus Standalone, Ollama, and other necessary tools.
Milvus Server Docker Installation and Packaging Dependencies
Date published
April 17, 2024
Author(s)
By Christy Bergman
Language
English
Word count
682
Hacker News points
None found.
Milvus is an open-source vector database with significant traction in Generative AI and RAG use cases. It offers flexible deployment options, including local and cloud (Zilliz) services. The main dependencies for Milvus standalone server include FAISS, etcd, Pulsar/Kafka, Tantivy, RocksDB, Minio/S3/GCS/Azure Blob Storage, Kubernetes, StorageClass, Persistent Volumes, Prometheus, and Grafana. The Docker image size for Milvus standalone container is around 300MB. It has a frequent release cycle with approximately one major release per month. Six SDKs are available in Python, Node, Go, C#, Java, and Ruby. Understanding these details can help organizations better plan and prepare for integrating Milvus into their technology stack.
Emerging Trends in Vector Database Research and Development
Date published
April 16, 2024
Author(s)
By Li Liu
Language
English
Word count
2159
Hacker News points
2
The future of vector databases is closely tied to the evolution of product requirements and user demands. Key areas of development include cost-efficiency, hardware advancements, collaboration with advanced machine learning models, prioritizing retrieval accuracy, optimizing for offline use cases, expanding feature sets for diverse industries, and more. As AI continues to mature, these advancements will enable vector databases to support a broader range of applications across various sectors, enhancing their overall functionality and versatility in production environments.
Streamlining Data Processing with Zilliz Cloud Pipelines: A Deep Dive into Document Chunking
Date published
April 16, 2024
Author(s)
Ehsanullah Baig
Language
English
Word count
3056
Hacker News points
None found.
Streamlining data processing using Zilliz Cloud Pipelines involves examining document chunking, a component of transforming unstructured data into a searchable vector collection. The platform enables use cases with semantic search in text documents and provides a critical building block for Retrieval-Augmented Generation (RAG) applications. Zilliz Cloud Pipelines include various functions like SEARCH_DOC_CHUNK, which convert the query text into vector embedding. It will then retrieve the top-K relevant document chunks, making it easier to find the related information based on the query’s meaning. The engineers at Zilliz designed Zilliz Cloud Pipelines to transform unstructured data from various sources into a searchable vector collection for busy Gen AI developers. This pipeline will take unstructured data, split it, convert it to embeddings, index it, and store it in Zilliz Cloud with the designated metadata.
The Evolution and Future of AI and Its Influence on Vector Databases: Insights from Charles, CEO of Zilliz
Date published
April 15, 2024
Author(s)
By Charles Xie
Language
English
Word count
1604
Hacker News points
None found.
Charles Xie, CEO of Zilliz, discusses the evolution and future of AI and its influence on vector databases. He highlights how Zilliz developed Milvus, a vector database, before the advent of large language models (LLMs), emphasizing the importance of data management for unstructured data. The article also explores the transition from enterprise-centric to democratized AI, as well as the significance of vector databases in the age of Foundation Models and LLMs. Furthermore, it delves into the role of Milvus 3.0 in enhancing retrieval accuracy for RAG systems and how ChatGPT and vector databases complement each other in semantic search. Lastly, Xie shares his vision for Affordable General Intelligence within five years, aiming to make AI-solutions accessible to all individuals and businesses.
Embedding Inference at Scale for RAG Applications with Ray Data and Milvus
Date published
April 12, 2024
Author(s)
By Christy Bergman, and Cheng Su
Language
English
Word count
1761
Hacker News points
None found.
This blog discusses the use of Retrieval Augmented Generation (RAG) applications with open-source tools such as Ray Data and Milvus. The author highlights the performance boost achieved using Ray Data during the embedding step, where data is transformed into vectors. By using just four workers on a Mac M2 laptop with 16GB RAM, Ray Data was found to be 60 times faster than Pandas. The blog also presents an open-source RAG stack that includes BGM-M3 embedding model, Ray Data for fast, distributed embedding inference, and Milvus or Zilliz Cloud vector database. The author provides a step-by-step guide on how to set up these tools and use them to generate embeddings from data downloaded from Kaggle IMDB poster. Additionally, the blog discusses the benefits of using bulk import features in Milvus and Zilliz Cloud for efficient batch loading of vector data into a vector database.
Monitoring Milvus with Grafana and Loki
Date published
April 11, 2024
Author(s)
By Stephen Batifol
Language
English
Word count
1333
Hacker News points
None found.
This guide provides step-by-step instructions on setting up Grafana and Loki to effectively monitor Milvus deployments. Milvus is a distributed vector database designed for storing, indexing, and managing massive embedding vectors. Grafana is an open-source platform for monitoring and observability, while Loki pairs with Grafana as a log aggregation system. Together, they offer a solid monitoring setup for Milvus and beyond. The prerequisites include Docker, Kubernetes, Helm, and kubectl. After setting up the K8s cluster, users can deploy Grafana and Loki using Helm. Finally, configure Grafana data sources and dashboard to visualize and query logs effectively.
The Cost of Open Source Vector Databases: An Engineer’s Guide to DYI Pricing
Date published
April 8, 2024
Author(s)
Steffi Li
Language
English
Word count
1764
Hacker News points
None found.
The cost of open source vector databases can be complex and challenging to quantify. Engineers often start projects using free software like Milvus, but hardware costs soon arise. Running a distributed database requires setting up dependencies such as Kafka or Pulsar for WAL, etcd for metadata storage, and Kubernetes for orchestration. Additionally, costs include load balancers, monitoring and logging tools, EC2 instances for worker nodes, and storage solutions like S3 or Azure Blob. Some aspects of running an open-source vector database are difficult to quantify, such as capacity planning, setup phase tasks, routine maintenance, troubleshooting latency issues, and disaster recovery plans. Other costs include time to market, engineering morale and retention, and risk mitigation. To assess costs in vector database management, performance tests should be conducted to gather data on how the database handles real-life workloads. Optimizing for cost involves adopting dynamic scaling, adjusting recall accuracy, latency, and throughput according to project needs, and using MMap to store less data in memory. The decision on how to manage a vector database ultimately depends on comparing costs and making an intelligent economic choice based on the most cost-effective option.
Redis tightens its license: How can an OSS company survive in the Cloud Era
Date published
April 5, 2024
Author(s)
James Luan
Language
English
Word count
1076
Hacker News points
None found.
Redis, an open-source database software, has transitioned from the BSD license to the Server Side Public License (SSPLv1), causing some controversy. This change may lead to multiple Linux distributors dropping Redis from their codebases, but alternative options like Valkey and Microsoft's Garnet are available. The shift in open-source licensing has been driven by cloud computing's impact on the traditional business model of open-source software companies. Some open-source projects have adopted more restrictive licenses to protect their profits, while others continue to offer permissive licenses and focus on commercial services. Companies like Zilliz are finding new ways to balance open-source and commercialization by offering unique capabilities in their managed services while maintaining compatibility with the open-source API.
The Evolution and Future of Vector Databases: Insights from Charles, CEO of Zilliz
Date published
April 4, 2024
Author(s)
Charles Xie
Language
English
Word count
1737
Hacker News points
1
Charles, CEO of Zilliz, discusses the evolution and future of vector databases in AI applications. He explains that vector databases are designed to manage and query unstructured data like images, videos, and natural languages through deep learning algorithms and semantic queries. They are widely used in recommendation systems, chatbots, and semantic search. The current landscape of vector databases includes purpose-built ones like Milvus, traditional databases with a vector search plugin like Elasticsearch, lightweight vector databases like Chroma, and more technologies with vector search capabilities like FAISS. Charles shares insights into building the Milvus vector database system, emphasizing its support for heterogeneous computing, both vertical and horizontal scalability, and offering a smooth developer experience from prototyping to production. He also provides guidance on choosing the right vector database for businesses based on performance requirements and projected data volume growth. Charles predicts that future vector databases will extend their capabilities beyond similarity-based search to include exact search or matching, as well as support additional vector computing workloads like clustering and classification.
Building a Tax Appeal RAG with Milvus, LlamaIndex, and GPT
Date published
April 3, 2024
Author(s)
Ash Naik
Language
English
Word count
794
Hacker News points
1
A group of four strangers, including a Product Manager, full-stack developers, and an AI enthusiast, came together during a monthly Hackathon in Seattle to build the SaveHaven project. The team developed a Retrieval Augmented Generation (RAG) app called SaveHaven that helps individuals contest property and income tax assessments by leveraging technologies like LlamaIndex, Milvus, and GPT from OpenAI. By automating data collection and analysis from public records, the app simplifies the tax appeal process for the general public. The team's experience serves as an example for future entrepreneurs to build meaningful innovations using GenAI technologies.
An LLM Powered Text to Image Prompt Generation with Milvus
Date published
April 2, 2024
Author(s)
Werner Oswald
Language
English
Word count
797
Hacker News points
None found.
The author discovered their love for open-source image-generating AI systems and started searching through webpages to find cool images and the prompts that made them. They used those prompts to make their own images, but it took a lot of time. To speed up the process, they downloaded millions of prompts and put them into a Milvus vector database. The system was able to fetch similar results based on simple prompts entered into a UI. Users found that the system produced better results than what they were doing before with their regular prompts. The author chose Milvus for performance reasons, as it was five times faster than pgvector with almost the same code. They also added instructions telling the LLM that it was a prompt engineer and provided some example conversation history to get it to start producing wonderful images. The next step is to add the same function for negative prompts, which have a positive influence on how prompts can be used to generate images.
JSON and Metadata Filtering in Milvus
Date published
March 26, 2024
Author(s)
Christy Bergman
Language
English
Word count
1140
Hacker News points
None found.
JSON, or JavaScript Object Notation, is a flexible data format used for storage and transmission. It employs key-value pairs adaptively, making it ideal for NoSQL databases and API results. Milvus Client, a wrapper around the Milvus collection object, uses a flexible JSON "key":value format to allow schema-less data definitions. This makes it faster and less error-prone than defining a full schema upfront. The schema-less schema includes fields for id (str) and vector (str), with the rest of the fields determined flexibly when the data is inserted into Milvus. JSON data can be uploaded directly into Milvus, which also supports metadata filtering on JSON fields and JSON array data types.
Community and Open Source Contributions in Vector Databases
Date published
March 26, 2024
Author(s)
Stephen Batifol
Language
English
Word count
1077
Hacker News points
None found.
Vector databases, designed to store high-dimensional data points, are particularly useful in handling unstructured data such as image recognition, natural language processing, and recommendation systems. The open-source nature of many vector database projects allows for diverse contributions from various individuals and organizations, fostering innovation and transparency. Open source also promotes accessibility, enabling a wider range of projects and innovations. Community collaboration is crucial in the development of vector databases, with knowledge sharing and inclusive participation playing significant roles. Resources such as well-maintained documentation, chat channels, hackathons, meetups, and conferences contribute to fostering a sense of community and driving innovation. Contributing to open-source vector databases involves finding contribution opportunities, engaging with the community, and understanding that contributions are not limited to coding. Success stories include improvements in scalability, performance, usability, and accessibility of vector databases due to open-source contributions and active community engagement. Challenges faced by these projects include managing a high volume and variety of contributions and balancing diverse interests and visions. However, with robust systems for tracking, reviewing, and integrating contributions, as well as transparent decision-making processes, these challenges can be addressed effectively. In conclusion, the open-source model has proven to be a driving force in advancing vector databases, breaking down barriers, and democratizing access to cutting-edge technology. The diverse community of contributors ensures that these tools are continually improving in terms of robustness, efficiency, and versatility.
Milvus 2.4 Unveils CAGRA: Elevating Vector Search with Next-Gen GPU Indexing
Date published
March 20, 2024
Author(s)
Li Liu
Language
English
Word count
1587
Hacker News points
4
Milvus 2.4 introduces CAGRA (CUDA Anns GRAph-based), a GPU-based graph index that significantly enhances vector search performance. Leveraging the parallel capabilities of GPUs, CAGRA offers improved efficiency in both small and large batch queries compared to traditional methods like HNSW. Additionally, CAGRA accelerates index building by approximately 10 times. The integration of CAGRA into Milvus marks a significant milestone in overcoming challenges associated with GPU-based vector search algorithms and sets the stage for future advancements in high recall, low latency, cost efficiency, and scalability in vector search.
What’s New in Milvus 2.4.0?
Date published
March 20, 2024
Author(s)
Steffi Li
Language
English
Word count
629
Hacker News points
None found.
Milvus 2.4, a significant update in search capabilities for large datasets, has been released. This version accelerates search efficiency and broadens the horizons towards a unified search platform capable of fulfilling diverse search use cases with exceptional speed and precision. Key highlights include support for NVIDIA's CAGRA Index, Multi-vector Search, Grouping Search, beta support for sparse vector embeddings, and other key enhancements. These updates significantly boost Milvus's performance and versatility for complex data operations.
Build Real-time GenAI Applications with Zilliz Cloud and Confluent Cloud for Apache Flink®
Date published
March 19, 2024
Author(s)
Jiang Chen
Language
English
Word count
762
Hacker News points
None found.
Zilliz Cloud has partnered with Confluent to unlock semantic search for real-time updates powered by Apache Kafka, Apache Flink, and the Milvus vector database. The new cloud-native, serverless Apache Flink service is now available directly alongside cloud-native Apache Kafka on Confluent's fully managed data streaming platform. This integration enables users to easily build high-quality, reusable data streams for real-time GenAI applications. By leveraging Kafka and Flink as a unified platform, teams can connect to data sources across any environment, clean and enrich data streams on the fly, and deliver them in real-time to the Milvus vector database for efficient semantic search or recommendation.
Using Similarity Search - How Not to Lose Meetup Content on the Internet
Date published
March 19, 2024
Author(s)
Stephen Batifol
Language
English
Word count
1207
Hacker News points
None found.
The author discusses the problem of losing valuable content from Meetup events and how similarity search techniques can be used to address this issue. They introduce Milvus, an open-source vector database that excels in managing complex data landscapes, and SentenceTransformers, a Python framework for generating text embeddings. The author demonstrates how to use these tools to create a system that searches for similar content within Meetup descriptions. By using OpenAI GPT-3.5-turbo to summarize the content of Meetups, they aim to improve search results by reducing noise in event descriptions.
RAG Evaluation Using Ragas
Date published
March 18, 2024
Author(s)
Christy Bergman
Language
English
Word count
1018
Hacker News points
2
Retrieval Augmented Generation (RAG) is an approach to building AI-powered chatbots that answer questions based on data the model has been trained on. However, natural language retrieval accuracy remains low, necessitating experiments to tune RAG parameters before deployment. Large Language Models (LLMs) are increasingly being used as judges for modern RAG evaluation, automating and speeding up evaluation while offering scalability and saving time and cost spent on manual human labeling. Two primary flavors of LLM-as-judge for RAG evaluation include MT-Bench and Ragas, with the latter emphasizing automation and scalability for RAG evaluations. Key data points needed for Ragas evaluation include the question, contexts, answer, and ground truth answer.
Building an AI-driven Car Repair Assistant with Milvus and the OpenAI LLM
Date published
March 13, 2024
Author(s)
Lin Liu
Language
English
Word count
467
Hacker News points
None found.
The AI-driven car repair assistant project aims to create an interactive and reliable platform for drivers seeking automotive advice and solutions. Using the OpenAI LLM model with Milvus vector database, the product combines user inputs with AI capabilities to provide relevant diagnostic suggestions. This tool hopes to revolutionize car maintenance and repair in today's digital world by refining the process of identifying car issues and broadening access to expert advice.
Zilliz Cloud Now Available on Azure Marketplace
Date published
March 11, 2024
Author(s)
Steffi Li
Language
English
Word count
414
Hacker News points
None found.
Zilliz Cloud is now available on Azure Marketplace, following its successful integration into AWS and GCP marketplaces. This expansion simplifies subscription management and billing, allowing smoother integration into developers' existing Azure workflows. Getting started with Zilliz Cloud on Azure Marketplace involves searching for "Zilliz Cloud," subscribing, configuring the project and SaaS details, linking the Azure Marketplace subscription with a Zilliz Cloud account, and setting Azure Marketplace as the payment method. This integration enables developers to easily incorporate Zilliz Cloud's powerful capabilities into their AI projects.
Building an AI Agent for RAG with Milvus and LlamaIndex
Date published
March 11, 2024
Author(s)
Yujian Tang
Language
English
Word count
1380
Hacker News points
None found.
In 2023, large language models (LLMs) gained immense popularity, leading to the development of two main types of LLM applications: retrieval augmented generation (RAG) and AI agents. RAG involves using a vector database like Milvus to inject contextual data, while AI Agents use LLMs to utilize other tools. This article combines these two concepts by building an AI Agent for RAG using Milvus and LlamaIndex. The tech stack includes Milvus, LlamaIndex, and OpenAI (or alternatively OctoAI or HuggingFace). The process involves spinning up Milvus, loading data into it via LlamaIndex, creating query engine tools for the AI Agent, and finally building the AI Agent for RAG. This architecture allows an AI Agent to perform RAG on documents by providing it with the necessary tools for querying a vector database.
Stephen Batifol - Why I Joined Zilliz
Date published
March 6, 2024
Author(s)
Stephen Batifol
Language
English
Word count
371
Hacker News points
None found.
Stephen Batifol, Developer Advocate at Zilliz in Berlin, is organizing events and creating content to help people understand and use Milvus. With experience as an Android developer, data scientist, machine learning engineer, and now a developer advocate, he has always aimed to simplify the work of data scientists and software engineers. His interest in open-source projects led him to Zilliz, where he is excited to build a community from scratch and engage with people at various events. Batifol plans to immerse himself technically in the domain, start a new Meetup series in Berlin, and release open-source projects soon. He encourages interested candidates to join Zilliz as Developer Advocates across different regions.
Will Retrieval Augmented Generation (RAG) Be Killed by Long-Context LLMs?
Date published
March 5, 2024
Author(s)
James Luan
Language
English
Word count
1858
Hacker News points
38
Google's Gemini 1.5, an LLM capable of handling contexts up to 10 million tokens, and OpenAI's Sora, a text-to-video model, have sparked discussions about the future of AI, particularly the role and potential demise of Retrieval Augmented Generation (RAG). Gemini 1.5 Pro supports ultra-long contexts of up to 10 million tokens and multimodal data processing. In a "needle-in-a-haystack" evaluation method, Gemini 1.5 Pro achieves 100% recall from up to 530,000 tokens and maintains over 99.7% recall from up to 1M tokens. Even with a super long document of 10M tokens, the model retains an impressive 99.2% recall rate. While Gemini excels in managing extended contexts, it grapples with persistent challenges encapsulated as the 4Vs: Velocity, Value, Volume, and Variety. LLMs’ 4Vs Challenges include hurdles in achieving sub second response times for extensive contexts, considerable inference costs associated with generating high-quality answers in long contexts, vastness of unstructured data that may not be adequately captured by an LLM, and diverse range of structured data. Strategies for optimizing RAG effectiveness include enhancing long context understanding, utilizing hybrid search for improved search quality, and leveraging advanced technologies to enhance RAG’s performance. The RAG framework is still a linchpin for the sustained success of AI applications. Its provision of long-term memory for LLMs proves indispensable for developers seeking an optimal balance between query quality and cost-effectiveness.
Using Your Vector Database as a JSON (or Relational) Datastore
Date published
March 4, 2024
Author(s)
Frank Liu
Language
English
Word count
1436
Hacker News points
43
The blog post discusses the use of vector databases, such as Milvus or Zilliz Cloud, as a JSON (or relational) datastore. It explains how to create a collection in Milvus and perform CRUD operations on JSON data stored within it. The author demonstrates querying, updating, and deleting records using Python code snippets. Additionally, the post introduces a package called milvusmongo that implements basic CRUD functionality across collections using Milvus as the underlying database instead of MongoDB. The author emphasizes that vector databases are not meant to replace NoSQL databases or lexical text search engines but can be used as an efficient data store for solo developers and small teams, with the option to optimize infrastructure usage later as they grow.
Zilliz Cloud Introduces BYOC for Greater Data Sovereignty and Compliance
Date published
Feb. 29, 2024
Author(s)
Steffi Li
Language
English
Word count
820
Hacker News points
None found.
Zilliz has introduced Zilliz Cloud Bring Your Own Cloud (BYOC) to provide greater data sovereignty and compliance. This solution allows customers to use Zilliz Cloud's managed services while keeping their data within their private network. The architecture of Zilliz Cloud BYOC is built on two main pillars: the Data Plane, which encompasses all essential components for data collection, management, and query processing; and the Control Plane, responsible for deployment, management, and seamless coordination across all instances of the Zilliz Data Plane. The new deployment model in BYOC lets customers deploy the Data Plane within their own Virtual Private Cloud (VPC) while the Control Plane remains managed by Zilliz. This setup offers benefits such as data security and compliance, fine-grained control, and cost savings. Security measures include adherence to the Principle of Least Privilege, controlled access for software updates, and data plane access restrictions. BYOC is currently available on AWS with plans to expand to other cloud providers in the future.
How Sohu Enhances Personalized News Recommendation with Milvus
Date published
Feb. 28, 2024
Author(s)
Fendy Feng
Language
English
Word count
650
Hacker News points
None found.
Sohu, a NASDAQ-listed company, partnered with Milvus to enhance its news recommendation system. The outdated legacy vector search stack in the recommender system was struggling to deliver real-time, personalized news due to slow retrieval and scalability issues. Milvus, an open-source vector database, provided a solution for handling large datasets and improving classification accuracy of short-text news articles. Sohu News integrated Milvus into its recommender system using a dual-tower structure and achieved a 10x faster vector retrieval speed and significantly improved recommendation accuracy. The collaboration with Milvus has transformed the user experience by offering more personalized and engaging news content.
Finding the Right Fit: Automatic Embeddings Support for AI Retrieval (RAG) in Zilliz Cloud Pipelines from OSS, VoyageAI, and OpenAI
Date published
Feb. 27, 2024
Author(s)
Christy Bergman
Language
English
Word count
1579
Hacker News points
None found.
This blog post discusses the use of embedding models in Retrieval Augmented Generation (RAG) applications. RAG is an approach used to enhance question-answering bots by integrating domain knowledge into AI's knowledge base. The process involves using embedding models to generate vector embeddings of chunks of text from all documents, followed by indexing and search using the same embedding model. Finally, a large language model (LLM) generates an answer based on the given domain knowledge. The most common type of embedding model is SBERT (Sentence-BERT), which specializes in understanding complete sentences. The HuggingFace MTEB Leaderboard provides a list of embedding models sorted by retrieval performance, making it easier for developers to choose the best model for their needs. Zilliz Cloud Pipelines support various embedding models, including BAAI/bge-base-en(or zh)-v1.5, VoyageAI's voyage-2 and voyage-code-2, and OpenAI's text-embedding-3-small(or large). Each model has its advantages and is best suited for different use cases. In conclusion, embedding models play a crucial role in enhancing AI retrieval capabilities by integrating domain knowledge into the AI's knowledge base. The choice of an appropriate embedding model depends on factors such as context length, embedding dimensions, and specific use case requirements.
Building RAG Apps Without OpenAI - Part Two: Mixtral, Milvus and OctoAI
Date published
Feb. 26, 2024
Author(s)
Yujian Tang
Language
English
Word count
2044
Hacker News points
None found.
This blog discusses building Retrieval Augmented Generation (RAG) applications without using OpenAI's GPT models. The authors demonstrate how to build RAG apps with Mixtral, Milvus, and OctoAI. They also provide an overview of the tools involved in this process: Mixtral as the LLM, Milvus as the vector database, OctoAI for serving the LLM and embedding model, and LangChain as the orchestrator. The tutorial covers setting up RAG tools, loading data into a vector database, querying data with OctoAI and Mixtral, and leveraging Mixtral's multilingual capabilities.
Exploring Multimodal Embeddings with FiftyOne and Milvus
Date published
Feb. 23, 2024
Author(s)
Yujian Tang
Language
English
Word count
1514
Hacker News points
None found.
This tutorial explores the concept of multimodal embeddings using open-source tools like Voxel51 and Milvus. It covers the meaning of "multimodal", how Milvus handles multimodal embeddings, examples of multimodal models, and how to use FiftyOne and Milvus for multimodal embedding exploration. The tutorial uses Fashion MNIST dataset with CLIP-VIT model from OpenAI to demonstrate the process. It also discusses how to further customize FiftyOne for data exploration with Milvus and provides a summary of exploring multimodal embeddings with Voxel51 and Milvus.
Building Zilliz Cloud in 18 months: Lessons learned while creating a scalable Vector Search Service on the public cloud
Date published
Feb. 16, 2024
Author(s)
James Luan
Language
English
Word count
3009
Hacker News points
2
The article details the creation of Zilliz Cloud, a fully managed service powered by Milvus, the most adopted open-source vector database, developed from the ground up over eighteen months. It covers the design choices and invaluable insights gained during the journey to build this cloud service. The author emphasizes maximizing the use of mature third-party products, simplifying architecture, anticipating day 2 challenges from day 1, and focusing on cloud finops as key principles for building a successful cloud service. They also discuss the lessons learned while creating a scalable Vector Search Service on the public cloud and acknowledge the support of their users in this endeavor.
TL;DR Milvus regression in LangChain v0.1.5
Date published
Feb. 12, 2024
Author(s)
Christy Bergman
Language
English
Word count
577
Hacker News points
None found.
A recent regression in Milvus has caused an error when using Langchain v0.1.5 to connect with it, specifically a "KeyError: 'pk'" error due to the absence of an automatically generated primary key field. The temporary solution is to downgrade to Langchain version <= v0.1.4 until the fix is officially merged. A permanent solution will be provided in an upcoming update that addresses this issue by handling cases where "pk" is not present during insertion. Until then, users can workaround the problem by downgrading their Langchain version or waiting for the official fix.
Zilliz Cloud Pipelines February Release - 3rd Party Embedding Models and Usability Improvements!
Date published
Feb. 9, 2024
Author(s)
Jiang Chen
Language
English
Word count
857
Hacker News points
None found.
Zilliz has released an update to its Cloud Pipelines, focusing on embedding models and usability improvements. The February release includes new 3rd-party embedding models from OpenAI and Voyage AI, providing a total of six options for users. Additionally, the platform now supports all dedicated vector db clusters in GCP's us-west-2 region, enhancing performance and reliability. Usability improvements include a new "Run Pipeline" page, local file upload feature, and support for running pipelines on any type of vector database cluster.
Zilliz Joins the AI Alliance: Advancing Open Innovation in AI for a Better Future
Date published
Feb. 8, 2024
Author(s)
Charles Xie
Language
English
Word count
393
Hacker News points
None found.
Zilliz has joined the AI Alliance, a consortium promoting open innovation in AI for responsible development and safe practices. The company's journey into open source began with its vector database Milvus, which was donated to the Linux Foundation. Open-source projects foster transparency, collaboration, innovation, and accessibility. Zilliz is committed to working alongside other industry players within the AI Alliance to shape a future where AI benefits everyone and positively impacts society.
Introducing the Databricks Connector, a Well-Lit Solution to Streamline Unstructured Data Migration and Transformation
Date published
Feb. 8, 2024
Author(s)
Jiang Chen
Language
English
Word count
1107
Hacker News points
None found.
This integration enables developers to effortlessly transfer data from Spark/Databricks to Milvus/Zilliz Cloud, whether in real-time or batch mode. By leveraging the Databricks Connector for Apache Arrow, developers can streamline their workflow and focus on building efficient and scalable AI solutions using these powerful technologies. The integration approach involves connecting Spark to Milvus through a shared filesystem such as S3 or MinIO buckets. By granting access to Spark or Databricks, the Spark job can use Milvus connectors to write data to the bucket in batch and then bulk-insert the entire collection for serving. To help developers get started quickly, we have prepared a notebook example that walks them through the streaming and batch data transfer processes with Milvus and Zilliz Cloud. This integration empowers developers to build efficient and scalable AI solutions, unlocking the full potential of these powerful technologies. For more information on this integration and its use cases, check out the official documentation for Databricks Connector for Apache Arrow.
The High-performance Vector Database Zilliz Cloud Now Available on Google Cloud Marketplace
Date published
Feb. 7, 2024
Author(s)
Steffi Li
Language
English
Word count
623
Hacker News points
None found.
Zilliz Cloud, a fully managed Milvus vector database that supports various AI applications, is now available on Google Cloud Marketplace. This integration simplifies billing as charges will appear directly on the developer's regular Google Cloud bill. Users can easily subscribe to the Zilliz service using their existing GCP account and access all features without upfront costs. A 100 credit bonus is available for new sign-ups, enabling developers to kickstart their journey with Zilliz Cloud on GCP.
Crafting Superior RAG for Code-Intensive Texts with Zilliz Cloud Pipelines and Voyage AI
Date published
Feb. 7, 2024
Author(s)
Jiang Chen
Language
English
Word count
694
Hacker News points
None found.
Zilliz Cloud Pipelines has integrated the Voyage AI embedding models, voyage-2 and voyage-code-2, which have shown outstanding performance in retrieval tasks related to source code, technical documentation, and general tasks. The incorporation of these models enhances the RAG system implemented with various embedding models for code-related tasks. Notably, when compared to other popular embedding models on code datasets, Voyage's models demonstrate significantly better retrieval capability and lead to over ten percentage point improvements in Answer Correctness and overall performance scores.
Zilliz Cloud Enhances Data Protection with More Granular RBAC
Date published
Feb. 6, 2024
Author(s)
Sarah Tang
Language
English
Word count
1157
Hacker News points
None found.
Zilliz Cloud has introduced enhanced Role-Based Access Control (RBAC) functionality, providing more nuanced RBAC capabilities to improve access management and data isolation. The updated system features two primary categories of roles - Operation Layer Roles and Data Layer Roles - catering to diverse developer requirements. In the operational layer, Zilliz Cloud has four predefined Organization and Project Roles: Organization Owner, Organization Member, Project Owner, and Project Member. Additionally, it offers three predefined Cluster Roles in the data layer: Admin, Read-Write, and Read-Only. Users can also create custom roles to fine-tune permissions for specific collections or operations. The enhanced RBAC capabilities are exemplified through real-world use cases such as cross-team collaboration in a medium-sized company and managing a RAG-based knowledge base. These features ensure effective data management, improved security, and efficient resource allocation.
Choosing a Vector Database: Milvus vs. Chroma
Date published
Feb. 5, 2024
Author(s)
Fendy Feng
Language
English
Word count
1401
Hacker News points
None found.
In this comparison, we delve into the functionalities and performance of two open-source vector databases: Milvus and Chroma. We assess these platforms based on their capabilities in handling vector data storage, indexing, searching, scalability, and ecosystem support. Additionally, we examine the purpose-built features and performance trade-offs between Milvus and Chroma. Milvus is a versatile and comprehensive open-source vector database, offering extensive support for various index types, including 11 different options. It supports hybrid search operations and offers flexible in-memory and on-disk indexing configurations. Furthermore, Milvus ensures strong consistency and provides multi-language SDKs encompassing Python, Java, JavaScript, Go, C++, Node.js, and Ruby. On the other hand, Chroma is a relatively simpler vector database with a primary focus on enabling easy initiation and usage. It currently supports only the HNSW algorithm for its KNN search operations and lacks advanced features such as RBAC support. Additionally, it offers limited SDK options, primarily focusing on Python and JavaScript. While Chroma's simplicity may be adequate for specific applications, its limitations could restrict its adaptability across diverse use cases. With its comprehensive functionality and extensive feature set, Milvus emerges as a more versatile and scalable solution for addressing a broader spectrum of vector data management needs. In the upcoming Milvus 2.4 release, we plan to support the inverted index with tantivy, which promises significant enhancements to prefiltering speed. This update further solidifies Milvus as a cutting-edge open-source vector database that continues to evolve and adapt to emerging requirements in the AI ecosystem. In summary, while Chroma offers simplicity and ease of use, Milvus distinguishes itself with its comprehensive feature set, extensive index type support, and robust multi-language SDKs. As a result, Milvus remains a highly recommended open-source vector database for developers and organizations seeking to optimize their applications' performance, scalability, and data management capabilities. Milvus Lite, a lightweight alternative to the full Milvus version, has also been introduced. It aims to preserve the ease of initiation while retaining an extensive set of features, making it particularly useful for specific use cases such as integration into Python applications without adding extra weight or spinning up a Milvus instance in Colab or Notebook for quick experiments.
An Introduction to Milvus Architecture
Date published
Feb. 2, 2024
Author(s)
Yujian Tang
Language
English
Word count
1420
Hacker News points
None found.
Milvus is a distributed system designed to scale vector operations, addressing the challenges of scalability in vector databases. Unlike traditional databases, vector data doesn't require complex transactions and has diverse use cases that necessitate tunable tradeoffs between performance and consistency. Some vector data operations are computationally expensive, requiring elastic resource allocation. Milvus achieves horizontal scaling through its deliberate design as a distributed system, overcoming the limitations of single-instance databases. It contains four layers: access, coordination, worker, and storage. The separation of concerns in querying, data ingestion, and indexing allows for independent scaling of each operation. Milvus ensures large-scale write consistency through sharding and supports pre-filtering metadata search to enhance efficiency. Its unique architecture provides benefits such as horizontal scaling and flexibility, making it a suitable choice for cloud-native vector databases catering to diverse use cases.
Introducing Cardinal: The Most Performant Engine For Vector Searches
Date published
Feb. 1, 2024
Author(s)
Alexandr Guzhva
Language
English
Word count
1665
Hacker News points
None found.
Cardinal is a new vector search engine developed by Zilliz, which has demonstrated a threefold increase in performance compared to the previous version. It offers a search performance (QPS) that reaches tenfold that of Milvus. Cardinal is capable of performing brute-force search, creating and modifying ANNS indices, working with various input data formats, and filtering results during the search based on user-provided criteria. The key to Cardinal's speed lies in its algorithm optimizations, engineering optimizations, low-level optimizations, and AutoIndex feature for search strategy selection.
Nurturing Innovation: Our Approach to Feature Deployment from Open-Source Milvus to Zilliz Cloud
Date published
Jan. 30, 2024
Author(s)
James Luan
Language
English
Word count
580
Hacker News points
None found.
James Luan, VP of Engineering at Zilliz, discusses the company's commitment to innovation and community collaboration through open-source projects like Milvus. The four essential freedoms of open source, as emphasized by Richard Stallman, guide their approach to feature deployment from Milvus to Zilliz Cloud. They follow three fundamental principles: iteration with precision, testing the waters, and quality over speed. Despite occasional delays in feature deployment, they prioritize maintaining a robust and reliable platform while encouraging community feedback for continuous improvement.
The Best Vector Database Just Got Better
Date published
Jan. 30, 2024
Author(s)
Frank Liu
Language
English
Word count
1034
Hacker News points
None found.
In 2023, vector databases gained popularity due to the widespread adoption of ChatGPT and other large language models (LLMs). Zilliz Cloud, a vector database service, has seen increased usage in retrieval-augmented generation systems as well as various search and retrieval applications. The platform aims to help computers understand human-generated data such as text, images, bank transactions, and user behaviors. Zilliz Cloud recently introduced new features like range search, multi-tenancy & RBAC, up to 10x improved search & indexing performance, and more in response to customer demand. These enhancements have proven critical for users developing applications that require a purpose-built vector database supporting essential database features and various workloads. Three real-world use cases demonstrate the importance of these new features: efficient autonomous agents, product recommendation systems, and AI-powered drug discovery. In each case, Zilliz Cloud's performance optimizations, adaptability, and range search feature have enabled users to overcome challenges in their respective applications. The platform's ability to handle diverse data types and workloads makes it a valuable tool for developers working with vector databases.
New for Zilliz Cloud: Cardinal Search Engine, GCP Marketplace, Databricks Connector and More
Date published
Jan. 30, 2024
Author(s)
Steffi Li
Language
English
Word count
1575
Hacker News points
1
Zilliz has introduced new features to its cloud product, enhancing vector search performance and ensuring enterprise-grade security. The latest updates include the Cardinal Search Engine, which delivers a 10x performance boost; Milvus 2.3, offering advanced vector search capabilities for production workloads; GCP Marketplace integration, simplifying budget planning, payment, and procurement processes; and the Databricks Connector, enabling data migration and transformation without custom code. Additionally, Zilliz Cloud now supports role-based access control (RBAC) across both control and data layers for enhanced security and compliance.
Sharding, Partitioning, and Segments - Getting the Most From Your Database
Date published
Jan. 29, 2024
Author(s)
Christy Bergman
Language
English
Word count
1219
Hacker News points
None found.
This blog delves into the concepts of sharding, partitioning, and segments in distributed databases like Milvus. Sharding refers to horizontal data partitioning across multiple servers, enabling faster writing by utilizing distributed systems. Partitioning organizes data for efficient retrieval, optimizing targeted reads. Automatic partitioning is recommended as it minimizes errors and ensures optimal performance. Each shard and partition has segments of data, with growing and sealed segments being the smallest unit in Milvus for load balancing. The default segment size is 512 MB, but adjustments should only be made if there are large machine resources available.
Zilliz Vector Search Algorithm Dominates All Four Tracks of BigANN
The BigANN challenge is an important competition in the vector search domain, fostering the development of indexing data structures and search algorithms. Zilliz's solution dominated all four tracks of BigANN 2023, achieving a remarkable up to 2.5x performance improvement. This year's BigANN introduced more significant challenges with larger datasets and complex scenarios across four tracks: filtered, out-of-distribution, sparse, and streaming variants of ANNS. Zilliz's solution is based on graph algorithms and optimizations driven by the specific characteristics of each track. The company plans to integrate these insights into their products, extending their impact on a broader range of issues.
Building RAG Apps Without OpenAI - Part One
Date published
Jan. 17, 2024
Author(s)
Yujian Tang
Language
English
Word count
1615
Hacker News points
None found.
This post discusses the creation of a Conversational Retriever Augmentation Generator (RAG) application without using OpenAI. The tech stack includes LangChain, Milvus, and Hugging Face for embedding models. The process involves setting up the conversation RAG stack, creating a conversation, asking questions, and testing the app's memory retention. The example demonstrates how to use Nebula, a conversational LLM created by Symbl AI, in place of OpenAI's GPT-3.5.
How Mozat's Stylepedia and Milvus Are Redefining Your Closet
Date published
Jan. 16, 2024
Author(s)
Fendy Feng
Language
English
Word count
564
Hacker News points
None found.
Singapore-based tech company Mozat has developed an innovative wardrobe management approach with its app, Stylepedia. The app is designed to redefine how users engage with fashion by integrating Milvus, an open-source vector database, to power its smart image search system. This integration allows Stylepedia to manage a rapidly growing database of clothing images, respond to user queries in milliseconds, and handle user-uploaded photos with varying resolutions. By leveraging Milvus, Stylepedia offers personalized style recommendations, facilitates user connections, and enables image searches for similar clothing items.
What’s New in Milvus 2.3.4
Date published
Jan. 15, 2024
Author(s)
Steffi Li
Language
English
Word count
476
Hacker News points
None found.
Milvus 2.3.4, the latest update of the vector database platform, introduces enhancements to improve availability and usability. The release focuses on streamlining monitoring, data import, and search efficiency. Key highlights include access logs for improved system performance insights, Parquet file support for efficient large-scale data operations, Binlog index on growing segments for faster search within expanding datasets, and other improvements such as increased collection/partition support, enhanced memory efficiency, clearer error messaging, faster data loading speeds, and better query shard balance. Developers are encouraged to visit the release notes for a comprehensive overview of all new features and enhancements in Milvus 2.3.4.
Understanding Consistency Models for Vector Databases
Date published
Jan. 11, 2024
Author(s)
Yujian Tang
Language
English
Word count
1479
Hacker News points
None found.
Distributed systems are crucial for vector search applications, offering scalability, fault tolerance, enhanced performance, and global accessibility. Consistency is a key principle in distributed systems, ensuring that data remains accurate across all replicas. The fully distributed Milvus vector database offers Tunable Consistency through its unique architecture, allowing users to scale out data writing while maintaining consistency without additional tools. Consistency levels in Milvus include Eventual, Session, Bounded, and Strong. Eventual consistency ensures that data will eventually be consistent across all replicas, prioritizing speed over immediate data updates. Session consistency maintains up-to-date data within a single session, while Bounded Consistency forces instances and replicas to sync within a certain period. Strong consistency ensures immediate data availability but comes with increased latency. Understanding the levels of consistency is essential for building resilient, high-performing applications that utilize distributed systems.
Dissecting OpenAI's Built-in Retrieval: Unveiling Storage Constraints, Performance Gaps, and Cost Concerns
Date published
Jan. 9, 2024
Author(s)
Robert Guo
Language
English
Word count
2284
Hacker News points
None found.
OpenAI's built-in retrieval feature has storage constraints, performance gaps, and cost concerns. The current pricing model of $0.2 per GB per day is expensive compared to traditional document services like Office 365 and Google Workspace. However, the server cost for serving vectors is only about $0.30 per day, which is a bargain compared to the pricing. The architecture of OpenAI Assistants' retrieval feature has limitations such as a maximum of 20 files per assistant, a cap of 512MB per file, and a hidden limitation of 2 million tokens per file. The current architecture may not scale well enough to support larger businesses with more extensive data requirements. To address these challenges and reduce costs, the service's architecture needs to be optimized. A refined vector database solution, hybrid disk/memory vector storage, streamlining disaster recovery by pooling system data, and multi-tenancy support for diverse user base are suggested improvements. Among popular vector databases, Milvus is considered the most mature open-source option with effective separation of system and query components, isolation of query components through Resource Group feature, hybrid memory/disk architecture, and application-level multi-tenancy facilitated by RBAC and Partition features. However, no single vector database solution can comprehensively address all challenges and meet every design requirement for imminent infrastructure development. The choice of vector databases should be tailored to specific requirements to effectively navigate the complexities of optimizing OpenAI Assistants' architecture.
OpenAI RAG vs. Your Customized RAG: Which One Is Better?
Date published
Jan. 5, 2024
Author(s)
Cheney Zhang
Language
English
Word count
2134
Hacker News points
None found.
The OpenAI Assistants' retrieval feature has been a topic of discussion in the AI community, as it incorporates Retrieval Augmented Generation (RAG) capabilities for question-answering. A comparison between OpenAI's built-in RAG and a customized RAG using Milvus shows that while the former slightly outperforms in answer similarity, the latter performs better in context precision, faithfulness, answer relevancy, and correctness. The Milvus-powered Customized RAG system also has higher Ragas Scores than OpenAI's built-in RAG. This superior performance is attributed to factors such as effective utilization of external data, better document segmentation and data retrieval, and the ability for users to adjust parameters in the customized RAG pipeline.
Demystify Benchmark Result Divergence: Milvus vs. Qdrant
Date published
Jan. 4, 2024
Author(s)
Steffi Li
Language
English
Word count
859
Hacker News points
1
The blog post discusses the disparities between benchmark results of Qdrant's vector database technology and VectorDB Bench, which uses Milvus. It highlights three reasons for these differences: outdated Milvus version used in testing, improper use of Milvus by only using Growing Segments, and benchmark-driven optimizations for Qdrant that may compromise operational flexibility in real-world scenarios. The post emphasizes the importance of trustworthy and comprehensive benchmarking for vector databases and suggests developers should access truthful and precise benchmarks or conduct their own tests against their data to make informed decisions when choosing a vector database.
2023
Optimizing RAG Applications: A Guide to Methodologies, Metrics, and Evaluation Tools for Enhanced Reliability
Date published
Dec. 29, 2023
Author(s)
Cheney Zhang
Language
English
Word count
1700
Hacker News points
1
Optimizing Retrieval Augmented Generation (RAG) applications involves using methodologies, metrics, and evaluation tools to enhance their reliability. Three categories of metrics are used in RAG evaluations: those based on the ground truth, those without the ground truth, and those based on LLM responses. Ground truth metrics involve comparing RAG responses with established answers, while metrics without ground truth focus on evaluating the relevance between queries, context, and responses. Metrics based on LLM responses consider factors such as friendliness, harmfulness, and conciseness. Evaluation tools like Ragas, LlamaIndex, TruLens-Eval, and Phoenix can help assess RAG applications' performance and capabilities.
Harmony in Pixels: Picdmo's Leap into Seamless Photo Management with Zilliz Cloud
Date published
Dec. 27, 2023
Author(s)
Fendy Feng
Language
English
Word count
614
Hacker News points
None found.
Picdmo, an AI-powered photo management app, sought to improve its search performance and user experience. The team initially used Milvus, an open-source vector database, but found it labor-intensive and financially burdensome. They then integrated Zilliz Cloud, a fully managed Milvus service, into their infrastructure. This resulted in response times plummeting from 8 seconds to less than 1 second, even under extreme data loads. The adoption of Zilliz Cloud brought efficient search performance, substantial time and cost savings, and responsive support from the Zilliz team. As Picdmo evolves into a comprehensive multimedia application, its collaboration with Zilliz remains crucial for future features.
How To Evaluate a Vector Database?
Date published
Dec. 26, 2023
Author(s)
Li Liu
Language
English
Word count
1363
Hacker News points
None found.
In the data-driven world, the exponential growth of unstructured data has led to the rise of vector databases. These powerful tools specialize in storing, indexing, and searching unstructured data through high-dimensional numerical representations known as vector embeddings. They are used for building recommender systems, chatbots, and applications for searching similar images, videos, and audio. When selecting a vector database, scalability, functionality, and performance are the top three most crucial metrics to consider. Scalability is essential for accommodating growing data demands effectively, while functionality includes both vector-oriented features like support for multiple index types and database-oriented features such as Change Data Capture (CDC) and multi-tenancy support. Performance is evaluated using benchmarking tools like ANN-Benchmark and VectorDBBench, which measure recall rate, QPS, latency, and other metrics. Various vector search technologies are available beyond vector databases, including vector search libraries, lightweight vector databases, vector search plugins, and purpose-built vector databases. Each type has its strengths and weaknesses, so the choice depends on specific business needs.
What Is A Dynamic Schema?
Date published
Dec. 25, 2023
Author(s)
Yujian Tang
Language
English
Word count
1506
Hacker News points
None found.
This post discusses database schemas, specifically focusing on vector databases and their dynamic schema feature. It explains that SQL databases have predefined schemas while NoSQL databases typically have a dynamic or schemaless schema. The Milvus vector database supports dynamic schema, allowing users to add data in JSON format without defining attributes when creating the database. The article covers how to use dynamic schema with the Milvus vector database and how the feature is implemented. It also discusses the pros and cons of dynamic schemas, such as ease of setup and flexibility but slower filtered search compared to fixed schemas.
Unlocking Next-Level APK Security: Trend Micro's Journey with Milvus
Date published
Dec. 21, 2023
Author(s)
Fendy Feng
Language
English
Word count
913
Hacker News points
None found.
Trend Micro, a global leader in cybersecurity, has integrated Milvus, an open-source vector database, into their security infrastructure to enhance APK (Android application package) security. The company initially used MySQL for APK similarity search but faced scalability issues as the dataset grew. They then shifted focus to Faiss, which excelled in speed but lacked critical features required for a production environment. Milvus addressed these challenges with seamless integration with mainstream vector index libraries and simple, intuitive APIs. The implementation of Milvus has resulted in low query latency and high data import speed, significantly enhancing Trend Micro's ability to detect and neutralize harmful APKs.
Metadata Filtering with Zilliz Cloud Pipelines
Date published
Dec. 17, 2023
Author(s)
Christy Bergman
Language
English
Word count
1014
Hacker News points
None found.
The text discusses the use of vector databases like Milvus and Zilliz Cloud, which allow hybrid vector and scalar searches. It explains how metadata filtering can be used to perform more precise results that cater to specific needs by limiting search with certain conditions using boolean expressions on scalar fields or primary key field. The text also provides a step-by-step guide on how to create collections and pipelines in Zilliz Cloud, as well as searching via the web console or API calls.
Optimizing User Experience: BIGO Leverages Milvus for Duplicate Video Removal
Date published
Dec. 14, 2023
Author(s)
Fendy Feng
Language
English
Word count
659
Hacker News points
None found.
BIGO, the owner of short video platform Likee, has leveraged Milvus, an open-source vector database, to optimize its duplicate video removal process. With millions of daily uploads on Likee, the proliferation of duplicate videos posed a threat to content quality and user experience. Previously, BIGO used FAISS for similarity search but faced limitations in managing massive vectors. Milvus provided faster query responses and scalability, improving throughput and efficiency. The transformation involved converting new video frames into feature vectors and matching them against an extensive database of existing content using cutting-edge technologies like Kafka, deep learning models, and relational databases. BIGO plans to extend Milvus's capabilities for content moderation, restriction, and customized video services in the future.
Improving ChatGPT’s Ability to Understand Ambiguous Prompts
Date published
Dec. 12, 2023
Author(s)
Cheney Zhang
Language
English
Word count
1531
Hacker News points
None found.
Prompt engineering techniques are being used to help large language models (LLMs) handle pronouns and other complex coreferences in retrieval augmented generation (RAG) systems. RAG combines the power of LLMs with a vector database acting as long-term memory, enhancing the accuracy of generated responses. One example is Akcio, an open source project that offers a robust question-answer system. However, implementing RAG systems introduces challenges, particularly in multi-turn conversations involving coreference resolution. Researchers are turning to LLMs like ChatGPT for coreference resolution tasks, but they occasionally produce direct answers instead of following the prompt instructions. A refined approach using few-shot prompts and Chain of Thought (CoT) methods has been developed to guide ChatGPT through coreference resolution, resulting in coherent responses.
Similarity Metrics for Vector Search
Date published
Dec. 11, 2023
Author(s)
Yujian Tang
Language
English
Word count
1490
Hacker News points
3
This article discusses vector similarity search metrics and how they work. It covers three primary distance metrics: L2 or Euclidean distance, cosine similarity, and inner product. Additionally, it mentions other interesting vector similarity or distance metrics such as Hamming Distance and Jaccard Index. The article explains the concept of vectors in terms of orientation and magnitude, and how these metrics can be used to compare any data that can be vectorized. It also provides examples of when each metric should be used.
Shaping Tomorrow: How Milvus Powers Shopee's Multimedia Ambition
Date published
Dec. 7, 2023
Author(s)
Fendy Feng
Language
English
Word count
588
Hacker News points
None found.
In order to stay competitive in the e-commerce industry, Shopee ventured into short video services. However, they faced challenges handling vast amounts of unstructured data such as videos, images, audio, and text. Milvus emerged as a solution due to its ability to handle billions of vectors, scalability, and seamless integration with Shopee's internal ecosystem. The migration from Milvus 1.x to 2.x improved stability, scalability, and multi-replica capabilities, resulting in low-latency and high-availability retrieval services. With Milvus, Shopee has elevated its real-time search capabilities and streamlined offline data retrieval for copyright video matching and video deduplication processes.
Introducing Zilliz Cloud Pipelines: A One-Stop Service for Building AI-Powered Search
Date published
Dec. 6, 2023
Author(s)
Steffi Li
Language
English
Word count
983
Hacker News points
2
Zilliz has introduced its new service, Zilliz Cloud Pipelines, which simplifies the process of creating and retrieving unstructured data as vectors. This solution is designed to empower developers in building high-quality semantic searches without requiring extensive customization or infrastructure adjustments. The platform consists of three specific pipelines: Ingestion, Search, and Deletion. Zilliz Cloud Pipelines currently focuses on semantic search in text documents but will be expanded to include image search, video copy detection, and multi-modal search capabilities in the future.
Create a Movie Recommendation Engine with Milvus and Python
Date published
Dec. 4, 2023
Author(s)
Gourav Bais
Language
English
Word count
1594
Hacker News points
None found.
This article explains how to build a movie recommender system using the open source vector database, Milvus. The process involves setting up the environment, collecting and preprocessing data, connecting to Milvus, generating embeddings for movies, sending embeddings to Milvus, and finally recommending new movies using Milvus. By leveraging vector storage and similarity search, Milvus can help build an efficient and scalable movie recommendation system, enhancing user engagement and showcasing the role of advanced vector-based models in modern recommendation systems.
Building an Open Source Chatbot Using LangChain and Milvus in Under 5 Minutes
Date published
Nov. 29, 2023
Author(s)
Christy Bergman
Language
English
Word count
2068
Hacker News points
None found.
This blog post demonstrates how to build an open source chatbot using LangChain and Milvus in under 5 minutes. The process involves creating a retrieval augmented generation (RAG) stack with LangChain, which allows for answering questions about custom data while reducing hallucinations. The text is grounded on factual, custom data such as product documentation to ensure accuracy. The source code for the live chatbot is available on GitHub. The blog post also explains how to use Milvus, a high-performance vector database optimized for fast storage, indexing, and searching of embeddings or vectors. OpenAI's language models like GPT series are used in this process. Overall, the RAG retrieval and question-answering chatbot on custom documents is shown to be efficient and cost-effective as it allows free calls to data almost all the time for retrieval, evaluation, and development iterations, with only a paid call to OpenAI once for the final chat generation step.
Transforming Ad Recommendations: SmartNews's Journey with Milvus
Date published
Nov. 29, 2023
Author(s)
Fendy Feng
Language
English
Word count
609
Hacker News points
None found.
SmartNews, a leading news app, faced the challenge of optimizing ad recommendations for its highly engaged user base. The company turned to Milvus after researching solutions that could handle high-throughput and low-latency queries. Milvus's vector similarity search capabilities were instrumental in optimizing SmartNews's dynamic ad vector recall. Adopting Milvus led to more relevant ads, increasing click-through rates and driving up ad revenue. The company has upgraded its Milvus to 2.2.4 and is looking forward to leveraging new features for building even more real-time and reliable systems.
Kicking Off the Open Source Advent
Date published
Nov. 27, 2023
Author(s)
Yujian Tang
Language
English
Word count
558
Hacker News points
3
The Open Source Advent is a project that aims to introduce participants to open-source software. For 25 days in December, one open-source project will be featured on social media along with a tutorial for quick start-up. Participants can earn points by starring the project's GitHub repo, creating repos using the project, and making posts tagging the company page. Extra points are awarded for writing a PR that gets merged or writing a blog about their experience. The top three scorers will receive swag packs from Zilliz and partners, as well as shoutouts on social media. Participants can join the Open Source Advent Discord Channel to submit their entries between December 26th, 2023, and January 2nd, 2024. Winners will be announced on January 8th, 2024.
Getting Started with a Milvus Connection
Date published
Nov. 24, 2023
Author(s)
Christy Bergman
Language
English
Word count
595
Hacker News points
None found.
Milvus is an open-source vector database designed for building AI applications using unstructured data embeddings. It provides four SDKs, including Java, Python, React, and Ruby. The text outlines the steps to install and start a Milvus server, connect to it, create a collection with schema and index, insert data into the collection, and query the collection. Additionally, it mentions using LangChain and Milvus for building chatbots in an upcoming blog post. Resources are provided to get started with Milvus and Zilliz.
How Milvus Powers Credal’s Mission for “Useful AI, Made Safe”
Date published
Nov. 22, 2023
Author(s)
Anya Sage
Language
English
Word count
1020
Hacker News points
None found.
Credal, an enterprise AI platform, aims to make Generative AI integration safer and more accessible for businesses. Their solution focuses on seamlessly integrating data from various sources while ensuring privacy and security. At the core of their offering is Milvus, an open-source vector database that enables efficient search, filtering, and data curation capabilities. Credal's architecture prioritizes high-quality data interpretations and effective communication with GenAI models. The platform offers observability and governance tools for administrators and IT teams, including features like PII redaction, audit logging, and data access controls. Milvus's scalability and robustness make it a game-changer for Credal, enabling them to deliver "Useful AI, made safe" to businesses worldwide.
Zilliz Cloud Now Available on Microsoft Azure
Date published
Nov. 21, 2023
Author(s)
Steffi Li
Language
English
Word count
278
Hacker News points
None found.
Zilliz Cloud is now available on Microsoft Azure, expanding its presence across major cloud platforms including AWS and Google Cloud. This integration allows Azure-centric developers and enterprises to leverage the unique capabilities of Zilliz Cloud for vector database workloads. The move also signifies seamless access to Azure's cutting-edge AI services such as Semantic Kernel. Future enhancements include expansion into new Azure regions, availability on the Azure Marketplace, and continuous integration efforts to ensure data security and optimize application performance.
Milvus 2.3 Beta and Enterprise Upgrades on Zilliz Cloud
Date published
Nov. 21, 2023
Author(s)
Steffi Li
Language
English
Word count
460
Hacker News points
None found.
Zilliz Cloud has released the beta version of Milvus 2.3, introducing new features to enhance data management and querying processes for developers. The update includes Cosine similarity integration, Range Search feature, Upsert functionality, raw vector returns, JSON_CONTAINS filter, entity count, and more. Additionally, Zilliz Cloud has introduced enhanced enterprise features such as improved Role-Based Access Control (RBAC), expanded geographical options with the general availability of AWS EU Frankfurt region, and Self-Service Account and Organization Deletion feature. Furthermore, Zilliz Cloud is now available on Microsoft Azure in the azure-east-us region, completing its availability across major cloud platforms including AWS and Google Cloud. The company invites feedback from developers to shape the future of their vector database technology.
Enhancing Data Flow Efficiency: Zilliz Introduces Upsert, Kafka Connector, and Airbyte Integration
Date published
Nov. 20, 2023
Author(s)
Steffi Li
Language
English
Word count
1348
Hacker News points
None found.
Zilliz has introduced Upsert, Kafka Connector, and Airbyte integration to enhance data flow efficiency in its vector database. Upsert simplifies the update process by inserting or updating data based on atomicity. The Kafka Connector enables real-time streaming of vector data from Confluent/Kafka into Milvus or Zilliz vector databases, enhancing capabilities for Generative AI and e-commerce recommendations. Airbyte Integration streamlines data transfer and processing in LLMs and vector databases, improving search functionality. These enhancements aim to improve search performance and streamline the entire data pipeline, making it more efficient and developer-friendly.
What’s New in Milvus 2.3.2 & 2.3.3
Date published
Nov. 20, 2023
Author(s)
Steffi Li
Language
English
Word count
544
Hacker News points
None found.
Milvus, the vector database system, has released versions 2.3.2 and 2.3.3 with significant improvements aimed at enhancing performance and user experience. The latest updates include support for array data types, complex delete expressions, integration of TiKV for metadata storage, FP16 vector type, and vector index MMAP. Other enhancements include a rolling upgrade experience, performance optimization, upgraded CDC (Change Data Capture), bulk insert of binlog data with partition keys, and the return of binary metric types such as SUBSTRUCTURE and SUPERSTRUCTURE. The developer community's contributions have been instrumental in shaping these updates, and feedback is welcome for future enhancements.
How LangChain Implements Self Querying
Date published
Nov. 16, 2023
Author(s)
Yujian Tang
Language
English
Word count
890
Hacker News points
None found.
LangChain, an open-source library for LLM orchestration, recently added the "Self Query" retriever. This feature allows users to query vector databases like Milvus using LangChain. The implementation of this self-query retriever is covered in lines 189 to 233 of the base.py file in the self-query folder. The only class method for the self-query base class is from_llm, which has eight specified parameters and one allowing keyword arguments (kwargs). Four required parameters are llm, vectorstore, document_contents, and metadata_field_info. Other optional parameters include structured_query_translator, chain_kwargs, enable_limit, and use_original_query. The self-query retriever implementation involves parsing the self-query parameters, creating an LLM chain, and returning a self-query retriever. This feature enables users to build simple retrieval augmented generation (RAG) applications using an LLM, vector database, and prompts to interface with the LLM.
Join us at AWS re:Invent 2023
Date published
Nov. 15, 2023
Author(s)
Emily Kurze
Language
English
Word count
400
Hacker News points
None found.
AWS re:Invent, one of the largest global cloud computing events, will take place in Las Vegas from November 27 to December 1, 2023. Zilliz invites attendees to visit their booth (#1339) and meet the team behind Milvus, a vector database solution. The event offers opportunities for innovative solution demos, problem-solving expertise, collaboration, community engagement, and swag. Attendees can also book private demos or meetings with Zilliz's experts to discuss specific projects or use cases. Additionally, users are invited to join the team for dinner to share project updates and feedback. Resources on vector databases are recommended for those interested in learning more before the event.
Grounding Our Chat Towards Data Science Results
Date published
Nov. 15, 2023
Author(s)
Yujian Tang
Language
English
Word count
940
Hacker News points
None found.
In this tutorial, we learn how to ground our Retriever- Augmenter-Generator (RAG) results using LlamaIndex and citations. We start by setting up the necessary libraries and environment variables for our chatbot. Next, we define the parameters of our RAG chatbot, including the embedding model, vector database, and data abstractions. Finally, we implement citations via LlamaIndex's CitationQueryEngine module to ensure grounded results. This tutorial uses Zilliz Cloud as a fully managed and optimized version of Milvus for persisting data across multiple projects.
Do We Still Need Vector Databases for RAG with OpenAI's Releasing of Its Built-In Retrieval?
Date published
Nov. 13, 2023
Author(s)
Jael Gu
Language
English
Word count
1281
Hacker News points
None found.
OpenAI's built-in Retrieval feature in its Assistants API has some limitations, such as scalability constraints and lack of customization. These issues can be addressed by integrating a custom retriever powered by a vector database like Milvus or Zilliz Cloud. This approach allows developers to optimize and configure the retrieval process according to their specific needs, improving overall efficiency.
Unlock Advanced Recommendation Engines with Milvus' New Range Search
Date published
Nov. 9, 2023
Author(s)
Leon Cai
Language
English
Word count
1198
Hacker News points
None found.
Milvus, an open-source vector database, has introduced a new feature called Range Search to enhance its similarity search capabilities. This feature allows developers to specify a distance range for relevant vectors in their searches, addressing limitations of traditional KNN searches in recommendation systems where results can be either too similar or too diverse. The technical architecture and usage guide for Range Search are outlined, along with details on when to use it over Top-K search. The feature is not limited to recommendation engines but has broader applications in areas like content matching, anomaly detection, and NLP search tasks. It is now available for public preview on Zilliz Cloud.
Zilliz at HackNC 2023
Date published
Nov. 8, 2023
Author(s)
Yujian Tang
Language
English
Word count
203
Hacker News points
None found.
HackNC 2021, an annual hackathon event hosted by the University of North Carolina at Chapel Hill, saw over 1,300 registrations and 650 participating hackers. Zilliz, a data observability company, was represented during the event with a workshop and keynote speech. The winning project, "wellSpent," is an expense tracking app that provides users with a dynamic pie chart of their expenses, transaction lists, and various financial planning tools. Congratulations to the team behind wellSpent for their victory in the Best Use of Zilliz category.
Zilliz at CalHacks 2023
Date published
Nov. 3, 2023
Author(s)
Christy Bergman
Language
English
Word count
1137
Hacker News points
None found.
CalHacks, a hackathon event held in San Francisco from October 27-29, featured over 1000 students from around the world participating in various projects. With $137,650 in prize money and sponsor awards, several innovative projects were awarded for their use of Milvus, an open-source vector database. The winning project, Second Search, utilized Milvus to search lecture videos by embedding video caption text into vectors and returning relevant sections based on user queries. Other notable projects included Jarvis, which described visual scenes to visually impaired users, an AI 911 agent that assessed emergency situations, and Mental Maps, a chatbot for mental well-being tracking.
Announcing Confluent's Kafka Connector for Milvus and Zilliz Cloud: Unlocking the Power of Real-Time AI
Date published
Nov. 3, 2023
Author(s)
Fendy Feng
Language
English
Word count
966
Hacker News points
None found.
Confluent, a data streaming platform, has announced the availability of its Kafka Connector for open-source Milvus and Zilliz Cloud. This collaboration enables seamless real-time vector data streaming from Confluent to Milvus or Zilliz vector databases, significantly enhancing real-time Generative AI powered by large language models (LLMs) like OpenAI's GPT-4. The integration of Zilliz and Confluent allows for continuous flow of real-time Confluent vector streams converted from unstructured data to be ported to Milvus/Zilliz, empowering developers to build applications for various use cases such as real-time semantic search, image/video/audio similarity search and retrieval augmented generation. The integration opens up possibilities for various sectors and applications, including enhancing Generative AI with a real-time knowledge base and optimizing personalized recommendations for e-commerce platforms.
Alexandr Guzhva: Why I Joined Zilliz
Date published
Nov. 2, 2023
Author(s)
Alexandr Guzhva
Language
English
Word count
404
Hacker News points
None found.
Alexandr Guzhva, an expert in performance optimization, joined Zilliz to outcompete its competitors and fully utilize his expertise. With over 15 years of experience in finance and two years at Meta, he has contributed significantly to the FAISS library and written more than 2 million lines of code. Zilliz's focus on advanced similarity search methods and integration with NVIDIA Raft attracted him to the company. His goal is to improve Zilliz products and contribute to Milvus OSS, potentially applying his knowledge of ANNS for time series prediction in the future.
How Troop Uses Milvus Vector Database to Unlock the Collective Power of Retail Investors
Date published
Nov. 1, 2023
Author(s)
Anya Sage
Language
English
Word count
722
Hacker News points
None found.
Troop, a tech company revolutionizing shareholder activism and engagement, leverages machine learning and AI technologies to enable investors to participate in corporate governance. Using the Milvus vector database, Troop built a solution that empowers individuals for collective financial activism in major corporations. The integration of Milvus enabled scalability, efficient handling of massive datasets, separation of storage and compute, rapid scaling of nodes, data partitioning, and improved semantic search capabilities. This infrastructure supports retrieval augmented generation (RAG) to process large volumes of unstructured data and build intelligent shareholder voting recommendation engines.
Evaluations for Retrieval Augmented Generation: TruLens + Milvus
Date published
Oct. 31, 2023
Author(s)
Josh Reini
Language
English
Word count
2154
Hacker News points
None found.
This article discusses the use of vector search technologies, such as Milvus and Zilliz Cloud, in building retrieval augmented generation (RAG) applications. RAGs are question-answering applications that allow large language models (LLMs) to access a verified knowledge base for context. The article highlights various configuration choices that can affect the quality of retrieval, including data selection, embedding model, index type, amount of context retrieved, and chunk size. It also introduces TruLens, an open-source library for evaluating and tracking the performance of LLM applications like RAGs. By using TruLens to evaluate different configurations and parameters, developers can identify failure modes and find the most performant combination for their specific use case.
Retrieval Augmented Generation on Notion Docs via LangChain
Date published
Oct. 30, 2023
Author(s)
Yujian Tang
Language
English
Word count
1042
Hacker News points
None found.
This tutorial demonstrates how to build a retrieval augmented generation (RAG) type app using LangChain and Milvus. The process involves reviewing LangChain self-querying, working with Notion docs in LangChain, ingesting Notion documents, storing them in a vector database, and querying the documents. The tutorial uses LangChain for operational framework and Milvus as the similarity engine. It covers how to load and parse a Notion document into sections to query in a basic RAG architecture, with future tutorials exploring different chunking strategies, embeddings, splitting strategies, and evaluation methods.
Exploring LLM-Driven Agents in the Age of AI
Date published
Oct. 27, 2023
Author(s)
David Wang
Language
English
Word count
872
Hacker News points
None found.
Large Language Models (LLMs) are driving innovation in AI, with LLM-driven Agents at the forefront. These agents combine LLMs with planning, memory, and tool modules to make decisions and take actions autonomously. The AutoGPT project demonstrates their potential by generating tasks, prioritizing them, and executing them using external resources. However, challenges such as getting stuck in loops and prompt length constraints need to be addressed. Ongoing research is focused on improving LLMs' reasoning abilities, enhancing agent frameworks, and developing specialized agent applications for various scenarios.
Experimenting with Different Chunking Strategies via LangChain
Date published
Oct. 24, 2023
Author(s)
Yujian Tang
Language
English
Word count
1499
Hacker News points
None found.
This tutorial explores the impact of different chunking strategies on retrieval augmented generation applications using LangChain. Chunking is the process of dividing text into smaller parts, and the choice of strategy can significantly affect the output quality. The code for this post can be found in a GitHub repo on LLM experimentation. The tutorial covers setting up the environment, importing necessary tools, and creating a function that takes parameters for document ingestion and chunking experimentation. It then tests five different chunking strategies with varying lengths and overlaps. The results show that finding an ideal chunking size is challenging and depends on the desired output format. Future tutorials may cover testing overlaps and using other libraries to refine chunking strategies further.
Jiang Chen: Why I Joined Zilliz
Date published
Oct. 16, 2023
Author(s)
Jiang Chen
Language
English
Word count
835
Hacker News points
None found.
Over the past decade, the author has specialized in various aspects of data infrastructure, including access control, data privacy, NoSQL databases, and web-scale data indexing. In recent years, big data emerged as a significant innovation with technologies like MapReduce, distributed computing, and structured data storage leading the way. However, the AI era requires different technology stacks, especially with the growing popularity of Large Language Models. Embedding and vector stores are at the center of this stage, which is also the focus of Zilliz. The author's experience includes working on search indexing at Google, where they built ultra-flexible infrastructures to understand billions of images and videos on the public web. They believe that AI-native infrastructure holds the key to the future of business and are enthusiastic about democratizing this highly complex infrastructure for resource-limited startups. The author joined Zilliz due to its ambitious mission, exceptional team, and challenging work environment. At Zilliz, they build a suite of tooling and services that ease the information retrieval process on unstructured data, including Towhee, Akcio, and a vector database for efficient storage and search of vector embeddings.
Milvus Introduced MMap for Redefined Data Management and Increased Storage Capability
Date published
Oct. 13, 2023
Author(s)
Yang Cen
Language
English
Word count
661
Hacker News points
None found.
Milvus introduces the MMap feature, which redefines how large data volumes are managed and promises cost efficiency without compromising functionality. MMap is a memory-mapped file technology that allows Milvus to map large files directly into system memory space, transforming them into contiguous memory blocks. This integration eliminates the need for explicit read or write operations, fundamentally changing how Milvus manages data. The feature benefits vector databases by enabling more efficient storage and access to large files or situations where users need to access files randomly. However, it may cause performance fluctuations as data volume grows. Enabling MMap in Milvus is straightforward, requiring a modification of the milvus.yaml file. Future updates will refine memory usage and provide more granular control over the feature.
How to Choose a Vector Database: Qdrant Cloud vs. Zilliz Cloud
Date published
Oct. 13, 2023
Author(s)
Steffi Li
Language
English
Word count
1239
Hacker News points
None found.
This blog compares two vector databases, Qdrant and Zilliz/Milvus. While both are purpose-built for vector data, they serve different market needs. Qdrant is designed for developers who prioritize modern technology and minimal infrastructure maintenance, while Zilliz/Milvus is engineered for extreme scale, high performance, and low latency. The benchmark results show that Zilliz Cloud outperforms Qdrant Cloud in terms of queries per second (QPS), queries per dollar (QP$), and latency. Furthermore, the feature comparison highlights differences in scalability, functionality, and purpose-built features between the two vector databases.
Chat with Towards Data Science Using LlamaIndex
Date published
Oct. 12, 2023
Author(s)
Yujian Tang
Language
English
Word count
1338
Hacker News points
None found.
This tutorial demonstrates how to use LlamaIndex, an open-source data retrieval framework, to improve the performance of a chatbot built with Zilliz Cloud. The primary challenge addressed in this project is integrating an existing Milvus collection into LlamaIndex while handling differences in embedding vector dimensions and metadata field usage. By using LlamaIndex as a query engine, the chatbot's retrieval capabilities are significantly enhanced, providing more accurate and relevant responses to user queries.
Optimizing Data Communication: Milvus Embraces NATS Messaging
Date published
Oct. 11, 2023
Author(s)
Zhen Ye
Language
English
Word count
1055
Hacker News points
None found.
Milvus, an open-source vector database, has introduced NATS messaging integration in its latest version 2.3. This feature enhances the handling of substantial data volumes and complex scenarios compared to its predecessor, RocksMQ. NATS is a distributed system connectivity technology implemented in Go that supports various communication modes like Request-Reply and Publish-Subscribe across systems. Milvus 2.3 offers a new control option, mq.type, which allows users to specify the type of MQ they want to use. To enable NATS, set mq.type=natsmq. The migration from RocksMQ to NATS is seamless and involves steps like stopping write operations, flushing data, modifying configurations, and verifying the migration through Milvus logs. Performance testing results show that NATS outperforms RocksMQ for larger data packets (> 64kb), offering much faster response times. In extensive testing with a 100 million vectors dataset, NATS showcased lower vector search and query latency compared to RocksMQ.
Use Milvus and Airbyte for Similarity Search on All Your Data
Date published
Oct. 10, 2023
Author(s)
Joe Reuter
Language
English
Word count
1909
Hacker News points
None found.
Milvus is an open-source vector database used to store, index, and efficiently search high-dimensional vector data. It's particularly useful in applications involving similarity searches across unstructured data, such as Generative Chat responses, product recommendations, and more. By using Airbyte, it's straightforward to transfer data from many different sources into Milvus, calculating vector embeddings of texts along the way. The power of embeddings is to be able to search for relevant pieces of information, even if similar concepts are phrased differently. This article demonstrates how to use Zilliz Cloud as a vector store, Airbyte to extract and load the data, OpenAI embedding API to calculate embeddings, and Streamlit to build a smart submission form showing relevant data.
Christy Bergman: Why I Joined Zilliz
Date published
Oct. 6, 2023
Author(s)
Christy Bergman
Language
English
Word count
1432
Hacker News points
None found.
Christy Bergman, a new Developer Advocate at Zilliz, shares her journey of discovering and choosing Milvus, the world's most popular open-source vector database. She explains how she explored various vector databases, including FAISS, Qdrant, Chroma, Weaviate, Pinecone, and finally settled on Milvus due to its user-friendly experience, speed in loading vectors and querying, and additional features. Christy also discusses her role at Zilliz and her plans for organizing events, writing blogs, improving documentation, and helping developers learn how to use Milvus.
Efficient Vector Similarity Search in Recommender Workflows Using Milvus with NVIDIA Merlin
Date published
Oct. 4, 2023
Author(s)
Burcin Bozkaya
Language
English
Word count
3087
Hacker News points
None found.
This blog post discusses the integration of NVIDIA Merlin, an open-source framework developed for training end-to-end models to make recommendations at any scale, with Milvus, an efficient vector database created by Zilliz. The integration is beneficial in the item retrieval stage with a highly efficient top-k vector embedding search. The post also highlights how Milvus complements Merlin in recommender systems workflows and provides benchmark results showing impressive speedups with GPU-accelerated Milvus that uses NVIDIA RAFT with the vector embeddings generated by Merlin Models.
How to Get the Right Vector Embeddings
Date published
Oct. 3, 2023
Author(s)
Yujian Tang
Language
English
Word count
1846
Hacker News points
None found.
Vector embeddings are crucial when working with semantic similarity. They represent input data as a series of numbers, allowing mathematical operations to be performed on the data instead of relying on qualitative comparisons. The appropriate vector embeddings must be obtained before use, as using an image model for text or vice versa may result in poor results. Vector embeddings are influential for many tasks, particularly semantic search. Vector embeddings are created by removing the last layer and taking the output from the second-to-last layer of a deep learning model (embedding models or a deep neural network). The dimensionality of a vector embedding is equivalent to the size of the second-to-last layer in the model. Common vector dimensionalities include 384, 768, 1,536, and 2,048. A single dimension in a vector embedding does not mean anything; however, when all dimensions are taken together, they provide the semantic meaning of the input data. The dimensions represent high-level, abstract attributes that depend on the training data and the model itself. Different models generate different embeddings based on their training data and architecture. To obtain proper vector embeddings, identify the type of data you wish to embed (images, text, audio, videos, or multimodal data) and use appropriate open-source embedding models from Hugging Face or PyTorch. For example, ResNet-50 is a popular image recognition model, while MiniLM-L6-v2 and MPNet-Base-V2 are text embedding models. Vector databases like Milvus and Zilliz Cloud are used to store, index, and search across massive datasets of unstructured data through vector embeddings. They employ the Approximate Nearest Neighbor (ANN) algorithm to calculate spatial distances between query vectors and stored vectors in the database.
How to Migrate Your Data to Milvus Seamlessly: A Comprehensive Guide
Date published
Oct. 2, 2023
Author(s)
Wenhui Zhang
Language
English
Word count
1741
Hacker News points
None found.
Milvus is an open-source vector database designed for similarity search, offering robust storage, processing, and retrieval capabilities for billions of vector data with minimal latency. As of September 2023, it has garnered almost 23,000 stars on GitHub and is used by tens of thousands of users across various industries. The latest release introduces new features such as GPU support and MMap storage for increased performance and capacity. To facilitate the migration process from older versions of Milvus (1.x), FAISS, and Elasticsearch 7.0 and beyond to the latest Milvus 2.x versions, a data migration tool called Milvus Migration has been developed. This powerful tool is written in Go and supports multiple interaction modes, including command-line interface (CLI) using the Cobra framework, Restful API with built-in Swagger UI, and integration as a Go module in other tools. Milvus Migration simplifies the migration process through its robust feature set, which includes support for various data sources such as Milvus 1.x to Milvus 2.x, Elasticsearch 7.0 and beyond to Milvus 2.x, and FAISS to Milvus 2.x. It also supports multiple file formats like local files, Amazon S3, Object Storage Service (OSS), Google Cloud Platform (GCP), and flexible Elasticsearch integration for migrating dense_vector type vectors from Elasticsearch as well as other field types such as long, integer, short, boolean, keyword, text, and double. The migration process involves configuring a migration.yaml file with details about the data source, target, and other relevant settings. Users can then execute the migration job using either command-line or Restful API methods. Once completed, users can view the total number of successful rows migrated and perform other collection-related operations using Attu, an all-in-one vector database administration tool. Future plans for Milvus Migration include supporting migration from more data sources like Redis and MongoDB, adding resumable migration capabilities, simplifying migration commands by merging the dump and load processes into one, and expanding support to other mainstream data sources.
Getting Started with GPU-Powered Milvus: Unlocking 10x Higher Performance
Date published
Sept. 29, 2023
Author(s)
Jaken Ma
Language
English
Word count
803
Hacker News points
None found.
Milvus 2.3 introduces GPU support, unlocking a 10x increase in throughput and significant reductions in latency. This strategic innovation is aimed at enhancing vector searching capabilities, particularly with the rise of Large Language Models (LLMs) like GPT-3. The integration of Milvus and NVIDIA GPUs allows for efficient searching through massive datasets and expands the AI landscape. To get started with the Milvus GPU version, users need to install CUDA drivers, configure Milvus GPU settings, build Milvus locally, and run it in standalone mode or using a provided docker-compose file.
Using LangChain to Self-Query a Vector Database
Date published
Sept. 28, 2023
Author(s)
Yujian Tang
Language
English
Word count
1206
Hacker News points
None found.
LangChain, known for orchestrating interactions with large language models (LLMs), has introduced self-querying capabilities. This tutorial demonstrates how to perform self-querying on Milvus, the world's most popular vector database. The process involves setting up LangChain and Milvus, obtaining necessary data, informing the model about expected data format, and finally, performing self-querying. Self-querying allows an LLM to query itself using the underlying vector store, creating a simple retrieval augmented generation (RAG) app in the CVP framework.
Zilliz x Galileo: The Power of Vector Embeddings
Date published
Sept. 27, 2023
Author(s)
Yujian Tang
Language
English
Word count
1119
Hacker News points
None found.
Unstructured data, which makes up 80% of global data, is becoming increasingly prevalent. Vector embeddings are numerical representations used to work with unstructured data such as text, images, audio, and videos. They can be extracted from trained machine-learning models and have high dimensionality to store complex data. Vector embeddings are the de facto way to work with unstructured data, allowing for comparisons between data points. When generating embedding vectors, factors like vector size, training data quality, and quantity should be considered. Vector embeddings can be used to debug training data by detecting errors through clustering, finding samples not present in the training data, identifying hallucinations, and fixing errors in retrieval augmented generation (RAG). Additionally, they can be indexed, stored, and queried using vector databases like Milvus or Zilliz Cloud. The power of vector embeddings is evident from their wide range of use cases, making them a valuable tool for working with unstructured data in machine learning applications.
Zilliz Makes Real-Time AI a Reality with Confluent
Date published
Sept. 26, 2023
Author(s)
Steffi Li
Language
English
Word count
976
Hacker News points
None found.
Zilliz Cloud has integrated with Confluent Cloud, allowing users of both platforms to access real-time data streams across their entire business for building AI applications. The integration enables the ingestion, parsing, and processing of real-time data into Zilliz Cloud using Confluent's Kafka producer and consumer APIs. This collaboration opens new avenues for leveraging Generative Artificial Intelligence (GenAI) in real-time scenarios, such as personalized responses and content generation platforms. The integration also enhances traditional AI use cases like recommender systems and anomaly detection. With easy access to data streams from across their entire business, Zilliz users can now create a real-time knowledge base, build governed, secured, and trusted AI applications, and experiment, scale, and innovate faster.
Exploring the Marvels of Knowhere 2.0
Date published
Sept. 25, 2023
Author(s)
Patrick Xu
Language
English
Word count
803
Hacker News points
None found.
Milvus 2.3 has been released with significant updates, including the transformative upgrade of Knowhere 2.0. Key features of Knowhere 2.0 include support for GPU indexes, Cosine similarity, ScaNN index, ARM architecture, range search, optimized filter queries, code structure and compilation enhancements, MMap support, and retrieval of original vectors. These improvements aim to elevate Milvus's performance and user experience in vector databases.
How to Choose A Vector Database: Weaviate Cloud vs. Zilliz Cloud
Date published
Sept. 21, 2023
Author(s)
Steffi Li
Language
English
Word count
1234
Hacker News points
None found.
This blog compares two vector databases, Weaviate and Zilliz/Milvus. While both are designed to manage vector data, they serve different needs. Weaviate is a strong choice for developers seeking quick and straightforward implementation, while Zilliz/Milvus excels in handling large-scale, high-performance, low-latency applications. The benchmark results show that Zilliz Cloud outperforms Weaviate Cloud in terms of queries per second (QPS), queries per dollar (QP$), and latency. Furthermore, a feature comparison reveals differences in scalability, functionality, and purpose-built features between the two vector databases.
Chat Towards Data Science: Building a Chatbot with Zilliz Cloud
Date published
Sept. 20, 2023
Author(s)
Yujian Tang
Language
English
Word count
2347
Hacker News points
1
In the first part of the Chat Towards Data Science blog series, we guide you through building a chatbot using your dataset as the knowledge backbone. We employ web scraping techniques to collect data for our knowledge base and store it in Zilliz Cloud, a fully managed vector database service built on Milvus. The tutorial covers creating a chatbot for the Towards Data Science publication, demonstrating how to prompt the user for a query, vectorize the query, and query the vector database. However, we discovered that while the results are semantically similar, they are not exactly what we desire. In the next part of this blog series, we will explore using LlamaIndex to route queries and see if we can achieve better results.
Getting Started with Pgvector: A Guide for Developers Exploring Vector Databases
Date published
Sept. 15, 2023
Author(s)
Siddhant Varma
Language
English
Word count
2072
Hacker News points
None found.
This guide explores the use of Pgvector, an extension of PostgreSQL that allows developers to store and query vector data. It covers setting up Pgvector, integrating it with PostgreSQL, using it for similarity searches, understanding its indexes and limitations, and comparing it with dedicated vector databases like Milvus and Zilliz. The article also discusses the advantages of using dedicated vector databases over traditional relational databases and provides benchmarking results to help developers choose the best solution for their projects.
Comparing Llama 2 Chat and ChatGPT: How They Perform in Question Answering
Date published
Sept. 13, 2023
Author(s)
Towhee team
Language
English
Word count
2113
Hacker News points
2
Meta AI has released its open-source large language model (LLM), Llama 2, which is available for free use in commercial applications. It comes in three sizes and supports context lengths of up to 4096 tokens. Llama Chat, the fine-tuned model of Llama 2, has been trained on over 1 million human annotations and is specifically tailored for conversational AI scenarios. The performance of Llama 2 in answering questions was compared with that of ChatGPT, showing that both models excel at answering questions based on real-world knowledge. However, Llama 2 faces challenges maintaining answer quality when confronted with complex text formatting. Llama 2 stands out by not requiring high-end GPUs and can operate smoothly on desktop-level GPUs, especially after undergoing low-bit quantization.
An Engineering Perspective: Why Milvus is a Compelling Option for Your Apps?
Date published
Sept. 10, 2023
Author(s)
Owen jiao
Language
English
Word count
688
Hacker News points
None found.
Milvus 2.3, the latest version of the pioneering vector database, offers numerous enhancements and new features that make it an excellent choice for users looking to build applications ranging from recommendation systems and chatbots to artificial general intelligence (AGI) and retrieval augmented generation (RAG). The updated version balances performance, cost, and scalability while providing multiple deployment options. It also empowers developers with simplicity by enhancing its API and supporting data integration with other products. Furthermore, Milvus 2.3 ensures stability and second-level availability through improved system reliability features. Future updates will introduce additional cutting-edge features to enhance the user experience further.
How to Choose A Vector Database: Elastic Cloud vs. Zilliz Cloud
Date published
Sept. 5, 2023
Author(s)
Chris Churilo
Language
English
Word count
1221
Hacker News points
None found.
This blog compares Elastic Cloud and Zilliz Cloud, two vector database cloud services. It delves into benchmarks to offer a performance perspective and performs an in-depth feature analysis of both platforms. The results show that Zilliz outperforms Elastic Cloud in terms of QPS, queries per dollar (QP$), and latency. Additionally, the blog highlights the features of each platform, such as scalability, multi-tenancy, data isolation, API support, and user interface/administrative console. It also provides a migration tutorial for moving from Elasticsearch to Zilliz Cloud.
What’s New in Milvus 2.3
Date published
Aug. 30, 2023
Author(s)
Steffi Li
Language
English
Word count
364
Hacker News points
None found.
Milvus 2.3.0 has been released, featuring numerous enhancements and improvements. Key features include computational upgrades with GPU & ARM64 support, search & indexing enhancements such as range search and ScaNN index integration, data pipeline tools like iterator in Pymilvus and upsert operation, and system optimizations for better operability, load balancing, and query performance. The release also includes bug fixes and updates to existing tools like Birdwatcher and Attu. Developers are encouraged to integrate these updates and provide feedback.
Building LLM Apps with 100x Faster Responses and Drastic Cost Reduction Using GPTCache
Date published
Aug. 28, 2023
Author(s)
Fendy Feng
Language
English
Word count
1461
Hacker News points
None found.
The article discusses the challenges faced by developers while building applications based on large language models (LLMs) such as high costs of API calls and poor performance due to response latency. It introduces GPTCache, an open-source semantic cache designed to improve efficiency and speed of GPT-based applications. GPTCache stores LLM responses in the cache, allowing users to retrieve previously requested answers without calling the LLM again. The article explains how GPTCache works, its benefits including drastic cost reduction, faster response times, improved scalability, and better availability. It also provides an example of OSS Chat, an AI chatbot that utilizes GPTCache and the CVP stack for more accurate results.
Comparing Different Vector Embeddings
Date published
Aug. 21, 2023
Author(s)
Yujian Tang
Language
English
Word count
2436
Hacker News points
None found.
This article discusses the differences between vector embeddings generated by different neural networks and how to evaluate them in Jupyter Notebook. Vector embeddings are numerical representations of unstructured data, such as images, videos, audio, text, and molecular images. They are generated by running input data through a pre-trained neural network and taking the output of the second-to-last layer. The article provides an example of comparing vector embeddings from three different multilingual models based on MiniLM from Hugging Face using L2 distance metric and an inverted file index as the vector index. It also demonstrates how to compare vector embeddings directly in a Jupyter Notebook with Milvus Lite, a lightweight version of Milvus.
How to Build an AI Chatbot with Milvus and Towhee
Date published
Aug. 18, 2023
Author(s)
Eric Goebelbecker
Language
English
Word count
2364
Hacker News points
None found.
In this tutorial, we will create an intelligent chatbot using Milvus and Towhee. We will use the following components to build our chatbot: 1. Milvus: An open-source vector database for efficient similarity search and AI applications. 2. Towhee: A Python library that provides a set of pre-built machine learning models and tools for processing unstructured data. 3. OpenAI API: A service that allows developers to access powerful language generation models like GPT-3.5. 4. Gradio: An open-source Python library for creating interactive demos of machine learning models. First, we need to install the required packages: ```bash pip install milvus pymilvus towhee gradio ``` Next, let's define some variables and answer the prompt for the API key. Run this code to do so: ```python import os import getpass MILVUS_URI = 'http://localhost:19530' [MILVUS_HOST, MILVUS_PORT] = MILVUS_URI.split('://')[1].split(':') DROP_EXIST = True EMBED_MODEL = 'all-mpnet-base-v2' COLLECTION_NAME = 'chatbot_demo' DIM = 768 OPENAI_API_KEY = getpass.getpass('Enter your OpenAI API key: ') if os.path.exists('./sqlite.db'): os.remove('./sqlite.db') ``` Sample pipeline Now, let's download some data and store it in Milvus. But before you do that, let's look at a sample pipeline for downloading and processing unstructured data. You'll use the Towhee documentation pages for this example. You can try different sites to see how the code processes different data sets. This code uses Towhee pipelines: - input - begins a new pipeline with the source passed into it - map - uses ops.text_loader() to retrieve the URL and map it to 'doc' - flat_map - uses ops.text_splitter() to process the document into "chunks" for storage - output - closes and prepares the pipeline for use Pass this pipeline to DataCollection to see how it works: ```python from towhee import pipe, ops, DataCollection pipe_load = ( pipe.input('source') .map('source', 'doc', ops.text_loader()) .flat_map('doc', 'doc_chunks', ops.text_splitter(chunk_size=300)) .output('source', 'doc_chunks') ) DataCollection(pipe_load('https://towhee.io')).show() ``` Here's the output from show(): The pipeline created five chunks from the document. Sample embedding pipeline The pipeline retrieved the data and created chunks. You need to create embeddings, too. Let's take a look at another sample pipeline: This one uses map() to run ops.sentence_embedding.sbert() on each chunk. In this example, we're passing in a single block of text. ```python from towhee import pipe, ops, DataCollection pipe_embed = ( pipe.input('doc_chunk') .map('doc_chunk', 'vec', ops.sentence_embedding.sbert(model_name=EMBED_MODEL)) .map('vec', 'vec', ops.np_normalize()) .output('doc_chunk', 'vec') ) text = '''SOTA Models We provide 700+ pre-trained embedding models spanning 5 fields (CV, NLP, Multimodal, Audio, Medical), 15 tasks, and 140+ model architectures. These include BERT, CLIP, ViT, SwinTransformer, data2vec, etc. ''' DataCollection(pipe_embed(text)).show() ``` Run this code to see how the pipeline processes the single text block: Prepare Milvus Now, you need a collection to hold the data. This function defines create_collection(), which uses MILVUS_HOST and MILVUS_PORT to connect to Milvus, drop any existing collections with the specified name, and create a new one with this schema: - id - an integer identifier - embedding - a vector of floats for the embeddings - text - the corresponding text for the embeddings ```python from pymilvus import ( connections, utility, Collection, CollectionSchema, FieldSchema, DataType ) def create_collection(collection_name): connections.connect(host=MILVUS_HOST, port=MILVUS_PORT) has_collection = utility.has_collection(collection_name) if has_collection: collection = Collection(collection_name) if DROP_EXIST: collection.drop() else: return collection # Create collection fields = [ FieldSchema(name='id', dtype=DataType.INT64, is_primary=True, auto_id=True), FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, dim=DIM), FieldSchema(name='text', dtype=DataType.VARCHAR, max_length=500) ] schema = CollectionSchema( fields=fields, description="Towhee demo", enable_dynamic_field=True ) collection = Collection(name=collection_name, schema=schema) index_params = { 'metric_type': 'IP', 'index_type': 'IVF_FLAT', 'params': {'nlist': 1024} } collection.create_index( field_name='embedding', index_params=index_params ) return collection ``` Insert pipeline It's time to process your input text and insert it into Milvus. Let's start with a pipeline that collapses what you learned above: This function: - Creates the new collection - Retrieves the data - Splits it into chunks - Creates embeddings using EMBED_MODEL - Insert the text and embeddings into Milvus ```python from towhee import pipe, ops, DataCollection load_data = ( pipe.input('collection_name', 'source') .map('collection_name', 'collection', create_collection) .map('source', 'doc', ops.text_loader()) .flat_map('doc', 'doc_chunk', ops.text_splitter(chunk_size=300)) .map('doc_chunk', 'vec', ops.sentence_embedding.sbert(model_name=EMBED_MODEL)) .map('vec', 'vec', ops.np_normalize()) .map(('collection_name', 'vec', 'doc_chunk'), 'mr', ops.ann_insert.osschat_milvus(host=MILVUS_HOST, port=MILVUS_PORT)) .output('mr') ) ``` Here it is in action: ```python project_name = 'towhee_demo' data_source = 'https://en.wikipedia.org/wiki/Frodo_Baggins' mr = load_data(COLLECTION_NAME, data_source) print('Doc chunks inserted:', len(mr.to_list())) ``` The model does a good job of pulling three closely matched nodes: Search knowledge base Now, with the embeddings and text stored in Milvus, you can search it: This function creates a query pipeline. The most important step is this one: ops.ann_search.osschat_milvus(host=MILVUS_HOST, port=MILVUS_PORT, **{'metric_type': 'IP', 'limit': 3, 'output_fields': ['text']})) The osschat_milvus searches the embeddings for matches to the submitted text. Here is the whole pipeline: ```python from towhee import pipe, ops, DataCollection pipe_search = ( pipe.input('collection_name', 'query') .map('query', 'query_vec', ops.sentence_embedding.sbert(model_name=EMBED_MODEL)) .map('query_vec', 'query_vec', ops.np_normalize()) .map(('collection_name', 'query_vec'), 'search_res', ops.ann_search.osschat_milvus(host=MILVUS_HOST, port=MILVUS_PORT, **{'metric_type': 'IP', 'limit': 3, 'output_fields': ['text']})) .flat_map('search_res', ('id', 'score', 'text'), lambda x: (x[0], x[1], x[2])) .output('query', 'text', 'score') ) ``` Try it: ```python query = 'Who is Frodo Baggins?' DataCollection(pipe_search(project_name, query)).show() ``` The model does a good job of pulling three closely matched nodes: Add an LLM Now, it’s time to add a large language model (LLM) so users can hold a conversation with the chatbot. We’ll use ChatGPT and the OpenAI API for this example. Chat history In order to get better results from the LLM, you need to store chat history and present it with queries. You’ll use SQLite for this step: Here's a function for retrieving the history: ```python from towhee import pipe, ops, DataCollection pipe_get_history = ( pipe.input('collection_name', 'session') .map(('collection_name', 'session'), 'history', ops.chat_message_histories.sql(method='get')) .output('collection_name', 'session', 'history') ) ``` Here's the one to store it: ```python from towhee import pipe, ops, DataCollection pipe_add_history = ( pipe.input('collection_name', 'session', 'question', 'answer') .map(('collection_name', 'session', 'question', 'answer'), 'history', ops.chat_message_histories.sql(method='add')) .output('history') ) ``` LLM query pipeline Now, we need a pipeline to submit queries to ChatGPT: This pipeline: - searches Milvus using the user's query - collects the current chat history - submits the query, Milvus search, and chat history to ChatGPT - Appends the ChatGPT result to the chat history - Returns the result to the caller ```python from towhee import pipe, ops, DataCollection chat = ( pipe.input('collection_name', 'query', 'session') .map('query', 'query_vec', ops.sentence_embedding.sbert(model_name=EMBED_MODEL)) .map('query_vec', 'query_vec', ops.np_normalize()) .map(('collection_name', 'query_vec'), 'search_res', ops.ann_search.osschat_milvus(host=MILVUS_HOST, port=MILVUS_PORT, **{'metric_type': 'IP', 'limit': 3, 'output_fields': ['text']})) .map('search_res', 'knowledge', lambda y: [x[2] for x in y]) .map(('collection_name', 'session'), 'history', ops.chat_message_histories.sql(method='get')) .map(('query', 'knowledge', 'history'), 'messages', ops.prompt.question_answer()) .map('messages', 'answer', ops.LLM.OpenAI(api_key=OPENAI_API_KEY, model_name='gpt-3.5-turbo', temperature=0.8)) .map(('collection_name', 'session', 'query', 'answer'), 'new_history', ops.chat_message_histories.sql(method='add')) .output('query', 'history', 'answer') ) ``` Let's test this pipeline before connecting it to a GUI: ```python new_query = 'Where did Frodo take the ring?' DataCollection(chat(COLLECTION_NAME, new_query, session_id)).show() ``` The pipeline works. Let's put together a Gradio interface. Gradio GUI First, you need functions to create a session identifier and to respond to queries from the interface: These functions create a session ID using a UUID, and accept a session and query for the query pipeline: ```python import uuid import io def create_session_id(): uid = str(uuid.uuid4()) suid = ''.join(uid.split('-')) return 'sess_' + suid def respond(session, query): res = chat(COLLECTION_NAME, query, session).get_dict() answer = res['answer'] response = res['history'] response.append((query, answer)) return response ``` Next, the Gradio interface uses these functions to build a chatbot: It uses the Blocks API to create a ChatBot interface. The Send Message button uses the respond function to send requests to ChatGPT: ```python import gradio as gr with gr.Blocks() as demo: session_id = gr.State(create_session_id) with gr.Row(): with gr.Column(scale=2): gr.Markdown('''## Chat''') conversation = gr.Chatbot(label='conversation').style(height=300) question = gr.Textbox(label='question', value=None) send_btn = gr.Button('Send Message') send_btn.click( fn=respond, inputs=[ session_id, question ], outputs=conversation, ) demo.launch(server_name='127.0.0.1', server_port=8902) ``` Here it is: Now, you have an intelligent chatbot! Summary In this post, we created Towhee pipelines to ingest unstructured data, process it for embeddings, and store those embeddings in Milvus. Then, we created a query pipeline for the chat function and connected the chatbot with an LLM. Finally, we got an intelligent chatbot. This tutorial demonstrates how easy it is to build applications with Milvus. Milvus brings numerous advantages when integrated into applications, especially those relying on machine learning and artificial intelligence. It offers highly efficient, scalable, and reliable vector similarity search and analytics capabilities critical in applications like chatbots, recommendation systems, and image or text recognition.
Building LLM Augmented Apps with Zilliz Cloud
Date published
Aug. 17, 2023
Author(s)
Steffi Li
Language
English
Word count
1272
Hacker News points
None found.
The release of GPT-3.5 and GPT-4 has revolutionized how users interact with data and applications, providing more natural and intuitive communication interfaces. However, implementing LLMs like ChatGPT in applications presents challenges such as lack of private data access, hallucination, outdated information, high costs, slow performance, and immutable pre-training data. Zilliz Cloud and GPTCache are innovative solutions that address these issues by improving accuracy, timeliness, cost-efficiency, and performance. The CVP Stack (ChatGPT/LLMs + a vector database + prompt-as-code) offers a robust framework for building LLM applications. OSS Chat is an example of a successful AI chatbot built with the CVP stack using Akcio and Zilliz Cloud. To learn more about these technologies, join the upcoming webinar on September 7.
Using AI to Find Your Celebrity Stylist (Part II)
Date published
Aug. 11, 2023
Author(s)
Yujian Tang
Language
English
Word count
2528
Hacker News points
1
In this tutorial, we extended our first celebrity-style project by using Milvus' new dynamic schema, filtering out certain segmentation IDs, and keeping track of the bounding boxes of our matches. We also sorted our search results to return the top three results based on the number of matches. Milvus' new dynamic schema allows us to add extra fields when we upload data using a dictionary format, changing the way we were initially batch-uploading a list of lists. It also facilitated adding crop coordinates without changing the schema. As a new preprocessing step, we filtered out certain IDs that aren't clothing-related based on the model card in Hugging Face. We filter these IDs out in the get_masks function. Fun fact, the obj_ids object in that function is actually a tensor. We also kept track of the bounding boxes. We moved the embedding step to the image cropping function and returned the embeddings with the bounding boxes and segmentation IDs. Then, we saved these embeddings into Milvus using a dynamic schema. At query time, we aggregated all the returned images by the number of bounding boxes they contained, allowing us to find the closest matching celebrity image via different articles of clothing. Now it's up to you. You can take my suggestions and make something else out of it, such as a fashion recommender system, a better style comparison system for you and your friends, or a generative fashion AI app.
Using AI to Find Your Celebrity Stylist (Part I)
Date published
Aug. 8, 2023
Author(s)
Yujian Tang
Language
English
Word count
2587
Hacker News points
None found.
The article discusses the use of AI in fashion, specifically focusing on a project called "Fashion AI" that utilizes a fine-tuned model to segment clothing in images. It explains how the project involves cropping out each labeled article and resizing the images to the same size before storing the embeddings generated from those images in Milvus, an open-source vector database. The article also provides detailed steps on how to generate image segmentation for fashion items, add your image data to Milvus, and find out which celebrity your dress is most like using this technology.
Zilliz Cloud Expands to AWS and GCP Singapore
Date published
Aug. 7, 2023
Author(s)
Steffi Li
Language
English
Word count
338
Hacker News points
None found.
Zilliz Cloud has expanded its services to AWS and GCP Singapore regions, following the positive response from users since its launch in April 2023. This expansion aims to meet increasing demand and provide greater flexibility for customers by offering more deployment options. As a result, Zilliz Cloud is now the first fully managed vector database available on AWS in the APAC region. The company invites users to explore new possibilities with this expansion and offers free trials of its Starter Plan and Standard plan with up to $200 worth of credits.
Retrieval Augmented Generation with Citations
Date published
Aug. 4, 2023
Author(s)
Yujian Tang
Language
English
Word count
1209
Hacker News points
2
This tutorial explains how to implement retrieval augmented generation (RAG) with citations using LlamaIndex and Milvus. RAG is a technique used in large language model (LLM) applications to supplement their knowledge, addressing the lack of up-to-date or domain-specific information. The process involves using a vector database like Milvus to inject knowledge into an app. Citations and attributions are crucial for determining trustworthy answers as more data is added. LlamaIndex and Milvus can be used together to create a citation query engine, allowing users to retrieve information with citations or attributions. The tutorial demonstrates this process using Python libraries and provides code examples for scraping data from Wikipedia, setting up the vector store in LlamaIndex, and querying the engine with citations.
What Is a Real Vector Database?
Date published
Aug. 3, 2023
Author(s)
Fendy Feng
Language
English
Word count
1135
Hacker News points
None found.
The emergence of ChatGPT has signaled the start of a new era in artificial intelligence (AI), with vector databases becoming an essential infrastructure. Vector databases store and retrieve unstructured data such as images, audio, videos, and text through high dimensional values called embeddings. They are frequently used for similarity searches using the Approximate Nearest Neighbor (ANN) algorithm. Specialized vector databases like Milvus and Zilliz Cloud offer many user-friendly features and are a more optimal solution for unstructured data storage and retrieval compared to vector search libraries. Vector databases are becoming vital infrastructure for AI-related tech stacks, such as LLM augmentation, recommender systems, image/audio/video/text similarity searches, anomaly detection, question-answering systems, and molecular similarity searches. To choose the most suitable vector database for your project, VectorDBBench is an open-source benchmarking tool that evaluates various vector database systems regarding QPS, latency, capacity, and other metrics.
Zilliz Cloud: a Fully-Managed Vector Database That Minimizes Users’ Costs for Building AI Apps
Date published
Aug. 1, 2023
Author(s)
James Luan
Language
English
Word count
1037
Hacker News points
None found.
Zilliz Cloud, a fully-managed vector database, aims to minimize users' costs for building AI applications. The latest release of Zilliz Cloud offers new features such as partition key, dynamic schema, and JSON support, making it more accessible and affordable for developers. By minimizing development, hardware, and maintenance costs, Zilliz Cloud enables traditional companies and startups to create innovative AI applications. Future updates will introduce unstructured data processing pipelines, support for complex aggregation functions, and global expansion of services.
Getting Started With the Milvus JavaScript Client
Date published
July 28, 2023
Author(s)
Eric Goebelbecker
Language
English
Word count
1833
Hacker News points
None found.
Milvus is an open-source database designed for vector search, offering robust scalability for various loads. It's ideal for machine learning deployments and includes best-of-class tooling like the JavaScript client. In this tutorial, we'll guide you through setting up a development environment with Milvus Lite and the Milvus node.js SDK(Client). We'll cover connecting to a server, creating databases and collections, inserting data, performing queries and searches, and more. With these tools, working with vector data in JavaScript using Milvus becomes simple and efficient.
Breaking Barriers: Democratizing Access to Vector Databases for All
Date published
July 27, 2023
Author(s)
Fendy Feng
Language
English
Word count
1340
Hacker News points
None found.
Vector databases, crucial infrastructure for AI applications and large language models (LLMs), have gained widespread attention from a broader user base. Unlike traditional relational or NoSQL databases that store structured data, vector databases are purpose-built to store and manage unstructured data in numeric representations called embeddings. They enable similarity searches using the approximate nearest neighbor (ANN) algorithm, making them valuable for various use cases such as recommender systems, anomaly detection, and question-and-answer systems. The democratization of vector databases is essential to make progress in AI technology. However, only some developers have equal access due to barriers like proprietary technology, complex architecture and deployment, high costs, and poor user experience. To improve vector database democratization, it's crucial to evangelize knowledge, expertise, and technologies; open the source code to all developers; provide fully managed vector database services; offer free cloud options for individual developers and small teams; and prioritize a great user experience that meets users' needs. Choosing the right vector database for your project can be challenging due to the many available options. VectorDBBench, an open-source benchmarking tool, thoroughly evaluates and compares different vector database systems based on critical metrics such as queries per second (QPS), latency, throughput, and capacity.
Yujian Tang: Why I Joined Zilliz as Developer Advocate
Date published
July 26, 2023
Author(s)
Yujian Tang
Language
English
Word count
706
Hacker News points
None found.
Yujian Tang, a developer advocate at Zilliz, has an extensive background in computer science, statistics, and neuroscience. He previously worked as a software engineer at Amazon and researched machine learning. Tang chose to join Zilliz due to their focus on vector databases and the company's commitment to open-source ethos. As a developer advocate, he works with cutting-edge AI technologies and hosts meetups and conferences. He encourages others interested in DevRel roles to consider joining Zilliz.
Getting Started with the Zilliz REST API
Date published
July 25, 2023
Author(s)
Eric Goebelbecker
Language
English
Word count
1752
Hacker News points
None found.
Zilliz Cloud is a comprehensive vector database service that accelerates AI and analytics applications at scale. It's built on Milvus, an open-source vector database capable of handling billions of vector embeddings. The use cases for Milvus and Zilliz Cloud are broad and varied, including powering recommendation systems and building AI models in healthcare. The Zilliz REST API provides methods for managing clusters, collections, and vector data, allowing users to create, list, describe, drop, insert, delete, query, and search collections.
Zilliz Cloud: Igniting Vector Searching with Rocket-Like Speed
Date published
July 19, 2023
Author(s)
Li Liu
Language
English
Word count
978
Hacker News points
None found.
Zilliz recently launched an updated version of its cloud platform, introducing new features such as a free tier, dynamic schema and partition keys, and more affordable pricing plans. The latest update has significantly improved performance, making it twice as fast as the previous version and three to ten times faster than other vector databases like Milvus. Zilliz Cloud's speed is attributed to its robust vector indexing engine, optimized code structure, and AutoIndex feature for stable recall rates.
Frank Liu: Why I Joined a Vector Database Company
Date published
July 18, 2023
Author(s)
Frank Liu
Language
English
Word count
928
Hacker News points
None found.
The text discusses the importance of machine learning models and their corresponding embeddings, which are high-dimensional vectors that provide an abstract way to represent input data in the model. It explains how embeddings have been used in various applications such as image recognition and semantic search. The author shares their personal journey working with embeddings and vector search, highlighting their experiences at Yahoo and a startup they founded. They also discuss Zilliz's mission to build an affordable and scalable vector search solution for the enterprise AI infrastructure market. The text ends by inviting readers to join Zilliz in its efforts to democratize enterprise AI infrastructure.
What's New in Milvus 2.2.10 and 2.2.11
Date published
July 14, 2023
Author(s)
Steffi Li
Language
English
Word count
291
Hacker News points
None found.
Milvus has released versions 2.2.10 and 2.2.11, which include enhancements to improve functionality and user experience. Updates have been made based on community feedback, with a focus on performance and security improvements. The latest versions introduce the 'FlushAll' function and Database API for RBAC capabilities, optimize disk usage for RocksMq by enabling zstd compression, and replace CGO payload writer with Go payload writer to reduce memory usage. Additionally, several bug fixes and performance enhancements have been made in these releases.
Democratizing Vector Databases: Empowering Access & Equality
Date published
July 12, 2023
Author(s)
Yujian Tang
Language
English
Word count
1040
Hacker News points
None found.
The democratization of technology refers to making it widely available and accessible, particularly in the context of software engineering. This involves using one's knowledge to simplify the creation, adoption, and understanding of technological advances for others. In this article, the author discusses the process of democratizing vector databases, which are complex tools that have traditionally only been available to developers at large enterprises. The author highlights three pillars of technology democratization: education, increasing accessibility, and evangelism. By open-sourcing projects like Milvus, providing educational resources, and offering free tiers for cloud services, companies can help expand the adoption of vector databases and other advanced technologies.
Filip Haltmayer: Why I Joined Zilliz as Software Engineer
Date published
July 10, 2023
Author(s)
Filip Haltmayer
Language
English
Word count
701
Hacker News points
None found.
Filip Haltmayer, a software engineer at Zilliz in Redwood City, California, shares his journey into the company that leads in AI and vector search technology. His passion for software engineering led him to focus on distributed systems and machine learning during university. After graduation, he worked on personal projects in these areas before joining Zilliz. The technical interview with Zilliz aligned with his interests, and he was impressed by the team's intelligence and shared passion for pushing boundaries in vector search technology. Two years later, Haltmayer remains happy at Zilliz as it continues to grow and contribute significantly to the field of AI and vector searching.
Getting Started with PyMilvus
Date published
July 7, 2023
Author(s)
Eric Goebelbecker
Language
English
Word count
1806
Hacker News points
None found.
Milvus, an open-source vector database, paired with PyMilvus - its Python SDK, is a powerful tool for handling large data sets and performing advanced computations and searches. This tutorial guides you in installing and setting up a development environment for using Milvus and PyMilvus. It then walks through example code for analyzing audio files, storing their data in Milvus, and using it to compare audio samples for similarities. The setup includes creating a virtual environment, installing Python dependencies, starting Redis, and installing and starting Milvus Lite. Finally, the tutorial demonstrates how to connect to Redis and Milvus, create a collection, store audio data, and search for similarities.
Setting Up With Facebook AI Similarity Search (FAISS)
Date published
July 4, 2023
Author(s)
Keshav Malik
Language
English
Word count
2231
Hacker News points
None found.
Facebook's AI Similarity Search (FAISS) is a library that provides efficient and reliable solutions to similarity search problems, especially when dealing with large-scale data. It functions on the concept of "vector similarity" and can handle millions or even billions of vectors quickly and accurately. FAISS has various applications, from image recognition and text retrieval to clustering and data analysis. To set up FAISS, you need Conda installed on your system. Once installed, FAISS can be used for tasks such as searching for similar text data in the Stanford Question Answering Dataset (SQuAD). Best practices include understanding your data, choosing the right index, preprocessing your data effectively, batching your queries, and tuning your parameters. Compared to FAISS, purpose-built vector databases like Milvus offer more advanced capabilities for scalable similarity search and AI applications.
Webinar Recap: Retrieval Techniques for Accessing the Most Relevant Context for LLM Applications
Date published
July 3, 2023
Author(s)
Fendy Feng
Language
English
Word count
1635
Hacker News points
None found.
In a recent webinar, Harrison Chase and Filip Haltmayer discussed retrieval techniques for accessing the most relevant context for large language model (LLM) applications. Retrieval involves extracting information from connected external sources and incorporating it into queries to provide context. Semantic search is one of the most critical use cases for retrieval, which functions within a typical CVP architecture (ChatGPT+Vector store+Prompt as code). The webinar also covered edge cases of semantic searches, such as repeated information, conflicting information, temporality, metadata querying, and multi-hop questions. Various solutions to these challenges were proposed during the discussion.
How to Select the Most Appropriate CU Type and Size for Your Business?
Date published
June 30, 2023
Author(s)
Robert Guo
Language
English
Word count
1164
Hacker News points
None found.
Zilliz Cloud offers three types of Compute Units (CUs) - Performance-optimized, Capacity-optimized, and Cost-optimized. The Performance-optimized CU is ideal for rapid response time applications with high throughput requirements such as Generative AI, Recommender systems, Search engines, Chatbots, Content moderation, Augmenting LLMs' knowledge base, and Anti-fraud systems. Capacity-optimized CUs are suitable for handling large-scale unstructured data searches like text, images, videos, and molecular structures, copyright violations detection, and identity verification. Cost-optimized CUs are perfect for offline tasks with a tight budget but higher search latency. The performance comparison shows that the Performance-optimized CU outperforms others in terms of latency and throughput. Capacity evaluation results indicate that capacity-optimized and cost-optimized CUs have equal capacities, five times larger than the performance-optimized CU. Examples are provided to help businesses choose the most suitable option for their needs.
Persistent Vector Storage for LlamaIndex
Date published
June 27, 2023
Author(s)
Yujian Tang
Language
English
Word count
1040
Hacker News points
None found.
This article discusses the challenges and solutions in building applications using large language models (LLMs) such as OpenAI's ChatGPT. The three main challenges are high costs, lack of up-to-date information, and need for domain-specific knowledge. Two proposed frameworks to address these issues are fine-tuning and caching + injection. LlamaIndex is a powerful tool that can abstract much of the latter framework. The article introduces LlamaIndex as a "black box around your Data and an LLM" and explains its four main indexing patterns: list, vector store, tree, and keyword indices. It then demonstrates how to create and save a persistent vector index using LlamaIndex with both local and cloud vector databases (Milvus Lite and Zilliz). In summary, the article provides an overview of LlamaIndex, its applications in LLM-based applications, and offers guidance on creating and managing persistent vector store indices for real-world use cases.
Enhancing ChatGPT's Intelligence and Efficiency: The Power of LangChain and Milvus
Date published
June 26, 2023
Author(s)
Silvia Chen
Language
English
Word count
2219
Hacker News points
None found.
The combination of LangChain and Milvus can enhance ChatGPT's intelligence and efficiency by harnessing vector stores' power. LangChain is a framework for developing applications powered by language models, while Milvus is a vector database that enables semantic search functionality. By integrating these tools, developers can create more reliable AI-Generated Content (AIGC) applications and address hallucination problems in ChatGPT. Additionally, using GPTCache and fine-tuning embedding models and prompts can improve the performance and search quality of AIGC applications.
The Philosophy Behind Zilliz Cloud’s Product Experience Optimization
The latest version of Zilliz Cloud introduces design optimizations to improve the product experience. Key updates include prioritizing ease of use, streamlining workflows with clear guidance, valuing user feedback, ensuring visually enjoyable experiences, and offering a smooth user journey. These enhancements aim to provide users with an intuitive interface and seamless navigation while using Zilliz Cloud's vector retrieval capabilities. The company encourages users to share their suggestions or ideas for further improvements through the support portal, LinkedIn, Twitter, or by contacting engineers directly.
Query Multiple Documents Using LlamaIndex, LangChain, and Milvus
Date published
June 19, 2023
Author(s)
Yujian Tang
Language
English
Word count
1974
Hacker News points
None found.
This tutorial demonstrates how to use Large Language Models (LLMs) like GPT in production by querying multiple documents using LlamaIndex, LangChain, and Milvus. The process involves setting up a Jupyter Notebook, building a Document Query Engine with LlamaIndex, starting the vector database, gathering documents, creating document indices in LlamaIndex, performing decomposable querying over your documents, comparing non-decomposed queries, and summarizing how to do multi-document querying using LlamaIndex. The use of decomposable queries allows for breaking down complex queries into simpler ones that can be answered by a single data source.
Improved Team Collaboration with Zilliz Cloud’s New Organizations and Roles Feature
Date published
June 16, 2023
Author(s)
Sarah Tang
Language
English
Word count
886
Hacker News points
None found.
Zilliz Cloud, a cloud service offering fast and scalable vector retrieval capabilities, has introduced the Organizations and Roles feature to simplify team access and permission management. The new feature includes three roles: Organization Owner, Organization Member, and Project Owner, each with unique access and permissions. This update aims to improve collaboration, security, and flexibility in users' workflows. To get started, users can sign up for a free account or log into their existing one, create an organization, invite new members, and manage billings collectively.
Introducing an Open Source Vector Database Benchmark Tool for Choosing the Ideal Vector Database for Your Project
The new open-source Vector Database Benchmark Tool is designed to help developers choose the ideal vector database for their projects. This tool enables users to measure performance across critical metrics and compare different options. Key features include flexibility, realistic workload simulation, interactive reports and visualization, and open-source community collaboration. VectorDBBench, written in Python, supports six vector databases: Milvus, Zilliz, Pinecone, Weaviate, Qdrant, and Elasticsearch. Users can download the tool from GitHub and install it using pip. The tool is actively maintained by a community of developers committed to improving its features and performance.
Zilliz Cloud Latest Update: A Game-Changer Bringing Elite Performance within Reach of All Developers
Date published
June 14, 2023
Author(s)
Robert Guo
Language
English
Word count
940
Hacker News points
1
Zilliz Cloud has released an update that introduces new features and more affordable pricing options, making it accessible to all developers regardless of budget. The latest release includes a free tier option with up to two collections handling 500,000 vectors each. Various pricing plans are available: Starter, Standard, Enterprise, and Self-hosted. A new Cost-Optimized CU offers the same storage capacity as the existing Capacity-Optimized CU but costs about 30% less. The Organizations and Roles feature enables users to manage team access and permissions easily. Zilliz Cloud now supports JSON data types, enabling users to store and manage JSON data alongside Approximate Nearest Neighbor (ANN) Search capabilities. Dynamic schema support is also available. A new benchmark tool, VectorDBBench, allows users to measure the performance of vector database solutions against other offerings in the market with their data.
Prompting in LangChain
Date published
June 12, 2023
Author(s)
Yujian Tang
Language
English
Word count
1472
Hacker News points
1
The recent emergence of Language Learning Models (LLMs) has introduced new tools, such as the LLM framework called LangChain. This versatile tool offers various features like different prompting methods, maintaining conversational context, and connecting to external tools. Prompting is a crucial task in building AI applications with LLMs, and this article extensively explores how to use LangChain for more complex prompts. The text covers: 1. Simple Prompts in LangChain: This section demonstrates the basic usage of LangChain prompting by creating a single prompt using the `PromptTemplate` object. It also explains how to add an LLM and create an `LLMChain`. 2. Multi Question Prompts: The article shows how to handle multiple questions within a single prompt using the same `PromptTemplate` object. 3. Few Shot Learning with LangChain Prompts: This section introduces "few shot learning," where users can teach AI how to behave by providing examples of desired responses. It demonstrates this feature using the `FewShotPromptTemplate`. 4. Token Limiting Your LangChain Prompts: To manage token usage and keep costs down, the article explains how to use the `LengthBasedExampleSelector` object to limit tokens in queries. 5. A Summary of Prompting in LangChain: The text concludes by summarizing the key points covered in the article about prompting with LangChain.
Auto GPT Explained: A Comprehensive Auto-GPT Guide For Your Unique Use Case
Date published
June 8, 2023
Author(s)
Yujian Tang
Language
English
Word count
1789
Hacker News points
None found.
Auto-GPT is an open-source, autonomous AI application that utilizes large language models (LLMs) to perform tasks such as browsing the internet, speaking via text-to-speech tools, writing code, and keeping track of its inputs and outputs. It has garnered significant attention due to its potential for automating mundane tasks and enhancing productivity. This article provides a comprehensive guide on setting up Auto-GPT, configuring it, running tasks, and adding memory using Milvus vector database. The integration of Milvus as a backend storage solution allows users to search, retrieve, or edit data more efficiently than the default JSON file format.
What's New in Milvus version 2.2.9
Date published
June 6, 2023
Author(s)
Chris Churilo
Language
English
Word count
357
Hacker News points
None found.
The Milvus community has released Milvus 2.2.9, which includes new features such as JSON support, dynamic schema handling, and partition key usage. Additionally, the update allows for more efficient resource utilization by removing the limit on the number of partitions. Bug fixes and performance enhancements are also included in this release. For a complete list of changes, check out the release notes.
Get Ready for GPT-4 with GPTCache & Milvus, Save Big on Multimodal AI
Date published
May 31, 2023
Author(s)
Jael Gu
Language
English
Word count
2734
Hacker News points
None found.
OpenAI's ChatGPT, powered by GPT-3.5, has revolutionized natural language processing (NLP) and sparked interest in large language models (LLMs). As the adoption of LLMs grows across various industries, so does the need for more advanced AI models that can process multimodal data. The tech world is buzzing with anticipation for GPT-4, which promises to be even more powerful by enabling visual inputs. To prepare for this upcoming revolution, Zilliz has introduced GPTCache integrated with Milvus - a game-changing solution that can help businesses save big on multimodal AI. Multimodal AI refers to integrating multiple modes of perception and communication, such as speech, vision, language, and gesture, to create more intelligent and effective AI systems. This approach allows AI models to better understand and interpret human interactions and environments and generate more accurate and nuanced responses. Multimodal AI has applications in various fields, including healthcare, education, entertainment, and transportation. GPTCache is a project developed to optimize response time and reduce expenses for API calls associated with large models. It enables the system to search for potential answers in the cache first before sending a request to a large model. GPTCache speeds up the entire process and helps reduce the costs of running large models. Semantic cache stores and retrieves knowledge representations of concepts. It is designed to store and retrieve semantic information or knowledge in a structured way. Thus, an AI system can better understand and respond to queries or requests. The idea behind a semantic cache is to provide faster access to relevant information by providing precomputed answers to commonly asked questions or queries, which can help improve the performance and efficiency of AI applications. One of the cornerstones of a semantic cache such as GPTCache is the vector database. Specifically, the embedding generator of GPTCache converts data to embeddings for vector storage and semantic search. Storing vectors in a vector database, such as Milvus, not only supports storage for a large data scale but also helps speed up and improve the performance of similarity search. This allows for more efficient retrieval of potential answers from the cache. The Milvus ecosystem provides helpful tools for database monitoring, data migration, and data size estimation. For more straightforward implementation and maintenance of Milvus, there is a cloud-native service Zilliz Cloud. The combination of Milvus with GPTCache offers a powerful solution for enhancing the functionality and performance of multimodal AI applications. Temperature in machine learning has become a valuable tool to balance randomness and coherence and align with the user's or application's specific needs and preferences. The temperature in GPTCache mainly retains the general concept of temperature in machine learning. It is achieved through 3 options in the workflow: 1. Select after evaluation 2. Call model without cache 3. Edit result from cache GPTCache and Milvus represent an exciting and innovative approach to building intelligent multimodal systems. The following examples showcase how GPTCache and Milvus have been implemented in multimodal situations: 1. Text-to-Image: Image Generation 2. Image-to-Text: Image Captioning 3. Audio-to-Text: Speech Transcription With its support for unstructured data, Milvus is an ideal solution for building and scaling multimodal applications. Furthermore, adding more features in GPTCache, such as session management, context awareness, and server support, further enhances the capabilities of multimodal AI. With these advancements, multimodal AI models have more potential uses and scenarios.
GPTCache, LangChain, Strong Alliance
Date published
May 25, 2023
Author(s)
Sim Fu
Language
English
Word count
710
Hacker News points
None found.
The GPTCache project aims to build a semantic cache for storing large language model (LLM) responses, addressing the challenges of increasing costs and slow response times associated with high traffic levels. LangChain is a library that assists in developing applications combining LLMs with other computational or knowledge sources. Before integrating GPTCache, LangChain's cache was based on string matching, including Memory Cache, SQLite Cache, and Redis Cache. The current condition for hitting the cache requires identical questions, which has limited cache utilization rate. Integration of GPTCache significantly improves cache functionality by performing embedding operations to obtain vectors and conducting vector approximation searches in cache storage. This increases the cache hit rate, reduces LLM usage costs, and speeds up response times.
Data Mastery Made Easy: Exploring the Magic of Vector Databases in Jupyter Notebooks
Date published
May 24, 2023
Author(s)
Yujian Tang
Language
English
Word count
908
Hacker News points
72
This tutorial explores the use of vector databases in Jupyter Notebooks, particularly Milvus Lite. Vector databases are useful for working with unstructured data like images, text, or video and can help solve problems faced by large language models (LLMs) such as a lack of domain knowledge and up-to-date data. They also power similarity search applications, product recommendations, reverse image search, and semantic text search. The tutorial covers the basics of vector databases, Milvus Lite, and how to use them in Jupyter Notebooks. It provides examples for using a standalone vector database instance like Milvus Standalone and offers resources for understanding vector databases further.
Ultimate Guide to Getting Started with LangChain
Date published
May 22, 2023
Author(s)
Yujian Tang
Language
English
Word count
1446
Hacker News points
1
LangChain is a framework that enables the creation of applications using large language models (LLMs) like GPT. It provides functionalities such as token management and context management, allowing users to build with the CVP Framework. The two core LangChain functionalities for LLMs are data-awareness and agency. One primary use case is querying text data, which can be done using documents, vector stores, or GPT interactions. In this tutorial, we covered how to interact with GPT using LangChain and queried a document for semantic meaning using LangChain with a vector store.
What is Pymilvus?
Date published
May 20, 2023
Author(s)
Filip Haltmayer
Language
English
Word count
1159
Hacker News points
None found.
Pymilvus is a Python SDK built for Milvus and Zilliz Cloud, offering access to all features provided by Milvus. However, users have faced issues with the complexity of configuration options available in the vector database system. To address this, MilvusClient was introduced as an attempt to simplify the API for most users. It offers functions such as insert_data(), upsert_data(), search_data(), query_data(), get_vectors_by_pk(), delete_by_pk(), add_partition(), and remove_partition(). The main goal of MilvusClient is to provide easy-to-use operations that may not exist or are unoptimized on the Pymilvus side. As Pymilvus improves, these operations can be optimized behind the scenes while maintaining a simple API for users.
Using a Vector Database to Search White House Speeches
Date published
May 19, 2023
Author(s)
Yujian Tang
Language
English
Word count
1967
Hacker News points
None found.
This tutorial demonstrates how to use semantic search with a vector database to analyze speeches given by the Biden administration during their first two years in office. The dataset used is "The White House (Speeches and Remarks) 12/10/2022" found on Kaggle. The process involves cleaning the data, setting up a vector database using Milvus Lite, getting vector embeddings from speeches, populating the vector database, and performing semantic searches based on descriptions. Semantic search allows for finding speeches with similar content rather than just matching exact phrases or sentences.
Getting Started with LlamaIndex
Date published
May 17, 2023
Author(s)
Yujian Tang
Language
English
Word count
1793
Hacker News points
2
LlamaIndex is a user-friendly, flexible data framework that connects private, customized data sources to large language models (LLMs). It helps address LLMs' lack of domain-specific knowledge by injecting data. The indexes in LlamaIndex include list index, vector store index, tree index, and keyword index. Each index is made up of "nodes" that represent a chunk of text from a document. LlamaIndex can build many types of indexes depending on the task at hand. It offers an efficient way to query large amounts of data for certain keywords or introduces similarity into LLM applications. The Basics of How to Use LlamaIndex section covers loading a text file, querying the vector store index, and saving and loading an index. Projects that can be created with LlamaIndex include chatbots, web apps, and more.
Revolutionizing Autonomous AI: Harnessing Vector Databases to Empower Auto-GPT
Date published
May 16, 2023
Author(s)
Sim Fu
Language
English
Word count
1019
Hacker News points
None found.
Auto-GPT is an experimental open-source project that combines a GPT language model with other tools to create an AI system capable of working independently without human intervention. It consists of two core parts: an LLM and a command set, which function as its "brain" and "hands" respectively. However, Auto-GPT has limitations in understanding and retaining extensive contextual information due to the token limit of the GPT model it leverages. Integrating Auto-GPT with a vector database like Milvus can enhance its memory and contextual understanding by converting commands and execution results into embeddings and storing them in the vector database. This integration allows for more precise information retrieval, improving the system's ability to generate aligned commands. Despite some limitations, such as unfiltered top-k results and inability to customize the embedding model, Auto-GPT has immense potential when combined with vector databases like Milvus, pushing the boundaries of AI technology and AIGC systems.
Webinar Recap: Boost Your LLM with Private Data Using LlamaIndex
Date published
May 15, 2023
Author(s)
Fendy Feng
Language
English
Word count
1267
Hacker News points
None found.
The popularity of large language models (LLMs) like ChatGPT has demonstrated their capabilities in generating knowledge and reasoning. However, these LLMs are pre-trained on publicly available data, which may not provide specific answers and results relevant to a business. LlamaIndex is one solution that can augment LLMs with private data by providing a simple, flexible, centralized interface connecting external data and LLMs. In a recent webinar, Jerry Liu, Co-founder and CEO of LlamaIndex, discussed how LlamaIndex could boost LLMs with private data. Two methods to enhance LLMs with private data were presented: fine-tuning and in-context learning. Fine-tuning requires retraining the network with private data but can be costly and lack transparency. In contrast, in-context learning involves pairing a pre-trained model with external knowledge and a retrieval model to add context to the input prompt. LlamaIndex is an open-source tool that provides central data management and query interface for LLM applications. It contains three main components: data connectors for ingesting data from various sources, data indices for structuring data for different use cases, and a query interface for inputting prompts and receiving knowledge-augmented output. LlamaIndex also manages interactions between the language model and private data to provide accurate and desired results. It operates like a black box, taking in detailed query descriptions and providing rich responses that include references and actions. The vector store index is a popular mode of retrieval and synthesis that pairs a vector store with a language model. LlamaIndex provides numerous integrations, including the integration of Milvus and LlamaIndex. Milvus is an open-source vector database capable of handling vast datasets containing millions, billions, or even trillions of vectors. With this integration, Milvus acts as the backend vector store for embeddings and text. LlamaIndex has various use cases, including semantic search, summarization, text to SQL (structured data), synthesis over heterogeneous data, compare/contrast queries, multi-step queries, exploiting temporal relationships, and recency filtering/outdated nodes.
Zilliz Cloud: a New Level of Usability and Performance
Date published
May 4, 2023
Author(s)
Sarah Tang
Language
English
Word count
614
Hacker News points
None found.
Zilliz Cloud has released an update that introduces six new features and enhancements, aiming to provide a more robust and cost-effective platform with an enhanced user experience. The latest release includes the Pricing Calculator for better cost estimates, improved system resiliency with data backup and restore on GCP, removal of storage quota for optimal user experience, automatic suspension of inactive databases for credit saving, custom timezone support for more accurate timestamps, and collection renaming for easier database management. Other improvements include a better billing interface, renamed CU types, and additional features to assist users in getting started with Zilliz Cloud.
Milvus 2.2.6: New Features and Updates
Date published
April 28, 2023
Author(s)
Chris Churilo
Language
English
Word count
93
Hacker News points
None found.
Milvus version 2.2.6 has been released with critical issues addressed from version 2.2.5. The new release includes bug fixes and performance enhancements, as detailed in the release notes. Users are advised to upgrade to this version for improved functionality. Key resources include PyPI package, documentation, Docker image, and GitHub release page.
The Fight for AI Supremacy
Date published
April 25, 2023
Author(s)
Filip Haltmayer
Language
English
Word count
1211
Hacker News points
None found.
LangChain is a framework designed to enhance the capabilities of Large Language Models (LLMs) by enabling users to chain together different computations and knowledge. It allows for the creation of domain-specific chatbots, action agents for specific computation, and more. Milvus, an open-source vector database, plays a crucial role in LangChain's integration as it enables efficient storage and retrieval of large documents or collections of documents. The integration involves extending the VectorStore class to implement functions such as add_texts(), similarity_search(), max_marginal_relevance_search(), and from_text(). However, challenges arise due to Milvus' inability to handle JSON natively, which may require additional work when dealing with existing collections or inserting data. Overall, LangChain offers a promising solution for improving LLMs' usefulness by providing working memory and knowledge base integration.
Yet another cache, but for ChatGPT
Date published
April 11, 2023
Author(s)
James Luan
Language
English
Word count
1949
Hacker News points
2
ChatGPT is an impressive technology that enables developers to create game-changing applications. However, the performance and cost of language model models (LLMs) are significant issues that hinder their widespread application in various fields. To address this issue, a cache layer called GPTCache was developed for LLM-generated responses. This caching layer is similar to Redis and Memcache and can decrease expenses for generating content and provide faster real-time responses. With the help of GPTCache, developers can boost their LLM applications 100 times faster. The cache system reduces the number of ChatGPT calls by taking advantage of temporal and spatial locality in user access for AIGC applications.
Caching LLM Queries for performance & cost improvements
Date published
April 10, 2023
Author(s)
Chris Churilo
Language
English
Word count
1079
Hacker News points
None found.
GPTCache is an open-source semantic cache designed to improve the efficiency and speed of GPT-based applications by storing responses generated by language models. It allows users to customize the cache according to their needs, including options for embedding functions, similarity evaluation functions, storage location, and eviction policy management. The tool supports multiple popular databases for cache storage and provides a range of vector store options for finding the most similar requests based on extracted embeddings from input requests. GPTCache aims to provide flexibility and cater to a wider range of use cases by supporting multiple APIs and vector stores.
New Support for Backup and Restore of Zilliz Cloud Databases
Date published
April 7, 2023
Author(s)
Sarah Tang
Language
English
Word count
479
Hacker News points
None found.
Zilliz Cloud has introduced Backup and Restore feature, allowing users to easily back up their important data and restore it in case of an unexpected loss. The new feature provides easy-to-use interface, automated backups, secure storage, and efficiency for big data. Access to this feature is limited to enterprise users subscribing to the Zilliz Cloud Enterprise plan. Future plans include flexible recovery options, point-in-time recovery, and cross-region backup. Backup & Restore are now available for Zilliz Cloud Enterprise Plan with pricing based on usage: $0.025 GB and free storage retention within 30 days.
Accelerate your migration experience from Milvus to Zilliz Cloud
Date published
April 6, 2023
Author(s)
Sarah Tang
Language
English
Word count
398
Hacker News points
None found.
Zilliz has introduced a new migration feature that allows customers to seamlessly move their local Milvus database to the fully managed cloud service, Zilliz Cloud. This feature ensures data safety and security during the migration process. The migration tool is free for Enterprise users subscribed to the Zilliz Cloud enterprise plan. Users can start a 30-day free trial with $100 worth of credit to explore the new features in Zilliz Cloud.
Zilliz Cloud Expands with Multi-Cloud Support
Date published
April 5, 2023
Author(s)
Emily Kurze
Language
English
Word count
483
Hacker News points
None found.
Zilliz Cloud, a vector database-as-a-service, is designed to help developers focus on creating innovative AI applications by handling the infrastructure and storage of embeddings. The platform offers multi-cloud and multi-region availability, currently supporting AWS and Google Cloud with plans for future expansions. Users can quickly scale their vector search storage capacity without re-provisioning hardware and benefit from streamlined procurement, consolidated billing, and leveraging pre-committed AWS spend through the AWS Marketplace. Zilliz Cloud is also available on Google Cloud as an official partner, with more regions and cloud providers planned for future releases to support developers' diverse needs.
ChatGPT+ Vector database + prompt-as-code - The CVP Stack
Date published
April 4, 2023
Author(s)
James Luan
Language
English
Word count
1242
Hacker News points
1
Zilliz has introduced OSS Chat, a chatbot designed to provide technical knowledge about open-source projects. Built using OpenAI's ChatGPT and a vector database, the service currently supports Hugging Face, Pytorch, and Milvus but plans to expand to more projects in the future. The new AI stack, called CVP Stack (ChatGPT+Vector database+prompt-as-code), is aimed at overcoming ChatGPT's limitations by using a vector database for accurate information retrieval. OSS Chat demonstrates this approach by leveraging GitHub repositories and their associated docs pages as the source of truth, converting data into embeddings, and storing them in Zilliz. When users interact with OSS Chat, it triggers a similarity search in Zilliz to find relevant matches and feeds the retrieved data into ChatGPT for precise responses.
Zilliz Cloud, the new billion-scale offering
Date published
April 4, 2023
Author(s)
Robert Guo
Language
English
Word count
970
Hacker News points
None found.
Zilliz has announced the general availability of an update to its cloud vector database service, raising the standard for usability, security, performance, and capability. The latest version supports billion-scale vector collections and offers a 2.5x reduction in search latency compared to the original release. Additionally, Zilliz Cloud is now available on Google Cloud Platform (GCP) and AWS Marketplace. New features include rolling upgrades, backup and restore functionality, recycler bin for data security, and database migration toolkits from open-source Milvus.
What's New in Milvus version 2.2.5
Date published
March 30, 2023
Author(s)
Chris Churilo
Language
English
Word count
233
Hacker News points
None found.
Milvus, an open-source vector database, has released version 2.2.5 with new features and improvements. Key updates include a security fix for MinIO (MinIO CVE-2023-28432) by updating to the latest release, and the addition of a First/Random replica selection policy that selects replicas in a round-robin fashion, improving throughput. The release also includes bug fixes and performance enhancements. For more information, check out the release notes or download Milvus to get started.
ChatGPT retrieval plugin with Zilliz and Milvus
Date published
March 23, 2023
Author(s)
Filip Haltmayer
Language
English
Word count
811
Hacker News points
None found.
OpenAI has open-sourced the code for a knowledge base retrieval plugin, allowing ChatGPT to augment its information by retrieving knowledge-based data from relevant document snippets. The plugin uses OpenAI's text-embedding-ada-002 embeddings model and stores the embeddings into a vector database like Milvus or Zilliz. Enterprises can benefit from this plugin by making their internal documents available to employees through ChatGPT, ensuring accurate and up-to-date information retrieval. The plugin also supports continuous processing and storage of documents from various data sources using incoming webhooks. Additionally, the memory feature allows ChatGPT to remember information from conversations and store it in a vector database for later use.
Milvus support for multiple Index types
Date published
March 23, 2023
Author(s)
Chris Churilo
Language
English
Word count
690
Hacker News points
None found.
Milvus is an open-source vector database that supports eight types of Indexes to optimize data querying and retrieval. These include FLAT, IVF_FLAT, IV_SQ8, HNSW Index, IVF_PQ, ANNOY, BIN_FLAT, and BIN_IVF_FLAT. Each Index type is best suited for specific scenarios based on factors such as data dimensions, dataset size, search efficiency requirements, and available resources. Choosing the right Index type can significantly improve search performance in AI applications.
What’s New In Milvus 2.3 Beta - 10X faster with GPUs
Date published
March 21, 2023
Author(s)
Chris Churilo
Language
English
Word count
774
Hacker News points
4
The Beta release of Milvus 2.3 introduces new features and improvements aimed at boosting the performance of AI-powered applications. Key features include support for GPU acceleration, RAFT-based integration, range search capabilities, mmap file I/O, incremental backups, and change data capture (CDC). These enhancements enable faster and more efficient vector data searches, improved productivity, and better overall performance of AI systems. The release also includes bug fixes and improvements for a smoother user experience.
Milvus Performance Evaluation 2023
Date published
March 17, 2023
Author(s)
Chris Churilo
Language
English
Word count
554
Hacker News points
None found.
Developers often ask how Milvus compares to previous versions for embedding workloads, with concerns about performance degradation. Benchmarks conducted on Milvus v2.2.3 vs. v2.2.0 and v2.0.0 show that the latest version significantly improves search and indexing speeds. Specifically, Milvus 2.2.3 achieved a 2.5x reduction in search latency compared to the original Milvus 2.0.0 release and a 4.5x increase in QPS. The performance evaluation technical paper provides detailed methodology and results. Periodic re-running of benchmarks will update the findings, with all code available on Github for further verification or suggestions.
What’s New In Milvus 2.2.4
Date published
March 17, 2023
Author(s)
Chris Churilo
Language
English
Word count
319
Hacker News points
None found.
Milvus 2.2.4 has been released, featuring resource grouping for QueryNodes to improve performance and better manage resources in multi-tenant scenarios. Additionally, enhancements include collection renaming, Google Cloud Storage support, and a new option (ignore_growing) for search and query APIs. The release also includes bug fixes and performance improvements. For more information, check the release notes or download Milvus to get started.
How Zilliz Cloud Protects Your Data
Date published
March 9, 2023
Author(s)
Frank Liu
Language
English
Word count
1006
Hacker News points
None found.
The text discusses the importance of data protection, security, and availability when moving vector search workloads to the cloud. It highlights three pillars of information security - confidentiality, integrity, and availability. The text also mentions common data management mistakes and how Zilliz Cloud offers features to protect users' data and services by ensuring confidentiality, integrity, and availability.
What’s New In Milvus 2.2.3
Date published
Feb. 27, 2023
Author(s)
Chris Churilo
Language
English
Word count
253
Hacker News points
None found.
Milvus, an open-source vector database, has released version 2.2.3 with new features and improvements. The release includes Rolling Upgrade support for minimizing service disruptions during upgrades and Coordinator High Availability (HA) to ensure quick failure recovery times. Additionally, enhancements have been made to bulk-insert, memory usage reduction, monitoring metrics optimization, and Meta storage performance. However, a breaking change has reduced the maximum number of fields in a collection from 256 to 64. The release also includes bug fixes and improvements.
How to Integrate OpenAI Embedding API with Zilliz Cloud
Date published
Jan. 11, 2023
Author(s)
Frank Liu
Language
English
Word count
562
Hacker News points
None found.
In 2018, Zilliz developed Milvus, a vector database designed to enhance search and storage capabilities. The initial focus was on improving the user experience, reliability, performance, and scalability of the platform. As a result, the Milvus community has grown significantly in terms of users, contributors, and stars (nearing 15000). Recently, the community emphasized the need to expand the vector database ecosystem by incorporating visualizations, tools, connectors, etc., with embedding model integrations being one of the most requested features. To address this demand, Zilliz will provide integration examples for Milvus and Zilliz Cloud with open-source or paid embedding models. Additionally, they have launched Towhee, a project that integrates hundreds of open-source models, embedding APIs, and in-house models to create end-to-end search pipelines backed by Milvus or Zilliz Cloud. The company plans to continue its support for the Milvus project while also focusing on integration and partnerships with the broader machine learning ecosystem.
2022
The Next Stop for Vector Databases: 8 Predictions for 2023
Date published
Dec. 9, 2022
Author(s)
James Luan
Language
English
Word count
1550
Hacker News points
None found.
In 2022, there was significant growth in the field of vector databases with multiple open-source products and cloud-based services emerging. This trend is expected to continue into 2023 as capital markets invest in these technologies. Key predictions for 2023 include differentiation and specialization among vector databases, a move towards a unified query interface, further integration of vector databases with traditional ones, significant cost reduction in vector databases, the emergence of the first serverless vector database, rise of open-source tools for vector databases, early adoption of AI for Database (AI4DB) in vector databases, and the second commercial company emerging from open-source Milvus. These developments indicate a promising year for vector databases, making them more cost-effective and efficient.
All You Need to Know About ANN Machine Learning
An Artificial Neural Network (ANN) is a machine learning model inspired by the structure and functions of the human brain. It consists of an input layer, several hidden layers, and an output layer. The most common types of ANNs include feed forward neural networks, convolutional neural networks, and recurrent neural networks. Applications of ANNs span across various industries such as speech recognition, image recognition, text classification, forecasting, and social media analysis. While ANNs offer numerous advantages like parallel processing capability and wide applications, they also face challenges such as scalability, testing, verification, and integration into modern environments. Vector databases are crucial for managing massive embedding vectors generated by deep neural networks and other machine learning models, which can be stored in a vector database offered by Zilliz.
Understanding K-means Clustering in Machine Learning
Date published
Oct. 26, 2022
Author(s)
Zilliz
Language
English
Word count
2219
Hacker News points
None found.
K-means clustering is an unsupervised machine learning algorithm that groups objects based on attributes. It is widely used in various industries, such as customer segmentation, recommendation engines, and similarity search. The algorithm works by calculating the distance of each data element from the geometric center of a cluster and reconfiguring the cluster if it finds a point belonging to a specific cluster closer to the centroid of another cluster. K-means clustering is useful in areas such as image processing, information retrieval, recommendation engines, and data compression. The number of clusters can be chosen using methods like the elbow method or the silhouette method. Zilliz offers a one-stop solution for challenges in handling unstructured data, especially for enterprises that build AI/ML applications that leverage vector similarity search.
What is K-Nearest Neighbors (KNN) Algorithm in Machine Learning? An Essential Guide
The K-Nearest Neighbors (KNN) algorithm is a supervised machine learning technique used for classification and regression problems. It is categorized as a lazy learner, meaning it only stores the training dataset without going through a training stage. KNN works by estimating the likelihood that an unobserved data point will belong to one of two groups based on its nearest neighbors in the dataset. The algorithm uses a voting mechanism where the class with the most votes is assigned to the relevant data point. Different distance metrics can be used to determine whether or not a data point is a neighbor, such as Euclidean, Manhattan, Hamming, Cosine, Jaccard, and Minkowski distances. KNN can be improved by normalizing data on the same scale, tuning hyperparameters like K and distance metric, and using techniques like cross-validation to test different values of K. The algorithm is time-efficient, simple to tune, and easily adaptable to multi-class problems but may not perform well with high-dimensional or unbalanced data.
From Text to Image: Fundamentals of CLIP
Date published
Oct. 4, 2022
Author(s)
Rentong Guo
Language
English
Word count
1508
Hacker News points
None found.
This blog introduces the fundamentals of CLIP, an advanced text-to-image service developed by OpenAI. It explains how search algorithms and semantic similarity are used to match texts with images. The process involves mapping the semantics of texts and images into a high-dimensional space where vectors representing similar semantics have small distances between them. A typical text-to-image service consists of three parts: request side (texts), search algorithm, and underlying databases (images). CLIP helps in creating a unified semantic space for both texts and images, enabling efficient cross-modal search. The next article will demonstrate how to build a prototype text-to-image service using these concepts.
Anatomy of A Cloud Native Vector Database Management System
Date published
Sept. 15, 2022
Author(s)
Xiaomeng Yi
Language
English
Word count
3051
Hacker News points
None found.
The paper "Manu: A Cloud Native Vector Database Management System" discusses the design philosophy and principles behind Manu, a cloud native database purpose built for vector data management. The authors identify four common business requirements for vector databases that are difficult to address under the initial framework: ever-changing requirements, flexible consistency policy, component-level elasticity, and simpler transaction processing model. To meet these needs, they propose five broad objectives for Manu: long-term evolvability, tunable consistency, good elasticity, high availability, and high performance. The paper then delves into the architecture of Manu, which adopts a four-layer design that enables decoupling of read from write, stateless from stateful, and storage from computing. It also explains the data processing workflow inside Manu, including data insertion, index building, and query execution. The authors conduct an overall system performance evaluation and compare Manu with other vector search systems in terms of query performance. They conclude by discussing future directions for research into cloud-native vector database management systems.
ArXiv Scientific Papers Vector Similarity Search with Milvus 2.1
Date published
Aug. 9, 2022
Author(s)
Marie Stephen Leo
Language
English
Word count
3034
Hacker News points
2
In this post, the author demonstrates how to build a semantic similarity search engine for scientific papers using open-source tools like ArXiv, Dask, sentence-transformers, and Milvus vector database. The process involves setting up an environment, downloading the arXiv dataset from Kaggle, loading data into Python using Dask, implementing a scientific paper semantic similarity search application using Milvus vector database, and running queries to find similar papers. This approach can be used as a template for building any NLP semantic similarity search engine, not just scientific papers. The author also provides an overview of the SPECTRE model, which is used to convert texts into embeddings.
Introducing Zilliz Cloud : Fully-managed Vector Database Cloud Service in Preview
Date published
Aug. 3, 2022
Author(s)
Zilliz
Language
English
Word count
497
Hacker News points
None found.
Zilliz Cloud, a fully-managed vector database cloud service built around Milvus, has been launched in preview mode for early access application. The service is designed to manage and process feature vectors at scale and in real-time, addressing the needs of modern AI algorithms that represent the deep semantics of unstructured data with feature vectors. Zilliz Cloud supports much-desired Milvus features while relieving users from managing their own data infrastructure. The service is designed for enterprise-level AI development and offers a fully-managed experience, high performance, elastic deployment, and enterprise-level security. Currently in private preview, interested parties can apply for early access by filling out a form on the Zilliz website.
Podcast: Using AI to Supercharge Data-Driven Applications with Zilliz
Date published
June 16, 2022
Author(s)
Rosie Zhang
Language
English
Word count
260
Hacker News points
None found.
In the latest episode of That Digital Show, Frank Liu from Zilliz discusses how AI and machine learning are being used to extract value from unstructured data. Traditional databases struggle with handling large volumes of unstructured data, which makes up around 80% of the world's data. Zilliz is an open-source vector database that helps developers understand and analyze this type of data more effectively. The conversation also covers challenges in data operations, how databases have evolved to tackle these issues, and some interesting use cases for Milvus, a powerful vector search engine developed by Zilliz.
Visualize Reverse Image Search with Feder
Date published
May 25, 2022
Author(s)
Min Tian, transcreated by Angela Ni.
Language
English
Word count
1249
Hacker News points
None found.
Reverse image search is an application of vector search or approximate nearest neighbor search. In this process, indexes are built to accelerate the search on large datasets. This article discusses how to visualize reverse image search with Feder using the example of IVF_FLAT index. The IVF_FLAT index divides vectors in the vector space into different clusters based on vector distance. During a vector similarity search, users need to provide a target vector and the configuration of search parameters. Feder then visualizes the whole search process. In this use case, we use VOC 2012 dataset with an nlist of 256 to build an IVF_FLAT index. The system first calculates the distance between the target vector and the centroid of each cluster to find the nearest clusters. Then it compares the distance between the target vector and all vectors in the nprobe clusters for a fine search. Feder provides two visualization modes for the fine search process, one based on cluster and vector distance, and the other is the projection for dimension reduction mode. The value of index building parameters influences how the vector space is divided, and the nprobe parameter can be used to achieve tradeoff between search efficiency and accuracy.
Feder: A Powerful Visualization Tool for Vector Similarity Search
Date published
May 6, 2022
Author(s)
Min Tian, transcreated by Angela Ni.
Language
English
Word count
1145
Hacker News points
None found.
Feder is a tool that enables users to visualize approximate nearest neighbor search (ANNS) algorithms, specifically indexes like IVF_FLAT and HNSW. It helps users understand the structure of different indexes, how data are organized using each type of index, and how parameter configuration influences the indexing structure. Feder currently supports the HNSW from hnswlib but plans to support more indexes in the future. The tool is built with JavaScript and Python, allowing users to visualize index structures and search processes under IPython Notebook or as an HTML file for web service use.
Manage Your Milvus Vector Database with One-click Simplicity
Date published
March 10, 2022
Author(s)
Zilliz
Language
English
Word count
851
Hacker News points
None found.
Zhen Chen and Licen Wang have written an article about the open-source graphical user interface (GUI) called Attu, specifically designed for Milvus 2.0, an AI-oriented vector database system. The article provides a step-by-step guide on how to perform a vector similarity search using Attu and Milvus 2.0. Attu offers installers for Windows OS, macOS, and Linux OS, as well as plugins for expansion of customized functionalities. It also provides complete system topology information for easier understanding and administration of Milvus instance. The article demonstrates how to install Attu via GitHub or Docker, and how to use its features such as the Overview page, Collection page, Vector Search page, and System View page. The authors encourage users to develop their own plugins in Attu to suit their application scenarios. They also invite feedback from users to help optimize Attu for a better user experience.
Zilliz Triumphed in Billion-Scale ANN Search Challenge of NeurIPS 2021
Date published
Jan. 21, 2022
Author(s)
Zilliz
Language
English
Word count
379
Hacker News points
None found.
On December 6th, 2021, Zilliz's research team won the first Approximate Nearest Neighbor (ANN) Search Challenge at NeurIPS 2021 with their Disk Performance Optimization Algorithm. The challenge focused on leveraging ANN search on billion-scale datasets and attracted participants from top institutions and companies. Zilliz's solution, BBAnn, performed exceptionally well in the SimSearchNet++ dataset, retrieving 88.573% of all relevant results compared to a baseline of 16.274%. The team plans to implement this achievement in Milvus, an open-source vector database with applications in new drug discovery, recommender systems, chatbots, and more.
2021
Get started with Milvus_CLI
Date published
Dec. 31, 2021
Author(s)
ChenZhuanghong & Chenzhen
Language
English
Word count
697
Hacker News points
None found.
Milvus_CLI is a command-line tool designed to simplify the use of the Milvus vector database. It supports various operations such as database connection, data import and export, and vector calculation using interactive commands in shells. The latest version of Milvus_CLI includes features like support for all platforms, online and offline installation with pip, portability, built on Milvus SDK for Python, help docs, and auto-complete. Users can install Milvus_CLI either online or offline using the provided commands. The tool also provides various usage examples such as connecting to Milvus, creating a collection, listing collections, calculating vector distances, and deleting a collection.
Accelerating Candidate Generation in Recommender Systems Using Milvus paired with PaddlePaddle
Date published
Nov. 26, 2021
Author(s)
Yunmei
Language
English
Word count
2670
Hacker News points
None found.
This article introduces an open-source vector database, Milvus, paired with PaddlePaddle, a deep learning platform, to address the issues faced in developing recommender systems. The basic workflow of a recommender system involves candidate generation and ranking stages. The product recommender system project uses three components: MIND (Multi-Interest Network with Dynamic Routing for Recommendation at Tmall), PaddleRec, and Milvus. MIND is an algorithm developed by Alibaba Group that processes multiple interests of one user during the candidate generation stage. PaddleRec is a large-scale search model library for recommendation, while Milvus is a vector database featuring a cloud-native architecture used for vector similarity search and vector management in this project. The system implementation involves data processing, model training, model testing, generating product item candidates, and data storage and search.
Frustrated with New Data? Our Vector Database can Help
Date published
Nov. 8, 2021
Author(s)
Zilliz
Language
English
Word count
3015
Hacker News points
None found.
In the era of Big Data, unstructured data represents roughly 80-90% of all stored data. Traditional analytical methods fail to pull out useful information from these growing data lakes. To address this issue, researchers are focusing on building general-purpose vector database systems that can handle high-dimensional vector data and support advanced query semantics. The article discusses the design and challenges faced when building such a system, including optimizing cost-to-performance ratio relative to load, automated system configuration and tuning, and supporting advanced query semantics. It also introduces Milvus, an AI-oriented general-purpose vector database system developed by Zilliz's Research and Developement team.
Zilliz CEO Shared Start-up Experience in 2021 SYNC
Date published
Oct. 30, 2021
Author(s)
Zilliz
Language
English
Word count
362
Hacker News points
None found.
The SYNC 2021 conference, hosted by PingWest and themed on "Reshape the Future", recently concluded successfully. In the session "New Opportunities: How Asian Entrepreneurs Change the World", Charles Xie, CEO and founder of Zilliz, shared his experiences in founding Zilliz and achieving business success. Other presenters included Brad Bao, Co-founder and Chairman of Lime, Jun Pei, CEO and Co-founder of Cepton, and Lake Dai, Partner at LDV Partners Adjunct. Charles Xie is an experienced database expert who previously worked for Oracle's US headquarters before founding Zilliz, a company specializing in AI unstructured data processing and analysis systems. With $43 million financing led by Hillhouse Ventures, Zilliz set a record for the largest single Series B financing in the world of open source infrastructure software. Charles encouraged young entrepreneurs to challenge themselves and stay true to their ideas. Currently, Zilliz is developing a market and hiring talents in Silicon Valley.
Building a Video Analysis System with Milvus Vector Database
Date published
Oct. 9, 2021
Author(s)
Shiyu Chen
Language
English
Word count
1231
Hacker News points
None found.
The text discusses the "tip of the tongue" (TOT) phenomenon experienced while watching movies and introduces an idea to build a video content analysis engine based on Milvus. It explains how object detection, feature extraction, and vector analysis can be used in this process. Key technologies mentioned include OpenCV for frame extraction, YOLOv3 for object detection, ResNet-50 for feature extraction, and Milvus as a vector database for analyzing extracted feature vectors. The text also provides an overview of the deployment process and concludes with the benefits of using Milvus in various fields such as image processing, computer vision, natural language processing, speech recognition, recommender systems, and new drug discovery.
Combine AI Models for Image Search using ONNX and Milvus
Date published
Sept. 26, 2021
Author(s)
Zilliz
Language
English
Word count
1014
Hacker News points
None found.
Open Neural Network Exchange (ONNX) is an open format that represents machine learning models, enabling AI developers to use models with various frameworks, tools, runtimes, and compilers. Milvus is an open-source vector database designed for massive unstructured data analysis. This article introduces how to use multiple models for image search based on ONNX and Milvus, using VGG16 and ResNet50 models as examples. The process involves converting pre-trained AI models into the ONNX format, extracting feature vectors from images using these models, storing vector data in Milvus, and searching for similar images based on Euclidean distance calculations between vectors.
DiskANN: A Disk-based ANNS Solution with High Recall and High QPS on Billion-scale Dataset
Date published
Sept. 24, 2021
Author(s)
Zilliz
Language
English
Word count
3689
Hacker News points
None found.
"DiskANN: A Disk-based ANNS Solution with High Recall and High QPS on Billion-scale Dataset" is a paper published in NeurIPS 2019 that introduces an efficient method for index building and search on billion-scale datasets using a single machine. The proposed scheme, called DiskANN, builds a graph-based index on the dataset SIFT-1B with a single machine having 64GB of RAM and a 16-core CPU, achieving over 95% recall@1 at more than 5000 queries per second (QPS) with an average latency lower than 3ms. The paper also introduces Vamana, a new graph-based algorithm that minimizes the number of disk accesses and enhances search performance. DiskANN effectively supports search on large-scale datasets by overcoming memory restrictions in a single machine.
DNA Sequence Classification based on Milvus
Date published
Sept. 6, 2021
Author(s)
Language
English
Word count
1305
Hacker News points
None found.
Mengjia Gu, a data engineer at Zilliz and open-source community member of Milvus, discusses the application of vector databases in DNA sequence classification. Traditional sequence alignment methods are unsuitable for large datasets, making vectorization a more efficient choice. The open-source vector database Milvus is designed to store vectors of nucleic acid sequences and perform high-efficiency retrieval, reducing research costs. By converting long DNA sequences into k-mer lists, data can be vectorized and used in machine learning models for gene classification. Milvus' approximate nearest neighbor search algorithm enables efficient management of unstructured data and recalling similar results among trillions of vectors within milliseconds. The author provides a demo showcasing the use of Milvus in building a DNA sequence classification system, highlighting its potential applications in genetic research and practice.
Zilliz attended VLDB Workshop 2021
Date published
Aug. 27, 2021
Author(s)
Zilliz
Language
English
Word count
510
Hacker News points
None found.
In 2021, significant advancements were made in the database industry. Zilliz, a leading company in this field, shared its latest research progress and achievements at VLDB Workshop 2021. The company introduced Milvus, an open-source vector database developed with machine learning methods. Milvus is designed for handling massive feature vectors and provides a complete framework for vector data update, indexing, and similarity search. It has been widely used in artificial intelligence applications and its performance surpasses that of other products. The research team behind Milvus also presented their design concept for the 2.0 version, which includes cloud-native, log-as-data, and unified batch-and-stream processing features.
Paper Reading|HM-ANN: When ANNS Meets Heterogeneous Memory
Date published
Aug. 26, 2021
Author(s)
Jigao Luo
Language
English
Word count
1789
Hacker News points
None found.
The research paper "HM-ANN: Efficient Billion-Point Nearest Neighbor Search on Heterogenous Memory" proposes a novel algorithm called HM-ANN for graph-based similarity search. This algorithm considers both memory heterogeneity and data heterogeneity in modern hardware settings, enabling billion-scale similarity search on a single machine without compression technologies. The paper discusses the challenges of existing approximate nearest neighbor (ANN) search solutions due to limited dynamic random-access memory (DRAM) capacity and presents HM-ANN as an efficient alternative that achieves low search latency and high search accuracy, especially when the dataset cannot fit into DRAM.
Building a Personalized Product Recommender System with Vipshop and Milvus
Date published
July 29, 2021
Author(s)
Zilliz
Language
English
Word count
1655
Hacker News points
None found.
Vipshop, an online discount retailer in China, built a personalized search recommendation system to optimize their customers' shopping experience. The core function of the e-commerce search recommendation system is to retrieve suitable products from a large number of products and display them to users according to their search intent and preference. To achieve this, Vipshop used Milvus, an open source vector database, which supports distributed deployment, multi-language SDKs, read/write separation, etc., compared to the commonly used standalone Faiss. The overall architecture consists of two main parts: write process and read process. Data such as product information, user search intent, and user preferences are all unstructured data that were converted into feature vectors using various deep learning models and imported into Milvus. With the excellent performance of Milvus, Vipshop's e-commerce search recommendation system can efficiently query the TopK vectors that are similar to the target vectors. The average latency for recalling TopK vectors is about 30 ms.
Audio Retrieval Based on Milvus
Date published
July 27, 2021
Author(s)
Shiyu Chen
Language
English
Word count
1090
Hacker News points
None found.
Sound is an information dense data type, with 83% of Americans ages 12 or older listening to terrestrial radio in a given week in 2020. Sound can be classified into three categories: speech, music, and waveform. Audio retrieval systems are used for searching and monitoring online media in real-time to prevent intellectual property infringement and classify audio data. Feature extraction is crucial for audio similarity search, with deep learning-based models showing lower error rates than traditional ones. Milvus, an open-source vector database, can efficiently process feature vectors extracted by AI models and provides various common vector similarity calculations. The article demonstrates how to use an audio retrieval system powered by Milvus for non-speech audio data processing.
Quickly Test and Deploy Vector Search Solutions with the Milvus 2.0 Bootcamp
Date published
July 13, 2021
Author(s)
Zilliz
Language
English
Word count
1218
Hacker News points
None found.
The new and improved Milvus 2.0 bootcamp offers updated guides and easier to follow code examples for testing, deploying, and building vector search solutions. Users can stress test their systems against 1 million and 100 million dataset benchmarks, explore popular vector similarity search use cases such as image, video, audio, recommendation system, molecular search, and question answering system. The bootcamp also provides quick deployment solutions for fully built applications on any system and scenario-specific notebooks to easily deploy pre-configured applications. Additionally, users can learn how to deploy Milvus in different environments like Mishards, Kubernetes, and load balancing setups.
Building a Milvus Cluster Based on JuiceFS
Date published
June 15, 2021
Author(s)
Changjian Gao and Jingjing Jia
Language
English
Word count
1094
Hacker News points
None found.
Collaborations between open-source communities have led to the integration of Milvus, the world's most popular vector database, and JuiceFS, a high-performance distributed POSIX file system designed for cloud-native environments. JuiceFS is commonly used for solving big data challenges, building AI applications, and log collection. A Milvus cluster built with JuiceFS works by splitting upstream requests using Mishards to cascade the requests down to its sub-modules. Benchmark testing reveals that JuiceFS offers major advantages over Amazon Elastic File System (EFS), including higher IOPS and I/O throughput in both single- and multi-job scenarios. The Milvus cluster built on JuiceFS offers high performance and flexible storage capacity, making it a valuable tool for AI applications.
Building an Intelligent News Recommendation System Inside Sohu News App
Date published
June 7, 2021
Author(s)
Zilliz
Language
English
Word count
1409
Hacker News points
None found.
Sohu, a NASDAQ-listed Chinese online media company, has built an intelligent news recommendation system inside its news app using semantic vector search. The system uses user profiles built from browsing history to fine-tune personalized content recommendations over time, improving user experience and engagement. It leverages Milvus, an open-source vector database built by Zilliz, to process massive datasets efficiently and accurately, reducing memory usage during search and supporting high-performance deployments. The recommendation system relies on the Deep Structured Semantic Model (DSSM), which uses two neural networks to represent user queries and news articles as vectors. It also utilizes BERT-as-service for encoding news articles into semantic vectors, extracting semantically similar tags from user profiles, and identifying misclassified short text. The use of Milvus has significantly improved the real-time performance of Sohu's news recommendation system and increased efficiency in identifying misclassified short text.
Accelerating Compilation 2.5X with Dependency Decoupling & Testing Containerization
Date published
May 28, 2021
Author(s)
Zhifeng Zhang
Language
English
Word count
1514
Hacker News points
None found.
The text discusses the challenges faced during large-scale AI or MLOps projects due to complex dependencies and evolving compilation environments. It highlights common issues such as prohibitively long compilation times, complex compilation environments, and third-party dependency download failures. To address these issues, the article recommends decoupling project dependencies and implementing testing containerization. By doing so, it managed to decrease average compile time by 60% in an open-source embeddings similarity search project called Milvus. The text also provides detailed steps on how to decouple dependencies and optimize compilation between components, as well as within components. It concludes with further optimization measures such as regular cleanup of cache files and selective compile caching. Additionally, it emphasizes the benefits of leveraging containerized testing for reducing errors, improving stability, and reliability.
Accelerating AI in Finance with Milvus, an Open-Source Vector Database
Date published
May 19, 2021
Author(s)
Zilliz
Language
English
Word count
674
Hacker News points
None found.
The financial industry has been an early adopter of open-source software for big data processing and analytics, with banks using platforms like Apache Hadoop, MySQL, MongoDB, and PostgreSQL. With the rise of artificial intelligence (AI), vector databases such as Milvus have become essential tools in managing vector data and enabling similarity searches on massive datasets. Applications of AI in finance include algorithmic trading, portfolio optimization, Robo-advising, virtual customer assistants, market impact analysis, regulatory compliance, and stress testing. Key areas where vector data is leveraged by banks and financial companies are enhancing customer experience with banking chatbots, boosting sales with recommender systems, and analyzing earnings reports and other unstructured financial data with semantic text mining.
Building a Search by Image Shopping Experience with VOVA and Milvus
Date published
May 13, 2021
Author(s)
Zilliz
Language
English
Word count
976
Hacker News points
None found.
VOVA, an e-commerce platform focusing on affordability and user experience, has integrated image search functionality into its platform using Milvus. The system works in two stages: data import and query. It uses YOLO for target detection and ResNet for feature vector extraction from images. Milvus is used to conduct vector similarity searches within the extensive product image library. VOVA's shop by image tool allows users to search for products using uploaded photos, enhancing the overall shopping experience on their platform.
Making with Milvus: Detecting Android Viruses in Real Time for Trend Micro
Date published
April 23, 2021
Author(s)
Zilliz
Language
English
Word count
1459
Hacker News points
5
Cybersecurity is a growing concern, with 86% of companies expressing data privacy concerns in 2020. Trend Micro, a global leader in hybrid cloud security, has developed an Android virus detection system called Trend Micro Mobile Security to protect users from malware. The system compares APKs (Android application packages) from the Google Play Store with a database of known malware using similarity search. Initially, Trend Micro used MySQL for its virus detection system but quickly outgrew it as the number of APKs with nefarious code in its database increased. Trend Micro then began searching for alternative vector similarity search solutions and eventually chose Milvus, an open-source vector database created by Zilliz. Milvus is highly flexible, reliable, and fast, offering a comprehensive set of intuitive APIs that allow developers to choose the ideal index type for their scenario. It also provides distributed solutions and monitoring services. Trend Micro's mobile security system uses Thash values to differentiate APKs and Thash values for vector similarity retrieval. Milvus is used to conduct instantaneous vector similarity search on massive vector datasets converted from Thash values, with corresponding Sha256 values queried in MySQL. The system architecture also includes a Redis caching layer to map Thash values to Sha256 values, significantly reducing query time. The monitoring and alert system for Trend Micro's mobile security system is compatible with Prometheus and uses Grafana to visualize various performance metrics. With the help of Milvus, the system performance was able to meet the performance criteria set by Trend Micro.
Build Semantic Search at Speed
Date published
April 19, 2021
Author(s)
Elizabeth Edmiston
Language
English
Word count
1023
Hacker News points
None found.
Semantic search is an effective tool to help customers and employees find relevant products or information. However, slow semantic search can hinder its usefulness. To address this issue, Lucidworks has implemented semantic search using the semantic vector search approach. This involves encoding text into numerical vectors and using a vector search engine like Milvus to quickly find the best matches for customer searches or user queries. Milvus uses FAISS technology, which is also used by Facebook in its machine learning initiatives. The combination of Milvus and other components allows semantic search to be fast and efficient while handling large datasets.
How to Make 4 Popular AI Applications with Milvus
Date published
April 8, 2021
Author(s)
Zilliz
Language
English
Word count
1141
Hacker News points
None found.
Milvus is an open-source vector database that supports efficient search of massive vector datasets created by AI models. It offers comprehensive APIs and support for multiple index libraries, accelerating machine learning application development and MLOps. Zilliz, the company behind Milvus, has developed demos showcasing its use in natural language processing (NLP), reverse image search, audio search, and computer vision. These include an AI-powered chatbot using BERT for NLP, a reverse image search system with VGG for feature extraction, an audio similarity search system with PANNs for pattern recognition, and a video object detection system leveraging OpenCV, YOLOv3, and ResNet50.
Operationalize AI at Scale with Software 2.0, MLOps, and Milvus
Date published
March 31, 2021
Author(s)
Zilliz
Language
English
Word count
1405
Hacker News points
None found.
MLOps is a systemic approach to AI model life cycle management, which involves monitoring a machine learning model throughout its lifecycle and governing everything from underlying data to the effectiveness of a production system that relies on a particular model. It is necessary for building, maintaining, and deploying AI applications at scale. Key components of MLOps include continuous integration/continuous delivery (CI/CD), model development environments (MDE), champion-challenger testing, model versioning, model store and rollback. Milvus is an open-source vector data management platform that supports the transition to Software 2.0 and manages model life cycles with MLOps.
Making With Milvus: AI-Infused Proptech for Personalized Real Estate Search
Date published
March 18, 2021
Author(s)
Zilliz
Language
English
Word count
855
Hacker News points
None found.
The application of artificial intelligence (AI) in real estate is transforming home search processes. With the help of AI, tech-savvy real estate professionals can assist clients in finding suitable homes faster and simplify property purchasing. The coronavirus pandemic has accelerated interest, adoption, and investment in property technology (proptech), indicating its growing role in the industry. This article explores how Beike utilized vector similarity search to develop a house hunting platform that provides personalized results and recommends listings in near real-time. Vector similarity search is useful for various AI, deep learning, and traditional vector calculation scenarios, as it helps make sense of unstructured data such as images, video, audio, behavior data, documents, and more. Beike uses Milvus, an open-source vector database, to manage its AI real estate platform. The platform converts property listing data into feature vectors, which are then fed into Milvus for indexing and storage. This enables efficient similarity searches based on user queries, improving the home search experience for house hunters and helping agents close deals faster.
Extracting Event Highlights Using iYUNDONG Sports App
Date published
March 15, 2021
Author(s)
Zilliz
Language
English
Word count
1164
Hacker News points
None found.
iYUNDONG is an Internet company that aims to engage sport lovers and participants of events such as marathon races. It builds artificial intelligence (AI) tools that can analyze media captured during sporting events to automatically generate highlights. One key feature of the iYUNDONG sports App, called "Find me in motion," allows users who took part in a sport event to retrieve their photos or video clips from a massive media dataset by uploading a selfie. The app uses Milvus, an open-source vector database, to power its image retrieval system and achieve quick and large-scale vector search. iYUNDONG chose Milvus for its ability to support multiple indexes, efficiently reduce RAM usage, and regularly release new versions with powerful out-of-the-box features.
Making with Milvus: AI-Powered News Recommendation Inside Xiaomi's Mobile Browser
Date published
March 9, 2021
Author(s)
Zilliz
Language
English
Word count
1264
Hacker News points
None found.
Xiaomi, the multinational electronics manufacturer, has built an AI-powered news recommendation engine into its mobile web browser using Milvus, an open-source vector database. The application's core data management platform is designed for similarity search and artificial intelligence. This system uses AI to suggest personalized content and cut through the noise of news by recommending relevant articles based on user search history and interests. Xiaomi selected BERT as the language representation model in its recommendation engine, which can be used as a general natural language understanding (NLU) model for various natural language processing tasks. The AI-powered content recommendation system relies on three key components: vectorization, ID mapping, and approximate nearest neighbor (ANN) service.
Building Personalized Recommender Systems with Milvus and PaddlePaddle
Date published
Feb. 24, 2021
Author(s)
Zilliz
Language
English
Word count
1090
Hacker News points
None found.
This article discusses the creation of personalized recommender systems using Milvus and PaddlePaddle. The recommendation system is designed to help users find relevant information or products by analyzing their historical behavior. The MovieLens Million Dataset (ml-1m) is used as an example, which contains 1 million reviews of 4000 movies by 6000 users. A fusion recommendation model is implemented using PaddlePaddle's deep learning platform, and the movie feature vectors generated by the model are stored in Milvus, a vector similarity search engine. The user features are used as target vectors for searching within Milvus to obtain recommended movies. The main process involves training the model, preprocessing data, and implementing the personalized recommender system with Milvus. This combination of technologies allows for efficient and accurate recommendations based on user interests and needs.
How we used semantic search to make our search 10x smarter
Date published
Jan. 29, 2021
Author(s)
Rahul Yadav
Language
English
Word count
1060
Hacker News points
None found.
Tokopedia has introduced similarity search to improve the relevance of its product search results. The platform uses Elasticsearch for keyword-based search, which ranks products based on their frequency and proximity in a document. To enhance meaning comparison, they adopted vector representation, encoding words by their probable context. Milvus was chosen as the feature vector search engine due to its ease of use and support for more indexes. The platform deployed one writable node, two read-only nodes, and one Mishards middleware instance in Google Cloud Platform (GCP) using Milvus Ansible. Indexing plays a crucial role in accelerating similarity searches on large datasets by organizing data efficiently. Tokopedia plans to improve the model's performance for obtaining embeddings and run multiple learning models simultaneously for future experiments like image search and video search.
Vector Similarity Search Hides in Plain View
Date published
Jan. 5, 2021
Author(s)
Zilliz
Language
English
Word count
1542
Hacker News points
None found.
Artificial intelligence (AI) has the potential to revolutionize various industries and tasks. One example is race timing, where AI can replace traditional chip timers with video cameras and machine learning algorithms. This technology, known as vector similarity search, involves converting unstructured data into feature vectors using neural networks, then calculating similarities between these vectors. Vector similarity search has applications in e-commerce, security, recommendation engines, chatbots, image or video search, and chemical similarity search. Open-source software like Milvus and publicly available datasets make AI more accessible to developers and businesses.
2020
Building a Graph-based Recommendation System with Milvus, PinSage, DGL, and MovieLens Datasets
Date published
Dec. 1, 2020
Author(s)
Zilliz
Language
English
Word count
1415
Hacker News points
None found.
This article explains how to build a graph-based recommendation system using open-source tools such as Milvus, PinSage, and DGL. Recommendation systems are algorithms that make relevant suggestions to users based on their preferences and behaviors. Two common approaches to building recommendation systems are collaborative filtering and content-based filtering. In this example, the author uses the MovieLens datasets to build a user-movie bipartite graph for classification purposes. The PinSage model is then used to generate embedding vectors of pins as feature vectors of the acquired movie information. These embeddings are loaded into Milvus, which returns corresponding IDs and enables vector similarity search. Finally, the system recommends movies most similar to user search queries.
Making Sense of Unstructured Data with Zilliz Founder and CEO Charles Xie
Date published
Nov. 19, 2020
Author(s)
Zilliz
Language
English
Word count
999
Hacker News points
None found.
Charles Xie, founder and CEO of open-source software company Zilliz, discusses the importance of unstructured data processing and analysis platforms in today's world. With 80% of all data being unstructured, such as images, videos, audio, molecular structures, and gene sequences, but only 1% of this data getting analyzed due to processing complexities, Zilliz aims to extract value from unstructured data by building accessible tools for everyone. The company innovates through an open-source software development model, transparency in its culture, and a focus on teamwork. Despite the challenges posed by the COVID-19 pandemic, Zilliz has managed to maintain its operations and continue developing its products. Xie emphasizes the importance of trusting oneself and the people around them in handling stress and uncertainty. The company plans to stay ahead of competitors by focusing on breadth and depth in their offerings and establishing itself as a global leader in AI-powered unstructured data science software.
ArtLens AI: Share Your View
Date published
Sept. 11, 2020
Author(s)
Anna Faxon and Haley Kedziora
Language
English
Word count
911
Hacker News points
None found.
The Cleveland Museum of Art (CMA) has launched ArtLens AI: Share Your View, an interactive tool that matches photos taken by users with art from the museum's collection. This initiative aims to provide a fun and engaging way for people to connect with art during these uncertain times. Users can upload their images on the CMA website or mention @ArtLensAI on Twitter to receive matching artwork. The tool uses machine learning and open-source vector similarity engine Milvus to recognize shapes, patterns, and objects in users' photos and find surprising matches from the museum's collection.
Item-based Collaborative Filtering for Music Recommender System
Date published
Sept. 7, 2020
Author(s)
Zilliz
Language
English
Word count
1286
Hacker News points
None found.
Wanyin App, an AI-based music sharing community, implemented an item-based collaborative filtering (I2I CF) recommender system to sort out music of interest based on users' previous behavior. The system converts songs into mel-frequency cepstrum (MFC), designs a convolutional neural network (CNN) to extract feature embeddings, and uses Milvus as the feature vector similarity search engine for embedding similarity search. This approach helps in generating music recommendations through embedding similarity search and filtering duplicate songs accurately.
4 Steps to Building a Video Search System
Date published
Aug. 29, 2020
Author(s)
Zilliz
Language
English
Word count
856
Hacker News points
None found.
The text describes a video search system that uses image similarity to retrieve videos from a repository. It explains the process of converting videos into embeddings, which involves extracting key frames and converting their features into vectors. The workflow includes importing videos using OpenCV library, cutting each video into frames, and inserting extracted vectors (embeddings) into Milvus. For searching, it uses the same VGG model to convert input images into feature vectors and inserts them into Milvus to find similar vectors. It then retrieves corresponding videos from Minio based on Redis correlations. The article also provides a sample dataset of 100,000 GIF files from Tumblr for building an end-to-end solution for video search. Deployment steps are outlined using Docker images and docker-compose.yml configuration file. Finally, the system's interface is displayed, allowing users to input target images and retrieve similar videos.
The Journey to Optimizing Billion-scale Image Search (2/2)
Date published
Aug. 10, 2020
Author(s)
Zilliz
Language
English
Word count
1987
Hacker News points
None found.
The second-generation search-by-image system uses CNN + Milvus solution. Feature extraction is done using convolutional neural network (CNN) as the underlying technology. VGG16 model is used for image feature extraction, and Keras + TensorFlow are utilized for technical implementation. Milvus, an open-source vector search engine, is employed to store and manage feature vectors, calculate similarity, and return vector data in the nearest neighbor range. The system also includes image processing techniques such as normalization, bytes conversion, and black border removal.
The Journey to Optimizing Billion-scale Image Search (1/2)
Date published
Aug. 4, 2020
Author(s)
Zilliz
Language
English
Word count
1155
Hacker News points
None found.
Yupoo Picture Manager, which manages tens of billions of images for its users, has an urgent need to quickly locate images within its growing gallery. To address this issue, the company developed a search by image service that underwent two evolutions. The first-generation system used Perceptual hash (pHash) algorithm for feature extraction and ElasticSearch for similarity calculation. However, it had limitations in handling images with altered integrity. The second-generation system introduced a new underlying technology to overcome these limitations.
Building an AI-Powered Writing Assistant for WPS Office
Date published
July 28, 2020
Author(s)
Zilliz
Language
English
Word count
1244
Hacker News points
None found.
WPS Office is a productivity tool developed by Kingsoft, used by over 150 million users worldwide. The company's AI department built a smart writing assistant using semantic matching algorithms such as intent recognition and text clustering. This tool exists both as a web application and WeChat mini program that helps users quickly create outlines, individual paragraphs, and entire documents by inputting a title and selecting up to five keywords. The writing assistant's recommendation engine uses Milvus, an open-source similarity search engine, to power its core vector processing module. Building the WPS Office smart writing assistant involves making sense of unstructured textual data, using the TFIDF model for feature extraction, extracting features with a bi-directional LSTM-CNNs-CRF deep learning model, creating sentence embeddings using Infersent, and storing and querying vectors with Milvus. AI isn't replacing writers; it's helping them write more efficiently and effectively.
Building an Intelligent QA System with NLP and Milvus
Date published
May 12, 2020
Author(s)
Zilliz
Language
English
Word count
789
Hacker News points
None found.
The Milvus Project is an open-source vector search engine designed to build question answering (QA) systems. It uses Google's BERT model and the Milvus vector search engine to create a Q&A bot based on semantic understanding. The system architecture includes data preparation, generating feature vectors using BERT, importing them into Milvus and PostgreSQL, and retrieving answers. This article provides step-by-step instructions for building an online Q&A system in the insurance industry. With high performance and scalability, Milvus can support a corpus of up to hundreds of millions of texts.
How Does Milvus Schedule Query Tasks
Date published
March 2, 2020
Author(s)
Zilliz
Language
English
Word count
1304
Hacker News points
None found.
Milvus is an open-source vector database that supports massive-scale data search. It schedules query tasks by dividing the data into multiple data blocks and creating SearchTasks for each block. The tasks are assigned to computing devices based on their estimated completion times, with priority given to devices with shorter times. The results of each task are then merged to form the final search result. To optimize performance, Milvus uses an LRU cache to store frequently accessed data blocks and overlaps data loading and computation stages for better resource usage. It also considers different transmission speeds between GPUs when scheduling tasks. Future work includes exploring query optimization techniques and handling more complex hardware environments.
How to Select Index Parameters for IVF Index
Date published
Feb. 26, 2020
Author(s)
Zilliz
Language
English
Word count
661
Hacker News points
None found.
In Best Practices for Milvus Configuration, some best practices for setting key parameters in Milvus clients are introduced to improve search performance. The index_file_size parameter affects data storage and search efficiency. Generally, increasing the value of index_file_size improves search performance but may cause large files to fail loading into GPU or CPU memory. For nlist and nprobe parameters, a trade-off between precision and efficiency is necessary when determining their values. The optimal values for these parameters depend on the dataset size and distribution.
Accelerating New Drug Discovery
Date published
Feb. 6, 2020
Author(s)
Zilliz
Language
English
Word count
682
Hacker News points
None found.
Dolphin AI is an open-source similarity search engine designed to handle massive-scale feature vectors. It can be used in conjunction with RDKit, a chemoinformatics software suite, for high-performance chemical structure similarity searches. The system generates Morgan fingerprints using RDKit and then imports them into Milvus to build a chemical structure database. With different chemical fingerprints, Milvus can perform substructure search, similarity search, and exact search. This approach is faster and more efficient than traditional methods for discovering potentially available compounds in drug discovery research.
2019
Accelerating Similarity Search on Really Big Data with Vector Indexing
Date published
Dec. 5, 2019
Author(s)
Zilliz
Language
English
Word count
1849
Hacker News points
None found.
This article discusses the role of vector indexing in accelerating similarity search and machine learning applications, particularly those that involve large datasets. It covers different types of vector inverted file (IVF) indexes and their suitability for various scenarios. The IVF_FLAT index is best suited for searching relatively small (million-scale) datasets when 100% recall is required. For scenarios where disk, CPU, or GPU memory resources are limited, the IVF_SQ8 index type is a better option as it can convert each FLOAT to UINT8 by performing scalar quantization, reducing memory consumption by 70-75%. The new hybrid GPU/CPU approach, IVF_SQ8H, offers even faster query performance compared to IVF_SQ8 with no loss in search accuracy. Finally, the article introduces Milvus, an open-source vector data management platform that can power similarity search applications across various fields.