Company
Date Published
Oct. 17, 2024
Author
Haziqa Sajid
Word count
4452
Language
English
Hacker News points
None

Summary

In this tutorial, we will build an image search engine using OpenAI's CLIP model and PostgreSQL with the pgvector extension. We will use a sample dataset of images from Flickr30k and store their embeddings in a PostgreSQL table. Then, we will create a React application that allows users to input a textual query and retrieve image results from our server. To get started, you need the following: 1. A PostgreSQL database with the pgvector extension installed. 2. The OpenAI CLIP model for generating embeddings. 3. A dataset of images (e.g., Flickr30k). 4. Node.js and npm installed on your machine. 5. React, Express, and Axios libraries installed in your project. First, let's set up the PostgreSQL database with the pgvector extension: 1. Install pgvector extension: It ensures that the pgvector extension is installed in the database by executing `CREATE EXTENSION IF NOT EXISTS vector`. 2. Check table existence: This function executes a query to check if the Search_table exists in the public schema. If it does not exist, this function creates it. 3. Create table: If the table does not exist, it executes a query to create the Search_table with the following columns: - id: a primary key with auto-increment - path: a text field for storing the image path - embedding: a vector field with 512 dimensions for storing image embeddings 4. Error handling: It catches and logs any errors that occur during the process. 5. Release connection: The client.release() method is used to return a database client back to the connection pool after it has been used. Since this code will only be called once, we can invoke the function directly in database.js. We can do that when inserting the data. Data insertion This section will discuss the dataset used for the image application and how to insert it into our database. Flickr30k The Flickr30k dataset is a well-known benchmark for sentence-based image descriptions. It contains 31,783 images of people engaging in everyday activities and events. It is widely used for evaluating models that generate sentence-based portrayals of images. The dataset is available on Kaggle and can be easily downloaded. As this is an extensive image dataset, this demo is based on a sample of 100 images. Insertion logic The following code is a part of database.js: ```javascript import pgvector from 'pgvector/pg'; export async function insertInTable(client, filePaths) { // Load processor and vision model await client.connect(); await pgvector.registerTypes(client); try { for (const filePath of filePaths) { try { // Compute embeddings const vision_embedding = await visionEmbeddingGenerator(filePath); console.log(`Embeddings for ${filePath}:`, [pgvector.toSql(Array.from(vision_embedding))]); await client.query('INSERT INTO Search_table (path, embedding) VALUES ($1, $2)', [ filePath, pgvector.toSql(Array.from(vision_embedding)), ]); } catch (err) { console.error(`Error processing ${filePath}:`, err); } } } finally { await client.end(); } } ``` The insertInTable function connects to a PostgreSQL database and iterates over a list of image file paths. For each path, it computes image embeddings using the visionEmbeddingGenerator function and inserts these embeddings, along with the file path, into the Search_table table. It handles errors that occur while processing each image and ensures that the database connection is closed properly once all insertions are complete. This approach maintains robust error handling and efficient database management throughout the insertion process. Let's include a function in utils.js to list the files in our dataset directory. We will use this in database.js to insert the images into the database. Here’s the utility function: ```javascript import fs from 'fs'; import path from 'path'; export function getFilePaths(directory) { try { const files = fs.readdirSync(directory); const filePaths = files.map(file => path.join(directory, file)); return filePaths; } catch (err) { console.error('Error reading directory:', err); return []; } } ``` Now we can import it in database.js and execute the insertion: ```javascript import {getFilePaths} from './utils.js' import pkg from 'pg'; const { Pool } = pkg; function main() { const client = new Pool({ user: '<Your user>', host: '<Your host>', database: '<Your db>', password: '<Your Password>', port: <Your Port>, ssl: { rejectUnauthorized: false, }, }); const tableCreated = await createTableIfNotExists(client); if (tableCreated) { insertInTable(client, getFilePaths('dataset')) } } main() ``` Note: The preceding code remains unchanged in the file. Now that this process is complete, we have inserted the images and their embeddings in the table, which will be retrieved depending on the query. Building the Image Search Application Search API In this section, we will develop a POST route /search using Express.js that accepts a textual query from the user, transforms it into embeddings, and performs a database search. CLIP, a neural network model, combines image and text embeddings into a unified output space, allowing for direct comparisons between the two modalities within a single model. ```javascript app.post('/search', async (req, res) => { try { // Load tokenizer and text model await client.connect(); // Compute text embeddings const text_emb = await textEmbeddingGenerator(req.query['searchText']) const queryTextEmbedding = [pgvector.toSql(Array.from(text_emb))] console.log(queryTextEmbedding) // Perform similarity search const result = await client.query(` SELECT path FROM Search_table ORDER BY embedding <-> $1 LIMIT 5`, queryTextEmbedding); res.json(result.rows); console.log(result.rows) } catch (error) { console.error('Error performing search', error); res.status(500).send('Error performing search'); } finally{ client.end(); } }); ``` The app.post('/search') route processes POST requests to perform an image search based on a textual query. When a request is received, the code first connects to the PostgreSQL database. It then generates embeddings for the search text using the `textEmbeddingGenerator` function. These embeddings are converted into a format compatible with PostgreSQL using pgvector.toSql. The route then executes a similarity search against the Search_table table in the database, ordering results based on their similarity to the query embeddings using the <-> operator. It limits the results to the top five matches. The matching image paths are returned as a JSON response. If an error occurs during this process, a 500 status code is sent, and the database connection is closed in the finally block. After running the server using node index.js, we can check our endpoint using Postman, which is a platform that helps developers build and use APIs. If that seems a hassle, we can simply use wget or curl. Here’s how we can make a POST request with curl: ```bash curl -X POST "http://localhost:3000/search" -d "searchText=old man" ``` If you are using Postman, you will need a desktop version. After logging in and creating a workspace, let’s request our API: 1. Add a query parameter with the key searchText and the value old man. 2. Configure the request method as POST. 3. Set the URL to http://localhost:3000 where the server is listening. Here are the paths retrieved from the database after semantic search: Let's verify one of the images from the paths to ensure that the retrieved images match the query. Now, our server is ready to search, given the query. Let’s complete it with our client side. Final Touches In this section, we will create a React application that a client will use to interact with the Search API. Here’s how you can create the client side: The first step is to create a component file named SearchBar.js, which will take the user's input. Let’s write some code in it. ```javascript import React, { useState } from 'react'; import Timescale from './assets/1.jpeg'; # A Icon saved in the assets const SearchBar = () => { const [searchText, setSearchText] = useState(''); const [clicked, setClicked] = useState(false); return ( <div className="container"> <div className="titleContainer"> <img src={Timescale} alt="logo" className="logo" /> <h1 className="title">Timescale Image Search Engine</h1> </div> <div className="searchContainer"> <input type="text" value={searchText} onChange={(e) => setSearchText(e.target.value)} placeholder="Search..." className="input" /> <button onClick={() => setClicked(true)} className="button"> Search </button> </div> </div> ); }; export default SearchBar; ``` This React component, SearchBar, allows users to input a search query and retrieve image results from a server. It manages the search text, results, loading state, and any errors encountered during the search. Let’s fill in with the useEffect hook to query the Search API. ```javascript const [results, setResults] = useState([]); const [error, setError] = useState(null); useEffect(() => { const performSearch = async () => { try { const response = await axios.post('http://localhost:3000/search', { searchText }); setResults(response.data); } catch (err) { setError(err); } finally { setLoading(false); } }; if (clicked) { setClicked(false); performSearch(); } }, [clicked]); ``` This code snippet uses React's useState and useEffect hooks to manage search results and errors. When the clicked state changes, useEffect triggers an asynchronous search function that sends a POST request to http://localhost:3000/search with the search text. Successful responses update the `results` state, and any errors update the error state. The clicked state is reset to prevent repeated searches. Now, let’s look at the complete SearchBar component. Please note that additional components and custom hooks have been created to handle dynamic image imports. However, due to the scope of the article, we will skip the explanation. If you want, you can explore this further in our GitHub repository. Here’s the complete component: ```javascript #SearchBar.js import React, { useEffect, useState } from 'react'; import Timescale from './assets/1.jpeg'; import axios from 'axios'; import Image from './Image'; const SearchBar = () => { const [searchText, setSearchText] = useState(''); const [results, setResults] = useState([]); const [error, setError] = useState(null); const [clicked, setClicked] = useState(false); useEffect(() => { const performSearch = async () => { try { const response = await axios.post('http://localhost:3000/search', { searchText }); setResults(response.data); } catch (err) { setError(err); } finally { setLoading(false); } }; if (clicked) { setClicked(false); performSearch(); } }, [clicked]); return ( <div style={styles.container}> <div style={styles.titleContainer}> <img src={Timescale} alt="logo" style={styles.logo} /> <h1 style={styles.title}>Timescale Image Search Engine</h1> </div> <div style={styles.searchContainer}> <input type="text" value={searchText} onChange={(e) => setSearchText(e.target.value)} placeholder="Search..." style={styles.input} /> <button onClick={() => setClicked(true)} style={styles.button}> Search </button> </div> <div style={styles.resultsContainer}> {results.length > 0 && ( <ul style={styles.resultsList}> {results.map((item, index) => ( <li key={index} style={styles.resultItem}> <Image fileName={item.path} alt={searchText} /> </li> ))} </ul> )} </div> </div> ); }; const styles = { container: { display: 'flex', flexDirection: 'column', alignItems: 'center', justifyContent: 'center', backgroundColor: 'black', textAlign: 'center', padding: '20px', }, titleContainer: { display: 'flex', alignItems: 'center', marginBottom: '20px', }, logo: { width: '80px', height: '80px', marginRight: '10px', }, title: { fontSize: '48px', color: '#F5FF80', }, searchContainer: { display: 'flex', alignItems: 'center', justifyContent: 'center', width: '100%', marginBottom: '20px', }, input: { padding: '15px', borderRadius: '5px', border: '1px solid #F5FF80', marginRight: '10px', width: '50%', }, button: { padding: '15px 15px', borderRadius: '5px', border: 'none', backgroundColor: '#F5FF80', color: 'black', cursor: 'pointer', }, resultsContainer: { width: '100%', textAlign: 'center', // Center align the results container }, resultsList: { listStyleType: 'none', padding: 0, margin: 0, display: 'flex', flexWrap: 'wrap', justifyContent: 'center', }, resultItem: { margin: '10px', color: '#F5FF80', textAlign: 'center', }, }; export default SearchBar; ``` The SearchBar.js component is also responsible for displaying images on the page. After retrieving the image paths from the database, it selects the corresponding assets and displays them. To dynamically add image imports in React, we have created the useImage effect and the Image component. # useImage.js import { useEffect, useState } from 'react' const useImage = (fileName) => { const [loading, setLoading] = useState(true) const [error, setError] = useState(null) const [image, setImage] = useState(null) useEffect(() => { const fetchImage = async () => { const path = fileName.replace(/\\/g, '/'); try { const response = await import(`./assets/${path}`) // change relative path to suit your needs setImage(response.default) } catch (err) { setError(err) } finally { setLoading(false) } } fetchImage() }, [fileName]) return { loading, error, image, } } export default useImage Note: An assets folder is created within the src directory, which contains the image dataset. Let’s create a component to display the image, as you can see in the SearchBar.js: #Image.js import useImage from "./useImage" const Image = ({ fileName, alt }) => { const { loading, error, image } = useImage(fileName) console.log(error) return ( <> <img src={image} alt={alt} /> </> ) } export default Image The Image component takes the file name as a prop and uses the useImage hook to fetch the image. It returns an img element with the fetched image source, loading state, error state, and alternative text. Now that we have completed our client-side application, let's test it by running the server using node index.js and opening the SearchBar component in a web browser. When you input a search query and click on the "Search" button, the application should send a POST request to the /search endpoint with the search text as payload. The server should then perform a similarity search against the Search_table table in the PostgreSQL database and return the top five matching image paths as JSON response. Finally, the client-side application should display these images on the page using the Image component. In conclusion, we have built an image search engine using OpenAI's CLIP model and PostgreSQL with the pgvector extension. We used a sample dataset of images from Flickr30k and stored their embeddings in a PostgreSQL table. Then, we created a React application that allows users to input a textual query and retrieve image results from our server. This application demonstrates how to use advanced machine learning models like CLIP for semantic search applications and how to efficiently store and query high-dimensional vectors using PostgreSQL with the pgvector extension.