In this tutorial, we will learn how to use vector search with time-based filters in PostgreSQL using the pgvector extension and TimescaleDB's hypertables. We will demonstrate how to create a table with embedded vectors, perform similarity searches, and filter results based on timestamps.
First, let's install the necessary extensions:
```sql
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
CREATE EXTENSION IF NOT EXISTS "pgvector";
```
Next, we will create a table with embedded vectors and timestamps:
```sql
CREATE TABLE wiki2 (
id SERIAL PRIMARY KEY,
embedding TSVECTOR,
content TEXT,
time TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
```
Now, let's insert some sample data into the table:
```sql
INSERT INTO wiki2 (embedding, content)
SELECT
'{"x": 0.1, "y": 0.2, "z": 0.3}'::TSVECTOR,
random_wiki_content()
FROM generate_series(1, 100000);
```
To perform a similarity search on the embedded vectors, we can use the `<=>` operator provided by the pgvector extension:
```sql
SELECT id, embedding <=> '{"x": 0.1, "y": 0.2, "z": 0.3}'::TSVECTOR AS dist
FROM wiki2
ORDER BY dist
LIMIT 10;
```
This query will return the 10 most similar rows based on the embedded vectors. However, it does not consider any time-based filters. To add a time filter to our search, we can modify the query as follows:
```sql
SELECT id, embedding <=> '{"x": 0.1, "y": 0.2, "z": 0.3}'::TSVECTOR AS dist
FROM wiki2
WHERE '2000-01-04'::TIMESTAMPTZ <= time AND time < '2000-01-06'::TIMESTAMPTZ
ORDER BY dist
LIMIT 10;
```
This query will return the 10 most similar rows based on the embedded vectors, but only for rows with timestamps between '2000-01-04' and '2000-01-06'.
To improve performance when dealing with large datasets, we can use TimescaleDB's hypertables. Hypertables automatically partition data across multiple chunks based on time, allowing for more efficient querying and storage management. To create a hypertable from our existing table, we can run the following command:
```sql
SELECT create_hypertable('wiki2', 'time');
```
Now, let's perform the same similarity search with a time filter using the hypertable:
```sql
SELECT id, embedding <=> '{"x": 0.1, "y": 0.2, "z": 0.3}'::TSVECTOR AS dist
FROM wiki2
WHERE '2000-01-04'::TIMESTAMPTZ <= time AND time < '2000-01-06'::TIMESTAMPTZ
ORDER BY dist
LIMIT 10;
```
This query will use the vector index associated with the relevant chunk(s) to perform an approximate nearest-neighbor search, which is faster and more efficient than computing exact distances on the fly. Additionally, as your dataset grows, TimescaleDB's hypertables will continue to offer better performance due to chunk exclusion optimization.
In conclusion, by combining vector search with time-based filters in PostgreSQL using the pgvector extension and TimescaleDB's hypertables, we can efficiently retrieve more temporally relevant vectors while maintaining fast query times even as our dataset grows. This technique is particularly useful for AI applications that require contextually aware interactions based on both semantic similarity and temporal relevance.