AI/ML

Vector Databases Deep Dive

Explore vector databases, indexing algorithms, and their role in enhancing LLMs through Retrieval-Augmented Generation (RAG), with insights on scalability and challenges.

The emerging LLM tech stack includes Data, Model, Orchestration, and Operational layers, with vector databases playing a key role in Retrieval-Augmented Generation (RAG). Vector Databases store and retrieve high-dimensional embeddings, which are numerical representations of data.

Common indexing algorithms include IVF, LSH, ANNOY, HNSW, RP, PQ, and SQ, used to speed up Approximate Nearest Neighbor (ANN) searches.

Comparison with Traditional Databases : Vector databases focus on scalability, availability, and handling unstructured data, unlike SQL/NoSQL databases that prioritize consistency.

Querying : Vector databases use similarity measures like Euclidean distance, Cosine similarity, and Manhattan distance to find similar vectors. Challenges : Key challenges include scalability, real-time updates, and cost optimization. Differences between vector databases vs. traditional databases

Differences between vector databases vs. vector search add-ons

An embedding is a numerical representation that captures semantic meaning and is expressed as a vector. Embeddings can be compared to see how related they are. Embeddings of unstructured data can be quickly classified and searched with specialized techniques. A diagram showing vector data. Source: Pinecone The Rise of Vector Data

A transformer is typically used to create the embeddings. For example, OpenAI offers embedding models for turning text, such as natural language or code, into embeddings. However, if the input data is huge in size, then chunking is needed to break up the input data into smaller fragments. At the time of writing, OpenAI's text embedding models have a maximum input size of 8,192 tokens.

Vector databases use a combination of different algorithms to enable the approximate nearest neighbor (ANN) approach. A common pipeline involves the following stages: indexing, querying, and filtering.

Vector databases focus on storing and retrieving high-dimensional embeddings, alongside the original content chunks. Unlike relational databases that create indexes based on the table columns, vector databases utilize specialized algorithms for indexing. The indexing algorithms help improve vector search in two main ways :

Reduce search scope: This is done by partitioning and organizing vectors into data structures. Cluster-based index (e.g. IVF)

Reduce vector size: This is done by reducing the dimensions of the vectors or reducing the number of bits representing vector values. Compression-based index (e.g. RP, PQ, SQ) Read on to learn more about each indexing algorithm. An illustration of Inverted File indexing: Source

Maps similar vectors into the same hash tables using a set of hash functions. When searching, the query vector is hashed via the same functions and compared to vectors in the same table. By reducing the search space to vectors in the same hash table, searches are quicker. The search is non-exhaustive and provides approximate results. More hash functions improve accuracy but increase search computation.

An illustration of Locality-Sensitive Hashing indexing: Source Approximate Nearest Neighbors Oh Yeah (ANNOY) An illustration of hyperplanes used in Approximate Nearest Neighbors Oh Yeah indexing: Source An illustration of binary trees used in Approximate Nearest Neighbors Oh Yeah indexing: Source

An illustration of a layered Hierarchical Navigable Small World indexing: Source An illustration of Random Projection indexing: Source

Splits high-dimensional vectors into smaller parts, represents each part with a code, and then reassembles the parts. The code representations are generated by performing k-means clustering and collectively form a codebook. When searching, the query vector is simplified via the same process and matched against the same codebook. More codes in the codebook improve accuracy but increase search computation.

An illustration of Product Quantization indexing: Source

Compresses vectors by converting each component from a higher-precision format (e.g. 32-bit float) to a lower-precision format (e.g. 8-bit integer). This process maps continuous values to a finite set of discrete values. During indexing and search, all vectors undergo the same quantization, which reduces memory usage while approximately preserving vector relationships. Using more bits per component improves precision but increases memory requirements.

An illustration of Scalar Quantization mapping: Source

Providers of vector databases and add-ons offer different indexing options. As this is an evolving space, check out the providers' latest documentation on current indexing options available. A list of available indexing options for some popular providers (non-exhaustive, as of 11/29/2023): Source

A table of vector search metrics (memory, query time, recall) for various indexes from Pinecone: Source

In addition to storing the embeddings, metadata associated with each embedding can be stored. Examples of metadata are content type, category, source, etc. Hence, vector databases typically maintain a vector index and a metadata index. Traditional indexing methods are used for indexing the structured metadata. Some common indexing methods include:

Inverted Indexing: Creates an index that maps each unique term or metadata property to the list of records containing it, allowing for efficient keyword searches. Hash Indexing: Uses hash functions to map metadata keys to the storage locations of records, enabling fast exact-match lookups.

B-tree Indexing: Creates a balanced tree structure that stores structured metadata in sorted hierarchical order, allowing for efficient range queries and ordered traversals.

Unlike traditional databases (SQL and NoSQL) which look for exact matches, vector databases look for similar matches. The following are some common measures of similarity for comparing two vectors:

Euclidean or L2 Distance: Determines similarity by measuring the straight-line distance between two vectors in a vector space. The value ranges from 0 to infinity, where 0 represents identical vectors and larger values represent increasingly dissimilar vectors.

Manhattan or L1 Distance: Determines similarity by measuring the sum of the absolute differences between the coordinates of two vectors in a vector space. The value ranges from 0 to infinity, where 0 represents identical vectors and larger values represent increasingly dissimilar vectors.

Dot Product: Determines similarity by measuring the product of the magnitude of two vectors and the cosine of the angle between them. The value ranges from minus infinity to infinity, where a negative value represents vectors that point in opposite directions, 0 represents orthogonal vectors, and a positive value represents vectors that point in the same direction.

2D illustrations of vector similarity measures: Source

Aside from searching for similar vectors, vector databases can also filter the results based on a metadata query. The metadata filtering can be done before or after the vector similarity search.

Pre-filtering: Metadata filtering is performed before the vector search, which can reduce the search space. The drawback is that relevant results failing the metadata filter criteria may be overlooked.

Post-filtering: Metadata filtering is performed after the vector search, which ensures that relevant results aren't overlooked. However, this may add computational overhead to filter out the irrelevant results. Vector Databases vs. Traditional Databases A table comparing database types across several dimensions. Source: Author

Deciding which type of database to use depends on the type of data to store and retrieve:

Structured Data: Also referred to as quantitative data, this kind of data follows a predefined model or schema. For example, flight reservations follow a rigid model of reservation number, flight number, passenger name, etc.

Unstructured Data: Also referred to as qualitative data, this kind of data doesn't have an internal structure. It can consist of text, video, and images. For example, customer reviews and product photos can vary a lot and don't follow a rigid schema.

Semi-structured Data: An in-between of structured and unstructured data. It lacks a predefined structure but uses metadata to define data points. For example, JSON and XML objects have defined properties or tags. Different types of databases have distinct priorities:

SQL Databases: Focus on ACID principles of atomicity, consistency, isolation, and durability. Data distributed across partitions must be consistent before subsequent queries which limits horizontal scaling.

NoSQL Databases: Emphasize BASE principles of basically available, soft state, and eventual consistency. As eventual consistency is acceptable, subsequent queries can occur before data partitions are all consistent which improves horizontal scaling. Tolerate some imprecision in query results.

Vector Databases: Designed to handle large amounts of data and queries, excel at horizontal scaling. Fault tolerance is enabled to ensure high availability. A mapping of database types among main database properties: Source Based on the data type and prioritization, different database types are suitable for specific scenarios:

SQL Databases: Appropriate for critical data and transactions. For example, applications in banking and healthcare that handle sensitive data and require transactional integrity.

NoSQL Databases: Useful for higher volumes of data with exact lookups. For example, applications in social media to manage the numerous posts and connections.

Vector Databases: Befitting applications in AI/ML, NLP, and image recognition. For example, applications in ecommerce to analyze customer reviews and search for products by image. Vector Databases vs. Vector Search Add-ons A table of available providers of vector databases and vector search add-ons (non-exhaustive, as of 07/25/2024). Source: Author

Queries Per Second (QPS): The number of queries that the database can process per second.

Query Latency: The time it takes to execute a query and receive a response, usually expressed as percentile metrics (e.g. 95th and 99th percentiles). Recall: A measure of how accurate a query is in retrieving similar results.

The following are some benchmarking results from Zilliz, which offers a vector database solution, and Redis, which offers vector search integration. A graph of benchmarking mainstream vector databases and pgvector based on queries per second by Zilliz: Source A graph of benchmarking mainstream vector databases and Redis based on throughput speed by Redis: Source

Ultimately, the decision to use a vector database vs. a vector search add-on depends on your application's specific needs for embeddings storage and retrieval. For a small number of embeddings requiring simple vector search capability or to build a prototype, perhaps begin with an extension or integration. As your production system scales, consider setting up a separate vector database for performance optimization and flexibility.

Due to their purpose and architecture, vector databases face the following challenges:

Scalability and Performance: As the volume of high-dimensional embeddings grows, vector databases need to enhance data distribution and parallel processing to maintain performance and fault tolerance. Additionally, they may incorporate caching for frequently accessed data and memory optimization for reducing dimensionality to improve search speed.

Real-time Updates: Vector databases need to provide " fresh data ," meaning that newly inserted data are queryable with low latency (within a few seconds). There's also the need to balance the tradeoffs between search performance and metadata filtering accuracy.

Cost Optimization: To reduce costs, computation resources should be used only when needed, which means " decoupling the index storage from queries and searching only what is needed ." This is difficult when trying to keep latency low across partitions of the search space.

Diana Cheung (ex-LinkedIn software engineer, USC MBA, and Codesmith alum) is a technical writer on technology and business. She is an avid learner and has a soft spot for tea and meows.