Vector Databases — Embeddings, Similarity Search, Indexing & RAG Pipelines
Vector databases store and search high-dimensional vector embeddings — numerical representations of text, images, or audio — enabling similarity search at scale for applications like semantic search, recommendation systems, and Retrieval Augmented Generation (RAG).
What You’ll Learn
- Generating embeddings with OpenAI text-embedding-3, Cohere, and HuggingFace models
- Pinecone, Weaviate, Chroma, Qdrant, and Milvus — which to use and when
- Similarity search metrics: cosine similarity, euclidean distance, dot product
- Hybrid search combining vector search with keyword filtering
- Metadata filtering for precise result narrowing
- RAG pipeline design using vector stores as the retrieval layer
- Indexing algorithms: HNSW, IVF, PQ, and their trade-offs
Why Vector Databases Matter
Traditional databases excel at exact matches (WHERE name = 'Alice') but fail at semantic search ("find articles similar to this one"). Vector databases solve this by converting unstructured data into embeddings and finding the nearest neighbors in vector space. Every major LLM application — including RAG-based chatbots, recommendation engines, and semantic search — depends on a vector database as its memory layer.
Doda Browser uses vector embeddings for smart content discovery and search. Durga Antivirus Pro leverages vector similarity for semantic malware detection — finding threats that match the behavior pattern of known malware even when the byte signatures differ.
Learning Path
flowchart LR
A["OpenAI API Guide"] --> B["Embeddings Concepts"]
B --> C["Vector Databases<br/>You are here"]
C --> D["RAG Pipeline Design"]
C --> E["Fine-Tuning LLMs"]
D --> F["AI Agents"]
style C fill:#f90,color:#fff
What Are Embeddings?
An embedding is a list of floating-point numbers (a vector) that captures the semantic meaning of content. Similar content has similar vectors.
# Generate embeddings with OpenAI
from openai import OpenAI
client = OpenAI()
response = client.embeddings.create(
model="text-embedding-3-small",
input="Vector databases store semantic meaning as mathematical vectors"
)
vector = response.data[0].embedding
print(f"Dimension: {len(vector)}")
print(f"First 5 values: {vector[:5]}")Expected output:
Dimension: 1536
First 5 values: [0.0123, -0.0456, 0.0789, -0.0123, 0.0567]Embedding Providers
| Provider | Model | Dimensions | Pricing |
|---|---|---|---|
| OpenAI | text-embedding-3-small | 1536 | $0.02/1M tokens |
| OpenAI | text-embedding-3-large | 3072 | $0.13/1M tokens |
| Cohere | embed-english-v3.0 | 1024 | $0.10/1K units |
| HuggingFace | all-MiniLM-L6-v2 | 384 | Free (local) |
| HuggingFace | BAAI/bge-large-en-v1.5 | 1024 | Free (local) |
Vector Database Options
Pinecone
Fully managed, serverless or pod-based indexing.
from pinecone import Pinecone, ServerlessSpec
pc = Pinecone(api_key="your-api-key")
# Serverless index
pc.create_index(
name="rag-index",
dimension=1536,
metric="cosine",
spec=ServerlessSpec(cloud="aws", region="us-east-1")
)
index = pc.Index("rag-index")
# Upsert vectors
index.upsert([
("id1", [0.1, 0.2, ...], {"text": "Document about AI"}),
("id2", [0.3, 0.4, ...], {"text": "Document about databases"}),
])
# Query
results = index.query(
vector=[0.15, 0.25, ...],
top_k=5,
include_metadata=True
)
print(results.matches[0].metadata["text"])Weaviate
Open-source with built-in vectorizer modules.
import weaviate
import weaviate.classes as wvc
client = weaviate.connect_to_local()
# Create collection with auto-schema
collection = client.collections.create(
name="Documents",
vectorizer_config=wvc.config.Configure.Vectorizer.none(),
properties=[
wvc.config.Property(name="title", data_type=wvc.config.DataType.TEXT),
wvc.config.Property(name="content", data_type=wvc.config.DataType.TEXT),
wvc.config.Property(name="category", data_type=wvc.config.DataType.TEXT),
]
)
# Insert with pre-generated vectors
collection.data.insert(
properties={"title": "AI Overview", "content": "...", "category": "tech"},
vector=[0.1, 0.2, ...]
)
# Hybrid search (vector + keyword)
response = collection.query.hybrid(
query="artificial intelligence",
alpha=0.75, # 75% vector, 25% keyword
limit=5
)Chroma
Lightweight, embedded — runs in-process, no server needed.
import chromadb
client = chromadb.Client()
collection = client.create_collection("docs")
collection.add(
documents=["Document about AI", "Document about databases"],
metadatas=[{"category": "tech"}, {"category": "tech"}],
ids=["doc1", "doc2"]
)
results = collection.query(
query_texts=["machine learning"],
n_results=2
)
print(results["documents"])Qdrant
Rust-based, high performance with rich filtering.
from qdrant_client import QdrantClient, models
client = QdrantClient("localhost", port=6333)
client.create_collection(
collection_name="products",
vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE),
)
client.upsert(
collection_name="products",
points=[
models.PointStruct(id=1, vector=[0.1, 0.2, ...],
payload={"name": "Laptop", "price": 999.99, "category": "electronics"}),
]
)
# Query with metadata filter
client.search(
collection_name="products",
query_vector=[0.15, 0.25, ...],
query_filter=models.Filter(
must=[
models.FieldCondition(key="category", match=models.MatchValue(value="electronics")),
models.FieldCondition(key="price", range=models.Range(lt=1500)),
]
),
limit=5
)Milvus
Cloud-native, designed for billion-scale vector search.
from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType
connections.connect(host="localhost", port="19530")
schema = CollectionSchema([
FieldSchema("id", DataType.INT64, is_primary=True),
FieldSchema("embedding", DataType.FLOAT_VECTOR, dim=1536),
FieldSchema("title", DataType.VARCHAR, max_length=200),
])
collection = Collection("documents", schema)
# Create IVF_FLAT index
collection.create_index("embedding", {
"index_type": "IVF_FLAT",
"metric_type": "COSINE",
"params": {"nlist": 256}
})
collection.load()
results = collection.search(
data=[[0.1, 0.2, ...]],
anns_field="embedding",
param={"metric_type": "COSINE", "params": {"nprobe": 10}},
limit=5,
)Similarity Search Metrics
| Metric | Formula | Best For | Range |
|---|---|---|---|
| Cosine | A·B / ( | A | × |
| Euclidean | √Σ(Aᵢ-Bᵢ)² | Clustering, anomaly detection | 0 to ∞ (lower = more similar) |
| Dot Product | Σ(Aᵢ×Bᵢ) | Recommendation, normalized vectors | -∞ to ∞ (higher = more similar) |
import numpy as np
def cosine_similarity(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
def euclidean_distance(a, b):
return np.sqrt(np.sum((a - b) ** 2))
vec1 = np.array([0.1, 0.2, 0.3])
vec2 = np.array([0.15, 0.25, 0.35])
vec3 = np.array([0.9, 0.8, 0.7])
print(f"Cosine(vec1, vec2) = {cosine_similarity(vec1, vec2):.4f}")
print(f"Cosine(vec1, vec3) = {cosine_similarity(vec1, vec3):.4f}")Expected output:
Cosine(vec1, vec2) = 0.9986
Cosine(vec1, vec3) = 0.8824Indexing Methods
| Method | Type | Search Speed | Memory | Build Time | Best For |
|---|---|---|---|---|---|
| HNSW | Graph | Fastest | High | Slow | High-accuracy, moderate scale |
| IVF | Cluster | Fast | Medium | Medium | Large-scale, balanced |
| PQ | Compression | Medium | Low | Slow | Memory-constrained |
| HNSW+PQ | Hybrid | Fast | Low | Slow | Large-scale + memory constrained |
HNSW (Hierarchical Navigable Small World)
The most popular algorithm for high-accuracy vector search. It builds a multi-layer graph where each layer is a coarser representation of the vectors below.
# HNSW configuration in Pinecone
index = pc.Index("hnsw-index")
# HNSW parameters (Pinecone handles this automatically)
# M = 16 - connections per node
# ef_construction = 200 - build quality vs speed
# ef_search = 100 - query accuracy vs speedIVF (Inverted File Index)
Clusters vectors into Voronoi cells. At query time, only the nearest cells are searched.
# IVF configuration in Milvus
index_params = {
"index_type": "IVF_FLAT",
"params": {"nlist": 256}, # Number of clusters
}
# nprobe = 10 (number of clusters to search at query time)RAG Pipeline with Vector Store
flowchart TB
DOC["Documents"] --> CHUNK["Chunking"]
CHUNK --> EMBED["Embedding Model"]
EMBED --> VECTOR["Vector Database"]
QUERY["User Query"] --> QUERY_EMBED["Query Embedding"]
QUERY_EMBED --> SEARCH["Similarity Search"]
VECTOR --> SEARCH
SEARCH --> CONTEXT["Retrieved Context"]
CONTEXT --> LLM["LLM Generation"]
QUERY --> LLM
LLM --> RESPONSE["Final Response"]
# Complete RAG pipeline
from openai import OpenAI
import chromadb
client = OpenAI()
vector_db = chromadb.Client().get_or_create_collection("knowledge_base")
# Step 1: Index documents
def index_document(doc_id, text, metadata=None):
embedding = client.embeddings.create(
model="text-embedding-3-small", input=text
).data[0].embedding
vector_db.add(
embeddings=[embedding],
documents=[text],
metadatas=[metadata or {}],
ids=[doc_id]
)
# Step 2: Retrieve
def retrieve(query, k=3):
query_emb = client.embeddings.create(
model="text-embedding-3-small", input=query
).data[0].embedding
results = vector_db.query(query_embeddings=[query_emb], n_results=k)
return results["documents"][0]
# Step 3: Generate
def rag_answer(query):
context = retrieve(query)
context_text = "\n\n".join(context)
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "Answer using the provided context."},
{"role": "user", "content": f"Context:\n{context_text}\n\nQuestion: {query}"}
]
)
return response.choices[0].message.content
# Example
index_document("1", "Vector databases store embeddings for similarity search")
print(rag_answer("What do vector databases store?"))Expected output: The answer is generated using the retrieved context about vector databases storing embeddings.
Common Errors
1. Embedding dimension mismatch
You upserted 1536-dim vectors but the index expects 384-dim. Always verify the dimension matches between your embedding model and your vector database index configuration.
2. Cosine similarity with unnormalized vectors
Pinecone and Qdrant use cosine by default but assume vectors are normalized. Unnormalized vectors give incorrect results. Normalize: vec / np.linalg.norm(vec).
3. Metadata filter field type mismatch
Filtering price < 1500 fails if price is stored as a string. Ensure metadata field types match the filter type.
4. HNSW ef_search too low for production
Default ef_search values favor speed over accuracy. For production, increase ef_search (e.g., 100-500) for better recall at the cost of latency.
5. Chunking too large or too small
Large chunks (2000+ tokens) contain too much noise for precise retrieval. Small chunks (50 tokens) lose context. Sweet spot: 200-500 tokens with 50-token overlap.
6. Pinecone pod index stuck initializing
Serverless indexes initialize in seconds. Pod-based indexes take 5-15 minutes. Check pod type — p1 pods are slower than s1 pods.
7. Milvus collection not loaded
After creating a collection, you must call collection.load() before searching. Searching an unloaded collection returns empty results.
Practice Questions
What is the difference between cosine similarity and euclidean distance? Cosine measures angle between vectors (direction), euclidean measures straight-line distance (magnitude + direction). Cosine is preferred for text embeddings where magnitude often represents document length, not meaning.
When would you choose HNSW over IVF? HNSW when you need higher recall and can afford more memory. IVF when you have billions of vectors and need lower memory footprint.
How does hybrid search combine vector and keyword search? It runs both searches and merges results using a weighted alpha parameter (alpha=1 = pure vector, alpha=0 = pure keyword). Weaviate and Qdrant support this natively.
What is a RAG pipeline? Retrieval Augmented Generation: retrieve relevant documents from a vector store, inject them as context to an LLM, and generate an answer grounded in the retrieved data — reducing hallucination.
Why normalize embeddings before similarity search? For cosine similarity, normalization ensures results are based on direction alone. For euclidean, normalization ensures all vectors contribute equally regardless of magnitude.
Challenge: Build a multi-modal RAG system for a support knowledge base with 10,000 documents. Design: (1) chunking strategy (chunk size, overlap), (2) embedding model selection, (3) vector database with hybrid search, (4) metadata filtering by category and date, (5) evaluation metric (recall@k, MRR) to tune chunk size and ef_search parameters.
FAQ
Try It Yourself
Build a local semantic search with Python and Chroma:
import chromadb
from sentence_transformers import SentenceTransformer
# Use a local embedding model
model = SentenceTransformer("all-MiniLM-L6-v2")
client = chromadb.Client()
collection = client.create_collection("demo")
documents = [
"Python is a programming language",
"Vector databases enable semantic search",
"Neural networks learn from data",
"Databases store structured information",
]
# Embed and index
embeddings = model.encode(documents).tolist()
collection.add(
embeddings=embeddings,
documents=documents,
ids=[f"doc{i}" for i in range(len(documents))]
)
# Search
query = "machine learning"
query_emb = model.encode([query]).tolist()
results = collection.query(query_embeddings=query_emb, n_results=2)
print("Query:", query)
print("Results:", results["documents"][0])Expected output:
Query: machine learning
Results: ['Neural networks learn from data', 'Python is a programming language']What’s Next
| Tutorial | What You’ll Learn |
|---|---|
| LangChain Guide | Building RAG pipelines with LangChain |
| OpenAI API Guide | Generating embeddings with OpenAI |
| Python Programming | Python fundamentals for vector DB scripting |
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro. Updated 2026-06-20.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro