Skip to content
Vector Databases — Embeddings, Similarity Search, Indexing & RAG Pipelines

Vector Databases — Embeddings, Similarity Search, Indexing & RAG Pipelines

DodaTech Updated Jun 20, 2026 9 min read

Vector databases store and search high-dimensional vector embeddings — numerical representations of text, images, or audio — enabling similarity search at scale for applications like semantic search, recommendation systems, and Retrieval Augmented Generation (RAG).

What You’ll Learn

  • Generating embeddings with OpenAI text-embedding-3, Cohere, and HuggingFace models
  • Pinecone, Weaviate, Chroma, Qdrant, and Milvus — which to use and when
  • Similarity search metrics: cosine similarity, euclidean distance, dot product
  • Hybrid search combining vector search with keyword filtering
  • Metadata filtering for precise result narrowing
  • RAG pipeline design using vector stores as the retrieval layer
  • Indexing algorithms: HNSW, IVF, PQ, and their trade-offs

Why Vector Databases Matter

Traditional databases excel at exact matches (WHERE name = 'Alice') but fail at semantic search ("find articles similar to this one"). Vector databases solve this by converting unstructured data into embeddings and finding the nearest neighbors in vector space. Every major LLM application — including RAG-based chatbots, recommendation engines, and semantic search — depends on a vector database as its memory layer.

Doda Browser uses vector embeddings for smart content discovery and search. Durga Antivirus Pro leverages vector similarity for semantic malware detection — finding threats that match the behavior pattern of known malware even when the byte signatures differ.

Learning Path

    flowchart LR
  A["OpenAI API Guide"] --> B["Embeddings Concepts"]
  B --> C["Vector Databases<br/>You are here"]
  C --> D["RAG Pipeline Design"]
  C --> E["Fine-Tuning LLMs"]
  D --> F["AI Agents"]
  style C fill:#f90,color:#fff
  

What Are Embeddings?

An embedding is a list of floating-point numbers (a vector) that captures the semantic meaning of content. Similar content has similar vectors.

# Generate embeddings with OpenAI
from openai import OpenAI
client = OpenAI()

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Vector databases store semantic meaning as mathematical vectors"
)
vector = response.data[0].embedding
print(f"Dimension: {len(vector)}")
print(f"First 5 values: {vector[:5]}")

Expected output:

Dimension: 1536
First 5 values: [0.0123, -0.0456, 0.0789, -0.0123, 0.0567]

Embedding Providers

ProviderModelDimensionsPricing
OpenAItext-embedding-3-small1536$0.02/1M tokens
OpenAItext-embedding-3-large3072$0.13/1M tokens
Cohereembed-english-v3.01024$0.10/1K units
HuggingFaceall-MiniLM-L6-v2384Free (local)
HuggingFaceBAAI/bge-large-en-v1.51024Free (local)

Vector Database Options

Pinecone

Fully managed, serverless or pod-based indexing.

from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key="your-api-key")

# Serverless index
pc.create_index(
    name="rag-index",
    dimension=1536,
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1")
)

index = pc.Index("rag-index")

# Upsert vectors
index.upsert([
    ("id1", [0.1, 0.2, ...], {"text": "Document about AI"}),
    ("id2", [0.3, 0.4, ...], {"text": "Document about databases"}),
])

# Query
results = index.query(
    vector=[0.15, 0.25, ...],
    top_k=5,
    include_metadata=True
)
print(results.matches[0].metadata["text"])

Weaviate

Open-source with built-in vectorizer modules.

import weaviate
import weaviate.classes as wvc

client = weaviate.connect_to_local()

# Create collection with auto-schema
collection = client.collections.create(
    name="Documents",
    vectorizer_config=wvc.config.Configure.Vectorizer.none(),
    properties=[
        wvc.config.Property(name="title", data_type=wvc.config.DataType.TEXT),
        wvc.config.Property(name="content", data_type=wvc.config.DataType.TEXT),
        wvc.config.Property(name="category", data_type=wvc.config.DataType.TEXT),
    ]
)

# Insert with pre-generated vectors
collection.data.insert(
    properties={"title": "AI Overview", "content": "...", "category": "tech"},
    vector=[0.1, 0.2, ...]
)

# Hybrid search (vector + keyword)
response = collection.query.hybrid(
    query="artificial intelligence",
    alpha=0.75,  # 75% vector, 25% keyword
    limit=5
)

Chroma

Lightweight, embedded — runs in-process, no server needed.

import chromadb

client = chromadb.Client()
collection = client.create_collection("docs")

collection.add(
    documents=["Document about AI", "Document about databases"],
    metadatas=[{"category": "tech"}, {"category": "tech"}],
    ids=["doc1", "doc2"]
)

results = collection.query(
    query_texts=["machine learning"],
    n_results=2
)
print(results["documents"])

Qdrant

Rust-based, high performance with rich filtering.

from qdrant_client import QdrantClient, models

client = QdrantClient("localhost", port=6333)

client.create_collection(
    collection_name="products",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE),
)

client.upsert(
    collection_name="products",
    points=[
        models.PointStruct(id=1, vector=[0.1, 0.2, ...],
            payload={"name": "Laptop", "price": 999.99, "category": "electronics"}),
    ]
)

# Query with metadata filter
client.search(
    collection_name="products",
    query_vector=[0.15, 0.25, ...],
    query_filter=models.Filter(
        must=[
            models.FieldCondition(key="category", match=models.MatchValue(value="electronics")),
            models.FieldCondition(key="price", range=models.Range(lt=1500)),
        ]
    ),
    limit=5
)

Milvus

Cloud-native, designed for billion-scale vector search.

from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType

connections.connect(host="localhost", port="19530")

schema = CollectionSchema([
    FieldSchema("id", DataType.INT64, is_primary=True),
    FieldSchema("embedding", DataType.FLOAT_VECTOR, dim=1536),
    FieldSchema("title", DataType.VARCHAR, max_length=200),
])
collection = Collection("documents", schema)

# Create IVF_FLAT index
collection.create_index("embedding", {
    "index_type": "IVF_FLAT",
    "metric_type": "COSINE",
    "params": {"nlist": 256}
})

collection.load()
results = collection.search(
    data=[[0.1, 0.2, ...]],
    anns_field="embedding",
    param={"metric_type": "COSINE", "params": {"nprobe": 10}},
    limit=5,
)

Similarity Search Metrics

MetricFormulaBest ForRange
CosineA·B / (A×
Euclidean√Σ(Aᵢ-Bᵢ)²Clustering, anomaly detection0 to ∞ (lower = more similar)
Dot ProductΣ(Aᵢ×Bᵢ)Recommendation, normalized vectors-∞ to ∞ (higher = more similar)
import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

def euclidean_distance(a, b):
    return np.sqrt(np.sum((a - b) ** 2))

vec1 = np.array([0.1, 0.2, 0.3])
vec2 = np.array([0.15, 0.25, 0.35])
vec3 = np.array([0.9, 0.8, 0.7])

print(f"Cosine(vec1, vec2) = {cosine_similarity(vec1, vec2):.4f}")
print(f"Cosine(vec1, vec3) = {cosine_similarity(vec1, vec3):.4f}")

Expected output:

Cosine(vec1, vec2) = 0.9986
Cosine(vec1, vec3) = 0.8824

Indexing Methods

MethodTypeSearch SpeedMemoryBuild TimeBest For
HNSWGraphFastestHighSlowHigh-accuracy, moderate scale
IVFClusterFastMediumMediumLarge-scale, balanced
PQCompressionMediumLowSlowMemory-constrained
HNSW+PQHybridFastLowSlowLarge-scale + memory constrained

HNSW (Hierarchical Navigable Small World)

The most popular algorithm for high-accuracy vector search. It builds a multi-layer graph where each layer is a coarser representation of the vectors below.

# HNSW configuration in Pinecone
index = pc.Index("hnsw-index")
# HNSW parameters (Pinecone handles this automatically)
# M = 16      - connections per node
# ef_construction = 200 - build quality vs speed
# ef_search = 100      - query accuracy vs speed

IVF (Inverted File Index)

Clusters vectors into Voronoi cells. At query time, only the nearest cells are searched.

# IVF configuration in Milvus
index_params = {
    "index_type": "IVF_FLAT",
    "params": {"nlist": 256},  # Number of clusters
}
# nprobe = 10 (number of clusters to search at query time)

RAG Pipeline with Vector Store

    flowchart TB
  DOC["Documents"] --> CHUNK["Chunking"]
  CHUNK --> EMBED["Embedding Model"]
  EMBED --> VECTOR["Vector Database"]
  QUERY["User Query"] --> QUERY_EMBED["Query Embedding"]
  QUERY_EMBED --> SEARCH["Similarity Search"]
  VECTOR --> SEARCH
  SEARCH --> CONTEXT["Retrieved Context"]
  CONTEXT --> LLM["LLM Generation"]
  QUERY --> LLM
  LLM --> RESPONSE["Final Response"]
  
# Complete RAG pipeline
from openai import OpenAI
import chromadb

client = OpenAI()
vector_db = chromadb.Client().get_or_create_collection("knowledge_base")

# Step 1: Index documents
def index_document(doc_id, text, metadata=None):
    embedding = client.embeddings.create(
        model="text-embedding-3-small", input=text
    ).data[0].embedding
    
    vector_db.add(
        embeddings=[embedding],
        documents=[text],
        metadatas=[metadata or {}],
        ids=[doc_id]
    )

# Step 2: Retrieve
def retrieve(query, k=3):
    query_emb = client.embeddings.create(
        model="text-embedding-3-small", input=query
    ).data[0].embedding
    
    results = vector_db.query(query_embeddings=[query_emb], n_results=k)
    return results["documents"][0]

# Step 3: Generate
def rag_answer(query):
    context = retrieve(query)
    context_text = "\n\n".join(context)
    
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "Answer using the provided context."},
            {"role": "user", "content": f"Context:\n{context_text}\n\nQuestion: {query}"}
        ]
    )
    return response.choices[0].message.content

# Example
index_document("1", "Vector databases store embeddings for similarity search")
print(rag_answer("What do vector databases store?"))

Expected output: The answer is generated using the retrieved context about vector databases storing embeddings.

Common Errors

1. Embedding dimension mismatch

You upserted 1536-dim vectors but the index expects 384-dim. Always verify the dimension matches between your embedding model and your vector database index configuration.

2. Cosine similarity with unnormalized vectors

Pinecone and Qdrant use cosine by default but assume vectors are normalized. Unnormalized vectors give incorrect results. Normalize: vec / np.linalg.norm(vec).

3. Metadata filter field type mismatch

Filtering price < 1500 fails if price is stored as a string. Ensure metadata field types match the filter type.

4. HNSW ef_search too low for production

Default ef_search values favor speed over accuracy. For production, increase ef_search (e.g., 100-500) for better recall at the cost of latency.

5. Chunking too large or too small

Large chunks (2000+ tokens) contain too much noise for precise retrieval. Small chunks (50 tokens) lose context. Sweet spot: 200-500 tokens with 50-token overlap.

6. Pinecone pod index stuck initializing

Serverless indexes initialize in seconds. Pod-based indexes take 5-15 minutes. Check pod type — p1 pods are slower than s1 pods.

7. Milvus collection not loaded

After creating a collection, you must call collection.load() before searching. Searching an unloaded collection returns empty results.

Practice Questions

  1. What is the difference between cosine similarity and euclidean distance? Cosine measures angle between vectors (direction), euclidean measures straight-line distance (magnitude + direction). Cosine is preferred for text embeddings where magnitude often represents document length, not meaning.

  2. When would you choose HNSW over IVF? HNSW when you need higher recall and can afford more memory. IVF when you have billions of vectors and need lower memory footprint.

  3. How does hybrid search combine vector and keyword search? It runs both searches and merges results using a weighted alpha parameter (alpha=1 = pure vector, alpha=0 = pure keyword). Weaviate and Qdrant support this natively.

  4. What is a RAG pipeline? Retrieval Augmented Generation: retrieve relevant documents from a vector store, inject them as context to an LLM, and generate an answer grounded in the retrieved data — reducing hallucination.

  5. Why normalize embeddings before similarity search? For cosine similarity, normalization ensures results are based on direction alone. For euclidean, normalization ensures all vectors contribute equally regardless of magnitude.

Challenge: Build a multi-modal RAG system for a support knowledge base with 10,000 documents. Design: (1) chunking strategy (chunk size, overlap), (2) embedding model selection, (3) vector database with hybrid search, (4) metadata filtering by category and date, (5) evaluation metric (recall@k, MRR) to tune chunk size and ef_search parameters.

FAQ

What is the difference between a vector database and a vector index?
A vector database (Pinecone, Weaviate) manages data lifecycle — insert, update, delete, filtering, replication. A vector index (FAISS) is just the search algorithm. Vector databases are built on top of vector indices.
Which vector database is best for production?
Pinecone is the easiest managed service. Weaviate offers the richest hybrid search. Milvus scales to billions. Chroma is best for prototyping. Evaluate based on scale, latency requirements, and operational resources.
Can I use a vector database without embeddings?
No — embeddings are the input. You generate them with an embedding model, then store and search them in the vector database.
How do I update an embedding for changed content?
Delete the old vector and upsert the new one. Most vector databases support upsert by ID (replace if exists, insert if new).
What is the recommended chunk size for RAG?
200-500 tokens with 10-25% overlap. This balances precision (small chunks retrieve specific facts) with context (enough surrounding text for the LLM to understand).
How much does a vector database cost?
Pinecone serverless: $0.10 per million vectors per hour for storage, plus query costs. Self-hosted (Milvus, Qdrant): infrastructure costs only. Chroma: free (in-process).

Try It Yourself

Build a local semantic search with Python and Chroma:

import chromadb
from sentence_transformers import SentenceTransformer

# Use a local embedding model
model = SentenceTransformer("all-MiniLM-L6-v2")
client = chromadb.Client()
collection = client.create_collection("demo")

documents = [
    "Python is a programming language",
    "Vector databases enable semantic search",
    "Neural networks learn from data",
    "Databases store structured information",
]

# Embed and index
embeddings = model.encode(documents).tolist()
collection.add(
    embeddings=embeddings,
    documents=documents,
    ids=[f"doc{i}" for i in range(len(documents))]
)

# Search
query = "machine learning"
query_emb = model.encode([query]).tolist()
results = collection.query(query_embeddings=query_emb, n_results=2)
print("Query:", query)
print("Results:", results["documents"][0])

Expected output:

Query: machine learning
Results: ['Neural networks learn from data', 'Python is a programming language']

What’s Next

TutorialWhat You’ll Learn
LangChain GuideBuilding RAG pipelines with LangChain
OpenAI API GuideGenerating embeddings with OpenAI
Python ProgrammingPython fundamentals for vector DB scripting

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro. Updated 2026-06-20.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro