Skip to content

Vector Databases Explained — Pinecone, Chroma, Weaviate

DodaTech 1 min read

In this tutorial, you'll learn about Vector Databases Explained. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

What You'll Learn

Understand vector databases, how they store and search embeddings, and how to use Chroma (local) and Pinecone (cloud) for RAG.

Why It Matters

Vector databases are the backbone of RAG pipelines. Every LLM-powered search or Q&A system uses one.

Real-World Use

Semantic search over millions of documents, RAG knowledge bases, image similarity search, and recommendation systems.

What is a Vector Database?

A vector database stores embeddings (vectors) and lets you search by semantic similarity:

# Traditional database
SELECT * FROM docs WHERE text LIKE '%machine learning%'

# Vector database
SELECT * FROM docs ORDER BY similarity(embedding, query_embedding) LIMIT 5

The vector database finds items "close" to your query in meaning, not just keyword matches.

Chroma (Local, Open Source)

Best for development, small projects, and local apps.

import chromadb

client = chromadb.Client()
collection = client.create_collection("docs")

# Add documents
collection.add(
    documents=["Python is a programming language",
               "TensorFlow is a machine learning framework",
               "Paris is the capital of France"],
    ids=["1", "2", "3"]
)

# Search by meaning
results = collection.query(
    query_texts=["What language should I use for ML?"],
    n_results=2
)
print(results["documents"])

Pinecone (Cloud, Managed)

Best for production with large datasets (millions of vectors).

from pinecone import Pinecone

pc = Pinecone(api_key="your-api-key")
index = pc.Index("my-docs")

# Index vectors
index.upsert([
    ("id-1", [0.1, 0.2, ...], {"text": "Python doc content"}),
    ("id-2", [0.3, 0.1, ...], {"text": "TensorFlow doc content"}),
])

# Search
results = index.query(vector=[0.15, 0.18, ...], top_k=2)

Comparison

Feature Chroma Pinecone Weaviate
Hosting Local/embedded Cloud managed Self-hosted or cloud
Free tier Always free 1-index free Self-hosted free
Scalability Single machine Unlimited Cluster
Setup pip install chromadb API key Docker

Best Practices

Data Size Recommendation
< 100K docs Chroma
100K–10M docs Pinecone
> 10M docs Weaviate or Elasticsearch

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro