Vector Databases Explained — Pinecone, Chroma, Weaviate
In this tutorial, you'll learn about Vector Databases Explained. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
What You'll Learn
Understand vector databases, how they store and search embeddings, and how to use Chroma (local) and Pinecone (cloud) for RAG.
Why It Matters
Vector databases are the backbone of RAG pipelines. Every LLM-powered search or Q&A system uses one.
Real-World Use
Semantic search over millions of documents, RAG knowledge bases, image similarity search, and recommendation systems.
What is a Vector Database?
A vector database stores embeddings (vectors) and lets you search by semantic similarity:
# Traditional database
SELECT * FROM docs WHERE text LIKE '%machine learning%'
# Vector database
SELECT * FROM docs ORDER BY similarity(embedding, query_embedding) LIMIT 5
The vector database finds items "close" to your query in meaning, not just keyword matches.
Chroma (Local, Open Source)
Best for development, small projects, and local apps.
import chromadb
client = chromadb.Client()
collection = client.create_collection("docs")
# Add documents
collection.add(
documents=["Python is a programming language",
"TensorFlow is a machine learning framework",
"Paris is the capital of France"],
ids=["1", "2", "3"]
)
# Search by meaning
results = collection.query(
query_texts=["What language should I use for ML?"],
n_results=2
)
print(results["documents"])
Pinecone (Cloud, Managed)
Best for production with large datasets (millions of vectors).
from pinecone import Pinecone
pc = Pinecone(api_key="your-api-key")
index = pc.Index("my-docs")
# Index vectors
index.upsert([
("id-1", [0.1, 0.2, ...], {"text": "Python doc content"}),
("id-2", [0.3, 0.1, ...], {"text": "TensorFlow doc content"}),
])
# Search
results = index.query(vector=[0.15, 0.18, ...], top_k=2)
Comparison
| Feature | Chroma | Pinecone | Weaviate |
|---|---|---|---|
| Hosting | Local/embedded | Cloud managed | Self-hosted or cloud |
| Free tier | Always free | 1-index free | Self-hosted free |
| Scalability | Single machine | Unlimited | Cluster |
| Setup | pip install chromadb |
API key | Docker |
Best Practices
| Data Size | Recommendation |
|---|---|
| < 100K docs | Chroma |
| 100K–10M docs | Pinecone |
| > 10M docs | Weaviate or Elasticsearch |
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro