Vector Databases Explained — Pinecone, Chroma, Weaviate

DodaTech 1 min read

In this tutorial, you'll learn about Vector Databases Explained. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

What You'll Learn

Understand vector databases, how they store and search embeddings, and how to use Chroma (local) and Pinecone (cloud) for RAG.

Why It Matters

Vector databases are the backbone of RAG pipelines. Every LLM-powered search or Q&A system uses one.

Real-World Use

Semantic search over millions of documents, RAG knowledge bases, image similarity search, and recommendation systems.

What is a Vector Database?

A vector database stores embeddings (vectors) and lets you search by semantic similarity:

# Traditional database
SELECT * FROM docs WHERE text LIKE '%machine learning%'

# Vector database
SELECT * FROM docs ORDER BY similarity(embedding, query_embedding) LIMIT 5

The vector database finds items "close" to your query in meaning, not just keyword matches.

Chroma (Local, Open Source)

Best for development, small projects, and local apps.

import chromadb

client = chromadb.Client()
collection = client.create_collection("docs")

# Add documents
collection.add(
    documents=["Python is a programming language",
               "TensorFlow is a machine learning framework",
               "Paris is the capital of France"],
    ids=["1", "2", "3"]
)

# Search by meaning
results = collection.query(
    query_texts=["What language should I use for ML?"],
    n_results=2
)
print(results["documents"])

Pinecone (Cloud, Managed)

Best for production with large datasets (millions of vectors).

from pinecone import Pinecone

pc = Pinecone(api_key="your-api-key")
index = pc.Index("my-docs")

# Index vectors
index.upsert([
    ("id-1", [0.1, 0.2, ...], {"text": "Python doc content"}),
    ("id-2", [0.3, 0.1, ...], {"text": "TensorFlow doc content"}),
])

# Search
results = index.query(vector=[0.15, 0.18, ...], top_k=2)

Comparison

Feature	Chroma	Pinecone	Weaviate
Hosting	Local/embedded	Cloud managed	Self-hosted or cloud
Free tier	Always free	1-index free	Self-hosted free
Scalability	Single machine	Unlimited	Cluster
Setup	`pip install chromadb`	API key	Docker

Best Practices

Data Size	Recommendation
< 100K docs	Chroma
100K–10M docs	Pinecone
> 10M docs	Weaviate or Elasticsearch

← Previous What are Embeddings? Vector Embeddings Explained Next → Image Classification with Python — Train a Model from Scratch

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Ai Ml