How to Build an Open Brain From Scratch
A two-hour walkthrough: Supabase Postgres with pgvector, an MCP server, and a minimal ingestion script. Under ten dollars a month in infrastructure.
Prerequisites
Building an open brain requires a specific stack of infrastructure and API access to handle vector storage and model context. The core requirements include a Supabase account; the free tier is sufficient for prototyping and initial deployment.
Technical Requirements
- Runtime: Python 3.11+ or Node.js for server-side logic.
- Embeddings API: An OpenAI API key for
text-embedding-3-smallor an open-source alternative like Nomic Embed via LM Studio. - AI Client: A Model Context Protocol (MCP) compatible interface, specifically Claude Desktop, Cursor, or Windsurf.
- Database: PostgreSQL with the pgvector extension enabled.
Estimated setup time varies by technical proficiency. A first-time builder should allocate 90-120 minutes to configure the environment and API keys. Engineers familiar with Postgres and Python can typically complete the installation in approximately 30 minutes.
Step 1: Supabase + pgvector
The foundation of an open brain is a vector-enabled database capable of performing similarity searches. Supabase provides a managed PostgreSQL instance that supports the pgvector extension, allowing for the storage and querying of high-dimensional embeddings.
Database Initialization
First, create a new project in the Supabase dashboard. Navigate to the SQL Editor and execute the following command to enable vector support:
CREATE EXTENSION IF NOT EXISTS vector;
Schema Design
To store long-term memories, implement a table that pairs raw text with its corresponding vector representation. For OpenAI embeddings, the dimension is set to 1536.
CREATE TABLE memories (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
content TEXT,
embedding VECTOR(1536),
metadata JSONB,
created_at TIMESTAMPTZ DEFAULT NOW()
);
Optimization
As the memory store grows, linear scanning becomes inefficient. To maintain low latency during retrieval, create an IVFFlat index using cosine distance operators:
CREATE INDEX ON memories USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
This indexing strategy is critical for those learning how to build an open brain, as it ensures the AI can retrieve relevant context in milliseconds regardless of database size.
Step 2: Embedding Pipeline
The embedding pipeline converts unstructured text into numerical vectors that represent semantic meaning. This process allows the system to find "related" concepts even if exact keywords do not match.
Implementation Script
Using the openai and supabase-py libraries, a script can be written to automate the ingestion of memories into the vector store.
import openai
from supabase import create_client
url = "your-supabase-url"
anon_key = "your-anon-key"
supabase = create_client(url, anon_key)
openai.api_key = "your-openai-key"
def add_memory(text, meta={}):
# Generate embedding using OpenAI's efficient small model
response = openai.Embedding.create(
input=text,
model="text-embedding-3-small"
)
vector = response['data'][0]['embedding']
# Insert into Supabase memories table
supabase.table("memories").insert({
"content": text,
"embedding": vector,
"metadata": meta
}).execute()
add_memory("User prefers technical documentation over tutorials", {"category": "pref"})
Cost Analysis
Operating this pipeline is highly cost-effective. The text-embedding-3-small model costs approximately $0.02 per 1 million tokens. For personal knowledge bases or small-scale agents, the monthly expenditure is negligible, often falling within free credit tiers.
Step 3: MCP Server
The Model Context Protocol (MCP), introduced by Anthropic in 2024, standardizes how AI clients interact with external data. By implementing an MCP server, the AI can autonomously decide when to query the vector database to retrieve context.
Developing the Search Tool
The server acts as a bridge between the LLM and Supabase. It exposes a tool that takes a natural language string, converts it to a vector, and performs a cosine similarity search via a PostgreSQL RPC call or direct query.
from mcp.server import Server
import openai
from supabase import create_client
app = Server("open-brain-server")
supabase = create_client("url", "key")
@app.tool()
async def search_memories(query: str):
"""Search the open brain for relevant memories."""
# Embed query text
emb = openai.Embedding.create(input=query, model="text-embedding-3-small")['data'][0]['embedding']
# Vector similarity search using pgvector's distance operator (<=>)
res = supabase.rpc('match_memories', {'query_embedding': emb, 'match_threshold': 0.5, 'match_count': 5})
return [item['content'] for item in res.data]
if __name__ == "__main__":
app.run()
Protocol Standards
This architecture follows the MCP specification detailed at modelcontextprotocol.io. By decoupling the data retrieval logic from the LLM, users can swap models (e.g., moving from Claude 3.5 Sonnet to a local Llama model) without rewriting the memory infrastructure. This modularity is essential for anyone learning how to build an open brain that remains future-proof.
Step 4: Wire It In
The final stage involves registering the MCP server with a compatible client. For Claude Desktop, this requires editing the claude_desktop_config.json file to include the server's executable path and environment variables.
Configuration and Testing
Once configured, restart the AI client. Test the integration by prompting: "What are my technical preferences recorded in my open brain?" The client will trigger the search_memories tool, fetch the top-5 results from Supabase, and synthesize a response based on retrieved facts.
Scaling Considerations
| Dataset Size | Performance Note | Optimization Required |
|---|---|---|
| < 10k rows | Sub-millisecond latency | None (Linear scan) |
| 10k - 1M rows | Fast retrieval | IVFFlat or HNSW Index |
| > 1M rows | Possible latency spikes | Partitioning / PgBouncer |
On a standard Supabase free tier, the system can handle roughly 100 queries per second with 50k rows before requiring advanced index tuning or connection pooling.