I Built a Personal AI Agent That Searches My Second Brain. Here's How It Works.
A personal AI agent is an LLM given tools and access to your own data. Here's the architecture I used to make one search my second brain and actually act.
I kept hitting the same wall with chatbots. I'd ask ChatGPT to plan a trip and it would invent restaurants I'd never heard of, while the actual list of places I wanted to go sat in a folder of screenshots on my phone. The model was smart. It just had no idea who I was or what I'd saved. So I built a personal AI agent that searches my second brain first, then answers. This is how it works, in plain terms, and why I think the second brain is the missing piece almost everyone forgets.
What is a personal AI agent, exactly?
A personal AI agent is an LLM given two things a chatbot lacks: tools it can call, and access to your own data. That combination lets it act, not just talk. Instead of answering from generic training, it can search your files, read a specific note, and chain those steps to finish a real task.
The distinction matters. A chatbot responds to whatever's in the prompt box. An agent decides *when to retrieve, which source to query, and whether the result is good enough* before answering, the pattern Neo4j and others now call agentic RAG. As one industry breakdown puts it, an agent "is only as good as the data it can access" — clean, structured personal data is the fuel. A model with no access to your life is a very articulate stranger. A model with tools pointed at your data is something closer to a chief of staff.
The 2026 framing I find clearest: agents operate on a write path. They execute multi-step workflows, make decisions, and reach into outside systems, where a chatbot stays on a read-only path and just talks. Most explainers, like this chatbots-to-agents rundown, draw the same line: agents can plan, connect tools, and produce an output you can act on. For a personal agent, the "outside system" it reaches into is simply *you* — your saved life. That's the whole trick, and it's why the data source matters more than the model you pick.
Why a second brain is the perfect data source
Most personal-agent projects fail at the data layer, not the model. A second brain solves that, because it's already a curated, deduplicated record of what *you* found worth keeping — screenshots, voice notes, saved links, PDFs. The agent doesn't have to guess what matters to you. You already told it, every time you hit save.
This is the unglamorous truth behind "AI that knows me." Karpathy's recent LLM Wiki idea makes the same bet: instead of retrieving from scratch every time, you build a persistent, compounding knowledge base about yourself. A second brain *is* that base — minus the setup. The hard part of personal RAG, as builders on Medium documenting offline personal RAG keep finding, is getting your scattered life into one searchable place. If you've been capturing into a real second brain, that work is already done.
The architecture in three layers
The whole system is simpler than the hype suggests. It's three layers: a capture layer (your stuff), a connection layer (tools the agent can call), and a client (the LLM that does the calling). Here's the flow.
| Layer | What it is | What it does |
|---|---|---|
| Capture | Your second brain: screenshots, notes, links, PDFs, voice notes | Holds the data, already indexed and OCR'd |
| Connect | An MCP server exposing tools like `search_nemos`, `find_related_by_topic`, `build_context_pack` | Lets an AI call your data safely, one query at a time |
| Ask | An AI client (Claude, ChatGPT) | Decides which tools to call, reads results, composes the answer |
The middle layer is the one people miss. The Model Context Protocol is an open standard that lets an AI client "ping a server linked to your note app, grab the text, and provide information," as one PKM-and-MCP guide describes it. The server hands over only what each query asks for — it never dumps your whole library into the model. If MCP is new to you, start with what an MCP server is.
Why a protocol instead of a bespoke plugin? Because it makes the agent portable. The same MCP server works whether I'm asking from Claude today or some other client next year — I write the tools once, and any compliant AI can call them. The ecosystem proves the pattern out: Obsidian alone has dozens of community MCP servers, and Notion ships an official one, so connecting an LLM to your notes is no longer a research project. The capture app exposes tools; the client does the reasoning. Clean separation, and you can swap either side without rebuilding the other.
How the tools actually work in Némos
I use Némos as my capture layer, and it ships an MCP server, so this is my real setup rather than a toy demo. The agent never sees my raw library. It sees a small menu of tools and picks the right one for the question.
- `search_nemos` — semantic search across everything I've saved. "Find what I saved about cold plunges" returns the screenshots and links, not a hallucinated answer.
- `find_related_by_topic` — given one item, surfaces neighbors I never manually linked. This is where forgotten threads resurface.
- `build_context_pack` — bundles the most relevant items into one clean context blob the LLM can reason over without me copy-pasting.
When I ask Claude a question, it calls one or two of these, reads what comes back, and answers grounded in my actual data. Because retrieval runs over text — even screenshots get described and OCR'd at capture time, the same indexing-time approach kapa.ai uses to index images for RAG — a photo of a menu is just as searchable as a note. The agent decides what to pull; I don't babysit it.
Three things I actually use it for
Architecture is abstract until it does your chores. Here are the three workflows that earn the setup. Each is a multi-step task the agent finishes by chaining tool calls — the definition of an agentic workflow, not a single lookup.
Trip planning from saved screenshots
I screenshot restaurants, hotels, and "must-do" posts for months before a trip, then forget all of it. Now I ask: "Build me a 3-day Lisbon plan from what I've saved." The agent runs `search_nemos` for Lisbon, pulls the screenshots, reads the captions, and drafts an itinerary from places *I* chose. Zero invented restaurants.
Weekly review
Every Sunday I ask the agent to summarize what I captured that week and flag loose ends. It calls a recent-items tool, clusters by topic, and hands me a synthesis — the "creative synthesis" step that makes personal RAG useful for learning. It catches the article I meant to read and the reminder I half-set.
Research synthesis
When I'm chasing an idea across a dozen saved sources, `build_context_pack` collapses them into one brief. The agent reads the pack and writes a grounded summary with the threads connected — work I'd never do by hand because the sources live in different folders.
What this isn't (and the honest limits)
This is not magic, and it's not a model fine-tuned on you. It's retrieval plus tools, which is exactly why it's reliable: the agent cites your real items instead of inventing them. The trade-off is that it can only know what you've captured. Sparse data, sparse answers. The agent is a multiplier on your capture habit, not a replacement for it.
The other honest limit is privacy. Pointing an agent at your whole life means the connection layer matters. I keep capture and indexing on-device where I can, and only the specific items a query needs ever reach the model — never the full library. An agent that reads your life should leak as little of it as possible. That's a design constraint, not an afterthought.
FAQ
What is the difference between a personal AI agent and a chatbot?
A chatbot answers from training data and whatever you paste into the prompt. A personal AI agent has tools and access to your own data, so it can search your notes, read a specific item, and chain those steps to finish a task. The agent decides what to retrieve; the chatbot can't retrieve at all.
Do I need to code to build an AI agent for my notes?
No. If your notes app ships an MCP server, you connect it to an AI client like Claude in a few clicks and the agent gets its tools instantly. Building one from scratch needs code, but using a packaged one — like the Némos MCP server — does not. The hard part is good captured data, not wiring.
Is personal RAG safe for private data?
It can be, if the connection layer is built right. A well-designed MCP server returns only the items a query needs, never your whole library, and on-device capture keeps indexing local. The risk is sending everything to a cloud model by default. Choose a setup that retrieves narrowly and keeps source data on your device.
Why is a second brain better than a folder of files for an agent?
A second brain is already curated, deduplicated, OCR'd, and semantically indexed, so the agent's search tools return relevant results instead of raw filenames. A loose folder forces the model to guess at structure. The capture layer doing that work upfront is what makes the agent's answers grounded and fast.
Related Reading
Try Nemos free — Get Nemos on the App Store
Taha built Némos after years of losing screenshots and voice memos across a dozen apps. He writes about on-device AI, personal knowledge management, and building privacy-first tools for iPhone.
@nemosapp
Stop losing things you save.
Némos remembers every screenshot, voice memo, link, and note — and surfaces them when you need them. Free, private, on-device AI.
No credit card · iOS launch Q3 2026 · We'll email you when it's live