AI9 min read

Nemos MCP Server: Give Claude and ChatGPT Access to Your Screenshots and Notes

The Nemos MCP server lets Claude, ChatGPT, and any MCP client search and reason over the screenshots, notes, links, and PDFs you saved in Nemos.

June 6, 2026·By Taha Baalla

I capture everything in Nemos. Wifi passwords, recipes, half-read articles, voice memos, reels I swore I'd act on. The problem was never capture. It was retrieval months later, when I knew I'd saved *something* but couldn't surface it. With the Nemos MCP server, I stopped hunting. I just ask Claude.

What the Nemos MCP server actually is

The Nemos MCP server is a connector that exposes your Nemos library to AI agents through the Model Context Protocol — the open standard Anthropic introduced for plugging AI tools into data sources. Once connected, an agent like Claude can call Nemos tools to search and read what you captured, without you copying anything into the chat.

MCP is the reason this works at all. Anthropic describes it as a universal interface — a "USB port for AI" — so any MCP-aware app can talk to any MCP server without custom glue code. In December 2025 Anthropic donated the protocol to the Agentic AI Foundation under the Linux Foundation, with OpenAI and Block as co-founders. Translation: this isn't a Claude-only trick. The same Nemos connection works across MCP clients.

Nemos is the server. Your AI app is the client. The agent doesn't get a dump of your library — it makes scoped tool calls ("search for X", "read item Y") and gets back exactly what it asked for. That distinction matters for both privacy and accuracy: the model only ever sees the slices it requests, and it cites real items from your library instead of guessing.

Here's the mental model I use. Before MCP, getting an AI to use your data meant copy-pasting context into the chat box every single time, or wiring a custom integration for each tool — what Anthropic calls the M times N problem. MCP collapses that. Nemos implements the protocol once; your AI app implements it once; they connect. No bespoke plumbing, no pasting screenshots into a prompt and hoping the model reads them right.

Why "I can never find that screenshot" is a real problem

Screenshot search on iPhone is genuinely broken for most people, and it got worse, not better. After the iOS 18 Photos redesign, users flooded Apple's forums complaining they could hear the screenshot "click" but couldn't locate the image afterward, and that the new Media Types layout buried screenshots in a confusing subsection.

Apple's answer is a filter button that shows you *all* your screenshots in one grid. That's not search — that's a haystack with the hay sorted. If you took 40 screenshots last month, scrolling a grid to find the one with the door code is the same chore as before. You need to search by *what's inside the image*, not by file type.

That's the gap the Nemos MCP server fills. Nemos already runs OCR and on-device AI over what you capture, so the text inside a screenshot is searchable. Point an agent at it and "find the door code screenshot" becomes a sentence, not a scroll.

What you can ask it

Once the Nemos MCP server is connected, you talk to your library in plain language. The agent figures out which Nemos tools to call. Some prompts I actually use:

"Claude, find the wifi password screenshot from the Airbnb." It runs a semantic search, opens the matching screenshot, OCRs it, and reads you the password.
"Summarize every article I saved about sleep." It finds the saved links, extracts the article text, and gives you one combined summary instead of ten open tabs.
"Build a research brief from my saved reels." It pulls captions and transcripts from the reels you captured and assembles a structured brief.
"What did I save last week that I forgot about?" It grabs your recent items and surfaces the ones you haven't touched.
"Find everything related to my apartment move and make a knowledge map." It clusters related items by topic and draws the connections.
"Export my notes on the marketing launch to Obsidian." It packages the relevant items into Obsidian-ready markdown.

The pattern is the same every time: you describe the outcome, the agent does the digging across your own captured stuff. No folder spelunking.

Walk through the Airbnb one, because it shows the chain. The agent calls `search_nemos` with "wifi password Airbnb" and gets back the most likely items by meaning. It calls `analyze_image_or_screenshot` or `extract_ocr_from_image` on the top hit to read the text rendered in the picture. Then it answers with the actual password — pulled from a screenshot you took in March and never named. You never opened Photos. You never scrolled a grid. You asked a question and got the answer, sourced from your own library.

The Nemos MCP tools, and what each one does

The server exposes a focused toolkit. Your agent picks the right ones automatically, but it's worth seeing what's under the hood so you know what's possible.

Tool	What it does
`search_nemos`	Semantic search across your whole library — finds items by meaning, not just keywords
`get_recent_items`	Pulls your most recently captured screenshots, notes, links, and recordings
`get_item`	Reads a single item in full, with its metadata
`get_folder_items`	Lists everything inside a specific Nemos folder
`download_item_files`	Fetches the underlying files (images, PDFs, audio) for an item
`analyze_image_or_screenshot`	Has the agent visually describe and reason over a screenshot
`extract_ocr_from_image`	Pulls the text out of a screenshot or photo
`extract_pdf_text`	Extracts the full text of a saved PDF
`summarize_article`	Fetches and summarizes a saved web article
`summarize_pdf`	Summarizes a saved PDF document
`get_youtube_transcript`	Pulls the transcript of a saved YouTube video
`get_reel_transcript_or_caption`	Gets the caption or transcript of a saved reel
`build_context_pack`	Bundles related items into one context block the agent can reason over
`find_related_by_topic`	Finds items connected to a topic across your library
`generate_knowledge_map`	Draws the relationships between your captured items
`export_to_obsidian`	Packages items into Obsidian-ready markdown

Notice what this means: the agent can search, read pixels and PDFs, transcribe video, and assemble briefs — all over content *you* own, never a generic web crawl.

Two ways to run it: local stdio and remote HTTP

The Nemos MCP server runs in two modes, and the difference is mostly about where the connection lives. Both talk only to *your* private Nemos library — the one synced through your CloudKit account.

Local stdio server. This runs on your own machine and connects directly to your private library. Your AI app launches it as a local process and talks to it over standard input/output — nothing routes through a third-party host. This is the tightest setup: your data and the connector both sit on hardware you control.

Remote HTTP option. For clients that prefer a hosted connection (and for ChatGPT-style remote connectors), Nemos offers an HTTP server gated by a per-user token. Your token scopes access to your CloudKit data and nothing else. This mirrors how remote MCP works across the industry: scoped, revocable tokens rather than handing over a password. Your CloudKit data stays yours; the token is the only thing the agent holds.

Pick stdio if you want everything local. Pick the remote HTTP option if your AI app expects a URL-based connector. Either way, the trust model is the same one the rest of the MCP ecosystem settled on in 2026: scoped, revocable access at the connector level, not a master key. If you ever want to cut the agent off, you revoke the token or remove the connector — and it's gone.

How this compares to bolting a notes app onto Claude

People already wire Apple Notes or Evernote into Claude Desktop with community MCP servers, and that's genuinely useful for plain text. But those connectors mostly read *text records*. Your screenshots, the reel you saved, the PDF receipt — they're either invisible or just a filename.

Nemos was built capture-first for exactly that mixed media. The MCP tools don't stop at text: `analyze_image_or_screenshot` reasons over pixels, `get_reel_transcript_or_caption` reaches into video, `extract_pdf_text` cracks documents. So the agent isn't searching a notes table — it's searching your actual visual second brain, the messy real one where most of your useful stuff is an image.

That's the bet: the things you most often can't find again aren't typed notes. They're screenshots. Nemos indexes those, and the MCP server hands that index to your AI.

FAQ

Is my data private with the Nemos MCP server?

Yes. Your library lives in your private CloudKit account, and the MCP server only ever talks to *your* data. The local stdio mode runs entirely on your machine. The remote HTTP option uses a per-user token scoped to your CloudKit data — revocable, and the only credential the agent ever holds. Nothing is shared into a public pool.

Which AI apps work with it?

Any MCP client. The Model Context Protocol is an open standard now stewarded by the Linux Foundation's Agentic AI Foundation, with adoption across Anthropic, OpenAI, Google, and Microsoft. Claude Desktop is the most common starting point, but the same Nemos server works with any app that speaks MCP — the whole point of the protocol is that you connect once.

Does it work with ChatGPT?

ChatGPT added custom MCP connector support through its Apps and Connectors settings, using remote MCP servers with OAuth-style scoped access. The Nemos remote HTTP option with a per-user token is built for exactly that style of connection, so you can point ChatGPT's connector at your Nemos library the same way you would Claude.

Do I need to be a developer?

No. You don't write code or build the server yourself. Nemos is bringing the MCP server to users as a guided setup — you connect your AI app to your Nemos library and start asking questions. If you can install a Claude Desktop extension or add a ChatGPT connector, you can run this. Join the waitlist to get it as it rolls out.