product
Read, listen, create: why your AI agent needs all three
Your agent already reasons. Give it three hands: read files, listen to recordings, create images. How image generation completes the Frenchie toolkit.
Your AI agent already has a brain. It can reason, plan, write code, argue with you about architecture decisions, and generate a plausible roadmap in under a minute. What it's missing is hands.
A brain without hands can describe what a bookshelf should look like, but it can't build one. For an agent to actually do work — the real work, the kind that ends in a shippable artifact — it needs capabilities that extend past pure reasoning. It needs to be able to read files it didn't write, listen to recordings it wasn't in, and create assets it can use downstream.
Those three verbs — read, listen, create — are what Frenchie gives your agent.
The shape of agent work
Watch an AI agent try to complete a real task end-to-end and you start to notice a pattern. The reasoning work — figuring out what to do, drafting text, writing code, proposing structure — is usually the easy part. Modern models are absurdly good at it.
The hard part is everything that isn't reasoning. The part where the agent has to:
- Read a scanned contract the user dropped into the chat.
- Listen to a thirty-minute client call that was recorded yesterday.
- Create a cover image for the blog post it just drafted.
Each of those is a capability gap. The agent understands the task. It just can't perform the underlying action — reading an image, decoding an audio stream, producing pixel data — because those aren't things an LLM does natively.
You can patch each gap individually. Wire in an OCR API. Host Whisper. Call out to an image-generation service. Handle the upload, the polling, the retries, the file management, the credit accounting. Each patch takes a few hours to build and a few more to maintain. Do that three times and you've built a tools layer. Do it right and you've built Frenchie.
Read (OCR)
Your agent can read any PDF or image. Scanned contracts, handwritten notes, invoices from 2003, research papers with tables and figures. One MCP call, clean Markdown back — with figures pulled out as separate PNG files so your agent can reference them directly instead of guessing at what they contain.
This isn't new to Frenchie — OCR was the first capability we shipped. But it set the shape for everything after: one narrow tool, one clean output, zero infrastructure for you to run.
Listen (transcription)
Your agent can listen to audio and video. Meeting recordings, sales calls, podcast episodes, voicemails. Long files get chunked automatically and reassembled into one continuous transcript. Async by default — your agent kicks off the job, keeps working, and collects the Markdown when it's ready. No blocking, no five-minute pauses in the conversation while a model chews through a long recording.
Most agents can't process audio at all without a tool like this. The model sees a file extension and politely declines. Frenchie turns that into a single transcribe_to_markdown call.
Create (image generation)
And now: your agent can create images. A product mockup to attach to a spec. A cover image for the blog post it just drafted. A concept sketch to include in a design review. Describe what you need in a prompt and your agent generates the image, saves it next to the work it's doing, and moves on.
This is the piece that was missing. An agent that can read a scanned contract and transcribe the related client call but still has to ask you to generate the accompanying visual is an agent that's stuck halfway through the work. Image generation closes that loop.
In Frenchie, it's the generate_image MCP tool. Same install flow, same flat pricing — 20 credits per image, $0.20. The generated file lands in your agent's workspace automatically. No new dashboard. No separate playground to manage. No new billing relationship.
Why focused file capabilities, not ten product lines
A question we get a lot: why stop at focused file work? Why not add schema extraction, video generation, music generation, RAG, translation, summarization, the whole menu?
Because an agent's LLM already does most of that. Summarization is a prompt. Translation is a prompt. Field extraction is a prompt over clean text — and Frenchie's job is to get the clean text to the agent in the first place. The reasoning layer is already solved.
What wasn't solved, before Frenchie, was the inputs and outputs that the reasoning layer needs. Reading scanned files. Listening to recordings. Extracting Office and spreadsheet files. Producing images. Those are the capability gaps that show up in real agent workflows, and those are the jobs Frenchie keeps focused on.
Read, listen, extract, create. One MCP-first file utility. One credit pool. That's the whole product shape.
What this looks like in your day
The practical version: you're in Claude Code, Cursor, Codex — whatever your agent lives in — and you're building something real. A blog post. A technical spec. A weekly update. A pitch deck.
You drop in a scanned PDF. Your agent reads it.
You attach a recording from yesterday's customer call. Your agent transcribes it.
You ask for a cover image to match the piece. Your agent generates it.
None of those steps require you to switch tools, open a second dashboard, or paste a URL from a different product. Your agent already has the capability. It just picks the right tool and runs.
That's the shape of agent work we've been trying to make feel normal. Read, listen, extract, create — all through one MCP interface, all behind one API key, all priced in the same credit pool. One hundred free credits on signup, once per email, to try the full file workflow.
Ready to see it? Install Frenchie in your agent with npx @lab94/frenchie install --api-key fr_your_key, and your agent gets OCR, transcription, Office/spreadsheet extraction, and image generation as native MCP tools the moment you restart. Read the docs for the full capability reference, or skip straight to the free tier.