# Frenchie — full site content > Your agent's best friend. An MCP-first capability layer for AI agents: OCR for PDFs and images, transcription for audio and video, and image generation from text prompts. This document aggregates the entire Frenchie site into a single Markdown file for AI ingestion. Generated dynamically at request time from the same source of truth as the human-facing pages. Canonical site: https://www.getfrenchie.dev Organization: LAB94 Co., Ltd. (https://lab94.io) Contact: support@getfrenchie.dev Package: https://www.npmjs.com/package/@lab94/frenchie --- ## Overview Frenchie is an MCP server that gives AI agents three capabilities: 1. OCR for PDFs and images (returns Markdown) 2. Transcription for audio and video (returns Markdown) 3. Image generation from text prompts (returns PNG / JPEG / WebP via gpt-image-2) It is NOT a platform, NOT a workflow builder, NOT a document management system. It is a narrow, purpose-built capability layer — your agent calls `ocr_to_markdown`, `transcribe_to_markdown`, or `generate_image` through MCP and gets back exactly what it needs. Primary transport: stdio MCP (`npx @lab94/frenchie install --api-key fr_...`). Stdio mode auto-saves results to `.frenchie//result.md` for markdown capabilities and `.frenchie//generated.` for image generation — no second download step. Fallback transport: HTTP at `mcp.getfrenchie.dev` for hosted agents that can't spawn local binaries. HTTP mode returns inline markdown for OCR / transcription, and a 30-minute presigned URL for generated images. Tier-A auto-install supports Claude Code, Cursor, Codex, Antigravity, and Claude Desktop. Any other MCP-compatible client can be wired up manually with the same skill pack. Shared result contract (spec section 6.1): every capability response wraps its typed payload in a `result` envelope — `{ status, jobId, creditsUsed, resultExpiresAt, result: { kind: "markdown" | "image", ... } }`. Agents branch on `result.kind`. No top-level `markdown`, `imageUrl`, or `savedTo` — read them off `result`. Files and generated images are processed and deleted. Results expire 30 minutes after first delivery. No training on customer data, no long-term retention. --- ## Pricing Pay-as-you-go. No subscriptions. - $1 = 100 credits - 1 credit per OCR page ($0.01) - 2 credits per transcription minute ($0.02) - 20 credits per generated image ($0.20) - 100 free credits on signup — once per email, no card required - 2 GB max file size per job - Credits don't expire — refunded automatically on job failure That $1 = 100 credits translates to 100 OCR pages, 50 minutes of transcription, or 5 generated images. ### Image generation limits - 50 images / hour per user - 250 images / day per user (shares the 5,000 credit / day global cap) - v1 generates exactly one image per call — call the tool again for variants - Supported output formats: PNG (default), JPEG, WebP. Transparent background rejected with JPEG. - Supported sizes: 1024x1024, 1536x1024, 1024x1536, or auto. - Quality: low / medium / high / auto. Background: transparent / opaque / auto. ### Usage examples - OCR a scanned book (400 pages) = $4.00 - OCR a week of invoices (50 files × 10 pages) = $5.00 - Transcribe a podcast episode (60 min) = $1.20 - Transcribe a sales call (45 min) = $0.90 - Daily standups for a month (20 × 30 min) = $12.00 - Parse a research paper with figures (15 pages) = $0.15 - Generate 5 marketing images for a blog post = $1.00 - Generate 20 product mockups = $4.00 ### Billing FAQ **Q: How do I pay?** Top up from the dashboard with any major card — Visa, Mastercard, Amex. Minimum $1. You only pay for what you use. **Q: Do I get a receipt or invoice?** Yes. Every top-up generates a receipt emailed to your account address. The dashboard keeps a full history you can export any time. **Q: Do credits expire?** No. Top up once, use whenever. A credit you bought today still works a year from now. **Q: How much does each capability cost?** One credit per OCR page ($0.01), 2 credits per transcription minute ($0.02), and 20 credits per generated image ($0.20). Same credit pool across all three — your AI agent can mix OCR, transcription, and image generation however it needs, billed from the same balance. **Q: How much does image generation cost on Frenchie?** 20 credits per image, which is $0.20 per generated image at the flat $1 = 100 credits rate. No model-tier pricing, no quality surcharges — whatever prompt your agent sends, it's 20 credits. **Q: What happens if a job fails?** Failed jobs don't bill. If an OCR, transcription, or image generation job fails partway through, the credits are refunded to your balance automatically — no support ticket needed. **Q: Can I refund unused credits?** Credits are non-refundable once purchased. That's why we ship 100 free credits on signup — try before you top up. See /terms for the full policy. **Q: Processing thousands of pages a month?** Email support@getfrenchie.dev with a rough volume estimate. We'll figure out the right setup for your workload — we're small enough to actually reply. **Q: Is there a free tier?** Yes. 100 credits on your first signup, once per email, no card required. That's 100 OCR pages, 50 transcription minutes, or 5 generated images — mix and match however you want. --- ## General FAQ **Q: What can my agent do with Frenchie?** Three things: OCR (read PDFs and images → Markdown), transcription (audio and video → Markdown), and image generation (text prompts → image files). All through one MCP interface. One install, three capabilities. **Q: What file types do you support?** PDF, PNG, JPG, WebP for OCR. MP3, M4A, WAV, MP4, MOV, WEBM for transcription. Text prompts for image generation. If it's a file or a prompt, we probably handle it. **Q: How good is the OCR?** Good enough to read scanned contracts with tables, handwritten notes, and that one PDF your client exported from a fax machine in 2003. Seriously — try the free credits and compare. **Q: How much does image generation cost?** 20 credits per generated image. Since $1 = 100 credits, that's $0.20 per image. Flat rate, no model math, no quality tier pricing. Your AI agent calls the generate_image MCP tool and the image lands in your working directory. **Q: What image sizes and formats does generate_image return?** PNG, JPEG, or WebP files at the size your agent requests — 1024×1024, 1536×1024, 1024×1536, or auto. The file is auto-saved next to your work in stdio mode, or returned as a short-lived presigned URL for hosted agents. No watermark, no attribution requirement. **Q: Can I pick the image model on Frenchie?** No. Frenchie ships one curated production image model and handles upgrades behind the scenes. If you need to pick between many models (FLUX, SDXL, SD3, niche fine-tunes), a platform like Replicate or Fal.ai is the better fit — we compare them honestly on /compare. **Q: Can my agent read PDFs natively? Why do I need this?** It can — but scanned PDFs come back empty, and large ones eat your tokens alive. Frenchie parses outside your agent and returns clean text. Fewer tokens, better results. **Q: What happens to my files and generated images?** Processed and deleted. We don't store your files or generated images after delivery. The result payload expires 30 minutes after first retrieval. **Q: Is there a free tier?** 100 credits on your first signup, once per email, no credit card required. That's 100 pages of OCR, 50 minutes of transcription, or 5 generated images — mix and match however you want. Enough to know if it works for you. **Q: I can build this myself.** You absolutely can. If you're parsing more than 5,000 pages a month or generating hundreds of images a day, you probably should. For everyone else, that's days of setup and ongoing maintenance for something that costs a dollar. **Q: What agents does it work with?** Any MCP client that can spawn a local stdio server. Tier-A auto-install supports Claude Code, Cursor, Codex, Antigravity, and Claude Desktop. Other MCP clients can be wired up manually with the same skill pack. Hosted/web agents (Lovable, Manus, Claude.ai, ChatGPT.com) connect to https://mcp.getfrenchie.dev over HTTP instead. **Q: What's coming next?** We keep the public promise tight: OCR for PDFs and images, transcription for audio and video, image generation from text prompts — all delivered over MCP. When we ship something new, it lands in the changelog first. **Q: Does Frenchie work with non-English content?** Yes. OCR, transcription, and image generation all handle Thai, Japanese, Chinese, Arabic, French, Spanish, and 50+ other languages out of the box. Mixed-language documents and multilingual prompts work too. No flags, no config — just send it. **Q: How does Frenchie compare to Marker, LlamaParse, Whisper, or Replicate?** They're all good tools. Marker and LlamaParse are libraries — you run them yourself. Whisper is a model — you wire it up yourself. Replicate is a model hub — you pick and glue the models. Frenchie is an MCP server — your agent calls it directly for OCR, transcription, and image generation, no glue code, no ops. If you're running heavy volume, DIY wins on cost. If you want it to just work inside Claude Code or Cursor tomorrow, that's us. **Q: What's the maximum file size?** 2 GB per file. For PDFs that's thousands of pages. For audio and video that covers most recordings. Image generation has no file-size limit — it's prompt-in, image-out. If you need more, email support@getfrenchie.dev — we'll figure it out. **Q: Is Frenchie open source? Can I self-host?** No on both. Frenchie is a managed service — you send files or prompts, we return Markdown or images. No infrastructure to run, no models to host. The MCP integration code on your side is fully in your hands; the extraction and generation pipelines stay with us. **Q: How long does a job take?** OCR is usually a few seconds per page. Transcription runs around a tenth of the audio length — a 30-minute meeting lands in about three minutes. Image generation is typically 20–60 seconds per image. Larger jobs process async: your agent kicks off the job, keeps working, and collects the result when it's ready. --- ## Agent integrations ### Frenchie in Claude Code Slash commands work out of the box. Results auto-save. **Install:** `npx @lab94/frenchie install --agent claude --api-key fr_your_key_here` **Primary invocation:** `/ocr TOR.pdf` **Alternates:** /transcribe meeting.mp3, /frenchie-status ### Frenchie in Cursor Natural language works. One setting toggle on first install. **Install:** `npx @lab94/frenchie install --agent cursor --api-key fr_your_key_here` **Primary invocation:** `Use Frenchie to OCR TOR.pdf` **Alternates:** Use Frenchie to transcribe meeting.mp3 ### Frenchie in Codex (Desktop, CLI, IDE) @frenchie, /frenchie, or natural language — all three work. **Install:** `npx @lab94/frenchie install --agent codex --api-key fr_your_key_here` **Primary invocation:** `/frenchie TOR.pdf` **Alternates:** @frenchie ocr TOR.pdf, OCR TOR.pdf with Frenchie ### Frenchie in Google Antigravity Invoke MCP servers by name, not by skill name. **Install:** `npx @lab94/frenchie install --agent antigravity --global --api-key fr_your_key_here` **Primary invocation:** `/frenchie TOR.pdf` **Alternates:** (none) ### Frenchie in VS Code (GitHub Copilot) Slash commands work once MCP is registered. **Install:** `npx @lab94/frenchie install --agent vscode --api-key fr_your_key_here` **Primary invocation:** `/frenchie TOR.pdf` **Alternates:** (none) ### Frenchie in Claude Desktop Natural language. Requires --global install. **Install:** `npx @lab94/frenchie install --agent claude-desktop --global --api-key fr_your_key_here` **Primary invocation:** `Use Frenchie to OCR TOR.pdf` **Alternates:** (none) ### Frenchie in Windsurf Natural language. User-level install. **Install:** `npx @lab94/frenchie install --agent windsurf --global --api-key fr_your_key_here` **Primary invocation:** `OCR TOR.pdf via Frenchie` **Alternates:** (none) ### Frenchie in Gemini CLI Natural language. Terminal-launched, no PATH gap. **Install:** `npx @lab94/frenchie install --agent gemini --api-key fr_your_key_here` **Primary invocation:** `OCR TOR.pdf with Frenchie` **Alternates:** (none) ### Frenchie in Zed Assistant panel, user-level install. **Install:** `npx @lab94/frenchie install --agent zed --global --api-key fr_your_key_here` **Primary invocation:** `OCR TOR.pdf via Frenchie` **Alternates:** (none) --- ## Comparisons ### Frenchie vs Marker Marker is an open-source PDF-to-Markdown library you install, run, and operate. Frenchie is an MCP server your agent invokes directly — no glue code, no deploy pipeline, no GPU capacity to plan for. If you're building a batch ingestion pipeline you want to own end to end, Marker is a solid pick. If you want an agent to parse PDFs in the middle of a conversation, that's us. **Side-by-side:** - **Integration surface** — Marker: Python library, called from your own code / Frenchie: MCP server, called from any MCP client (Claude Code, Cursor, Codex, Antigravity, Claude Desktop) - **Hosting model** — Marker: Self-hosted — you run it on your hardware / Frenchie: Fully managed — send file, receive Markdown - **Primary use case** — Marker: Batch pipelines, research corpora, ETL jobs / Frenchie: Agent-driven OCR inside live conversations - **Ops burden** — Marker: Dependencies, GPU/CPU capacity, queues, retries, monitoring / Frenchie: Zero infrastructure — ship an API key and go **When Marker is the right call:** - You're processing tens of thousands of pages a month and per-page cost dominates your budget. - You want the output format, model weights, and pre/post-processing fully under your control. - Your workload is batch-first — nightly jobs, research indexing, warehouse ETL — and fits inside a script you already maintain. **When Frenchie is the right call:** - Your agent needs to read a PDF mid-conversation and you don't want to wire a library into the agent process. - You don't want to own OCR ops — dependency updates, infrastructure, scaling, retries. - You're shipping an MCP-based product and want OCR to feel like a native tool call. **Can you use both?** Plenty of teams run both. Marker handles the scheduled overnight batch that feeds the warehouse. Frenchie handles the one-off PDFs your agent hits during a user session — where latency matters more than cost per page. Credits don't expire, so Frenchie sits idle quietly while the batch pipeline does its thing. **Common questions:** **Q: Is Frenchie cheaper than running Marker?** At small-to-medium volume, Frenchie is cheaper once you count your time — you don't run infrastructure. At high volume (thousands of pages a day, predictable load), a well-tuned Marker deployment wins on unit economics. **Q: Can I switch from Marker to Frenchie?** Yes. Frenchie has no lock-in on your side — you call an MCP tool, you get Markdown. If you later want to move off, your code just calls a different tool. We don't hold onto files or training data. **Q: Does Frenchie give me the same structured output as Marker?** Clean Markdown, preserved table structure, extracted figures as PNGs, and page breaks. For most downstream workflows — agents, RAG indexes, human review — the two are interchangeable on output shape. **Q: Why not just use Marker?** Run it if your workflow is batch and you want full control. Call Frenchie if your workflow is agent-driven and you want zero ops. Those are different shapes — pick the one that matches yours. ### Frenchie vs LlamaParse LlamaParse is a commercial PDF parsing API built around LlamaIndex workflows — if you're already invested in that framework, it slots in tight. Frenchie is built around MCP — native in Claude Code, Cursor, Codex, and every other MCP client. Both are managed services with pay-as-you-go pricing. The choice comes down to where your agent lives. **Side-by-side:** - **Integration surface** — LlamaParse: REST API, native in LlamaIndex loaders / Frenchie: MCP tool — appears as a callable function inside any MCP client - **Ecosystem fit** — LlamaParse: LlamaIndex pipelines and retrievers / Frenchie: MCP-compatible agents (Claude Code, Cursor, Codex, etc.) - **Pricing model** — LlamaParse: Tiered per-page (accuracy tier affects price) / Frenchie: Flat $1 = 100 credits, 1 credit per page - **Output** — LlamaParse: Markdown, JSON, with optional structured extraction prompts / Frenchie: Clean Markdown + figures, metadata response for stdio **When LlamaParse is the right call:** - You're building on LlamaIndex and want parsing, indexing, and retrieval in one ecosystem. - You need the tuneable accuracy tiers LlamaParse exposes for specific document types. - Your extraction uses structured-output prompts that fit LlamaParse's pipeline idioms. **When Frenchie is the right call:** - Your agent runs in Claude Code, Cursor, or any MCP client and you want OCR as a native tool call. - You prefer flat pricing you can reason about per job — no accuracy tier math. - You want the extraction step separated from the framework — no LlamaIndex lock-in. **Can you use both?** If you're building a product with a complex RAG backend, LlamaParse can do the heavy document parsing for your warehouse while Frenchie handles the agent-facing tool calls. Different layers of the same stack — document pipeline for offline, agent tool for online. **Common questions:** **Q: Is Frenchie cheaper than LlamaParse?** Frenchie is flat: $0.01 per OCR page. LlamaParse varies by accuracy tier. At base tier, pricing is in the same ballpark. At higher tiers, LlamaParse is more expensive per page but offers extraction features Frenchie doesn't. **Q: Can I use Frenchie inside a LlamaIndex app?** Yes — call Frenchie via its MCP tool or HTTP endpoint, feed the Markdown into your existing LlamaIndex indexer. No custom loader required. **Q: Does Frenchie do structured extraction like LlamaParse?** No. Frenchie returns clean Markdown. Structured extraction — pulling specific fields, applying schemas — is something your agent handles after reading the Markdown, using the LLM it already has. **Q: Why not just use LlamaParse?** Use it if your stack is LlamaIndex and you want parsing + retrieval coupled. Use Frenchie if your stack is MCP and you want parsing as a standalone tool call. ### Frenchie vs Whisper Whisper is an open-source transcription model you install, run, and scale yourself. Frenchie is a managed MCP server for agent workflows. Whisper fits if you have GPU capacity and want tight control over the pipeline. Frenchie fits if you want transcription delivered into your agent conversation without owning the ops. **Side-by-side:** - **Integration surface** — Whisper: Python library, Hugging Face, or your own inference server / Frenchie: MCP tool callable from any MCP client - **Hosting model** — Whisper: Self-hosted — your GPUs, your containers, your monitoring / Frenchie: Fully managed — send file, receive Markdown - **Primary use case** — Whisper: High-volume batch transcription, custom fine-tuning, offline workflows / Frenchie: Agent-driven transcription during live sessions - **Ops burden** — Whisper: GPU capacity planning, model weights, chunking long audio, retries / Frenchie: Zero infrastructure — one API key **When Whisper is the right call:** - You have GPU capacity in place and transcription volume justifies the ops investment. - You need to fine-tune the model on domain-specific audio or vocabulary. - Your workload is batch — overnight, warehouse-bound — and fits inside a job scheduler you already run. **When Frenchie is the right call:** - Your agent needs to transcribe a recording in a conversation and you don't want to run a GPU fleet. - You want async job handling (chunking, retries, scaling) delivered as a service, not engineered in-house. - You're building MCP-native and want transcription to feel like a native tool call. **Can you use both?** A fine-tuned Whisper running on your cluster can handle the predictable batch — call center archives, podcast backlogs, daily meeting batches. Frenchie handles the ad-hoc stuff: an audio file dropped into a Claude Code conversation, a one-off voicemail your agent needs to read. Different shapes, same underlying job. **Common questions:** **Q: Is Frenchie cheaper than running Whisper?** At low volume, Frenchie is cheaper — you don't run hardware. At high sustained volume, a well-tuned self-hosted Whisper wins on unit cost but requires ops investment. **Q: Can I switch from Whisper to Frenchie?** Yes. You call an MCP tool or HTTP endpoint, you get Markdown. No lock-in on our side — files are deleted after delivery, no training data kept. **Q: Does Frenchie handle long audio the way Whisper does?** Yes. Frenchie chunks long audio automatically, transcribes in parallel, merges the result — async so your agent doesn't block while the job runs. Max 2 GB per file. **Q: Why not just use Whisper?** Run it if you have GPUs and ops appetite. Call Frenchie if you want transcription as a zero-ops tool your agent can invoke directly. ### Frenchie vs AssemblyAI AssemblyAI is a transcription API with heavy audio-intelligence features — speaker diarization, sentiment analysis, content moderation, topic detection, summarization. If your product surface needs those, use AssemblyAI. Frenchie keeps the scope narrow on purpose: send audio or video, get clean Markdown back. Nothing else. **Side-by-side:** - **Feature scope** — AssemblyAI: Transcription plus diarization, sentiment, moderation, topics, summarization / Frenchie: Transcription to Markdown. Full stop. - **Integration surface** — AssemblyAI: REST API, SDKs for most languages / Frenchie: MCP tool native in Claude Code, Cursor, Codex, and every MCP client - **Pricing model** — AssemblyAI: Per-second tiered pricing plus add-on fees for audio intelligence features / Frenchie: Flat 2 credits per minute = $0.02/min - **Ideal surface** — AssemblyAI: Consumer-facing audio products, podcast apps, call centers / Frenchie: Agent workflows — Markdown in, Markdown out **When AssemblyAI is the right call:** - Your product surface shows speakers, sentiment, or content flags to end users — AssemblyAI's audio intelligence is a finished component. - You need PII redaction, auto-chapters, or topic detection built in rather than layered on. - You're building a consumer app where transcription is the product, not a step inside a larger agent workflow. **When Frenchie is the right call:** - You want transcription delivered as an MCP tool your agent can call — no SDK, no glue code. - Your agent does its own summarization, sentiment analysis, or extraction on top of clean text — you don't need the extras baked in. - You prefer simple flat pricing — no add-on feature fees to track. **Can you use both?** Reasonable combination: AssemblyAI for the consumer-facing product where diarization and sentiment feed directly into the UI, Frenchie for the internal agent workflow where Markdown is the right shape. Credit balance is separate from your AssemblyAI spend, so you can keep the billing clean. **Common questions:** **Q: Does Frenchie have speaker diarization?** No. Frenchie delivers a single Markdown transcript. If your product needs labeled speakers, AssemblyAI or a similar audio-intelligence API is the right tool. **Q: Is Frenchie cheaper than AssemblyAI?** At the base transcription rate, Frenchie is cheaper per minute. Once you turn on audio intelligence add-ons (sentiment, entities, moderation), AssemblyAI adds per-feature fees. Compare apples to apples on the features you actually use. **Q: Can I get summaries from Frenchie?** Not directly — Frenchie returns Markdown, your agent summarizes. Since you're already paying an LLM for the agent, summarization is a prompt away. **Q: Why not just use AssemblyAI?** Use AssemblyAI when audio intelligence is the product. Use Frenchie when clean transcription is a step inside a larger agent conversation. ### Frenchie vs Deepgram Deepgram is a transcription API engineered for realtime streaming and high-concurrency enterprise workloads — low latency, long-running connections, call-center scale. Frenchie is engineered for batch async jobs inside agent conversations — send a file, collect the Markdown when it's ready. Different shapes for different problems. **Side-by-side:** - **Transcription mode** — Deepgram: Realtime streaming plus batch / Frenchie: Batch async only - **Integration surface** — Deepgram: WebSocket streaming, REST API, SDKs / Frenchie: MCP tool native in every MCP client - **Target buyer** — Deepgram: Enterprise voice infrastructure, contact centers, media platforms / Frenchie: Developers and agents that need transcripts inline - **Pricing model** — Deepgram: Per-minute plus feature add-ons; volume commits for enterprise / Frenchie: Flat 2 credits per minute = $0.02/min, no commits **When Deepgram is the right call:** - You need live transcription — a call in progress, a meeting being captured, a voice interface responding in under a second. - You're an enterprise buyer with volume that justifies negotiated commits and custom SLAs. - Your workload is call-center scale — hundreds of simultaneous streams — and needs infrastructure designed for that load. **When Frenchie is the right call:** - Your agent transcribes files, not live streams — a meeting recording, a voicemail, a podcast. - You want zero commits, zero sales calls, just an API key and pay-as-you-go. - You're integrating with an MCP-based workflow and want transcription to feel native. **Can you use both?** Deepgram handles the live side — streaming transcripts for an active call, realtime voice agents. Frenchie handles the post-call side — once the recording is saved, an agent picks it up and returns the Markdown your team actually reads. One API for streaming, one tool for agent workflows. **Common questions:** **Q: Does Frenchie do realtime transcription?** No. Frenchie is batch async — you submit a file, the job runs in the background, and the Markdown comes back when it's ready. For live streaming, Deepgram (or a similar streaming API) is the right choice. **Q: Is Frenchie cheaper than Deepgram?** At Deepgram's standard per-minute rates, Frenchie is in a similar ballpark — the real difference is commit structure and add-ons. Frenchie is flat with no minimums; Deepgram gets cheaper at enterprise commit tiers. **Q: Can I use Deepgram for live calls and Frenchie for recordings?** Yes, and it's a clean split. Stream the live call through Deepgram, save the recording, hand it to Frenchie for the agent-facing Markdown transcript after the call. **Q: Why not just use Deepgram?** Use Deepgram when latency and streaming matter. Use Frenchie when the workflow is batch and your agent is the caller. ### Frenchie vs Replicate Replicate hosts thousands of open-source models — image generation, upscalers, video, audio, and everything else — behind a REST API. It's a platform: you pick a model, read its schema, write the integration code, and handle the pipeline. Frenchie is a single MCP tool your agent already knows how to call — `generate_image` — backed by one curated image model. If you want model diversity and are willing to wire it up yourself, Replicate wins. If you want your agent to generate an image mid-conversation with zero integration code, that's us. **Side-by-side:** - **Integration surface** — Replicate: REST API, Python/Node SDK, model-specific schemas / Frenchie: MCP tool — `generate_image` appears natively in Claude Code, Cursor, Codex, and every MCP client - **Agent workflow fit** — Replicate: You write the glue code: auth, polling, image download, file management / Frenchie: Your agent calls one tool; the image lands in your working directory, auto-saved - **Model choice** — Replicate: Thousands of open-source models — you pick, switch, compare / Frenchie: One curated production image model — we handle upgrades behind the scenes - **Pricing model** — Replicate: Per-second compute pricing; varies by model, GPU, and runtime / Frenchie: Flat 20 credits per image = $0.20, same rate regardless of prompt **When Replicate is the right call:** - You want to test and compare multiple image models, or run niche models (ControlNet, upscalers, domain-tuned finetunes). - You're building an image-heavy product where model selection is a core feature and you'll invest in the integration. - Your workload includes video, audio, or other modalities Replicate supports that Frenchie doesn't. **When Frenchie is the right call:** - You want your AI agent to generate an image as part of a conversation — mid-chat, next to the work it's already doing — without switching tools. - You'd rather not write polling loops, download handlers, or file-management code for every image your agent creates. - You want flat, predictable per-image pricing and zero infrastructure to run. **Can you use both?** Reasonable split: Replicate for your production image pipeline where you control the model and squeeze the unit economics. Frenchie for the ad-hoc image your agent generates in a Claude Code or Cursor session — where speed to the image matters more than per-image cost. **Common questions:** **Q: Can my agent call Replicate through Frenchie?** No. Frenchie is its own MCP tool — call `generate_image` and you get an image back. If you want Replicate-specific models, use Replicate's API directly (or wrap it in your own MCP server). **Q: Is Frenchie cheaper than Replicate?** Frenchie is flat: $0.20 per image. Replicate varies by model and GPU-seconds. For a fast model on a cheap GPU, Replicate can be cheaper per image; for a premium model on an H100, Frenchie is often cheaper and simpler. **Q: Can I pick the image model on Frenchie?** No. Frenchie ships one production image model and handles upgrades behind the scenes. If model choice is the feature you're building on, Replicate is the better fit. **Q: Why not just use Replicate?** Use Replicate when model variety is a feature of your product and you have engineering time to integrate. Use Frenchie when you want your agent to generate an image as a single MCP tool call with zero setup. ### Frenchie vs Fal.ai Fal.ai is a fast, low-latency image-generation API — purpose-built for real-time products where every millisecond of inference time shows up in the UI. Frenchie is an MCP server for AI agents: one tool call, one image back, auto-saved next to your work. If you're shipping a consumer image product where latency is the pitch, use Fal. If you're building an agent and want image generation as a native tool alongside OCR and transcription, that's us. **Side-by-side:** - **Integration surface** — Fal.ai: REST API, WebSocket streaming, JavaScript/Python SDKs / Frenchie: MCP tool, native in every MCP client — no SDK required - **Latency focus** — Fal.ai: Aggressive — sub-second for some models, optimized inference / Frenchie: Standard — typically a few seconds per image, fine for agent workflows - **Primary audience** — Fal.ai: Consumer-facing image products, real-time creative tools / Frenchie: AI agents doing image generation inline with other work - **Pricing model** — Fal.ai: Per-request pricing that varies by model and quality tier / Frenchie: Flat 20 credits per image = $0.20, no model math **When Fal.ai is the right call:** - You're building a consumer product where image-generation latency is visible to end users and needs to feel real-time. - You want to pick from Fal's catalog of optimized models (FLUX variants, SDXL, niche models) and tune per use case. - You're comfortable writing the integration code — auth, polling or streaming, result handling — because your product depends on it. **When Frenchie is the right call:** - Your AI agent should generate images during a conversation — a blog cover, a mockup, a diagram — without stopping to configure a separate service. - You want one flat rate, one tool name, and the image to land in your working directory automatically. - You care more about 'does it fit in my agent workflow?' than 'how many ms is the inference?' **Can you use both?** A common split: Fal powers your customer-facing creative tool where latency is the product. Frenchie handles the behind-the-scenes image generation your agent needs while drafting a newsletter, building a mockup, or prepping marketing copy. Two different jobs; two tools shaped for each. **Common questions:** **Q: Is Frenchie as fast as Fal.ai?** No. Fal is latency-optimized; Frenchie isn't. A Frenchie image typically comes back in a few seconds, which is plenty fast for agent workflows but not competitive with Fal for real-time consumer UX. **Q: Is Frenchie cheaper than Fal.ai?** Frenchie is flat $0.20 per image. Fal varies by model and tier — some models are cheaper, premium models cost more. Compare on your exact model before deciding. **Q: Can I use both Fal and Frenchie?** Yes — they're complementary. Fal for your consumer product's real-time generation, Frenchie for your agent's ad-hoc image calls inside Claude Code, Cursor, etc. **Q: Why not just use Fal.ai?** Use Fal when inference latency is a user-facing feature. Use Frenchie when you want your agent to generate images as part of a broader tool-calling workflow with zero integration code. --- ## Use cases ### Meeting transcription for AI agents Turn meetings into Markdown your agent can read. Drop a recording. Get a clean transcript back. Let your agent summarize, extract decisions, or answer follow-ups directly. **Problem:** A 45-minute meeting recording lands in your agent's context. Native transcription isn't available — the agent sees a file it can't read. If you hand it raw audio bytes, you blow the context window before the first word is transcribed. If you transcribe it yourself with a DIY model, you're now maintaining a GPU pipeline to save 20 minutes of listening time. **Gap:** Generic transcription tools return plain text that loses paragraph structure. You paste it into your agent and the agent has to guess where speaker turns happen, where topics shift, where the decisions landed. That guessing costs tokens and quality. **Workflow:** 1. Drop the recording into your agent — drag a file in, paste a path, or run `/transcribe ./standup.mp3` in a Tier-A client like Claude Code or Codex. The Frenchie skill pack picks up the request and resolves the file path against your project directory, no manual upload step. 2. Your agent calls `transcribe_to_markdown` with the resolved path. Frenchie inspects the file, estimates credits (2 per minute), confirms your balance, and routes between the sync and async paths automatically based on duration — meetings under ~10 minutes complete inline, longer ones return a `jobId` immediately. 3. For long audio, Frenchie chunks the file at silence boundaries, transcribes the chunks in parallel, then stitches the result back together. Your agent isn't blocked: the smart-wait loop polls for up to 90 seconds, then hands control back so you can keep working in the same session. 4. When the job completes, the Markdown lands at `.frenchie//result.md` next to the source file (stdio mode) or inline in the tool response (HTTP mode). Paragraph breaks are preserved so your agent can summarize, extract decisions, or answer follow-ups without re-reading raw audio. **Example prompt:** ``` /transcribe ./standups/2026-04-16-standup.mp3 Then pull out every decision someone committed to, grouped by person. ``` **Example agent response:** ``` Transcribed 32-minute standup — clean Markdown saved to .frenchie/2026-04-16-standup/result.md. Commitments: - **Priya**: Push the billing migration to staging by Thursday. - **Kai**: Own the spec for the Antigravity install flow, ready for review Monday. - **Lee**: Review the PR for changelog automation before EOW. - **Mia**: Draft the Q2 OKR rollup by Wednesday standup. ``` **Tips:** - Jobs run async — your agent doesn't block. For a 30-minute meeting, expect roughly 3 minutes of processing time. - Transcription is 2 credits per minute. A one-hour weekly standup all year is about 52 × $1.20 = $62.40. - The result expires 30 minutes after first delivery. If your agent needs to revisit the Markdown later, ask for a fresh transcription or save the result to your own workspace. **FAQ:** **Q: What audio formats work?** MP3, M4A, WAV, MP4, MOV, WEBM. If it plays, Frenchie can transcribe it. Max 2 GB per file. **Q: Does it handle speaker labels?** No — Frenchie returns a single Markdown transcript. Your agent can infer speakers from context, or you can ask it to guess from names mentioned. For enforced speaker diarization, use a dedicated transcription API. **Q: What about non-English meetings?** Thai, Japanese, Chinese, Arabic, French, Spanish, and 50+ other languages work out of the box. Mixed-language meetings work too — no flags needed. **Q: Can I transcribe Zoom / Google Meet recordings?** Yes. Download the recording, drop it in your agent. The audio format won't matter. ### Sales call transcription inside your CRM workflow Send a recording. Get notes your CRM can actually index. Frenchie transcribes sales calls into clean Markdown so your agent can extract next steps, objections, and commitments without relying on your memory or your rep's notes. **Problem:** A sales rep finishes a 40-minute discovery call. They have to summarize it, log next steps in the CRM, flag any risks for the deal desk, and move on. The recording sits in a folder no one reads. Tools like Gong are great but expensive, and they're designed for managers reviewing reps, not for the rep's own agent to read during prep for the next call. **Gap:** Most transcription APIs dump you into their UI. You want the transcript inside your agent workflow — attached to the opportunity record, processed by the same agent that drafts your follow-up email, compared against previous calls for the same account. **Workflow:** 1. Save the call recording from your meeting tool — Zoom's local-recording folder, Google Meet's auto-export, or your softphone's call-archive. Most platforms produce an MP4 or M4A; Frenchie handles all of those plus MP3, WAV, MOV, and WebM. 2. Drop the file into your agent with a prompt that names the artifacts you want — for example, `transcribe ./calls/acme-discovery.m4a, extract decision criteria, stakeholders, budget, and draft a follow-up email`. The skill pack maps that to a `transcribe_to_markdown` call. 3. Frenchie processes the audio asynchronously — chunked, transcribed in parallel, merged. While the job runs (typically a tenth of audio length, ~4 minutes for a 40-minute call), your agent stays unblocked and can prep the email scaffold, look up the account in your CRM, or pull last week's call notes. 4. Clean Markdown lands back in the agent's context. The agent extracts the fields you asked for (objections, next steps, quoted budget), uses your CRM's API to attach the transcript to the opportunity record, and drafts the follow-up email referencing specific quotes from the call — not generic templates. **Example prompt:** ``` /transcribe ./calls/2026-04-14-acme-discovery.m4a Pull out the decision criteria they mentioned, who the other stakeholders are, and the quoted budget range. Draft a follow-up email with a proposed next step. ``` **Example agent response:** ``` Call transcribed (38 min). Key points extracted: **Decision criteria**: SOC 2, single-sign-on, ability to deploy in their VPC. **Stakeholders mentioned**: Maya (VP Eng, our main contact), Ramesh (CTO — needs to approve infra), Priya (procurement). **Budget**: $40-60K annual, flexible if ROI is clear within 2 quarters. **Draft follow-up**: > Hi Maya — thanks for walking me through the infra setup today. To the three criteria you flagged (SOC 2, SSO, VPC deploy), I've attached our compliance pack and a diagram of how the VPC deployment works in practice. Happy to set up a 30-minute call with Ramesh next week to get ahead of his review. Thursday 2pm or Friday morning — does either work? ``` **Tips:** - Transcription runs around a tenth of the audio length. A 45-minute call is roughly 4-5 minutes of processing. - 2 credits per minute. A team running 20 calls a week year-round is roughly $1,000/year — compare to per-seat CRM intelligence tools that run 10-20x that. - Don't send PII you don't need to — brief customer names, deal IDs, and internal context are fine; Social Security numbers and payment details should stay out of any AI pipeline. **FAQ:** **Q: How is this different from using Gong or Chorus?** Gong and Chorus are full sales-intelligence platforms with recording, scoring, and management dashboards. Frenchie is just the transcription layer — you plug the Markdown into whatever your agent already does (prep, follow-up, CRM updates). Complementary for some teams, replacement for others. **Q: Can I pipe transcripts straight into Salesforce / HubSpot?** Yes — your agent does the piping. Frenchie returns Markdown, your agent uses your CRM's API to attach the transcript to the opportunity, extract fields into custom properties, whatever your workflow needs. **Q: Is the recording stored anywhere?** No. Frenchie processes the file and deletes it. The transcript result expires 30 minutes after first delivery. Save the Markdown where you need it long-term — we don't keep copies. **Q: Does it work for international calls?** Yes. 50+ languages, mixed-language calls work too. If your team sells globally, you don't need separate setups per region. ### Podcast transcription for show notes, SEO, and accessibility Ship show notes without the transcription slog. Drop an episode. Get Markdown your agent can turn into chapter markers, pull-quote cards, SEO-ready show notes, and full accessibility transcripts. **Problem:** You just finished recording a 72-minute episode. Your cohost already closed the laptop. You still need show notes, a title that doesn't feel generic, three pull-quote images for social, timestamped chapters for YouTube, and a plain-text transcript for accessibility. That's four hours of work on top of editing, and it's work you hate. **Gap:** Transcription services give you text. They don't give you the downstream artifacts — the pull quotes, the chapter breakdown, the show-note bullets tuned for your audience. Your agent can do all of that, if it has the transcript in a shape it can work with. **Workflow:** 1. Drop the exported episode audio into your agent — the master mixdown is fine, or a near-final cut if you want to draft show notes before final color-correction. Frenchie handles MP3, M4A, WAV, MP4, and MOV up to 2 GB, so a 3-hour interview master fits comfortably. 2. Your agent calls `transcribe_to_markdown` via Frenchie. Long-form audio chunks at silence boundaries and transcribes in parallel — a 90-minute episode finishes in 8–10 minutes of wall time. The smart-wait loop holds the call open for the first 90 seconds, then returns a `jobId` your agent polls with `get_job_result`. 3. Once the Markdown lands in `.frenchie//result.md`, ask your agent for the downstream artifacts in one prompt — eight chapter markers with timestamps, three pull quotes, five candidate episode titles, a 3-sentence hook for the RSS description, and a plain-text accessibility transcript. Your agent works off the structured Markdown, not the raw audio. 4. Publish. The transcript stays in your project as an `.md` file — your next episode's agent can grep across previous episodes for running threads, recurring guests, or callbacks. Frenchie deletes the source audio and expires the result payload after 30 minutes; the local file is yours forever. **Example prompt:** ``` /transcribe ./episodes/ep-041-final-mix.mp3 Give me: (1) a 3-sentence episode hook, (2) 8 chapter markers with timestamps, (3) 3 pull quotes from the guest, (4) five candidate episode titles. ``` **Example agent response:** ``` Transcribed 72-minute episode. Outputs below. **Hook**: Why platform engineers should stop building internal dev portals from scratch and start treating their platform as a product with real users. Our guest has built platforms at three unicorns and has opinions. **Chapters**: - 0:00 — Cold open - 2:14 — How the guest got into platform work - 9:40 — The internal-dev-portal mistake - 18:22 — Treating platform as a product - 31:05 — Metrics that matter vs vanity metrics - 44:18 — Hiring for platform teams - 58:30 — Common failure modes - 1:06:12 — Book and tool recommendations **Pull quotes**: 1. "Nobody uses your platform because they love you. They use it because their deadline is tomorrow." 2. "A platform team without product management is just an infra team with more meetings." 3. "The best measure of a platform is how much of it your users forgot is even there." **Title candidates**: - Your Platform Is a Product (Whether You Like It or Not) - Stop Building Developer Portals From Scratch - The Invisible Platform - Platform Engineering, Minus the Myth - What Your Platform Team Isn't Telling You ``` **Tips:** - A 60-minute episode is 120 credits = $1.20. An episode a week is roughly $62 a year — less than a single month of most podcast-production SaaS. - Frenchie doesn't detect chapter breaks on its own. Your agent does that from the transcript — it works because agents are good at narrative boundaries once they can read the words. - For podcasts with heavy cross-talk, your agent's pull-quote quality improves if you prompt for quotes from a specific speaker by context clues ("the guest", "the second voice"). **FAQ:** **Q: Can I transcribe episodes with multiple hosts?** Yes. Frenchie returns a single transcript — your agent can usually guess speaker boundaries from content. For strict speaker labels, use a dedicated transcription API and feed its output into your agent instead. **Q: What about episodes with music and sound effects?** Transcribes the spoken content and ignores non-speech audio. If you have a stinger or sound effect mid-dialogue, it won't show up in the transcript, which is usually what you want. **Q: How long does a 90-minute episode take to transcribe?** Roughly 8-10 minutes of processing. The job runs async — your agent can keep generating show notes for earlier episodes while the new one processes. **Q: Can I automate this with a script?** Yes — either via the MCP tool (if your scheduler runs in an MCP-compatible agent) or via the HTTP endpoint at mcp.getfrenchie.dev. Cron the export, drop the file, your agent does the rest. ### Research paper parsing — text, tables, and figures Extract text and figures from papers your agent should actually read. Frenchie parses PDF research papers into clean Markdown, pulling figures out as separate PNG files so your agent can cite specific content instead of summarizing the whole thing. **Problem:** You point your agent at a 20-page paper. It reads the PDF via native attachments, burns 30K tokens on the raw bytes, and still misses half the figures because they come back as low-res image data. You wanted to ask a specific question about Table 3 and Figure 5. The agent gives you a vague paragraph instead. **Gap:** Native PDF reading in most agents handles text roughly and figures badly. Figures show up as inline image attachments, not standalone assets your agent can point at. Tables lose their structure. Equations often turn into garbled characters. You end up downloading the paper yourself and reading it the old way. **Workflow:** 1. Drop the paper into your agent — drag the PDF in, or run `/ocr ./papers/attention-is-all-you-need.pdf`. Frenchie accepts native PDFs from arXiv as easily as scans of conference proceedings; the same pipeline handles both. 2. Your agent calls `ocr_to_markdown` via Frenchie. The extraction pipeline runs text recognition, preserves Markdown table structure for results tables, transcribes equations as LaTeX where the source is typeset cleanly, and exports figures as separate PNG files (one per figure) so they're addressable later. 1 credit per page — a 15-page conference paper costs $0.15. 3. The Markdown lands at `.frenchie//result.md` with figure references like `![Figure 3](./fig-3.png)` pointing at sibling files in the same folder. Tables come through with full cell structure intact instead of being flattened to plain text. 4. Your agent can now answer specific questions — "compare the per-layer complexity in Table 1", "summarize the architecture in Figure 1", "list the citations referenced in Section 3". For figures, your agent can pass the PNG to a vision model for deeper analysis instead of guessing from a low-res inline thumbnail. **Example prompt:** ``` /ocr ./papers/attention-is-all-you-need.pdf What's the architecture described in Figure 1, and how does the per-layer complexity compare to recurrent networks according to Table 1? ``` **Example agent response:** ``` Paper parsed — 16 pages, 3 figures extracted as PNGs, Markdown saved to .frenchie/attention-is-all-you-need/result.md. **Figure 1**: The Transformer architecture — encoder on the left, decoder on the right. Each layer uses multi-head self-attention plus a position-wise feed-forward network. No recurrence, no convolution. Positional encodings are added to the input embeddings. **Per-layer complexity (Table 1)**: - Self-attention: O(n² · d) — quadratic in sequence length, linear in dimension. - Recurrent: O(n · d²) — linear in sequence length, quadratic in dimension. - For typical model dimensions (d > n), self-attention is actually cheaper per layer. The paper argues this flips when sequences get long enough, which is why later work explored sparse attention variants. ``` **Tips:** - Figures come back as PNG files, one per figure — not inline images in the Markdown. Your agent references them by filename. - Tables preserve cell structure as Markdown tables. Equations preserve as LaTeX where detectable. - Cost is predictable: 1 credit per page. A typical 15-page conference paper runs $0.15. A 100-page thesis runs $1. **FAQ:** **Q: Does it work on scanned/older papers?** Yes. Scanned PDFs that native agent readers return empty on usually work well through Frenchie — the OCR pipeline handles scanned text, old layouts, and weird column structures. **Q: What about papers with equations?** Equations come through as LaTeX where the source was typeset cleanly. For scanned equations, expect Markdown-style approximations with some symbol loss — still readable, not publication-grade. **Q: Can I batch process a whole folder of papers?** Yes. Your agent scripts the loop — it calls ocr_to_markdown once per file, tracks job IDs, collects results. Frenchie handles parallelism on the server side. **Q: How big can a paper be?** 2 GB per file. For research papers, that's effectively unlimited — a 2 GB PDF is usually thousands of pages. ### Contract and legal document OCR for agent review Read contracts without the copy-paste tax. Frenchie parses legal PDFs — scanned or native — into clean Markdown so your agent can flag clauses, extract dates, and diff redlines without you scrubbing formatting by hand. **Problem:** You get a 38-page master services agreement as a scanned PDF. Your agent can read maybe half of it through native attachment. You need to flag unusual liability language, extract payment terms, and compare it against your standard template. Doing that on malformed PDF text means your agent misses clauses, confuses sections, and gives you back a summary you can't trust. **Gap:** Most contracts arrive as scanned PDFs, photocopies of signed originals, or exports from systems that flatten the structure. Native PDF reading returns empty text or garbled tables. You can't review what your agent can't read. **Workflow:** 1. Drop the contract PDF into your agent — scanned, photographed, or native PDF all work equally well. For redlines (two versions of the same contract), drop both files in the same prompt so your agent can diff them clause by clause. 2. Your agent calls `ocr_to_markdown` via Frenchie. The pipeline handles scanned documents at 200+ DPI, preserves section numbering (`1.2.3.a`, `Exhibit A`, `Schedule 2`) as Markdown heading hierarchy, and keeps table structure intact for fee schedules, payment grids, and warranty matrices in the exhibits. 3. Clean Markdown lands at `.frenchie//result.md` — every section navigable by `#` heading, every clause addressable by section number, every exhibit as a separate H2 block. Your agent can grep for `liability`, `indemnif`, `term`, or any specific defined term and get exact section references back. 4. Your agent reviews clauses against your standard template, extracts payment terms / renewal mechanics / cure periods into structured fields, and flags material deviations clause-by-clause. For sensitive matters, save the Markdown to your workspace before the result expires (30 minutes) — Frenchie deletes the source file and result payload automatically. **Example prompt:** ``` /ocr ./contracts/acme-msa-draft-v3.pdf Flag any liability caps, indemnification language, or IP assignment clauses that differ from our standard template. Extract the payment terms and renewal mechanics. ``` **Example agent response:** ``` Contract parsed — 38 pages, 4 exhibits, saved to .frenchie/acme-msa-draft-v3/result.md. **Flags vs template**: - **Liability cap (Section 9.2)**: Capped at 12 months of fees. Our standard is 24 months. Worth pushing back. - **IP assignment (Section 11.1)**: Assigns foreground IP to customer. Our standard reserves foreground IP for us and grants a license. Material difference. - **Indemnification (Section 10.4)**: Customer-favorable — we indemnify them for IP infringement and confidentiality breaches. Standard structure otherwise. **Payment terms**: Net 60, annual commit, auto-renewal with 90-day notice to cancel, price increase capped at CPI + 3% per year. **Risks**: 12-month liability cap is below standard. IP assignment in Section 11.1 is a real concern — flag to legal before counter-signing. ``` **Tips:** - Scanned contracts often have lower OCR confidence on handwritten initials, signatures, and margin notes. Don't rely on those fields without human review. - For redlines (two versions of the same contract), parse both through Frenchie and let your agent diff them. Markdown diffs cleanly. - Privacy: files are processed and deleted — not stored. Results expire 30 minutes after first delivery. For highly sensitive contracts, save the Markdown to your workspace and let the result expire. **FAQ:** **Q: Does it preserve section numbering?** Yes. Section hierarchy (1.2.3, Exhibit A, Schedule 2) comes through as Markdown headings your agent can navigate by reference. **Q: What about handwritten signatures or initials?** Signature blocks are transcribed as best-effort. For signature validation, use a dedicated e-signature tool. Frenchie is for reading the text, not verifying execution. **Q: Can I compare two contract versions?** Yes. Parse both, have your agent do a clause-level diff. Markdown diffs cleanly and your agent can summarize material changes. **Q: Is this acceptable for privileged documents?** Files are processed and deleted — not stored, not used for training. Results expire 30 minutes after first delivery. Consult your firm's data handling policy for privileged documents; Frenchie's posture is no-retention, but only your counsel can confirm that's sufficient for your matter. ### Invoice OCR and receipt extraction for bookkeeping agents Pull line items out of invoices without building a data pipeline. Frenchie parses invoices and receipts into structured Markdown so your agent can extract line items, totals, tax, and dates straight into your accounting tool. **Problem:** A vendor sends you an invoice as a PDF scan. Your accounting agent has to extract the line items, the totals, the VAT, the due date, and post it to Xero. Generic OCR gives you a blob of text with misaligned columns. You spend more time cleaning up the extraction than the invoice saves you. **Gap:** Invoices come in every template under the sun. Most OCR tools give you text; few preserve the table structure that makes line items addressable. Tools that do structured extraction lock you into their schema — you can't point them at a slightly-weird invoice without retraining. **Workflow:** 1. Drop the invoice into your agent — PDF email attachment, scanned paper invoice, or a phone photo of a paper receipt. Frenchie accepts PDF, PNG, JPG, and WebP. Batch processing works too: pass a folder of weekly receipts and your agent can loop the OCR call. 2. Your agent calls `ocr_to_markdown` via Frenchie. Tables preserve as Markdown tables — line items stay in their columns with currency symbols intact, dates in their original format. 1 credit per page, so a year of receipts for a small team runs under $20. 3. Your agent reads the Markdown, extracts the fields your accounting tool cares about (vendor, invoice number, date, due date, line items array, subtotal, tax breakdown, total), normalizes currencies if needed, and posts them via your accounting API — Xero, QuickBooks, FreshBooks, Wave all expose draft-bill endpoints. 4. If anything looks off — missing VAT number, line item with a different tax rate from siblings, subtotal+tax that doesn't equal the total, vendor name that doesn't match a known supplier — your agent flags it for review instead of silently posting wrong data. Frenchie returns text; the validation logic stays in your agent. **Example prompt:** ``` /ocr ./invoices/acme-services-2026-04-15.pdf Extract vendor, invoice number, date, line items, subtotal, VAT, total. Post it to Xero as a draft bill. ``` **Example agent response:** ``` Invoice parsed. Fields extracted: - **Vendor**: Acme Services Ltd - **Invoice number**: INV-2026-0417 - **Date**: 2026-04-15 - **Due date**: 2026-05-15 (Net 30) - **Line items**: 1. Platform fee — April 2026 — $2,000.00 2. Support retainer — 10 hours @ $150/hr — $1,500.00 3. One-time onboarding setup — $500.00 - **Subtotal**: $4,000.00 - **VAT (7%)**: $280.00 - **Total**: $4,280.00 Posted to Xero as draft bill #DRAFT-8821. VAT rate on line 3 differs from the other two lines in the source doc — flagged for your review before approving. ``` **Tips:** - A scanned receipt is usually 1-2 pages. At 1 credit per page, OCR on a year of receipts for a small team runs under $20. - The quality of extraction depends on your agent's prompt — ask for specific fields by name and it returns structured data you can pipe straight into your accounting tool. - For weird invoices (handwritten, photo-of-phone-screen), expect slightly lower confidence on totals. Always pass through a review step before auto-posting. **FAQ:** **Q: Does it work on photos of receipts?** Yes. JPG, PNG, WebP all work. Skewed, wrinkled, or low-light photos still transcribe in most cases — accuracy drops for extreme cases. **Q: Can I extract to a specific accounting schema?** Your agent does the schema mapping. Frenchie returns clean text; the agent pulls named fields via prompt. Works with any accounting tool that has a documented API — Xero, QuickBooks, FreshBooks, Wave. **Q: What about multi-currency invoices?** Currencies come through as they appear in the invoice. Your agent handles conversion if your accounting tool needs it. **Q: How does this compare to dedicated invoice-capture tools?** Dedicated tools like Dext or AutoEntry preprocess on their schema and are great if they support your vendor templates. Frenchie is more flexible — any invoice shape works, but your agent does the extraction logic. Use Frenchie if your vendors are weird or your schema changes often. ### Handwritten notes to searchable Markdown Digitize your notebook without typing it up. Frenchie reads handwritten pages — notebook scans, whiteboard photos, meeting jottings — and returns Markdown your agent can search, summarize, and reference. **Problem:** You take notes in a paper notebook. At the end of a week, you have 30 pages of handwritten jottings and no way to search them. Your agent can read the scan as an image but loses half the content. Typing them up is a tax you keep meaning to pay. **Gap:** Handwriting OCR has gotten much better, but most general-purpose OCR tools fall back to print recognition and mangle cursive. Note-taking apps with OCR lock you into their ecosystem. You wanted plain Markdown in your own workspace. **Workflow:** 1. Photograph or scan your notebook pages. Use your phone's built-in document scanner (iOS Notes' Scan Documents, Google Drive's scan tool) — they auto-correct perspective and contrast, which materially improves handwriting OCR accuracy. Plain phone photos work too if the lighting is even. 2. Drop the images into your agent. Batch a whole week of pages at once with a glob like `/ocr ./notebook-scans/week-of-2026-04-13/*.jpg` — your agent loops the OCR call once per page, tracks job IDs, and collects results as they arrive. 1 credit per page, so a 200-page notebook runs $2. 3. Frenchie returns Markdown for each page. Clean print transcribes at ~95% accuracy; cursive or abbreviated scrawl drops to 70–80%. Diagrams and sketches come back as PNG attachments your agent can describe via its vision capability or surface inline alongside the transcribed text. 4. Your agent indexes the notes into whatever search system you use — Obsidian, Notion, a local `ripgrep` workflow. Weekly review prompts ("what did I commit to this week?", "what's still unactioned?") work because the agent now has structured Markdown to reason over instead of opaque image data. **Example prompt:** ``` /ocr ./notebook-scans/week-of-2026-04-13/*.jpg Summarize what I committed to this week, and flag anything I wrote down but haven't acted on yet. ``` **Example agent response:** ``` 12 pages transcribed. Markdown saved per page under .frenchie/week-of-2026-04-13/. **Commitments captured**: - Reply to Maya's email about the Q2 roadmap (noted Monday, status: ✓ done Wednesday) - Draft the proposal for the infra refactor (noted Tuesday, status: not started) - Call Dad about the trip (noted Wednesday, status: ✓ done Thursday evening) - Follow up with the Antigravity team on the install flow (noted Friday, status: not started) **Unactioned items from your notes**: Infra refactor proposal and the Antigravity follow-up. Both flagged for Monday. ``` **Tips:** - Handwriting OCR quality tracks how legible your writing is. Clean block letters come through ~95% accurate; frantic cursive drops to 70-80%. - Shadow and glare are the biggest issues for phone scans. Use your phone's built-in document scanner (iOS Notes, Google Drive scan) — they auto-correct perspective and contrast. - Cost is 1 credit per page. A 200-page notebook runs $2. Cheaper than a replacement notebook. **FAQ:** **Q: How accurate is handwriting recognition?** For neat print, very accurate. For cursive or abbreviated scrawl, usable but not perfect. Treat the output as a first-pass digital copy, not a verbatim transcript. **Q: Can I OCR diagrams and sketches?** Frenchie extracts handwritten text. Sketches and diagrams come back as extracted figures (PNG files). Your agent can describe them via its vision capability or you can view them directly. **Q: Does it handle multiple languages?** Yes — Thai, Japanese, Chinese, Arabic, French, Spanish, and many others. Mixed-language notes work. **Q: Can I make my whole notebook searchable?** Yes — your agent takes the Markdown and drops it into whatever search system you use (Obsidian, Notion, a local ripgrep workflow). Frenchie is the digitization step; indexing is downstream. ### Product mockups for AI agents Let your agent generate product shots from a prompt. Describe a product image. Your agent generates it, saves it next to your work, and moves on. **Problem:** You're drafting a landing page, a launch deck, or a product brief, and you need a mockup. Opening a separate image-gen playground, logging in, generating, downloading, dropping into your folder — that's a 10-minute context switch for a 30-second image. **Gap:** Most image-generation platforms are web apps, not agent tools. Your AI agent can't call them without you gluing together SDKs, API keys, and download logic. By the time you've wired that up, you could have made the mockup yourself. **Workflow:** 1. Ask your agent to generate an image directly with a concrete prompt — `Generate a minimalist product mockup: matte black wireless earbuds on a concrete surface, soft morning light, shallow depth of field`. Specific subject + style + lighting + composition outperforms vague aesthetic adjectives. 2. Your agent calls `generate_image` with the prompt — no setup beyond the one-time Frenchie install. Optional parameters: `size` (1024×1024 default, 1536×1024 landscape, 1024×1536 portrait), `quality` (low / medium / high / auto), `format` (PNG / JPEG / WebP), `background` (transparent / opaque / auto). 3. Frenchie generates the image (typically 20–60 seconds, 20 credits = $0.20) and saves it at `.frenchie//generated.png` next to your work in stdio mode. HTTP mode returns a 30-minute presigned URL your agent can fetch. 4. Your agent references the image path directly — drop it into a Markdown doc, a deck slide, a design tool, or a CMS upload. For variations, call `generate_image` again with a refined prompt; each call is a fresh image (v1 generates exactly one per call). **Example prompt:** ``` Generate a minimalist product mockup: matte black wireless earbuds on a concrete surface, soft morning light, shallow depth of field. ``` **Example agent response:** ``` Generated image saved to .frenchie/earbuds-mockup/image.png (1024×1024, 20 credits, 4.8s). Ready to drop into your landing-page hero. Want a variation, a different angle, or a darker background? ``` **Tips:** - Image generation is 20 credits per image — $0.20. A launch page with 10 shots runs $2. - Prompts work like any other image-gen service: specific subject + style + lighting + composition. Be concrete. - Images auto-save to the agent workspace. No separate download step, no playground hunt. **FAQ:** **Q: What image sizes and formats do you support?** Standard 1024×1024 PNG by default. Other sizes and aspect ratios are supported — your agent passes them as parameters on the generate_image call. **Q: Can I iterate on a prompt?** Yes — your agent can call generate_image repeatedly with refined prompts. Each call is 20 credits. Frenchie doesn't store prompts or images beyond 30 minutes after delivery. **Q: Is it safe for client work?** Generated images are deleted from our storage 30 minutes after your agent retrieves them. We don't train on your prompts. Check your downstream licensing for the specific provider behind the generation. ### Blog covers and social images for AI agents Cover images for every post your agent writes. Your agent drafts a blog post, then generates a matching cover image in the same flow. No separate design pass. **Problem:** Every blog post needs a cover. Every social share needs an image. You're not a designer, and stock photos look like stock photos. Hand-prompting an image-gen playground for every post breaks the writing flow and costs more time than the writing itself. **Gap:** AI agents can draft the post, summarize the key idea, and describe a matching image — but they can't actually produce the image without an external tool you've glued in yourself. So the image step stays manual, and posts ship with no cover or with a hasty stock photo. **Workflow:** 1. Ask your agent to draft the post and generate a matching cover in one prompt — `Write a 600-word post on async transcription pricing for dev tools, then generate an abstract technical cover image, warm amber and charcoal palette`. The same agent owns both halves of the deliverable. 2. Your agent drafts the post, extracts a visual concept from the content ("abstract technical illustration capturing the idea of asynchronous handoff"), and calls `generate_image` with a prompt it derives from the post body — concrete subject, palette, mood, composition. 3. Frenchie returns the image at `.frenchie//cover.png` next to the post draft (20 credits = $0.20). For social variants — landscape OG, square Instagram, vertical story — your agent calls `generate_image` again with a different `size` parameter on the same prompt. 4. Your agent references the image in the post's frontmatter (`cover: ./cover.png`) or hands the path back to you for review. For weekly publishing cadence, the cost works out to about $10/year for 52 covers — less than a single month of stock-photo subscriptions. **Example prompt:** ``` Draft a 600-word post on pay-per-use vs subscription pricing for dev tools. Then generate a cover image — something abstract and technical, warm colors. ``` **Example agent response:** ``` Post drafted — .frenchie/pay-per-use-post/draft.md (612 words). Cover image generated — .frenchie/pay-per-use-post/cover.png (1024×1024, 20 credits). Prompt used: 'Abstract technical illustration, warm amber and charcoal palette, geometric tokens flowing across a grid, subtle halftone texture.' ``` **Tips:** - 20 credits per cover. A post per week for a year is 52 × $0.20 = $10.40. - Your agent can describe the image in words first — review the prompt before generating to avoid wasted credits. - Pair with the ocr_to_markdown or transcribe_to_markdown tools in the same session if the post references a document or recording. **FAQ:** **Q: Can I match my brand style?** Yes — include brand colors, typography cues, and composition in the prompt. Image generation responds to specific style direction better than vague aesthetic adjectives. **Q: Can my agent generate social variants (landscape, square, vertical)?** Yes — pass aspect ratio as a parameter on generate_image. Your agent can loop through a set of sizes for one post. **Q: Does this work for OpenGraph and blog platforms?** Yes — the image saves as a standard PNG file. Your agent can drop the path into your frontmatter, upload it to your CMS, or reference it inline in Markdown. --- ## Blog posts ### Read, listen, create: why your AI agent needs all three *Published 2026-04-23 · 5 min read · Category: Product* Your agent already reasons. Give it three hands: read files, listen to recordings, create images. How image generation completes the Frenchie toolkit. Your AI agent already has a brain. It can reason, plan, write code, argue with you about architecture decisions, and generate a plausible roadmap in under a minute. What it's missing is hands.
Hand-drawn line illustration: a friendly humanoid AI agent silhouette in the center extending three slender hands — one holding a folded document (read), one holding a sound waveform (listen), one holding a small framed picture with a sparkle (create)
A brain without hands can describe what a bookshelf should look like, but it can't build one. For an agent to actually *do work* — the real work, the kind that ends in a shippable artifact — it needs capabilities that extend past pure reasoning. It needs to be able to read files it didn't write, listen to recordings it wasn't in, and create assets it can use downstream. Those three verbs — read, listen, create — are what Frenchie gives your agent. ## The shape of agent work Watch an AI agent try to complete a real task end-to-end and you start to notice a pattern. The reasoning work — figuring out what to do, drafting text, writing code, proposing structure — is usually the easy part. Modern models are absurdly good at it. The hard part is everything that isn't reasoning. The part where the agent has to: - Read a scanned contract the user dropped into the chat. - Listen to a thirty-minute client call that was recorded yesterday. - Create a cover image for the blog post it just drafted. Each of those is a *capability gap*. The agent understands the task. It just can't perform the underlying action — reading an image, decoding an audio stream, producing pixel data — because those aren't things an LLM does natively. You can patch each gap individually. Wire in an OCR API. Host Whisper. Call out to an image-generation service. Handle the upload, the polling, the retries, the file management, the credit accounting. Each patch takes a few hours to build and a few more to maintain. Do that three times and you've built a tools layer. Do it right and you've built Frenchie. ## Read (OCR) Your agent can read any PDF or image. Scanned contracts, handwritten notes, invoices from 2003, research papers with tables and figures. One MCP call, clean Markdown back — with figures pulled out as separate PNG files so your agent can reference them directly instead of guessing at what they contain. This isn't new to Frenchie — OCR was the first capability we shipped. But it set the shape for everything after: one narrow tool, one clean output, zero infrastructure for you to run. ## Listen (transcription) Your agent can listen to audio and video. Meeting recordings, sales calls, podcast episodes, voicemails. Long files get chunked automatically and reassembled into one continuous transcript. Async by default — your agent kicks off the job, keeps working, and collects the Markdown when it's ready. No blocking, no five-minute pauses in the conversation while a model chews through a long recording. Most agents can't process audio at all without a tool like this. The model sees a file extension and politely declines. Frenchie turns that into a single `transcribe_to_markdown` call. ## Create (image generation) And now: your agent can create images. A product mockup to attach to a spec. A cover image for the blog post it just drafted. A concept sketch to include in a design review. Describe what you need in a prompt and your agent generates the image, saves it next to the work it's doing, and moves on. This is the piece that was missing. An agent that can read a scanned contract and transcribe the related client call but still has to *ask you* to generate the accompanying visual is an agent that's stuck halfway through the work. Image generation closes that loop. In Frenchie, it's the `generate_image` MCP tool. Same install flow, same flat pricing — 20 credits per image, $0.20. The generated file lands in your agent's workspace automatically. No new dashboard. No separate playground to manage. No new billing relationship. ## Why three capabilities, not ten A question we get a lot: why stop at three? Why not add structured data extraction, video generation, music generation, RAG, translation, summarization, the whole menu? Because an agent's LLM already does most of that. Summarization is a prompt. Translation is a prompt. Structured extraction is a prompt over clean text — and Frenchie's job is to get the clean text to the agent in the first place. The reasoning layer is already solved. What wasn't solved, before Frenchie, was the *inputs and outputs* that the reasoning layer needs. Reading arbitrary files. Listening to arbitrary recordings. Producing arbitrary images. Those are the three capability gaps that show up in almost every real agent workflow, and those are the three things Frenchie does. Three verbs. Three MCP tools. One flat pricing scheme. That's the whole product. ## What this looks like in your day The practical version: you're in Claude Code, Cursor, Codex — whatever your agent lives in — and you're building something real. A blog post. A technical spec. A weekly update. A pitch deck. You drop in a scanned PDF. Your agent reads it. You attach a recording from yesterday's customer call. Your agent transcribes it. You ask for a cover image to match the piece. Your agent generates it. None of those steps require you to switch tools, open a second dashboard, or paste a URL from a different product. Your agent already has the capability. It just picks the right tool and runs. That's the shape of agent work we've been trying to make feel normal. Read, listen, create — all through one MCP interface, all behind one API key, all priced in the same credit pool. One hundred free credits on signup, once per email, to try all three. --- Ready to see it? Install Frenchie in your agent with `npx @lab94/frenchie install --api-key fr_your_key`, and your agent gets OCR, transcription, and image generation as native MCP tools the moment you restart. [Read the docs](/docs) for the full capability reference, or [skip straight to the free tier](/register). ### Why we built Frenchie: the MCP tools gap *Published 2026-04-21 · 4 min read · Category: Origin* Agents write code and reason about systems. Drop a scanned PDF or a 30-minute recording in front of one and most hit a wall. That wall is why Frenchie exists. A modern coding agent can do impressive things. It can write a week's worth of code in an hour. It can reason about system design, refactor a gnarly function, and negotiate with a package manager. Hand it a business problem and it'll come back with a solution, a test suite, and usually an unsolicited architectural opinion. Drop a scanned contract in front of that same agent, and it falls apart.
Hand-drawn line illustration: a small AI agent silhouette facing a tall wall built of three stacked blocks (folded document, sound waveform, empty picture frame) — three shapes of work it can't finish on its own
We've watched it happen in every MCP client we use. Claude Code tries to read a PDF and either succeeds on clean digital text or returns a confused paragraph about how the document appears to be image-based and therefore unreadable. Cursor hands the file to the model as raw bytes, which is worse — 30,000 tokens later the agent has "read" the document but missed every table. Codex just declines the task politely. None of these tools are broken. They're just pointed at the wrong problem. They're reasoning engines. The file extraction problem is a different shape. ## The tools gap MCP — the Model Context Protocol — is the most interesting thing that's happened to agent tooling in the last two years. It finally gives agents a clean way to call external tools without every product reinventing the wiring. Install an MCP server, and any MCP client (Claude Code, Cursor, Codex, Antigravity, Windsurf, VS Code, Gemini CLI, Zed, Claude Desktop) can call its tools natively. No custom adapter per client. No glue code per workflow. The ecosystem has filled in fast. There are MCP servers for databases, for APIs, for file systems, for project management, for browser automation, for everything you'd expect. But when we looked for OCR, transcription, and image generation — the three most common capability gaps in agent workflows — the answer was the same everywhere: build it yourself, or stitch together a commercial API with your own chunking, retry, and storage logic. That's a gap worth filling. ## What Frenchie is Frenchie is a narrow, purpose-built MCP server that gives your AI agent three abilities: 1. **Read** (OCR) — PDFs and images in, clean Markdown out. Scanned documents, handwritten notes, research papers with tables and figures, invoices, contracts. Whatever shape the file arrives in, your agent gets back prose it can actually read. 2. **Listen** (transcription) — audio and video in, clean Markdown out. Meeting recordings, sales calls, podcast episodes, voicemails. Async by default — your agent submits the job and keeps working instead of blocking on long files. 3. **Create** (image generation) — text prompt in, image file out. Product mockups, blog covers, concept art. Auto-saved next to your agent's work, no separate playground to manage. That's the whole product. No dashboards. No workflow builder. No platform. A tool you install once, use through your agent's native tool call syntax, and mostly forget is there. ## Why narrow is a feature We had the conversation early about expanding scope. We could add structured extraction. We could add summarization. We could add a semantic chunker. Every one of those features sounds reasonable in isolation. Together they turn a narrow tool into a platform, and platforms are harder to use, harder to trust, and harder to replace if we get any decision wrong. Narrow means we can ship a change without triggering a cross-product coordination meeting. Narrow means pricing stays predictable — $1 for 100 credits, one credit per OCR page, two credits per transcription minute, twenty credits per generated image, full stop. Narrow means when someone asks "what does Frenchie do?" there's a one-sentence answer. Your agent already handles summarization. Your agent already does structured extraction. Your agent has an LLM bolted onto it — let that LLM do the reasoning work, and let Frenchie do the boring file-reading work it was never going to be good at. ## What we picked We made three choices that we think will still be the right ones a year from now: - **MCP-first, not REST-first.** The primary integration path is stdio MCP. Install Frenchie with one command (`npx @lab94/frenchie install --api-key fr_…`), restart your agent, and the OCR, transcription, and image generation tools show up as native function calls. HTTP at `mcp.getfrenchie.dev` exists as a fallback for hosted agents that can't spawn local binaries, but it's deliberately not the main path. - **Pay-as-you-go, not subscription.** Subscriptions make small use cases uneconomic for the people we most want to help — indie devs, small teams, anyone whose use case doesn't fit a monthly quota. A year of casual use shouldn't cost more than a lunch. - **Files processed and deleted.** We don't store your files. We don't train on your files. Results expire 30 minutes after delivery. This is an engineering choice, not a marketing promise — we don't have infrastructure for retention even if we wanted it. ## What we're not We're not a transcription platform trying to be everything. We're not a sales-intelligence product. We're not competing with Whisper on batch throughput or with AssemblyAI on audio intelligence. If your workflow lives in those shapes, use those tools — we have [honest comparisons](/compare) for every major alternative. We're the narrow thing that plugs the gap in the MCP ecosystem. That's it. ## What's next We ship incrementally. The [changelog](/changelog) is the honest record of what's landed. If you want to know what we're thinking about next, watch the changelog — we don't pre-announce features we haven't committed to building. If Frenchie sounds like it fits somewhere in your agent workflow, [sign up for 100 free credits](/register) and try it against a real file from your own work. Worst case you spent nothing and learned something. ### How Frenchie handles 30-minute audio without blocking your agent *Published 2026-04-20 · 5 min read · Category: How it works* Sync transcription freezes your agent for five minutes. Async job handling is the detail that makes transcription usable inside a live agent workflow. Sync APIs feel right at first. You call a function, you wait, you get a result. Simple to code, simple to reason about. Then you try to transcribe a 30-minute meeting recording inside an agent conversation. The model submits the job. The HTTP connection opens. Five minutes pass while the audio gets chunked, transcribed, and merged. The agent sits frozen, the user sits frozen, the terminal sits frozen. At minute six someone hits Ctrl-C out of frustration. You just burned the agent context on a task that was never going to work synchronously. This is why Frenchie's transcription pipeline is async by default. ## What "async" actually means in practice
Hand-drawn sequence diagram: two parallel vertical lifelines with an AI agent silhouette on the left and a small server box on the right, connected by horizontal arrows alternating direction to show async request and response over time
When your agent calls `transcribe_to_markdown` with a file path, three things happen in sequence, fast: 1. Frenchie accepts the file (or reads it from disk), validates it, and estimates credits. 2. Frenchie creates a job, queues it, and returns a job ID to your agent in under a second. 3. Your agent gets control back. The conversation continues. The actual transcription runs in the background. Long audio gets chunked automatically — a 30-minute file splits into segments that process in parallel, then merge back into a single Markdown transcript. Your agent can poll for the result when it's ready, or the stdio transport can deliver the result via a second tool call (`get_job_result`) once the job completes. For a 30-minute recording, the total wall-clock time is usually 2-4 minutes. But your agent doesn't sit blocked for 2-4 minutes. It does other work — drafts the meeting summary prompt, looks up attendees in your CRM, pulls last week's decisions from the transcript archive — and collects the Markdown when it lands. ## The concrete flow Here's what a realistic invocation looks like from the agent's side: ``` > /transcribe ./meetings/2026-04-16-standup.mp3 Tool call: transcribe_to_markdown(file: "./meetings/2026-04-16-standup.mp3") Response: { jobId: "fr_job_8821", status: "processing", etaSeconds: 180 } > [agent continues with other work — drafts email, checks calendar] > [2 minutes later, agent checks job status] Tool call: get_job_result(jobId: "fr_job_8821") Response: { status: "completed", savedTo: ".frenchie/2026-04-16-standup/result.md", wordCount: 4821 } > [agent reads the Markdown file and proceeds] ``` The agent isn't waiting on transcription. It's doing its regular job while a different piece of infrastructure does the audio work. The experience, from the user's perspective, is "I dropped in a recording and my agent kept moving." ## Why we didn't make it sync We did experiment with sync early on. The simpler code path is attractive — no job IDs, no polling, no second tool call. For files under about five minutes, sync is fine. Everything else falls over. The failure modes are ugly: - **Request timeouts.** Most HTTP clients default to 30 or 60-second timeouts. Long sync jobs die partway through. The agent gets nothing and has to retry the whole thing. - **Context bloat.** If a sync job succeeds, the full Markdown payload comes back as the tool response. A 30-minute transcription is easily 4,000 words. That's 5,000+ tokens dumped into the agent's context on a single tool call. - **Retries are destructive.** If the sync call fails, retrying starts the whole transcription over. With async, a failed polling request is cheap — you just poll again. - **Parallelism is impossible.** If sync blocks the whole tool call, your agent can't do anything else. You've effectively single-threaded the entire conversation on a background job. Async fixes every one of those. The cost is that tool authors (us) have to build a job queue, retry logic, and a second tool call for result retrieval. One-time cost on our side, recurring benefit on every agent that uses Frenchie. ## What this means for you If you're building an agent workflow that touches transcription, design around async from the start. The right shape is "submit, keep working, pick up the result." Not "wait five minutes staring at a blinking cursor." Frenchie's stdio MCP contract explicitly returns metadata (`savedTo`, `wordCount`, `imageCount`, `creditsUsed`) rather than the full transcript in the tool response. Your agent reads the Markdown file if and when it needs the content. That keeps the tool-call response small and the context clean. This is also why our max file size is 2 GB, not 200 MB. At 2 GB you can hand us an entire conference day's recording. You'd hate the wall-clock time on a sync API — you don't even notice it on an async one. ## The small-engineering bet Async job handling is the kind of engineering detail that doesn't show up in a demo. The demo looks identical — user drops a file, Markdown comes back. But the system behavior under real use is completely different. Real use means long files, slow networks, agent conversations that can't afford to block, and workflows where throughput matters more than latency. We bet that getting this one engineering detail right would make Frenchie feel qualitatively better than a simpler tool. Six months in, we think that bet was correct. If you've been around other transcription APIs that lock up your workflow on long files, give Frenchie a try with 100 free credits — the difference is noticeable on the second or third long file, not the first. ## The same shape for OCR and image generation Transcription is the most obvious case for async — a 30-minute recording forces the issue. But the same queue-and-retrieve pattern shows up wherever your agent does non-trivial work: - **OCR** uses the same async pipeline for large PDFs. A 500-page scan chunks out to `ocr_batch` jobs, merges through the same `MERGE` queue, and hands your agent a single Markdown result when it's ready. - **Image generation** submits each `generate_image` call through the same BullMQ queue with the same job-id retrieval path — so your agent fires a prompt, keeps drafting, and collects the image file when it lands. Three capabilities, one async contract. If you understood `transcribe_to_markdown` above, you already understand how `ocr_to_markdown` and `generate_image` behave on long jobs — only the inputs change. ### Inside Frenchie: from scanned PDF to clean Markdown in 3 seconds *Published 2026-04-19 · 5 min read · Category: How it works* What happens between dropping a scanned contract into your agent and getting back searchable Markdown with tables and figures intact. The OCR pipeline. Drop a scanned PDF into your agent. Three seconds later you get back clean Markdown with table structure preserved and figures extracted as separate files. It feels a little like magic the first time. It's not magic. It's five stages, each one solving a specific problem that the previous one leaves behind. Here's what actually happens between the file landing and the Markdown coming back.
Hand-drawn 5-stage pipeline diagram: five rounded boxes connected left to right, each containing a small pictogram (paper sheet, magnifying glass, paper broken into pieces, eye reading text, pieces joining together)
## Stage 1 — Intake and validation The first thing Frenchie does when your agent calls `ocr_to_markdown` is check the file. That sounds boring, but validation is where most OCR pipelines quietly fail. A "PDF" might actually be a password-protected archive, a corrupted file someone exported from a broken system in 2011, a PDF/A-3 with embedded binary attachments, or just a JPEG someone renamed with `.pdf`. Every one of these breaks the stages that come later if you don't catch them up front. Frenchie inspects the actual file bytes, not the extension. It checks MIME type, validates the PDF header if it claims to be PDF, rejects files over 2 GB, and estimates credit cost before any processing starts. If the file is clearly broken, the job fails fast with a useful error — not a vague "extraction failed" three minutes later. This stage takes milliseconds. Most files pass. The ones that don't save you from a failed job downstream. ## Stage 2 — Page separation and routing PDFs are weird. A single PDF can contain pages that are native text (crisp, selectable, already Unicode-encoded) and pages that are scanned images (pixel data that needs OCR). You can't treat them identically — running OCR on native text pages throws away perfect data, and trying to extract text from image-only pages returns empty strings. Frenchie separates pages by type. Native text pages go through a fast path: extract the text, preserve the layout hints PDF already provides, ship the result. Image-based pages go through the heavier OCR path in the next stage. For a typical 30-page legal contract — scanned at some point, re-saved a dozen times — every page ends up on the OCR path. For a conference paper exported clean from LaTeX, every page ends up on the text path and the whole job finishes in under a second. Most real documents are mixed, and the router handles that transparently. ## Stage 3 — Optical character recognition The pages that need OCR go through the actual recognition step. This is where the pixel data becomes characters. Frenchie handles the stuff that makes OCR hard: - **Multiple languages on the same page.** Thai in a header, English in the body, a code block in monospace. No language flag, no manual mode selection — the pipeline detects script boundaries and handles each region with the right recognizer. - **Mixed orientation.** Pages scanned sideways, upside down, or at a slight angle. Rotation and skew correction happen before recognition. - **Structured layouts.** Two-column papers, tabular invoices, forms with labeled fields. The recognizer keeps track of where text belongs spatially, not just what the characters are. - **Handwriting.** Not perfect on cursive, but usable on neat block letters. Anything above about 75% confidence comes through; anything lower gets flagged rather than silently mistranscribed. The output of this stage is text plus positional metadata — which characters came from which region of which page. That metadata matters for the next stage. ## Stage 4 — Structure reconstruction OCR gives you text, but it doesn't give you a document. A table full of numbers that came through as an OCR blob is less useful than a Markdown table. A figure caption that got stitched into the paragraph above it is misleading. This is where Frenchie turns recognized text into a document structure. Three things happen here: 1. **Tables.** Cells get matched to rows and columns using the positional metadata from OCR. Column alignment, cell content, merged cells — all preserved as Markdown tables your agent can read row-by-row. 2. **Figures.** Images embedded in the document (charts, diagrams, photos) get extracted as separate PNG files and referenced inline in the Markdown: `![Figure 3](./figures/fig-3.png)`. Your agent gets text it can read and figures it can cite by filename. 3. **Hierarchy.** Headings, section numbering, paragraph breaks, footnotes — all reconstructed from font-size hints and spatial layout. The result reads like a document, not a wall of plain text. The Markdown that comes out of this stage is what your agent actually sees. Everything before was preparation. ## Stage 5 — Delivery The final stage is how the output reaches your agent. Over stdio MCP, the tool response is metadata only — the file path where the Markdown was saved, word count, image count, credits used. Your agent reads the file when it needs the content. Over HTTP MCP, the full Markdown can be inlined in the response, which works for smaller documents but we recommend the file-reference pattern for anything substantial. Results expire 30 minutes after first delivery. If your agent needs the Markdown permanently, it saves it to your workspace before the TTL hits. We don't archive your files, we don't keep your results. ## Why three seconds The "3 seconds" number isn't universal — a 400-page scanned thesis takes longer. But for the document shape that matters most (10-30 pages, mixed scan and native text), the pipeline targets sub-5-second end-to-end. That's the latency budget that keeps the tool feeling live inside an agent conversation. Most of the speed comes from three places: parallelizing page-level work across the whole document, skipping OCR entirely on native-text pages, and returning metadata instead of dumping full Markdown into the tool-call response. None of them are exotic. They're the obvious engineering choices once you decide latency matters. ## What this looks like from your side None of the five stages are visible when you're using Frenchie. Your agent calls one tool (`ocr_to_markdown`), waits a few seconds, and picks up the result. The pipeline details only matter when something goes wrong — which is why the validation stage exists, and why our [troubleshooting page](/docs/troubleshooting) leads with symptom-first fixes. If you want to see the whole pipeline in action on a real document of yours, [sign up for 100 free credits](/register) and throw a scanned PDF at it. The first one is on us. ### MCP for beginners: what it is and why Frenchie picks it *Published 2026-04-18 · 6 min read · Category: Primer* Model Context Protocol is the quiet standard reshaping how AI agents get tools. What it actually is, why it matters, why Frenchie bet the whole product on it. Ask an engineer what MCP is, and you'll get a confident three-sentence answer that happens to be mostly wrong. Ask a product person, and you'll get a vague "it's how agents talk to tools." Ask an AI researcher, and you'll get a nuanced paragraph that correctly answers the question nobody asked. The Model Context Protocol is actually simple. But it's one of those simple things that matters more than it looks like it should. This is a primer for the person who's heard the acronym in a few too many meetings and would like a clean mental model. ## What MCP is MCP is a protocol for AI agents to call external tools.
Hand-drawn protocol diagram: a small AI agent silhouette on the left, three tool icons stacked vertically on the right (document, sound waveform, framed sparkle), connected by a horizontal channel with bidirectional arrows showing request and response
That's it. No magic, no orchestration layer, no hidden reasoning component. It's the set of conventions a tool provider follows so that any MCP-compatible agent can call that tool without custom integration code. The protocol defines: - **Tool discovery.** How an agent asks "what tools are available?" and gets back a machine-readable list. - **Tool invocation.** How an agent says "run tool X with arguments Y" and gets back results. - **Tool schemas.** How a tool describes its inputs, outputs, and side effects. - **Transport.** How the messages get from agent to tool — stdio for local tools, HTTP for hosted tools. If you've seen OpenAI's function calling, or Anthropic's tool use, MCP is the open-standard version of the same idea. It's what function calling looks like when you peel it off any one LLM provider's API and make it portable. ## Why MCP matters Before MCP, every AI-powered product had to reinvent the tool-calling layer. Claude Code had its own tool system. Cursor had a different one. Codex had a third. If you wanted your tool to work in all of them, you wrote three adapters. That sucked for tool builders. It sucked more for users, because it meant the tool ecosystem was fragmented along client lines — tools that worked in Claude Code didn't work in Cursor, and vice versa. You had to pick your client before you picked your tools. MCP undoes that fragmentation. A tool built as an MCP server works in every MCP client. Today that's Claude Code, Cursor, Codex, Antigravity, Windsurf, VS Code (GitHub Copilot), Gemini CLI, Zed, Claude Desktop, and an expanding list of others. One protocol, one implementation per tool, every client. That's a compounding advantage. Every new MCP client that ships adopts the whole existing tool ecosystem by default. Every new MCP tool that ships works in every client from day one. The ecosystem grows faster because the integration tax is zero. ## The two transports MCP has two ways for an agent to talk to a tool: - **stdio** — the tool runs as a local process, the agent spawns it and communicates over standard input/output. Fast, no network, works offline. The default for most developer tools. - **HTTP** — the tool runs as a remote service, the agent calls it over HTTP. Slower but works from anywhere, doesn't require installing anything locally. Good for hosted agents or SaaS integrations. You'd think this is an implementation detail, and mostly it is. But it has one user-visible consequence: stdio tools are installed locally (usually via `npx`), while HTTP tools are configured with a URL. The install steps look different, but the resulting agent experience is identical — a tool call is a tool call. Frenchie ships both. The primary path is stdio (`npx @lab94/frenchie install --api-key fr_…`) because that's what works in every local coding agent. The HTTP fallback at `mcp.getfrenchie.dev` exists for hosted and web agents (Lovable, Manus, Claude.ai, ChatGPT.com, Le Chat) that can't spawn local binaries. ## What MCP isn't A lot of the confusion around MCP comes from things it's not. For the record: - **MCP is not an agent.** It's the wire between an agent and a tool. The agent itself — the LLM doing the reasoning, the conversation handler, the UI — is a separate thing. - **MCP is not an orchestrator.** It doesn't decide which tool to call when. That's the agent's job. MCP just delivers the call. - **MCP is not a runtime.** It doesn't host your code. Your tool runs wherever you run it (locally via stdio, remotely via HTTP); MCP just standardizes how the agent talks to it. - **MCP is not an alternative to RAG or vector databases.** Those are different layers. An MCP tool can query a vector database if that's what it needs to do, but MCP itself is just the call interface. If you find yourself confused about whether something is "MCP" or not, the test is: does it describe how an agent invokes external functions, or does it describe what those functions do? MCP is the first, everything else is the second. ## Why we bet on MCP for Frenchie When we started building Frenchie, we had two choices for distribution: build a REST API that every agent has to integrate separately, or build an MCP server and get distribution across every MCP client for free. REST would have been slightly faster to prototype, because we wouldn't have had to learn the protocol. Every other trade-off pointed the other way: - **Distribution.** MCP gets us into nine local coding agents and a handful of hosted ones on day one. REST would have required a per-client adapter or a wrapper library per ecosystem. - **User experience.** An MCP tool shows up as a native function call in the agent's interface. A REST API shows up as "paste this curl command into a helper script." Guess which one users actually use. - **Durability.** MCP is an open standard maintained by a large and growing community. It'll outlive any one vendor's tool-calling format. Betting on MCP means not having to rewrite the integration layer every time OpenAI or Anthropic shifts their tool-use API. - **Narrow scope.** MCP is designed for exactly the kind of narrow, focused tool Frenchie is — three capabilities your agent calls directly (OCR for PDFs and images, transcription for audio and video, image generation from text prompts), delivered as three MCP tools through one server. One protocol, one interface, many clients. A good fit for a product that doesn't want to be a platform. We kept the HTTP endpoint as a secondary option because some users genuinely can't use stdio (hosted agents, web-based chat interfaces). But everything about the product is designed around the stdio MCP contract being the primary path. ## How to try it If MCP is new to you and you want to feel what an MCP tool is like, Frenchie is a pretty gentle way in: 1. Open a local coding agent — Claude Code, Cursor, Codex, or any other MCP client. 2. Run `npx @lab94/frenchie install --api-key fr_your_key_here` (you can sign up for 100 free credits first if you want a key). 3. Restart your agent. 4. Drop any PDF into your agent and ask it to OCR it, any audio or video file and ask it to transcribe, or any prompt and ask it to generate an image. You'll see the tool call appear as a native function invocation in the agent's transcript — `ocr_to_markdown(file: "…")`, `transcribe_to_markdown(file: "…")`, or `generate_image(prompt: "…")` — with the result coming back inline or saved next to your work. That's MCP doing its job, quietly, the way it's supposed to work. No dashboards, no adapters, no integration code. One protocol, one install, every agent. ## Further reading If you want to go deeper on MCP itself, the canonical source is [modelcontextprotocol.io](https://modelcontextprotocol.io). The spec is surprisingly readable for a protocol spec. If you want to see how MCP fits into a specific agent's workflow, our [per-agent install guides](/docs/tools) cover the nine Tier-A clients we support — each one has its own quirks (Cursor toggles, Codex TOML, Claude Desktop's config location) that are worth knowing going in. If you want to see how we think about MCP-vs-alternatives in practice, our [comparison pages](/compare) lay out when Frenchie's MCP-first approach is the right call and when a different tool fits better. We mean it when we say "honest" — there are workflows where Marker, LlamaParse, Whisper, AssemblyAI, or Deepgram will serve you better. The goal of this blog isn't conversion, it's clarity. ### Pay-per-use vs subscription: why Frenchie ships flat $1 = 100 credits *Published 2026-04-17 · 6 min read · Category: Pricing* Every SaaS eventually debates usage-based vs subscription pricing. For a narrow tool like Frenchie, the answer was clear before the first line of billing code. Your AI agent's work doesn't fit a subscription shape. Some days it parses three invoices, some weeks it transcribes ten hours of calls, some months it generates a hundred blog covers. A monthly fee either overcharges the quiet days or underprices the busy ones. Pay-per-use is just honest about what your agent actually does. Pricing is one of those choices where the decision shape matters more than the specific number you land on. A $9/month plan and a $19/month plan are different numbers on the same structure. A $9/month plan and a pay-as-you-go model are different structures entirely. We picked pay-as-you-go for Frenchie early, and we think the reasoning is worth writing down. Not because $1 for 100 credits is the universally correct answer — it isn't — but because the logic that gets you there is portable to other narrow tools facing the same decision. ## The shapes of pricing
Hand-drawn cost-over-time chart: three distinct line shapes — a flat horizontal line (flat subscription), a stepped staircase (tiered), and a jagged step function with peaks and valleys (pay-per-use)
There are really only three shapes available to a software product in 2026: - **Flat subscription.** $29/month gets you the product. Predictable revenue for you, predictable cost for the customer. The default in SaaS. - **Tiered subscription.** $29/month gets you X, $99/month gets you 3X, $299/month gets you enterprise features. Lets you capture more value from high-usage customers. - **Pay-as-you-go (usage-based).** You pay for what you consume, period. Predictable cost-per-unit, unpredictable total spend. There are hybrids — subscription with usage caps, usage with volume commits — but they're all variants of these three. Most SaaS companies default to tiered subscription because it's what the previous SaaS company did. That's not a reason, it's an echo. ## What makes a product a good fit for pay-as-you-go Usage-based pricing works best when three conditions hold: 1. **Usage varies wildly across customers.** Some people use it once a month, others use it hundreds of times a day. A monthly fee either overcharges the low users (who leave) or undercharges the high users (who cost you money). 2. **The value scales with usage.** Each unit consumed produces roughly the same amount of value for the customer. Not always true — some products have diminishing returns on the 100th use — but for most file-processing workflows, unit value is pretty flat. 3. **Unit cost is low and predictable.** The customer can reason about "how much will 50 OCR pages cost me?" without a calculator, a sales call, or a consultation. Frenchie hits all three. Some customers OCR five invoices a week. Others transcribe 40 hours of calls a month. Others have their agent generate 200 product mockups before a launch. The value per page, per minute, or per image is basically constant. And $0.01 per page / $0.02 per minute / $0.20 per image is a unit cost anyone can estimate in their head. Run the same checklist on a collaboration tool like Slack, and usage-based pricing makes less sense — all three conditions break. Subscription is the right shape there. ## What breaks with subscription for a product like Frenchie We ran the numbers on subscription pricing early. The shape kept breaking in one of two ways: - **Too low a floor.** A $10/month plan covers a thousand OCR pages. Most casual users won't hit that. They pay $120/year for something they use a few times and mostly forget about. They leave after six months because they can't justify the line item. You never build a long-term relationship with the exact people you most want to help — indie devs, small teams, someone experimenting with an agent workflow. - **Too high a floor.** A $49/month plan prices out everyone who isn't already sure they need the product. You never capture the long tail of "I'd try it if it were cheap." A product that only works for enterprise customers isn't a product we want to build. Tiered subscription combines the downsides. Now you have three plans your customers have to compare, a feature matrix that turns into a sales surface, and price anchoring that pushes everyone to the plan you want them on instead of the one they need. We didn't want to build a product that requires a pricing page explainer. ## What pay-per-use actually looks like Frenchie's pricing is five lines: - $1 = 100 credits. - 1 credit per OCR page. - 2 credits per transcription minute. - 20 credits per generated image. - 100 free credits on signup, no card required. That's the whole pricing sheet. Every job costs what it costs. No plans to compare, no feature matrix, no commit, no renewal. If you use Frenchie once a quarter for a one-off PDF, it costs you cents. If you use it every day for a bookkeeping workflow, it costs you dollars. The other piece that matters: **credits don't expire.** You can top up once, let the balance sit for a year, use it when you need to. We don't play the "unused credits reset monthly" game — that's a subscription in a trench coat, and we don't want to be that. ## The downsides we chose Pay-per-use has real downsides. We took them on knowingly. - **Revenue is harder to predict.** Customers don't commit to a monthly fee. Cash flow is bumpier. We have less visibility into next quarter's revenue. This is a real business problem — we just decided the customer-experience benefit was worth more than the operational convenience. - **Unit economics have to actually work at the low end.** We can't rely on a subscription averaging out heavy and light users. Each unit has to be priced high enough to cover its cost, including the capacity we can't perfectly utilize. We've been ruthless about keeping our infrastructure cost per unit down so the price can stay low. - **No floor means no guaranteed revenue per user.** A customer who tries Frenchie and churns costs us the same acquisition effort as a customer who stays for years. We need conversion and retention to pay for itself, without the cushion of a minimum monthly fee. - **Enterprise conversations are weirder.** When someone asks "do you have an annual contract?" the honest answer is "not really, but email us if volume matters." That's fine for the customers we want; it's a hurdle for the ones who need procurement-shaped pricing. We'll get there when it matters; we're not rushing it. ## When this is the wrong answer Pay-per-use isn't universally correct. If your product is a platform that takes weeks to onboard, subscription is probably right — the switching cost is high enough that customers want the predictability of a fixed fee. If your product has heavy marginal cost per call (like a vision model or a reasoning-heavy workflow), you need to be careful that per-unit pricing actually covers your costs before optimizing for experience. For a narrow utility that users can try in five minutes and integrate in one command, pay-per-use removes the biggest objection to trying it. That's the game we're playing. ## What you get from this A product you can try for free, pay for when you use it, stop paying for when you don't. No meetings, no plans, no gotchas. If that sounds like the kind of tool you'd rather have in your stack, [sign up for 100 free credits](/register) and see if Frenchie fits. The 100 is on us; everything after is cheaper than you'd expect. We've also put [usage examples on the pricing page](/pricing) — a scanned book, a podcast, a month of daily standups, a year of invoices — so the abstract credit math becomes concrete dollar amounts before you commit anything. --- ## About LAB94 LAB94 is a small AI studio. We build narrow, well-shaped tools for agent workflows — the kind of tools you install once, forget are there, and notice only when they save you an afternoon. Frenchie is our first product. More are coming. None of them will try to be everything. ### What we believe - **Impact over features.** We don't chase feature parity with bigger tools. We look at moments where agent workflows actually break, and we build the smallest thing that fixes them. - **Narrow scope.** One Frenchie does OCR and transcription. That's it. No dashboards, no plans to compare, no fifth product line to distract the roadmap. - **Honest by default.** When a tool isn't right for you, we say so. When a competitor fits better, we recommend them. Trust compounds — it's the only moat we actually want to build. ### Contact Email support@getfrenchie.dev — a real person reads it. Company site: https://lab94.io --- ## Canonical links - Product: https://www.getfrenchie.dev - Pricing: https://www.getfrenchie.dev/pricing - Docs: https://www.getfrenchie.dev/docs - Comparisons: https://www.getfrenchie.dev/compare - Use cases: https://www.getfrenchie.dev/use-cases - Blog: https://www.getfrenchie.dev/blog - About: https://www.getfrenchie.dev/about - Changelog: https://www.getfrenchie.dev/changelog - Terms & Privacy: https://www.getfrenchie.dev/terms - RSS: https://www.getfrenchie.dev/blog/feed.xml - Sitemap: https://www.getfrenchie.dev/sitemap.xml