extraction

Pandoc, Unstructured, or Frenchie?

A practical guide to three different extraction shapes: local conversion, document processing infrastructure, and MCP-native file extraction for agents.

Three tools can all sit near the phrase "document extraction" and still be shaped for different jobs.

Pandoc is a document converter. Unstructured is document processing infrastructure. Frenchie is an MCP file utility for agents.

That distinction matters more than a feature checklist.

Hand-drawn comparison diagram: three extraction choices split into converter, infrastructure, and agent utility paths

Pick Pandoc when you want a converter

Pandoc is the classic answer when you need to convert documents locally. It is broad, scriptable, and excellent in publishing workflows where you control the environment.

Use Pandoc when:

  • You want local conversion.
  • You already have scripts around it.
  • Your target is a publishing pipeline, not a live agent conversation.
  • You need a broad converter more than an agent tool.

Pandoc is free software and your machine does the work. That is exactly right for many teams.

It is less right when the user is sitting inside an agent and drops in a workbook, CSV export, Word doc, or deck. At that point, you still need tool invocation, path handling, HTTP fallback for hosted agents, result saving, and instructions that teach the agent when to use the converter.

Pick Unstructured when you want ingestion infrastructure

Unstructured is closer to document-processing infrastructure. It is useful when you are building ingestion pipelines, preparing documents for downstream systems, or working with partitioned document elements.

Use Unstructured when:

  • Document processing is part of your backend architecture.
  • You need partitioning and preprocessing knobs.
  • You are building a durable ingestion path for many document types.
  • The output feeds a larger pipeline you own.

That is a bigger shape than most agent workflows need.

Pick Frenchie when the file is inside the agent workflow

Frenchie is smaller on purpose.

The user gives an agent a file. The agent calls an MCP tool. Frenchie returns Markdown. The agent does the reasoning.

Use Frenchie when:

  • Your agent needs to read DOCX, XLSX, CSV, TSV, or PPTX files.
  • You want skills installed alongside MCP tools.
  • You want stdio for local agents and HTTP for hosted agents.
  • You do not want to build upload, polling, result saving, or retry behavior yourself.

Install:

npx @lab94/frenchie install --api-key fr_...

Ask:

Extract ./handoff/forecast.xlsx with Frenchie and write the variance summary.

Frenchie handles the file-reading layer. Your agent handles the variance summary.

The practical split

Use Pandoc for local conversion and publishing.

Use Unstructured for document ingestion infrastructure.

Use Frenchie when an agent needs Markdown from a file in the middle of work.

Those are different jobs. Picking the tool that matches the job is less glamorous than a benchmark table, but it saves more time.