use case
Extract text and figures from papers your agent should actually read.
Frenchie parses PDF research papers into clean Markdown, pulling figures out as separate PNG files so your agent can cite specific content instead of summarizing the whole thing.
the problem
Why this is a pain.
You point your agent at a 20-page paper. It reads the PDF via native attachments, burns 30K tokens on the raw bytes, and still misses half the figures because they come back as low-res image data. You wanted to ask a specific question about Table 3 and Figure 5. The agent gives you a vague paragraph instead.
Native PDF reading in most agents handles text roughly and figures badly. Figures show up as inline image attachments, not standalone assets your agent can point at. Tables lose their structure. Equations often turn into garbled characters. You end up downloading the paper yourself and reading it the old way.
the workflow
How Frenchie handles it.
- 1
Drop the paper into your agent — drag the PDF in, or run `/ocr ./papers/attention-is-all-you-need.pdf`. Frenchie accepts native PDFs from arXiv as easily as scans of conference proceedings; the same pipeline handles both.
- 2
Your agent calls `ocr_to_markdown` via Frenchie. The extraction pipeline runs text recognition, preserves Markdown table structure for results tables, transcribes equations as LaTeX where the source is typeset cleanly, and exports figures as separate PNG files (one per figure) so they're addressable later. 1 credit per page — a 15-page conference paper costs $0.15.
- 3
The Markdown lands at `.frenchie/<paper-slug>/result.md` with figure references like `` pointing at sibling files in the same folder. Tables come through with full cell structure intact instead of being flattened to plain text.
- 4
Your agent can now answer specific questions — "compare the per-layer complexity in Table 1", "summarize the architecture in Figure 1", "list the citations referenced in Section 3". For figures, your agent can pass the PNG to a vision model for deeper analysis instead of guessing from a low-res inline thumbnail.
example
What it looks like in practice.
Your prompt
/ocr ./papers/attention-is-all-you-need.pdf What's the architecture described in Figure 1, and how does the per-layer complexity compare to recurrent networks according to Table 1?
Agent response
Paper parsed — 16 pages, 3 figures extracted as PNGs, Markdown saved to .frenchie/attention-is-all-you-need/result.md. **Figure 1**: The Transformer architecture — encoder on the left, decoder on the right. Each layer uses multi-head self-attention plus a position-wise feed-forward network. No recurrence, no convolution. Positional encodings are added to the input embeddings. **Per-layer complexity (Table 1)**: - Self-attention: O(n² · d) — quadratic in sequence length, linear in dimension. - Recurrent: O(n · d²) — linear in sequence length, quadratic in dimension. - For typical model dimensions (d > n), self-attention is actually cheaper per layer. The paper argues this flips when sequences get long enough, which is why later work explored sparse attention variants.
What lands in your workspace
$frenchie ocr methodology.pdf
# Section 4 — Results The collected dataset confirms the hypothesis across all three runs. | Run | Accuracy | Latency | | --- | -------- | ------- | | 1 | 94.2% | 118ms | | 2 | 95.0% | 121ms | | 3 | 94.7% | 119ms | *Figure 3.* Distribution shifts across the validation cohort.
tips
Things worth knowing.
- Figures come back as PNG files, one per figure — not inline images in the Markdown. Your agent references them by filename.
- Tables preserve cell structure as Markdown tables. Equations preserve as LaTeX where detectable.
- Cost is predictable: 1 credit per page. A typical 15-page conference paper runs $0.15. A 100-page thesis runs $1.
questions
Common questions.
Does it work on scanned/older papers?
Yes. Scanned PDFs that native agent readers return empty on usually work well through Frenchie — the OCR pipeline handles scanned text, old layouts, and weird column structures.
What about papers with equations?
Equations come through as LaTeX where the source was typeset cleanly. For scanned equations, expect Markdown-style approximations with some symbol loss — still readable, not publication-grade.
Can I batch process a whole folder of papers?
Yes. Your agent scripts the loop — it calls ocr_to_markdown once per file, tracks job IDs, collects results. Frenchie handles parallelism on the server side.
How big can a paper be?
2 GB per file. For research papers, that's effectively unlimited — a 2 GB PDF is usually thousands of pages.
Try it with a real file of yours.
100 free credits on signup. No card. Drop a PDF or image from your own workflow and see the Markdown your agent gets back.