AI & ML · AI infrastructure / MLOps / evals

Extend

Production-ready document processing

Building @extendHQ

San Francisco, CA6.9K followers

TLVC Rating

Hook

Editing / Creativity

Copy

Sentiment of launch

Distribution strategy

Community Rating

★★★★★

No ratings yet

Your rating

About

Extend is a document processing platform for AI teams that need to turn messy PDFs into reliable structured context for agents and pipelines. With this launch, the New York based company is releasing Parse 2.0, its document parsing API built for agents and pipelines where accuracy, cost, and latency are critical, alongside RealDoc-Bench, a companion benchmark that measures parsing performance on real-world documents from logistics, healthcare, financial services, and real estate. The launch matters because document parsing is still a bottleneck for production agents working with the hospital intake forms, mortgage packets, tax filings, and bills of lading that businesses actually run on.

Rather than relying on a single large model to read a page end to end, Parse 2.0 uses a layout-first approach that segments each page into regions and routes those regions to a suite of fine-tuned OCR and vision language models downstream. On RealDoc-Bench, Extend reports an adjusted F1 of 0.847 on layout accuracy across 1,500 samples, ahead of Reducto at 0.759, AWS Textract at 0.709, Azure DI at 0.687, and PaddleOCR-VL at 0.684. On document Q&A accuracy across 1,359 prompts and 581 documents, Parse 2.0 scored 95.7%, with LlamaParse (Agentic) at 92.1%, Reducto (Agentic) at 91.1%, and Extend Parse 1.0 at 90.4%.

Founded in 2023 by Eli Badgio and Kushal Byatnal, Extend is based in New York and counts Brex, Mercury, and Opendoor among the teams running documents through its API. For founders and operators building agents that touch regulated or systems-of-record documents, Parse 2.0 and the public RealDoc-Bench results give a concrete way to compare parsing options on workloads that look like their own, instead of clean academic PDFs.

Tags

Product launch500K-1MExplainerSeries AB2BGlobalUSVertical AIFounder-led

Comments (9)

Nadia Oloruntoba5/27/2026

Parse 2.0 is wild branding for a PDF parser, makes it sound like a JS framework that ruins your weekend.

Tomasz K.5/27/2026

Every doc parser claims 'most accurate in the world' until you feed it a scanned fax of a handwritten invoice from 1997.

Ravi Subramaniam5/27/2026

Genuinely curious what your eval set looks like, are you benchmarking against DocVQA or something internal that nobody else can reproduce?

Leigh M.5/27/2026

Tagline rewrite, free of charge: 'PDFs in, structured data out, no excuses.' You can venmo me.

kenjiships5/27/2026

How big is the team behind this? If it's more than 12 people I'm going to be mildly disappointed in all of us.

Amara Diallo5/27/2026

One of my portcos swapped their stitched-together OCR pipeline for Extend and the engineer who maintained it cried tears of joy. True story.

Fernanda Q.5/27/2026

What's the rate limit ceiling on Parse 2.0 and do you fire a webhook on async completion or do I get to invent my own polling nightmare?

Yusef Haddad5/27/2026

The launch tweet buried the lede by leading with the 1 billion PDFs stat, the Brex and Mercury logo drop should've been the hook in the first 10 seconds.

mira5/27/2026

A document is just a stubborn opinion in PDF form. Tools that translate stubbornness into JSON are quietly the most important infra of this decade.