Back to directory
AI & ML · AI infrastructure / MLOps / evals

Extend

Production-ready document processing

Building @extendHQ
San Francisco, CA6.4K followers
TLVC Rating
Hook
Editing / Creativity
Copy
Sentiment of launch
Distribution strategy
Community Rating
No ratings yet
Your rating
Sign in to rate this launch.

About

Extend is a document processing platform for AI teams that need to turn messy PDFs into reliable structured context for agents and pipelines. With this launch, the New York based company is releasing Parse 2.0, its document parsing API built for agents and pipelines where accuracy, cost, and latency are critical, alongside RealDoc-Bench, a companion benchmark that measures parsing performance on real-world documents from logistics, healthcare, financial services, and real estate. The launch matters because document parsing is still a bottleneck for production agents working with the hospital intake forms, mortgage packets, tax filings, and bills of lading that businesses actually run on. Rather than relying on a single large model to read a page end to end, Parse 2.0 uses a layout-first approach that segments each page into regions and routes those regions to a suite of fine-tuned OCR and vision language models downstream. On RealDoc-Bench, Extend reports an adjusted F1 of 0.847 on layout accuracy across 1,500 samples, ahead of Reducto at 0.759, AWS Textract at 0.709, Azure DI at 0.687, and PaddleOCR-VL at 0.684. On document Q&A accuracy across 1,359 prompts and 581 documents, Parse 2.0 scored 95.7%, with LlamaParse (Agentic) at 92.1%, Reducto (Agentic) at 91.1%, and Extend Parse 1.0 at 90.4%. Founded in 2023 by Eli Badgio and Kushal Byatnal, Extend is based in New York and counts Brex, Mercury, and Opendoor among the teams running documents through its API. For founders and operators building agents that touch regulated or systems-of-record documents, Parse 2.0 and the public RealDoc-Bench results give a concrete way to compare parsing options on workloads that look like their own, instead of clean academic PDFs.
Tags
Product launch500K-1MExplainerSeries AB2BGlobalUSVertical AIFounder-led
Comments (9)
Sign in to join the discussion.
Nadia Oloruntoba6d ago

Parse 2.0 is wild branding for a PDF parser, makes it sound like a JS framework that ruins your weekend.

Tomasz K.6d ago

Every doc parser claims 'most accurate in the world' until you feed it a scanned fax of a handwritten invoice from 1997.

Ravi Subramaniam6d ago

Genuinely curious what your eval set looks like, are you benchmarking against DocVQA or something internal that nobody else can reproduce?

Leigh M.6d ago

Tagline rewrite, free of charge: 'PDFs in, structured data out, no excuses.' You can venmo me.

kenjiships6d ago

How big is the team behind this? If it's more than 12 people I'm going to be mildly disappointed in all of us.

Amara Diallo6d ago

One of my portcos swapped their stitched-together OCR pipeline for Extend and the engineer who maintained it cried tears of joy. True story.

Fernanda Q.6d ago

What's the rate limit ceiling on Parse 2.0 and do you fire a webhook on async completion or do I get to invent my own polling nightmare?

Yusef Haddad6d ago

The launch tweet buried the lede by leading with the 1 billion PDFs stat, the Brex and Mercury logo drop should've been the hook in the first 10 seconds.

mira6d ago

A document is just a stubborn opinion in PDF form. Tools that translate stubbornness into JSON are quietly the most important infra of this decade.