Judgment Labs
The continuous-improvement stack for agents.
Great launch, got really good traction. Animations + explanation was on point.
About
Production data as the training signal makes way more sense than synthetic eval sets that nobody actually trusts. Curious how you handle PII redaction at ingest.
Every agent infra startup says 'continuous improvement' but the loop is usually one engineer staring at logs at 2am. What's actually automated here?
ok wait, agents generating their own training data from prod is the actual unlock. Eval datasets have been the bottleneck for two years.
The tweet thread structure is doing a lot of heavy lifting here, but the hook sentence is buried under the funding number. Lead with the agent behavior insight next time.
Procurement question: SOC2 Type II in place? Most of our agent pilots are stuck in legal because vendors don't have the paperwork.
Per-trace pricing or seat-based? Agent volume is wildly unpredictable and finance teams hate metered surprises after the first quarter.
Building in the agent observability lane too and genuinely glad someone is making 'evals from prod' the headline. We've been screaming about it in customer calls all year.
Retention is the only number that matters for infra at this stage. Devs install ten of these in a sprint and uninstall nine by Friday.
Interesting. What does net dollar retention look like when an agent customer churns their underlying LLM provider?
How small is the team that shipped the full stack? Asking because I want to know if I should feel bad about my weekend project.
We have a strong infra eng candidate currently at a foundation model lab looking at agent-adjacent roles, are you hiring beyond the founding team?
First 10 seconds of the launch video do not tell me what Judgment actually does, I had to scroll to the pinned demo. Bury the funding, lead with the product moment.
Hot take: 'production data' is the new 'data flywheel' and in 18 months half these companies will pivot to being a logging SDK with a dashboard.
Saw three eval startups launch this month and yours has the cleanest positioning by a wide margin. The 'continuous improvement' framing actually means something instead of just 'we have a dashboard'.