London, England
It's simple. We sell data. Clean. Rare. Quiet. Annotated. Validated. Curated. And we sell lots of it. For a fraction of price.
Become a PartnerOur Bespoke Partnerships
What we do
We generate and collate real-world and mechanistic data to train frontier AI. We form exclusive partnerships with clients in ML to meet their extensive data needs at a fraction of the regular cost.
Scientists, experts, experimentalists. These are the people we work with to scour the edge of what is known.
01
Hand-verified, domain-specific datasets built to your exact specification. Covering scientific, industrial, linguistic, and multimodal data at scale.
02
Expert-led annotation pipelines using domain specialists — not crowdsourced. Every label traceable, auditable, and validated to your quality bar.
03
Access to low-resource, hard-to-acquire datasets. We specialise in edge cases, underrepresented domains, and data types the major providers don't carry.
04
Physics-grounded and model-generated synthetic data paired with real-world validation sets. Expands your training distribution without compromising ground truth.
05
Independent validation of your existing training data. We surface bias, coverage gaps, and label noise before they compound in your model.
06
Exclusive, long-term data supply agreements for ML companies and research labs. Custom collection pipelines designed around your roadmap, not ours.
Our Domain Expertise
Meet The team
A small, senior team. No juniors on client work. Everyone here has built and operated production ML systems—not just published papers or given conference talks.
Tamás Józsa
Founder, ML Lead
Former DeepMind research engineer. Specialises in large-scale pretraining and RLHF. PhD, Cambridge. Built inference systems serving 10M+ daily queries.
Tianyu Ma
Infrastructure Lead
Ex-Google Brain SWE. Deep expertise in distributed training, GPU orchestration, and Kubernetes-native ML serving at petabyte scale.
Adam Brierley
Data & Systems
Previously Palantir. Designed data pipelines for regulated industries. Expert in streaming architecture, feature engineering, and compliance-first ML.
Kilyan Campo
LLM & Agents
Research background in NLP and reasoning. Built production RAG and agentic systems for legal, finance, and healthcare verticals. 4 papers at NeurIPS.
Muhammad Gillani
LLM & Agents
Research background in NLP and reasoning. Built production RAG and agentic systems for legal, finance, and healthcare verticals. 4 papers at NeurIPS.
Pricing
Three engagement models designed for different stages and needs. All include direct access to senior engineers—no account managers in the way.
Sprint
£18K/engagement
A focused 4-week engagement. Ideal for proofs of concept, model evaluation, or auditing an existing ML system.
Build
£12K/month
Ongoing embedded engineering. A dedicated pod ships production-grade ML systems alongside your team, month by month.
Partner
Custom
Long-term strategic partnership. Full team embed, hiring support, and ML org design for companies scaling AI as a core capability.
Contact
Location
167-169 Great Portland Street, London
W1W 5PF
Response time
We respond to all enquiries within one business day.
Calendar