London, England

AI-ready Ground Truth Data for the next era.

It's simple. We sell data. Clean. Rare. Quiet. Annotated. Validated. Curated. And we sell lots of it. For a fraction of price.

Become a Partner
40+Bytes bitten
£180MCapital raised
98%Accuracy
7Years of data

Our Bespoke Partnerships

Google DeepMind
Microsoft
OpenAI
Anthropic
Meta AI
Hugging Face
Scale AI

What we do

Real-World,
Mechanistic
data on your doorstep

We generate and collate real-world and mechanistic data to train frontier AI. We form exclusive partnerships with clients in ML to meet their extensive data needs at a fraction of the regular cost.

Scientists, experts, experimentalists. These are the people we work with to scour the edge of what is known.

01

Curated Datasets

Hand-verified, domain-specific datasets built to your exact specification. Covering scientific, industrial, linguistic, and multimodal data at scale.

02

Annotation & Labelling

Expert-led annotation pipelines using domain specialists — not crowdsourced. Every label traceable, auditable, and validated to your quality bar.

03

Rare & Niche Data

Access to low-resource, hard-to-acquire datasets. We specialise in edge cases, underrepresented domains, and data types the major providers don't carry.

04

Synthetic Data Generation

Physics-grounded and model-generated synthetic data paired with real-world validation sets. Expands your training distribution without compromising ground truth.

05

Data Validation

Independent validation of your existing training data. We surface bias, coverage gaps, and label noise before they compound in your model.

06

Bespoke Partnerships

Exclusive, long-term data supply agreements for ML companies and research labs. Custom collection pipelines designed around your roadmap, not ours.

Our Domain Expertise

Healthcare & Life Sciences

Aerospace & Defence

Meet The team

Researchers who
ship.

A small, senior team. No juniors on client work. Everyone here has built and operated production ML systems—not just published papers or given conference talks.

TJ

Tamás Józsa

Founder, ML Lead

Former DeepMind research engineer. Specialises in large-scale pretraining and RLHF. PhD, Cambridge. Built inference systems serving 10M+ daily queries.

TM

Tianyu Ma

Infrastructure Lead

Ex-Google Brain SWE. Deep expertise in distributed training, GPU orchestration, and Kubernetes-native ML serving at petabyte scale.

AB

Adam Brierley

Data & Systems

Previously Palantir. Designed data pipelines for regulated industries. Expert in streaming architecture, feature engineering, and compliance-first ML.

KC

Kilyan Campo

LLM & Agents

Research background in NLP and reasoning. Built production RAG and agentic systems for legal, finance, and healthcare verticals. 4 papers at NeurIPS.

MG

Muhammad Gillani

LLM & Agents

Research background in NLP and reasoning. Built production RAG and agentic systems for legal, finance, and healthcare verticals. 4 papers at NeurIPS.

Pricing

Transparent
pricing.

Three engagement models designed for different stages and needs. All include direct access to senior engineers—no account managers in the way.

Foundation

Sprint

£18K/engagement

A focused 4-week engagement. Ideal for proofs of concept, model evaluation, or auditing an existing ML system.

  • 4-week fixed scope
  • 1 senior engineer
  • Technical report & recommendations
  • 2 workshops included
  • IP fully yours
Get started
Enterprise

Partner

Custom

Long-term strategic partnership. Full team embed, hiring support, and ML org design for companies scaling AI as a core capability.

  • 6–12 month engagements
  • Full team (4–8 engineers)
  • CTO-level advisory included
  • Hiring & team build support
  • Dedicated Slack & on-site
  • Custom SLA
Talk to us

Contact

Location

167-169 Great Portland Street, London
W1W 5PF

Response time

We respond to all enquiries within one business day.