Anatomy of a $643M Team

Twenty people,
ten months.

Get the full visual report as a PDF.

Subject

Eigen AI

Outcome

Acquired by Nebius · $643M

Headcount at exit

~20

Time from inc.

~10 months

A ~20-person startup, ten months out of incorporation, sold to Nebius for six hundred and forty‑three million dollars.

~20people

Total headcount

Founders, research, kernels, GTM, ops. 13 of ~20 with public LinkedIn profiles.

10months

Incorporation to acquisition

From day-zero to definitive agreement.

$643M

Acquisition value

~$98M cash plus 3.8M Nebius shares. Roughly $32M per employee.

0.1 — The claim

This was, first and foremost, a talent acquisition. Nebius didn't buy a product — they bought the people.

A tiny team that had already compressed a decade of trusted research networks, codebase ownership, and full‑stack inference expertise. So the rest of this report reads the deal as a talent‑flow story — who they are, how the team was built, and where that talent comes from.

3.1 — Composition

A research‑heavy core with deliberately thin operations.

Eight of thirteen hold doctorates. Three PhDs sit on the founding line. The rest of the team is sized to ship — not to manage.

Segment	Count	Representative members
Cofounders (PhD)	3	Ryan Wang · Di Jin · Wei‑Chen Wang
PhD engineers	4	Jinglei Cheng · Jiaao Chen · Jiacheng Yang · Zilin Shen
PhD (cross-domain)	1	Mingye Gao
MS / BS engineers	3	Yilian Liu · Samir Khaki · Zerui Xu
GTM / sales	1	Alexandra Yang
Business ops	1	Rachel Liu
Documented	13	of ~20 total · 62% PhDs · 85% technical

3.2 — Research firepower

A research footprint extraordinary for the team size.

10,000+ citations

Combined across founding team. Di Jin and Jiaao Chen each clear 5,000; combined h-index reaches 24.

8 / 13doctorates

From MIT (3), Purdue (2), Georgia Tech / Stanford, Toronto, NTU — concentrated, not scattered.

34patents filed

Wei‑Chen Wang alone — 23 granted — on quantization, memory, and inference techniques.

5+named systems

AWQ · MAgent · SpAtten · TextFooler · TinyChatEngine — each with thousands of stars or industry adoption.

6venues

Hardware (ISSCC, HPCA, MICRO) · Systems (MLSys, ASPLOS, EuroSys) · ML (NeurIPS, ICML, ICLR, CVPR) · NLP · Security.

3.3 — Product surface

Three planes, one vision.

The team called it Artificial Efficient Intelligence — full-stack optimization from data, through training, to the GPU kernel. Each plane is owned by people from the network.

Headline claim: 10x faster inference, 10x lower cost vs. baselines on open frontier models.

EigenData

Di Jin · Jiaao Chen

A self-evolving multi-agent platform for function-calling data. Audited the Berkeley Function-Calling Leaderboard and flagged 71.5% of samples as containing critical errors.

EigenTrain

Jiaao Chen · Di Jin

Controlled fine-tuning and RL post-training workflows. Function-calling on LLAMA 4 Maverick: BFCL 50 → 72.

EigenInference

Wei-Chen · Ryan · Zilin · Jiacheng

AWQ quantization, KV-cache, custom CUDA kernels, speculative decoding. Topped Artificial Analysis leaderboards on Kimi K2.6 with Nebius.

3.4 — The acquisition thesis

Nebius paid $643M for a team they had partnered with six weeks earlier — ~$32M per employee, to compress two-thirds of all future AI compute spend onto their GPU fleet.

3.5 — Buyer rationale

Right capability, right buyer.

Nebius runs a GPU “token factory” — a neocloud where margins are thin and efficiency is the product. Every gain in inference cost and speed compounds straight into capacity and margin.

And this caliber of inference talent is rare — most of it sits inside Google, Anthropic, OpenAI, and Fireworks. Right team, right buyer, right moment.

Capability acquired	Why it mattered to Nebius
Quantization · AWQ	Lower serving cost on every token.
GPU kernels	Higher utilization of the existing fleet.
Post-training	Better model behavior and accuracy.
Function-calling data	Enterprise-grade agent usability.
Patents · 23 granted	Track record of fileable innovation.
Dense ~20-person team	Fast integration, minimal overhead.

§ 02 — Section two of five

The inference stack, covered end to end by one small team.

Most teams own one layer. This team owns the entire vertical — quantization, sparsity, kernels, post-training, serving, on-device.

6.1 — Vertical integration

Every layer has a name — and that name built the tool.

On-device / TinyML

Wei-Chen Wang

TinyChatEngine, TinyEngine — pure C++ runtimes on ARM, x86, CUDA.

Post-training

Jiaao Chen · Di Jin

LLAMA 4 Maverick function-calling (BFCL 50 → 72), EigenData, EigenLoop.

Distributed inference

Jiacheng Yang

ScaleFusion, StreamFusion — sequence parallelism for diffusion transformers.

Model compression

Wei-Chen Wang

AWQ — 3.5k★, MLSys Best Paper. The de-facto LLM weight-quantization standard.

Sparse inference

Ryan Wang · Samir Khaki

SpAtten (most-cited HPCA since 2020), SparseLoRA, SparseRefine.

GPU kernels & compilers

Zilin Shen · Jiacheng Yang

FP4 Attention on Blackwell B200, Minuet sparse-convolution CUDA kernels.

6.2 — Proof of capability

AWQ — the standard.

AWQ is already the production standard for LLM weight quantization. Having its creator on staff means pushing the technique further than anyone using it as a black box.

It's proof of capability — a team that turns research into systems the industry runs on, and keeps innovating as architectures shift.

GitHub stars

3.5k

AWQ repo · mit-han-lab/llm-awq

Recognition

MLSys
Best Paper

Adopted across production LLM serving stacks.

Patents filed

By Wei-Chen Wang — quantization, memory, inference.

Patents granted

Of 34 filed — a track record of fileable innovation.

§ 03 — Section three of five

How Ryan turned a lab into a company.

Seven of ten technical hires trace back to a single advisor's research group. The remaining three were precise, deliberate gaps.

4.1 — Hiring topology

One node. Four rings.

Ryan Wang did not post job listings. He hired people he had already worked with — or whose code he had already read.

Lab alumniDirect collaborators inside MIT HAN Lab

ii.

Lab collaboratorsOne hop out — joint projects, shared GitHub orgs

iii.

Domain specialistsRecruited for specific gaps in the stack

iv.

GTMA single sales veteran, hired late

Ryan WangFounder · CEO

Wei-Chen WangYilian LiuJinglei ChengJiacheng YangSamir KhakiMingye GaoDi JinJiaao ChenZilin ShenAlexandra YangRachel Liu

4.2 — Circle one

Lab alumni — people Ryan had already shipped with.

i.01

Wei-Chen Wang

MIT HAN Lab postdoc

Co-creator of AWQ and TinyEngine alongside Ryan. The quantization stack itself walked over with him.

i.02

Yilian Liu

HAN Lab contributor

Contributor to TorchQuantum — already inside the codebase before the company existed.

i.03

Jinglei Cheng

Purdue · quantum overlap

Quantum computing collaborator on Ryan's QuantumNAS work — adjacent research lineage.

These three hires required no ramp-up. They had read the papers, used the codebases, and worked alongside the founder before day one.

4.3 — Circle two

Lab collaborators — one hop out from the node.

ii.01

Jiacheng Yang

UofT PhD · prior MIT stint

A 2018 research stint at MIT HAN Lab preceded his Toronto doctorate. Compilers, distributed inference.

ii.02

Samir Khaki

UofT MASc · Google / IBM

SparseRefine was published under the mit-han-lab GitHub org — a code-level collaborator.

ii.03

Mingye Gao

MIT EECS PhD · cross-domain

Same department, adjacent labs — institutional proximity without direct lab overlap.

Shared context, shared tooling, shared advisors. Three to six months of alignment work, eliminated.

4.4 — Circle three

Domain specialists — the deliberate gaps.

iii.01

Di Jin

MIT (Szolovits) · Meta LLAMA

NLP and LLM alignment — frontier post-training experience walked in the door.

iii.02

Jiaao Chen

Georgia Tech / Stanford · Meta

LLAMA 4 post-training and function-calling: BFCL 50 → 72 under his ownership.

iii.03

Zilin Shen

Purdue PhD · security

Security background plus FP4 Attention kernels targeting Blackwell B200.

These are not network hires — they are surgical fills for layers the lab itself didn't already cover.

4.5 — Circle four

GTM, hired last.

A single enterprise sales veteran joined three months before the acquisition closed. The role was likely less about building a broad pipeline — and more about enterprise credibility, buyer navigation, and closing motion.

Alexandra Yang · GTM

20yrs enterprise sales

Prior exit 01

Chef → Progress

Prior exit 02

CliQr → Cisco

She had run this exact playbook — startup to acquisition — twice before.

4.6 — Counter-pattern

What Ryan did not optimize for.

— Absent from the roster

No VP of Engineering.
No staff-level FAANG infra veterans.
No general-purpose ML engineers.
No marketing, no growth, no recruiting function.

— Present instead

~6 years average technical experience.
Researchers who built the tools the industry depends on — not people who used them.
Specialists per layer, zero redundancy.
Founder-led recruiting from a single network.

§ 04 — Section four of five

The talent map — four schools, three industry pipes.

Eleven of thirteen hires trace to four institutions — but the real map is the trust graph beneath them: advisors, co-authors, and shared codebases.

5.1 — Feeder institutions

Concentration, not breadth.

Top feeder · graduate

4 MIT Ryan · Wei-Chen · Mingye · Yilian (via research)

2 Purdue Jinglei Cheng · Zilin Shen

2 Toronto Jiacheng Yang · Samir Khaki

→

Undergrad feeders

3 Tsinghua Jinglei · Zilin · Di Jin

2 Shanghai Jiao Tong / Fudan ACM Class lineage

1 Nanyang Technological Wei-Chen Wang

→

Combined paths

11/13 from 4 schools MIT, Tsinghua, Purdue, Toronto

7/10 tech hires to HAN Lab direct or one-hop

~6yr avg. experience researchers, not industry vets

The signal isn't the logo on the diploma — it's the collaboration graph underneath it: shared advisors, shared codebases, prior co-authorship. Pedigree matters only when it points to that.

5.2 — Industry pipelines

Three industry pipes carried post-training know-how across the door.

Meta GenAI

LLAMA post-training→Di JinJiaao Chen→Eigen AI

Google

ML infra · tooling→Samir KhakiMingye Gao→Eigen AI

Amazon

applied science · intern + FT→Di JinJiaao ChenWei-Chen Wang→Eigen AI

Two people who worked on actual LLAMA post-training left Meta for a ten-month-old startup. That is a leading indicator — value creation moved from training to serving.

Research brief

Want the full visual breakdown?

Download the PDF version of this research brief, including the team map, hiring topology, and founder playbook.

7.0 — Takeaways

Six lessons for other founders.

01Hire tool builders, not tool users.Every key hire built a system others depend on. Look for downstream adoption — stars and integrations — not just citation counts.

02One advisor network can seed a company.Seven of ten technical hires from one lab is not nepotism — it's three to six months of alignment work eliminated.

03Cover the full stack with specialists.One quantization expert, one kernel engineer, one post-trainer. Zero redundancy, total coverage — how a tiny team out-ships a hundred.

04Frontier-lab departures are a talent signal.Two LLAMA post-training people left Meta for a ten-month-old startup. Track where frontier talent goes next — it leads the wave.

05Hire GTM late — but pick someone with exits.At ten people you don't need a sales team. When you do hire, hire someone who has closed the acquisition before.

06Pair papers with patents.Open papers built AWQ's adoption; 23 granted patents turned it into a balance-sheet asset a buyer can underwrite. Most research teams file none.

7.1 — The playbook

For founders building frontier AI infra.

Don't start with “we need five senior engineers.” Start with the bottleneck — then map the people who already own it.

1Start with the technical bottleneck, not the org chart.

2Map the research network around that bottleneck.

3Hire people who built adopted tools, not just wrote papers.

4Hire for layer coverage, not duplicate seniority.

5Add GTM only when there's buyer pull or strategic value.

6Treat patents, OSS, benchmarks, and deployments as distinct proof.

8.0 — In one frame

The pattern, end to end.

Input

One advisor's
research lab.

MIT HAN Lab. A decade of shared codebases and citations.

Method

Concentric
hiring.

Alumni, then collaborators, then specialists, then GTM — in that order.

Coverage

Full inference
vertical.

Six layers, named owners, named tools at every layer.

Outcome

$643M
in 10 months.

Acquired by Nebius. ~$32M per employee. The team was the asset.

9.0 — Methodology & limitations

How this read was assembled.

— Sources

Public LinkedIn profiles and work history.
Publications, citations, and conference records.
GitHub organizations and open-source adoption.
Personal websites and academic homepages.
Patent filings and grants.
Company posts and public acquisition materials.

— Limitations

Some team members or internal contributions may be missing.
This is not a complete org chart.
The goal is a pattern-level read on how the team was formed — not a roster.

Base to Base · Recruiting

The best early AI teams aren't built by broad sourcing. They're built by reading where technical trust already exists — labs, codebases, paper authorship, OSS adoption, and frontier-lab departures.

That graph often shows where company value is forming before the market narrative catches up.

— Closing

Twenty people,
one network,
full-stack coverage —
and six hundred and forty-three million dollars.

Report

Eigen AI · Talent Analysis

Prepared by

Base to Base · Recruiting

Twenty people,ten months.

A ~20-person startup, ten months out of incorporation, sold to Nebius for six hundred and forty‑three million dollars.

A research‑heavy core with deliberately thin operations.

A research footprint extraordinary for the team size.

Three planes, one vision.

Right capability, right buyer.

The inference stack, covered end to end by one small team.

Every layer has a name — and that name built the tool.

AWQ — the standard.

How Ryan turned a lab into a company.

One node. Four rings.

Lab alumni — people Ryan had already shipped with.

Lab collaborators — one hop out from the node.

Domain specialists — the deliberate gaps.

GTM, hired last.

What Ryan did not optimize for.

The talent map — four schools, three industry pipes.

Concentration, not breadth.

Top feeder · graduate

Undergrad feeders

Combined paths

Three industry pipes carried post-training know-how across the door.

Want the full visual breakdown?

Six lessons for other founders.

For founders building frontier AI infra.

The pattern, end to end.

How this read was assembled.

Twenty people,one network,full-stack coverage —and six hundred and forty-three million dollars.

Twenty people,
ten months.

Twenty people,
one network,
full-stack coverage —
and six hundred and forty-three million dollars.