Anatomy of a $643M Team

Twenty people,
ten months.

Get the full visual report as a PDF.
Subject
Eigen AI
Outcome
Acquired by Nebius · $643M
Headcount at exit
~20
Time from inc.
~10 months

A ~20-person startup, ten months out of incorporation, sold to Nebius for six hundred and forty‑three million dollars.

~20people
Total headcount
Founders, research, kernels, GTM, ops. 13 of ~20 with public LinkedIn profiles.
10months
Incorporation to acquisition
From day-zero to definitive agreement.
$643M
Acquisition value
~$98M cash plus 3.8M Nebius shares. Roughly $32M per employee.

0.1 — The claim

This was, first and foremost, a talent acquisition. Nebius didn't buy a product — they bought the people.

A tiny team that had already compressed a decade of trusted research networks, codebase ownership, and full‑stack inference expertise. So the rest of this report reads the deal as a talent‑flow story — who they are, how the team was built, and where that talent comes from.

3.1 — Composition

A research‑heavy core with deliberately thin operations.

Eight of thirteen hold doctorates. Three PhDs sit on the founding line. The rest of the team is sized to ship — not to manage.

SegmentCountRepresentative members
Cofounders (PhD)3Ryan Wang · Di Jin · Wei‑Chen Wang
PhD engineers4Jinglei Cheng · Jiaao Chen · Jiacheng Yang · Zilin Shen
PhD (cross-domain)1Mingye Gao
MS / BS engineers3Yilian Liu · Samir Khaki · Zerui Xu
GTM / sales1Alexandra Yang
Business ops1Rachel Liu
Documented13of ~20 total · 62% PhDs · 85% technical

3.2 — Research firepower

A research footprint extraordinary for the team size.

10,000+ citations
Combined across founding team. Di Jin and Jiaao Chen each clear 5,000; combined h-index reaches 24.
8 / 13doctorates
From MIT (3), Purdue (2), Georgia Tech / Stanford, Toronto, NTU — concentrated, not scattered.
34patents filed
Wei‑Chen Wang alone — 23 granted — on quantization, memory, and inference techniques.
5+named systems
AWQ · MAgent · SpAtten · TextFooler · TinyChatEngine — each with thousands of stars or industry adoption.
6venues
Hardware (ISSCC, HPCA, MICRO) · Systems (MLSys, ASPLOS, EuroSys) · ML (NeurIPS, ICML, ICLR, CVPR) · NLP · Security.

3.3 — Product surface

Three planes, one vision.

The team called it Artificial Efficient Intelligence — full-stack optimization from data, through training, to the GPU kernel. Each plane is owned by people from the network.

Headline claim: 10x faster inference, 10x lower cost vs. baselines on open frontier models.

P1
EigenData
Di Jin · Jiaao Chen
A self-evolving multi-agent platform for function-calling data. Audited the Berkeley Function-Calling Leaderboard and flagged 71.5% of samples as containing critical errors.
P2
EigenTrain
Jiaao Chen · Di Jin
Controlled fine-tuning and RL post-training workflows. Function-calling on LLAMA 4 Maverick: BFCL 50 → 72.
P3
EigenInference
Wei-Chen · Ryan · Zilin · Jiacheng
AWQ quantization, KV-cache, custom CUDA kernels, speculative decoding. Topped Artificial Analysis leaderboards on Kimi K2.6 with Nebius.

3.4 — The acquisition thesis

Nebius paid $643M for a team they had partnered with six weeks earlier — ~$32M per employee, to compress two-thirds of all future AI compute spend onto their GPU fleet.

3.5 — Buyer rationale

Right capability, right buyer.

Nebius runs a GPU “token factory” — a neocloud where margins are thin and efficiency is the product. Every gain in inference cost and speed compounds straight into capacity and margin.

And this caliber of inference talent is rare — most of it sits inside Google, Anthropic, OpenAI, and Fireworks. Right team, right buyer, right moment.

Capability acquiredWhy it mattered to Nebius
Quantization · AWQLower serving cost on every token.
GPU kernelsHigher utilization of the existing fleet.
Post-trainingBetter model behavior and accuracy.
Function-calling dataEnterprise-grade agent usability.
Patents · 23 grantedTrack record of fileable innovation.
Dense ~20-person teamFast integration, minimal overhead.

§ 02 — Section two of five

The inference stack, covered end to end by one small team.

Most teams own one layer. This team owns the entire vertical — quantization, sparsity, kernels, post-training, serving, on-device.

6.1 — Vertical integration

Every layer has a name — and that name built the tool.

L7
On-device / TinyML
Wei-Chen Wang
TinyChatEngine, TinyEngine — pure C++ runtimes on ARM, x86, CUDA.
L6
Post-training
Jiaao Chen · Di Jin
LLAMA 4 Maverick function-calling (BFCL 50 → 72), EigenData, EigenLoop.
L5
Distributed inference
Jiacheng Yang
ScaleFusion, StreamFusion — sequence parallelism for diffusion transformers.
L4
Model compression
Wei-Chen Wang
AWQ — 3.5k★, MLSys Best Paper. The de-facto LLM weight-quantization standard.
L3
Sparse inference
Ryan Wang · Samir Khaki
SpAtten (most-cited HPCA since 2020), SparseLoRA, SparseRefine.
L2
GPU kernels & compilers
Zilin Shen · Jiacheng Yang
FP4 Attention on Blackwell B200, Minuet sparse-convolution CUDA kernels.

6.2 — Proof of capability

AWQ — the standard.

AWQ is already the production standard for LLM weight quantization. Having its creator on staff means pushing the technique further than anyone using it as a black box.

It's proof of capability — a team that turns research into systems the industry runs on, and keeps innovating as architectures shift.

GitHub stars
3.5k
AWQ repo · mit-han-lab/llm-awq
Recognition
MLSys
Best Paper
Adopted across production LLM serving stacks.
Patents filed
34
By Wei-Chen Wang — quantization, memory, inference.
Patents granted
23
Of 34 filed — a track record of fileable innovation.

§ 03 — Section three of five

How Ryan turned a lab into a company.

Seven of ten technical hires trace back to a single advisor's research group. The remaining three were precise, deliberate gaps.

4.1 — Hiring topology

One node. Four rings.

Ryan Wang did not post job listings. He hired people he had already worked with — or whose code he had already read.

i.
Lab alumniDirect collaborators inside MIT HAN Lab
ii.
Lab collaboratorsOne hop out — joint projects, shared GitHub orgs
iii.
Domain specialistsRecruited for specific gaps in the stack
iv.
GTMA single sales veteran, hired late
Ryan WangFounder · CEO
Wei-Chen WangYilian LiuJinglei ChengJiacheng YangSamir KhakiMingye GaoDi JinJiaao ChenZilin ShenAlexandra YangRachel Liu

4.2 — Circle one

Lab alumni — people Ryan had already shipped with.

i.01
Wei-Chen Wang
MIT HAN Lab postdoc
Co-creator of AWQ and TinyEngine alongside Ryan. The quantization stack itself walked over with him.
i.02
Yilian Liu
HAN Lab contributor
Contributor to TorchQuantum — already inside the codebase before the company existed.
i.03
Jinglei Cheng
Purdue · quantum overlap
Quantum computing collaborator on Ryan's QuantumNAS work — adjacent research lineage.

These three hires required no ramp-up. They had read the papers, used the codebases, and worked alongside the founder before day one.

4.3 — Circle two

Lab collaborators — one hop out from the node.

ii.01
Jiacheng Yang
UofT PhD · prior MIT stint
A 2018 research stint at MIT HAN Lab preceded his Toronto doctorate. Compilers, distributed inference.
ii.02
Samir Khaki
UofT MASc · Google / IBM
SparseRefine was published under the mit-han-lab GitHub org — a code-level collaborator.
ii.03
Mingye Gao
MIT EECS PhD · cross-domain
Same department, adjacent labs — institutional proximity without direct lab overlap.

Shared context, shared tooling, shared advisors. Three to six months of alignment work, eliminated.

4.4 — Circle three

Domain specialists — the deliberate gaps.

iii.01
Di Jin
MIT (Szolovits) · Meta LLAMA
NLP and LLM alignment — frontier post-training experience walked in the door.
iii.02
Jiaao Chen
Georgia Tech / Stanford · Meta
LLAMA 4 post-training and function-calling: BFCL 50 → 72 under his ownership.
iii.03
Zilin Shen
Purdue PhD · security
Security background plus FP4 Attention kernels targeting Blackwell B200.

These are not network hires — they are surgical fills for layers the lab itself didn't already cover.

4.5 — Circle four

GTM, hired last.

A single enterprise sales veteran joined three months before the acquisition closed. The role was likely less about building a broad pipeline — and more about enterprise credibility, buyer navigation, and closing motion.

Alexandra Yang · GTM
20yrs enterprise sales
Prior exit 01
Chef → Progress
Prior exit 02
CliQr → Cisco

She had run this exact playbook — startup to acquisition — twice before.

4.6 — Counter-pattern

What Ryan did not optimize for.

— Absent from the roster

  • No VP of Engineering.
  • No staff-level FAANG infra veterans.
  • No general-purpose ML engineers.
  • No marketing, no growth, no recruiting function.

— Present instead

  • ~6 years average technical experience.
  • Researchers who built the tools the industry depends on — not people who used them.
  • Specialists per layer, zero redundancy.
  • Founder-led recruiting from a single network.

§ 04 — Section four of five

The talent map — four schools, three industry pipes.

Eleven of thirteen hires trace to four institutions — but the real map is the trust graph beneath them: advisors, co-authors, and shared codebases.

5.1 — Feeder institutions

Concentration, not breadth.

Top feeder · graduate

4 MIT Ryan · Wei-Chen · Mingye · Yilian (via research)
2 Purdue Jinglei Cheng · Zilin Shen
2 Toronto Jiacheng Yang · Samir Khaki

Undergrad feeders

3 Tsinghua Jinglei · Zilin · Di Jin
2 Shanghai Jiao Tong / Fudan ACM Class lineage
1 Nanyang Technological Wei-Chen Wang

Combined paths

11/13 from 4 schools MIT, Tsinghua, Purdue, Toronto
7/10 tech hires to HAN Lab direct or one-hop
~6yr avg. experience researchers, not industry vets

The signal isn't the logo on the diploma — it's the collaboration graph underneath it: shared advisors, shared codebases, prior co-authorship. Pedigree matters only when it points to that.

5.2 — Industry pipelines

Three industry pipes carried post-training know-how across the door.

Meta GenAI
LLAMA post-trainingDi JinJiaao ChenEigen AI
Google
ML infra · toolingSamir KhakiMingye GaoEigen AI
Amazon
applied science · intern + FTDi JinJiaao ChenWei-Chen WangEigen AI

Two people who worked on actual LLAMA post-training left Meta for a ten-month-old startup. That is a leading indicator — value creation moved from training to serving.

Research brief

Want the full visual breakdown?

Download the PDF version of this research brief, including the team map, hiring topology, and founder playbook.

7.0 — Takeaways

Six lessons for other founders.

01Hire tool builders, not tool users.Every key hire built a system others depend on. Look for downstream adoption — stars and integrations — not just citation counts.
02One advisor network can seed a company.Seven of ten technical hires from one lab is not nepotism — it's three to six months of alignment work eliminated.
03Cover the full stack with specialists.One quantization expert, one kernel engineer, one post-trainer. Zero redundancy, total coverage — how a tiny team out-ships a hundred.
04Frontier-lab departures are a talent signal.Two LLAMA post-training people left Meta for a ten-month-old startup. Track where frontier talent goes next — it leads the wave.
05Hire GTM late — but pick someone with exits.At ten people you don't need a sales team. When you do hire, hire someone who has closed the acquisition before.
06Pair papers with patents.Open papers built AWQ's adoption; 23 granted patents turned it into a balance-sheet asset a buyer can underwrite. Most research teams file none.

7.1 — The playbook

For founders building frontier AI infra.

Don't start with “we need five senior engineers.” Start with the bottleneck — then map the people who already own it.

1Start with the technical bottleneck, not the org chart.
2Map the research network around that bottleneck.
3Hire people who built adopted tools, not just wrote papers.
4Hire for layer coverage, not duplicate seniority.
5Add GTM only when there's buyer pull or strategic value.
6Treat patents, OSS, benchmarks, and deployments as distinct proof.

8.0 — In one frame

The pattern, end to end.

Input
One advisor's
research lab.
MIT HAN Lab. A decade of shared codebases and citations.
Method
Concentric
hiring.
Alumni, then collaborators, then specialists, then GTM — in that order.
Coverage
Full inference
vertical.
Six layers, named owners, named tools at every layer.
Outcome
$643M
in 10 months.
Acquired by Nebius. ~$32M per employee. The team was the asset.

9.0 — Methodology & limitations

How this read was assembled.

— Sources

  • Public LinkedIn profiles and work history.
  • Publications, citations, and conference records.
  • GitHub organizations and open-source adoption.
  • Personal websites and academic homepages.
  • Patent filings and grants.
  • Company posts and public acquisition materials.

— Limitations

  • Some team members or internal contributions may be missing.
  • This is not a complete org chart.
  • The goal is a pattern-level read on how the team was formed — not a roster.

Base to Base · Recruiting

The best early AI teams aren't built by broad sourcing. They're built by reading where technical trust already exists — labs, codebases, paper authorship, OSS adoption, and frontier-lab departures.

That graph often shows where company value is forming before the market narrative catches up.

— Closing

Twenty people,
one network,
full-stack coverage
and six hundred and forty-three million dollars.

Report
Eigen AI · Talent Analysis
Prepared by
Base to Base · Recruiting