Embedding Platform Engineers

Embedding platform engineers doesn’t fix your platform team bottleneck. It just makes the dysfunction harder to see.

The logic is appealing: attach an infra engineer to each squad. Guaranteed bandwidth, tight feedback loops, problem solved? But now you have duplicated your platform problems, once for each squad.

What embedding actually creates

By default, each embedded engineer builds what their squad needs, in the way that makes sense to them. Without shared patterns and without platform stewardship. Over months you get as many infrastructure dialects as you have squads. Nobody owns the whole. Nobody has incentives to.

The embedded engineer becomes a single point of failure. When they leave, nobody understands what they built.

Skills atrophy in isolation. Platform engineering rewards specialists who sharpen their tools across a wide surface area. That work — incident patterns, tooling investment, cross-functional coordination — simply isn’t available when you’re scoped to one squad’s needs.

And the platform never gets healthier. Nobody’s migrating off legacy systems. Nobody’s uplifting the security baseline. Nobody’s thinking about where this is all going in two years. That work isn’t anyone’s job anymore.

Central infra teams can be worse

Pull the embedded engineers back and you might land somewhere worse: a central infra team running on a ticket queue. Requests in, infrastructure out. No design input. No pushback. No ownership of outcomes.

These teams build what they’re asked to build. They implement designs handed to them by squads who know just enough about infrastructure to be dangerous. They ship components, not systems. A deployment touches three repos owned by three different teams and nobody can release without complex run-sheets.

Developers can’t move. Every feature that needs new infrastructure waits in a queue. While waiting, developers pick up the next thing — work-in-progress balloons, context switches multiply, quality goes down. The organisation moves slower than it did with embedded engineers.

This is what Lean thinkers describe as resource-optimisation instead of flow-optimisation — a concept the DevOps Handbook draws on heavily. The team’s utilisation looks great. But you have queues, long lead times and blocked work. Delivery is broken. This is what happens when you solve coordination problems with process instead of design.

How platform-product teams solve it

A real platform team treats developers as customers and infrastructure as a product. Evan Bottcher put it plainly: a platform is “a foundation of self-service APIs, tools, services, knowledge and support which are arranged as a compelling internal product” which avoids the “backlog coupling” problem described above.

That means building repeatable patterns that squads can deploy themselves — not doing it for them. Skelton and Pais call this X-as-a-Service: the platform team owns the path, the squad owns the journey. When the platform team builds infrastructure on a squad’s behalf, it owns the outcome. When squads deploy a golden path the platform team built, the squad owns the outcome — and the platform team improves the golden path given customer and security feedback.

Security, compliance, cost guardrails, observability aren’t negotiated per-ticket. They’re baked into the patterns. As Charity Majors puts it: “MAKE IT EASY TO DO THE RIGHT THING AND HARD TO DO THE WRONG THING.” Template repos, pipelines, and reusable libraries are how that principle becomes architecture as code.

The operating model needs to match. That means:

The team swarms on incidents and prioritises learnings
A support channel with SLAs that unblock small issues fast
cross-team operations reviews to maintain platform health
cross-team forums and communities of practice for design decisions that affect multiple teams
cross-team collaboration for proofs of concept, new designs and early adoption of new platform products
Architecture reviews for new vendors and anything that could impact the business and its integrations
A backlog informed by developer surveys and engineering lead input, not just the “highest paid person”
Protected capacity for platform-wide uplift

A good platform team explicitly divides their capacity. A decent fraction of the team is dedicated to platform product health and continuous improvement such as migrations, adoption, tooling, tech debt, security uplift. Minor products and features for squads are prioritised to maximise platform utility. Major new capabilities are roadmapped and funded as cross-functional projects, augmented with additional capacity as needed.

A good platform team can also manage their engineers’ capacity for long-term growth and sustainability. A cohesive team can pair on hard problems, rotate engineers across different areas, and choose initiatives that develop the skills they’ll need next. Someone junior can shadow a complex migration. Someone senior can step back from delivery to sharpen the tooling. None of this is available when engineers are scattered across squads and their growth is driven by whatever their squad needs that quarter.

For genuinely novel work — when a squad is moving fast and the scope isn’t clear — timeboxed collaboration makes sense. Skelton and Pais are explicit: collaboration mode with squads is for discovering unknowns, not for permanent attachment. Platform teams pair closely with squads for timeboxed proofs of concept and solution design. The goal is to learn enough to build a pattern, then step back.

Can embedding ever work?

Theoretically, yes — if the team has strong platform-craft and treats standards as non-negotiable. In practice, I haven’t seen it work.

The “King Solomon’s Baby” problem: The embedded engineer gets torn between two masters: platform obligations and squad obligations. Two standups. Two Slack channels. Two leads who can’t fully manage their capacity, and neither can fully hold them accountable. The engineer belongs to everyone and no one.

The result is usually burnout, resentment, or quiet drift. The engineer slowly becomes a squad engineer who happens to build some bespoke infrastructure.

One mitigation is to keep engineers rotating between squads rather than committing long-term to any one. That reduces the capture problem but introduces its own costs: just as the engineer builds context, they move on, and the squad starts over.

The binary is false

Embedded or central isn’t the real choice. The real choice is whether your platform engineers have a product mindset.

A platform team that does has most of what embedding promises — fast feedback, relevant patterns, reserved capacity for complex work — without the fragmentation. Squads stay autonomous. The platform stays coherent. Skills compound instead of diverging.

The test is simple: can a new engineer safely deploy a new service within four hours of onboarding? Can security hotfix be rolled out to all services in a day? If not, you have a platform product problem. Platform engineers have been building bespoke infrastructure instead of building reusable patterns that optimise production deployments. Embedding engineers fragments this problem into every squad and reduces focus.