How MIT’s ScienceClaw Runs Hundreds of AI Agents Without a Central Planner

MIT’s open-source agent swarm replaces the orchestrator with an artifact reactor. The architecture is worth studying even if you’ll never build a science swarm.

May 11, 2026

TL;DR - On March 15, 2026, a team led by MIT’s Markus Buehler released ScienceClaw + Infinite, an open-source framework where autonomous AI agents conduct scientific research across a registry of more than 300 interoperable tools. The system is Apache 2.0-licensed and built around a coordination pattern most production multi-agent systems don’t use: there is no central planner. Agents broadcast unsatisfied research needs into a shared index, peer agents pick those needs up via schema-overlap matching, and a component called the ArtifactReactor uses pressure-based scoring to bias the swarm toward high-impact directions. Every computation produces an immutable, content-hashed artifact with explicit parent lineage, accumulating in a directed acyclic graph. The repository is research-grade — five GitHub stars, four contributors, fifty-five commits as of early May 2026 — so this is not a drop-in production system. But the coordination pattern is what to take from it. If you are building multi-agent systems where the planner has become a brittle bottleneck, ScienceClaw shows what plannerless coordination via a typed-artifact substrate looks like in practice. Read the paper, skim the repo, port the patterns.

What ScienceClaw actually is

ScienceClaw + Infinite is an open-source multi-agent framework, released by MIT’s Laboratory for Atomistic and Molecular Mechanics in March 2026, where autonomous AI agents conduct scientific investigations across a catalog of more than 300 tools. Agents coordinate without a central scheduler: they broadcast unmet research needs and peer agents fulfill them through schema-matching on artifact types.

The system has three named components: an extensible registry of scientific skills, an artifact layer that preserves full computational lineage as a directed acyclic graph (DAG), and the Infinite platform — a structured space for agent-based scientific discourse with provenance-aware governance. The stack runs on top of OpenClaw, requires Node.js ≥ 22 and Python ≥ 3.8, and supports multiple LLM backends including Anthropic, OpenAI, and Hugging Face models alongside the default OpenClaw runtime. Once installed, agents run as a 4-hour heartbeat daemon — scienceclaw-heartbeat.service — that periodically scans for sessions to join, needs to fulfill, and findings to validate.

The paper presents four autonomous investigations: peptide design for the somatostatin receptor SSTR2, lightweight impact-resistant ceramic screening, cross-domain resonance bridging biology, materials and music, and formal analogy construction between urban morphology and grain-boundary evolution. The last of those produced a concrete output: a de novo Hierarchical Ribbed Membrane Lattice that, when validated with 3D finite-element analysis, resonates at 2.116 kHz and exhibits nine elastic modes in the 2–8 kHz band — relevant to acoustic filtering and bio-inspired sensing. Buehler reports that no human directed the cross-domain mapping, the gap identification, or the design generation.

The plannerless coordination loop

Most production multi-agent frameworks are orchestrator-based. A planner LLM decomposes the user’s request into subtasks, assigns them to agents, and either supervises execution or rewires the plan as new information arrives. AutoGen, CrewAI, and most LangGraph patterns sit in this family. The orchestrator is the throat through which all coordination flows.

ScienceClaw inverts this. There is no planner. Coordination emerges from three primitives: typed artifacts produced by every computation, a global index where agents broadcast unsatisfied information needs, and pressure-based scoring that biases attention toward high-impact directions.

The mechanic is straightforward. When an agent produces an artifact — say, a list of candidate peptide sequences — it is wrapped as an immutable, content-addressed record with typed metadata and parent lineage, then dropped into a shared store. When that agent hits a question it cannot answer with its own skills — say, ADMET prediction — it broadcasts the unmet need into the global index. Peer agents discovering this index during their own heartbeat cycles via the ArtifactReactor pick up matching needs, run the fulfilling skill, and post their result as another comment on the same Infinite thread, creating a growing, traceable conversation between agents that never explicitly assigned each other tasks. Schema-overlap matching does the routing: when one agent posts an artifact whose schema is a downstream input for another agent’s skill, the second agent detects the match implicitly.

If the pattern feels familiar, that is because it is. This is a modern blackboard architecture — the 1970s-era pattern where multiple knowledge sources read from and write to a shared substrate — re-implemented for typed LLM agents. Buehler describes it categorically as a pullback in category theory: distinct domains (biology, metamaterials, music) become categories of objects, the shared feature space is a functor, and the ArtifactReactor’s schema-overlap matching behaves like the universal object connecting them. That is a fancier way to say agents see each other through types, not orchestration.

Why this matters: where orchestrators break

Orchestrator-based multi-agent systems work well when the work is well-specified, the agent set is small and stable, and the planning context fits. They fall apart in the opposite regime.

As agent counts grow, the planner’s context bloats with state about every agent’s capabilities, current task, intermediate outputs, and dependencies. Plans get longer, the planner’s reasoning gets shallower per step, and small misroutings compound. Adding a new agent means changing the planner’s prompts or fine-tuning. Removing one means dependency repair. The planner becomes the channel through which all coordination passes — and the single point of contention.

Plannerless coordination shifts the harness. Instead of encoding routing in a planner’s prompts, ScienceClaw encodes it in the substrate: typed artifacts, schema matches, and pressure scores. Agents see each other through what they produce and what they need, not through a central agenda. An autonomous mutation layer prunes the expanding artifact DAG to resolve conflicting or redundant workflows, and persistent memory lets agents build on prior epistemic states across cycles. The result is an architecture that scales by addition: contribute an agent, contribute a skill, the swarm reorganizes around it without rewiring.

There is a second consequence worth pulling out. Every computation in ScienceClaw produces an immutable artifact with explicit parent lineage, accumulating in a directed acyclic graph that preserves the full provenance of every discovery. Provenance is what production AI teams typically bolt on as observability — a tracing layer wrapped around an existing system. Here it is the substrate. The DAG is the coordination medium and the audit log. You cannot have one without the other.

How agents actually select tools

The headline question for engineers reading this: how do agents decide which tools to call?

ScienceClaw’s answer is that there is no domain-to-tool routing table. The LLM analyzes the topic and selects three to five skills from the full catalog, with skills auto-discovered from the skills/ directory. The README is explicit: “No hardcoded domain → tool mapping — selection adapts to any research question.” Add a skill folder with a SKILL.md and the catalog picks it up.

The catalog spans roughly fifteen tool families covering the working set of a modern computational research lab. Sequence and structural biology are represented by BLAST, UniProt, and PDB; literature by PubMed and ArXiv; cheminformatics by PubChem, ChEMBL, RDKit, and TDC; materials by the Materials Project and NIST WebBook; plus general-purpose web search and data visualization. Each is a thin Python wrapper that exposes a uniform invocation surface. Agents reason about which skills apply, chain them, and produce artifacts at every step.

There is a separate, smaller decision the system makes at the social layer: role assignment. ScienceClaw exposes five roles — investigator, validator, critic, synthesizer, and screener — assigned based on skills and personality during session joining. Investigators explore. Validators independently re-verify findings using different tools. Critics challenge logic and propose alternatives. Synthesizers integrate disagreements. Screeners parallelize high-throughput work. Upvotes and downvotes require structured reasoning and citations; they are evidence-backed, not sentiment. Disagreement is preserved as validated, challenged, under review, or disputed rather than forced into unanimity.

This matters for engineers because role-plus-interaction-type is a different shape of coordination than control flow. You are not writing the workflow. You are writing the vocabulary the workflow uses to assemble itself.

The coordination loop, end to end

The co-ordination loop

The eight-step coordination loop runs without a central planner. Skill-based discovery, role assignment, and schema matching happen as side effects of the heartbeat — not as orchestrated control flow. The full loop and its four-layer implementation are documented in the README and the paper.

What’s actually shipped — and what to be careful about

The four investigations in the paper are real and worth reading, but the framing matters.

The peptide design investigation targeted SSTR2, a somatostatin receptor with established cancer relevance. The lightweight ceramic work was a screening pipeline. The cross-domain resonance investigation produced the Hierarchical Ribbed Membrane Lattice with the 2.116 kHz primary mode that I mentioned above, and validated the design with finite-element analysis. The urban-morphology-to-grain-boundary work built a formal analogy between two fields with no prior cross-citation. The paper’s core empirical claim is that across these four cases, the framework demonstrates heterogeneous tool chaining, emergent convergence among independently operating agents, and traceable reasoning from raw computation to published finding.

What the paper does not yet show is large-scale cross-institutional coordination. Buehler’s announcement describes ScienceClaw × Infinite as a swarm “across institutions, labs and the world”, and the architecture is built for it: anyone can deploy an agent or contribute a skill, the heartbeat runs 24/7 without a central coordinator. But the four investigations in the paper are produced by Buehler’s MIT team. The cross-institutional layer is a design property, not a demonstrated outcome — at least not yet.

The repo state confirms this is early. Five GitHub stars, four contributors, fifty-five commits at the time of writing. Posting to Infinite requires a minimum of 10 karma, which agents earn through commenting and voting before they can post — a sensible spam guard, but a reminder that the surrounding social layer is also under construction. There are rate limits: one post per 30 minutes, fifty comments per day, two hundred votes per day. This is a research artifact, generously open-sourced, that aligns with the broader DOE Genesis Mission’s stated goal of doubling the productivity and impact of American science within a decade, but it is not a production system.

That framing is also the right way to consume it.

The broader OpenClaw scientific ecosystem this sits inside is itself worth knowing about. A bioRxiv paper from late March 2026 catalogued 91 projects and 2,230 skills across 34 scientific categories in the OpenClaw scientific agent ecosystem, and ScienceClaw is one of the more architecturally distinct entries. The pattern across the ecosystem — skill-based agent design where workflows are expressed as structured Markdown files, lowering the barrier to contribution — is what makes the substrate-driven coordination model viable at all. Agents do not need to know about each other in advance because the skill catalog and the artifact types form a shared language.

What production AI engineers should take from this

The patterns transfer even if the framework does not.

Schema-typed artifacts as a routing primitive. The most portable idea in ScienceClaw is that the type of an artifact is the routing signal. If an agent produces a peptide_sequences artifact, any agent whose SKILL.md declares peptide_sequences as an input can pick it up. That removes a layer of planner reasoning. Production multi-agent systems can adopt this without going fully plannerless: type your intermediate artifacts, expose schemas as inputs and outputs, and let the substrate dispatch.

Provenance as substrate, not afterthought. Treat the artifact DAG as the source of truth for both coordination and audit. If your current observability is wrapping logs around an opaque LangGraph state, you are paying twice. ScienceClaw’s pattern — content-hashed, immutable, lineage-preserving artifacts dropped into a shared store — gives you a deterministic replay of any investigation, and the cost is mostly upfront design discipline.

Roles plus interaction types as coordination semantics. The investigator/validator/critic/synthesizer split is a coordination pattern, not a UI metaphor. You can implement it on top of any agent framework: tag each agent’s purpose, define a small interaction-type vocabulary (challenge, validate, extend, synthesize, request_help), and write your prompts to respect those roles. You will find that consensus and disagreement become legible in your traces in a way they typically are not.

Plannerless is not always the answer. Orchestrator-based architectures still win when the workload is bounded, the agent set is small, and latency matters. Plannerless coordination has overhead — the pressure scoring, the schema matching, the heartbeat cadence — and it works best when the work is open-ended and agents can be added or removed dynamically. Apply it where it fits.

If you want to experiment with these patterns without adopting ScienceClaw wholesale, the cheapest path is to add a needs board to your existing system. Let one agent post what it cannot do; let peer agents pick those needs up on their own schedule. You will learn whether plannerless coordination buys anything for your domain in about a week of work.

FAQ

Is ScienceClaw production-ready? No. Five GitHub stars, four contributors, an academic paper from March 2026, and a Vercel-deployed Infinite platform. Treat it as a reference architecture and a research artifact, not a runtime you deploy this quarter.

How is it different from CrewAI or other frameworks? Most frameworks use orchestrator-based coordination — a central agent decomposes work and assigns it. ScienceClaw uses plannerless coordination via the ArtifactReactor: agents broadcast unsatisfied needs and peers fulfill them via schema-overlap matching, without any planner assigning tasks. The closest analogue is a 1970s blackboard architecture, modernized for typed-artifact LLM agents.

Can I use Claude as the agent backbone? Yes. The repository documents Anthropic, OpenAI, and Hugging Face as supported LLM backends, with OpenClaw as the default runtime. Setup is via LLM_BACKEND=anthropic and the corresponding API key.

Does it actually produce real scientific results? The paper presents four investigations across peptide design, ceramic screening, cross-domain resonance, and urban-morphology analogy, and one of them produced a finite-element-validated metamaterial design with concrete acoustic properties. Whether those count as “real scientific results” depends on whether you mean novel publishable findings or experiments still pending wet-lab validation. The framework’s contribution is the coordination pattern; the scientific outputs are early demonstrations.

Should I read the paper or the repo first? The paper for the architecture and the experimental results. The repo’s ARCHITECTURE.md and the multi-agent examples in the README for the implementation patterns. Both fit in an afternoon.

Closing

The interesting question is not whether ScienceClaw will become the dominant scientific agent platform. It probably will not, on its own. The interesting question is what production AI engineers should port out of it before someone else does.

Type your artifacts. Make provenance substrate, not observability. Let agents post what they need rather than wait for a planner to figure it out for them. The coordination patterns ScienceClaw demonstrates are old ideas — blackboard architectures, tuple spaces, content-addressable artifacts — applied with discipline to the LLM-agent stack. They were good ideas in 1975 and they remain good ideas now.

If your multi-agent system has a planner that has become the most fragile component in your harness, ScienceClaw is the cleanest open-source reference you can read this month for what the alternative looks like. Read the paper. Skim the repo. Then go look at the planner in your own system and ask what would happen if you replaced it with a needs board, a type system, and a pressure score.

The AI Runtime

Discussion about this post

Ready for more?