Cline Breaks Its Agent Free From the IDE: The Open-Source SDK That Lets AI Sessions Outlive Any App
Agent sessions that die when a developer closes a tab are no longer an architectural inevitability. Cline published its @cline/sdk on May 14, 2026 — this AI Universe briefing, dated 2026-05-16, summarizes that release — extracting the internal agent harness that previously lived inside its IDE extension into a standalone, open-source TypeScript SDK. The consequence is direct: long-running agent work no longer dies with a UI restart, and sessions can move across surfaces because the agent loop is stateless by design. That is the real story, and it is an architectural one.
The benchmark numbers sharpen the case. Cline’s CLI, running claude-opus-4.7, scored 74.2% on Terminal Benchmark 2.0 — a standard tracked at tbench.ai — against Anthropic’s own published score of 69.4% for Claude Code on the same model. On claude-opus-4.6, Cline CLI reached 71.9% versus Claude Code’s published 65.4%. The Cline team’s own framing for this effort is two words: “Rebuilding the foundation.”
The SDK is available now via npm install @cline/sdk, requires Node.js 22 or later, and supports Anthropic, OpenAI, Google, AWS Bedrock, Mistral, and any OpenAI-compatible endpoint. Full documentation lives at docs.cline.bot/sdk and the official announcement at https://cline.bot/blog/introducing-cline-sdk-the-upgraded-agent-runtime.
A Four-Layer Stack Built to Separate Concerns — and Survive Restarts
The @cline/sdk is not a thin wrapper. It is a deliberately layered TypeScript stack composed of four discrete packages, each with a bounded responsibility. At the foundation sits @cline/shared, which carries types, schemas, tool helpers, hook contracts, and extension registration utilities, with no dependencies on any higher layer. Above it, @cline/llms owns the provider gateway and model catalogs — covering Anthropic, OpenAI, Google, AWS Bedrock, Mistral, LiteLLM, and OpenAI-compatible endpoints such as vLLM, Together, and Fireworks. Crucially, all provider logic is kept out of the agent loop, so switching providers is a configuration change, not a code change.
The third layer, @cline/agents, is the stateless agent execution loop itself: it handles iteration, tool orchestration, and event emission, but deliberately does not own session storage, built-in file or shell tools, or Node-specific orchestration. That separation is what makes it embeddable in browser environments. The top layer, @cline/core, is the Node runtime and orchestration layer, responsible for sessions, storage, built-in tools, hub and remote transports, automation and scheduling, telemetry, and plugin and extension loading. Developers who only need a browser-compatible stateless loop can install just @cline/agents, @cline/shared, and @cline/llms; those who only need an LLM proxy layer can install @cline/llms and @cline/shared alone.
The SDK also ships native multi-agent support — agent teams and subagents — alongside a plugin manifest format called cline.plugins and two exported functions, registerProvider and registerModel, for extending the runtime registry at runtime. Full TypeScript types are included, and plain JavaScript is also supported. Installation works with npm, yarn, pnpm, or bun, and requires an API key from at least one LLM provider.
Benchmark Gaps, Portability Trade-offs, and the Friction of Adoption
The Terminal Benchmark 2.0 results deserve careful reading. On kimi-k2.6, Cline scored 55.1% — compared to Pi-Code’s 45.5% and OpenCode’s 37.1% on the same model. That 17.9-point gap over OpenCode on a single model is not a rounding error; it suggests the harness itself contributes meaningfully to task completion, independent of the underlying model. The implication for developers is that the runtime layer is no longer a neutral substrate — it is a performance variable.
The CRITICAL_ANGLE here is worth naming plainly: Cline has chosen depth over simplicity. A monolithic agent design offers faster initial setup; the @cline/sdk‘s four-package layered structure introduces dependency management complexity and requires developers to internalize which layer handles which concern. The npx skills add cline/sdk-skill command partially addresses this by allowing Claude Code, Codex, or Cline itself to understand the SDK’s APIs for scaffolding agents and wiring up plugins — but that is a convenience layer on top of an architecture that still demands deliberate adoption. The trade-off is explicit: developer adoption friction in exchange for enhanced architectural robustness and the ability to migrate agent state across diverse environments.
| Tool | Key Difference | Best For |
|---|---|---|
Cline CLI (claude-opus-4.7) | 74.2% on Terminal Benchmark 2.0; stateless, portable agent loop | Multi-surface, long-running agent workflows |
Pi-Code (kimi-k2.6) | 45.5% on Terminal Benchmark 2.0 | Lighter terminal agent use cases |
OpenCode (kimi-k2.6) | 37.1% on Terminal Benchmark 2.0; lowest of the three on this model | Simpler, lower-complexity terminal tasks |
📊 Key Numbers
- 74.2%: Cline CLI score on Terminal Benchmark 2.0 running
claude-opus-4.7— 4.8 points above Anthropic’s published Claude Code score of 69.4% on the same model - 71.9%: Cline CLI score on Terminal Benchmark 2.0 running
claude-opus-4.6— 6.5 points above Claude Code’s published 65.4% on the same model - 55.1% vs 37.1%: Cline versus OpenCode on
kimi-k2.6on Terminal Benchmark 2.0 — an 18-point gap attributable to the runtime harness, not the model - 45.5%: Pi-Code score on Terminal Benchmark 2.0 with
kimi-k2.6, placing it between Cline and OpenCode - Node.js 22 or later: Minimum runtime requirement for the full
@cline/corelayer; browser-compatible subset (@cline/agents) has no Node dependency - 4 packages:
@cline/shared,@cline/llms,@cline/agents,@cline/core— each independently installable for targeted use cases
🔍 Context
The Terminal Benchmark 2.0 scores referenced here are drawn from tbench.ai, the benchmark’s tracking site, and compared against Anthropic’s own published results for Claude Code — making the evaluator a third-party standard rather than Cline’s internal testing alone. The specific gap this SDK addresses is the tight coupling between agent logic and its host environment: previously, an agent session running inside a VS Code extension would not survive a restart or transfer to a CLI or Kanban surface. The @cline/sdk resolves this by isolating the stateless agent loop in @cline/agents, which carries no session storage or Node-specific dependencies. This release responds directly to a trend in agentic AI development where developers are building workflows that span terminals, browsers, and CI pipelines simultaneously — environments that a single IDE extension cannot serve. Rather than competing with a named commercial rival, the architectural contrast here is with bespoke, monolithic agent integrations that embed all logic inside a single application layer, making state migration and surface portability structurally impossible without a full rewrite. The release is tied to a concrete product milestone: Cline’s CLI and Kanban are already running on the SDK, with IDE extensions actively being migrated.
💡 AIUniverse Analysis
Our reading: ★ LIGHT — The genuinely new mechanism here is the separation of the agent loop from session ownership. By making @cline/agents stateless and browser-compatible while pushing session storage and transport into @cline/core, Cline has created a runtime where the same agent logic can execute in a terminal, a browser tab, or a serverless function without modification. That is not a marketing claim — it is a direct consequence of the layered architecture, and the benchmark numbers suggest the harness itself adds measurable task-completion capability beyond what the underlying model provides alone.
★ SHADOW — The fine print is the adoption cost. A four-package dependency graph with distinct installation paths for browser, Node, and proxy use cases is not a drop-in replacement for a monolithic agent script. Developers who need the full @cline/core layer are committing to Node.js 22 or later and to Cline’s specific plugin and extension model — including the cline.plugins manifest format and the registerProvider / registerModel registry pattern. The benchmark scores are compelling, but they were produced by Cline’s own CLI, not by third-party developers building on the SDK. Whether independent implementations achieve comparable results on Terminal Benchmark 2.0 remains unverified. A cautious engineering lead would also note that multi-LLM provider support — spanning Anthropic, OpenAI, Google, AWS Bedrock, Mistral, and OpenAI-compatible endpoints — introduces configuration surface area that can silently degrade performance if provider-specific behaviors are not accounted for in the agent loop.
For this to matter in 12 months, independent developers building on @cline/sdk would need to demonstrate that the portability guarantees hold in production multi-surface deployments — not just in Cline’s own CLI and Kanban implementations.
⚖️ AIUniverse Verdict
✅ Promising. The stateless agent loop architecture is a real and verifiable mechanism for cross-surface session portability, and the 4.8-point Terminal Benchmark 2.0 gap over Anthropic’s published Claude Code score on claude-opus-4.7 is a concrete data point — but whether third-party developers can replicate those gains building on the SDK, rather than using Cline’s own CLI, is the open question that determines whether this becomes infrastructure or a footnote.
🎯 What This Means For You
Founders & Startups: Founders can now leverage a robust, open-source agent runtime to accelerate the development and deployment of cross-platform AI agents, reducing initial engineering overhead for complex agentic features.
Developers: Developers gain the ability to embed stateless agent loops in various environments, from browsers to serverless functions, with a pluggable architecture for custom tools and LLM providers — install the full SDK with npm install @cline/sdk, the CLI globally with npm i -g @cline, or add SDK awareness to an existing coding agent with npx skills add cline/sdk-skill.
Enterprise & Mid-Market: Enterprises can integrate a standardized agent runtime to build consistent AI-powered workflows across developer tools and command-line interfaces, with the option to install only @cline/llms and @cline/shared for a lightweight LLM proxy layer that avoids the full Node.js 22 dependency.
General Users: Users may experience more persistent AI assistance as agent sessions can follow them across different applications and devices without interruption — because the session state is no longer tied to a single UI process.
⚡ TL;DR
- What happened: Cline extracted its internal agent harness into
@cline/sdk, an open-source TypeScript SDK with a stateless agent loop that lets sessions survive UI restarts and move across surfaces. - Why it matters: Cline CLI running
claude-opus-4.7scored 74.2% on Terminal Benchmark 2.0 versus Anthropic’s published 69.4% for Claude Code — suggesting the runtime layer, not just the model, drives task-completion performance. - What to do: Run
npm install @cline/sdkto evaluate the full stack, ornpx skills add cline/sdk-skillto add SDK awareness to Claude Code, Codex, or Cline — then test your own agent workflows against Terminal Benchmark 2.0 at tbench.ai to verify the portability claims hold in your environment.
📖 Key Terms
- @cline/sdk
- The open-source TypeScript SDK Cline extracted from its IDE extension, composed of four independently installable packages that together provide a portable, stateless agent runtime.
- Agent loop
- In this context, the stateless execution cycle inside
@cline/agentsthat handles iteration, tool orchestration, and event emission without owning session storage — the property that makes sessions portable across surfaces. - Terminal Benchmark 2.0
- A third-party benchmark tracked at tbench.ai that measures how effectively an AI coding agent completes terminal-based tasks; the scores cited here compare Cline CLI against Anthropic’s own published Claude Code results.
- pass@1
- A benchmark scoring convention where a model or agent is credited only if it solves a task correctly on its first attempt, without retries — the metric underlying the Terminal Benchmark 2.0 percentages in this article.
📎 Sources
Sources: MarkTechPost
Analysis based on reporting by MarkTechPost. Original article here.

