Dustin Trainor sits down with Laurent Giraid, a technologist steeped in AI systems, machine learning, and the ethics that keep them safe and useful at scale. With MCP crossing its first year and surging to nearly two thousand servers, the conversation spans the hard edges of taking agentic systems from pilots to production: long-running workflows via Tasks, OS-level support in Windows 11, URL-based registration in IAM, secure browser handoffs for sensitive auth, and the visibility practices that keep sprawling integrations from becoming an attack surface. Expect candid stories about unexpected blockers, the nitty-gritty of refactoring synchronous actions, and what it means to reduce lock-in while raising the security bar.
Key themes in this conversation: how MCP’s growth and native OS support changed enterprise rollout dynamics; why Tasks (SEP-1686) finally made multi-hour workflows robust; how URL-based client registration (SEP-991) and URL Mode Elicitation (SEP-1036) shrink approval bottlenecks while satisfying regulated flows; pairing identity, RBAC, and observability from day one; taming integration sprawl with inventories and tagging; embracing Sampling with Tools (SEP-1577) without losing control; and what vendor support across AWS, Microsoft, and Google Cloud means for real portability.
MCP just marked its first year and the registry grew 407% to nearly two thousand servers. What drove that surge, and how did real teams decide to move from pilots to production? Share metrics from your rollout funnels and one story about an unexpected blocker.
The surge was a signal that MCP crossed the trust threshold—teams stopped building brittle, one-off bridges and leaned into a standard that actually met enterprise guardrails. We saw interest turn into sustained adoption once people realized they could connect agents to real systems without a rewrite, then govern those connections with IAM and observability they already understood. Our internal funnel mirrored the broader registry growth—what clicked was demonstrating that the same connector could serve multiple models with no code changes and that Tasks covered multi-hour work without timeouts. The unexpected blocker was cultural: a team insisted on hand-rolled webhooks “just in case,” and the review board froze the launch until we showed how MCP’s polling and cancellation mapped to their operational norms; once we did, the pilot moved to production almost immediately.
Microsoft added native MCP support to Windows 11. How does OS-level support change deployment steps, security posture, and help desk workflows? Walk me through a before-and-after setup, including time-to-enable, policy mapping, and any user training you needed.
Before Windows 11 support, we had a patchwork of client installs, per-user configuration drift, and a lot of help desk pings when certificates or proxies changed. With OS-level integration, baseline configuration is baked into the image, policy inheritance is consistent, and our onboarding shifted from “install and pray” to “enable and verify.” Help desk tickets dropped because machine trust, cert stores, and enterprise proxies are handled by the same policies we already use for browsers and VPN. Training became lighter: instead of a tool-specific walkthrough, we teach a short “responsible use” module and a simple runbook on how to request scopes, all anchored to the same Windows policies people already know.
The new Tasks feature (SEP-1686) enables long-running workflows with states like working and input_required. How did you design retries, cancellation, and status polling at scale? Share concrete SLAs, timeouts, and one incident where Tasks saved a job from failing.
Tasks gave us a standard lifecycle we could reason about. We structured retries around idempotent operations and used the working state as our backoff gate, and cancellation became a first-class citizen instead of an afterthought. For status, polling is predictable and auditable—Ops loves being able to see progress drift and request input_required when a human needs to approve a step. We had a migration where a dependent service flapped; Tasks kept the workflow alive, flipped to input_required for a quick parameter fix, and resumed—without Tasks, that same job would have died in a synchronous timeout and required a messy restart.
Until now, most database actions were synchronous. How did you refactor a multi-hour process—say a codebase migration or medical records analysis—using Tasks? Outline the step-by-step orchestration, monitoring hooks, and the metrics that proved reliability improved.
We split the workflow into explicit phases: extract, transform, validate, and write-back—each a Task with clean inputs/outputs. Orchestration moved from a single long RPC into chained Tasks with checkpoints and a recovery path per phase. Monitoring hooks captured state transitions and artifacts so we could replay or branch on failure; that visibility let us spot hot spots early without trawling logs for lost context. Reliability improved because failures became contained: we could retry a transform without redoing extraction, and the end-to-end run no longer depended on one fragile socket staying open for hours.
URL-based client registration (SEP-991) replaces heavy Dynamic Client Registration. How did this cut approval bottlenecks in your IAM flow? Describe the metadata document you serve, the review cadence with security, and the measurable reduction in onboarding time.
DCR was bureaucratic by design; SEP-991 turned it into a transparent, self-serve handshake with a stable URL that security can cache and verify. Our metadata document declares the client ID, redirect URIs, supported grant types, and the scopes it may request—plain, reviewable, and diff-friendly. Security reviews moved from ad hoc forms to a scheduled cadence, because the document is the single source of truth and changes are obvious. The best part is predictability: we went from waiting on forms to shipping connectors in the same window we finalize scope reviews, aligned with the registry’s 407% growth momentum rather than fighting it.
URL Mode Elicitation (SEP-1036) lets servers hand off auth to a secure browser window. How did you implement this for sensitive flows, like payments or HR data, without exposing passwords to agents? Share your token scopes, session lifetimes, and PCI audit evidence.
Handing off to the browser gave us the right separation of concerns: credentials live in a hardened surface we already trust, the agent only ever touches tokens, and those tokens carry least-privilege scopes. We map scope requests to our RBAC catalog and use short-lived sessions that reflect the sensitivity of flows like payroll adjustments or payment approvals. For PCI, the clean boundary—passwords never traverse the agent—allowed us to show clear isolation and review trails, with the browser session and token issuance audited independently. It felt like flipping a lock: once in place, we could expand capability without dragging secrets through the agent’s memory.
Security researchers found ~1,800 MCP servers exposed publicly by mid-2025. How did you audit your estate for exposure, and what controls—network, identity, and server hardening—actually reduced risk? Give specifics on discovery tools, false positive rates, and fixes.
That finding was a wake-up call; we treated it like a perimeter breach drill and started with discovery—enumerating servers against our inventory and the registry to spot anything drifting toward public exposure. Network controls tightened first—deny-by-default ingress and environment-based allowlists, with health checks moved behind private endpoints. Identity did the rest: mutual TLS and enforced registration via SEP-991 so “mystery clients” simply can’t light up. The practical win was operational discipline: once we had a clean list and hardened defaults, the noise fell away and exposed assets stopped being a surprise.
“AI is only as good as the data it can reach safely.” How did you pair MCP with identity, RBAC, and observability from day one? Walk through a real permissioning model, the dashboards you monitor, and one access review that changed your defaults.
We anchored everything to identity: service principals for agents, human identities for approvals, and scopes mapped directly to RBAC roles on the data plane. Our dashboards focus on three lenses—auth success/error rates, Task lifecycle health, and data access by scope—so we can tell within minutes whether a change breaks intent or authorization. One access review stood out: we discovered a connector requesting broader read scope than necessary; tightening that role didn’t break functionality, but it dramatically reduced the blast radius. It reinforced a habit—start narrow, expand only with evidence that the workflow truly needs it.
Sampling with Tools (SEP-1577) lets servers run their own loops using client tokens. What’s a concrete “research server” use case you shipped, and how did you constrain it? Detail token scoping, rate limits, logs you review, and the quality gains you measured.
Our research server takes a question, fans out to approved corpora, and synthesizes a brief—no custom client code, just an internal loop close to where the data lives. We constrained it by scoping tokens to read-only, constraining the sources it can touch, and gating long runs behind Tasks so we can intervene if needed. Logs focus on document provenance and tool-call traces so reviewers can validate sources—if a brief cites a repository or knowledge base, we can prove the trail. Quality jumped because the reasoning happens near the data; the synthesis feels grounded rather than stitched together from shallow snippets.
Vendors like AWS, Microsoft, and Google Cloud now back MCP. How has that reduced lock-in for you in practice? Share a case where the same MCP Postgres connector worked across two model providers, including any subtle behavior differences you had to tune.
The promise of portability finally feels real; we used the same MCP Postgres connector across two providers without touching the integration layer. What changed was policy and inference nuance, not plumbing—the connector described itself once, and the clients could reason over it consistently. We saw subtle behavioral differences in how the models asked for clarifications during schema exploration; we tuned prompt scaffolding and let Tasks absorb longer paths when one model preferred more incremental steps. The big picture is freedom of choice: changing models didn’t mean rebuilding the entire bridge to our data.
Mundakkal mentions the “unprecedented infrastructure build-out,” citing OpenAI’s multi-gigawatt Stargate program. How do you plan capacity and data locality for MCP-driven agents against that backdrop? Offer your cost model, cache strategy, and a painful lesson learned.
When compute scales like that, the hidden costs become orchestration and data movement. We bias for locality—put the reasoning near the data and keep intermediate artifacts short-lived—to avoid shuttling huge payloads through distant regions. Caching happens at the boundary of trust: pre-authorized summaries and schema maps live close to the agents, and Tasks keep long work contained. The painful lesson was letting a synchronous job drag data across zones; switching to Tasks with local staging cut the churn and spared us the week of noisy retries.
Upadhyaya argues the next wave is visibility—monitoring MCP uptime and auth flows like APIs. What SLIs/SLOs, traces, and synthetic checks do you run today? Show sample thresholds, alert routes, and the playbook steps you use when auth failures spike.
We treat MCP like an API family: SLIs on auth success, Task completion health, and connector responsiveness, with traces attached to each Task so we can follow the bouncing ball. Synthetic checks simulate both happy-path and failure-path auth to catch regressions before users do. Alerts route to the teams that own identity and the connectors—because fixes live on both sides—and we kick off a playbook that rolls back risky scope changes and pauses nonessential Tasks while we verify. It’s dull, and that’s the point; visible systems stay predictable.
The update is backward compatible, but new features unlock regulated workflows. How did you stage the upgrade without downtime, and what validations satisfied auditors? List migration steps, test matrices, rollback criteria, and the evidence package you submitted.
We ran old and new side by side, registering clients via SEP-991 while keeping legacy paths intact, then flipping flows that benefitted from Tasks and URL Mode Elicitation first. Our test matrix covered auth paths, Task state transitions, and data access by scope, with canaries in representative workflows. Rollback was explicit: per-connector flags and a return-to-synchronous plan for a subset of flows if Tasks showed instability. For auditors, the evidence package paired design docs with trace samples and access logs—demonstrating that passwords never hit the agent, scopes were minimal, and long-running work had checkpoints and cancellation.
You advise “exposure over rewrites” for internal APIs. How do you pick which APIs to expose via MCP first, and what’s your hardening checklist? Give a prioritization rubric, one end-to-end example, and the metrics that prove faster time-to-value.
We prioritize by two axes: business impact and ease of safe exposure—start with high-value endpoints where read-mostly access unlocks the most help. The hardening checklist is boring but essential: stable schemas, RBAC alignment, narrow scopes, and observability hooks before go-live. A simple end-to-end example was exposing a read path to our knowledge base; the agent could find and summarize content using Sampling with Tools, while write paths stayed off-limits. Time-to-value tracked the broader momentum—like the registry’s 407% growth—because not rewriting meant we shipped in weeks, not quarters.
When agents go from pilot to production, small “integration sprawl” can turn into a big attack surface. How do you prevent that? Describe your server inventory, naming and tagging standards, dependency maps, and the process you use to retire stale connectors.
Sprawl starts when you can’t name what you run, so we built an inventory that treats MCP servers like first-class citizens—unique names, environment tags, owners, and data classifications. Dependency maps visualize which agents touch which data, and Tasks give us usage signals so we can see what’s alive versus what’s just lurking. Retiring is deliberate: deprecate, observe for residual calls, then yank scopes and registration; if a connector doesn’t phone home for a while, it’s a candidate for sunset. The happy outcome is smaller blast radius and less cognitive load—teams know where to look when something goes sideways.
What is your forecast for MCP?
The first year proved open connectivity can be both practical and safe, and the next year will be all about visibility and scale—treating MCP servers like APIs with real SLOs and rich traces. Tasks will keep pushing multi-hour, regulated workflows into the mainstream, and URL-based registration will make IAM feel like a product, not a gauntlet. With AWS, Microsoft, and Google Cloud backing the standard, lock-in pressure will keep easing; the same Postgres connector working across models will go from novelty to expectation. Most importantly, “AI is only as good as the data it can reach safely” will stop being a slogan and become a checklist—identity first, scopes tight, observability always on.
