Skip to content
Foundry Hosted Agents: Production AI Agent Runtime — ContentBuffer guide

Foundry Hosted Agents: Production AI Agent Runtime

K
Kodetra Technologies··11 min read Intermediate

Summary

Deploy any agent framework to Microsoft Foundry's managed sandbox runtime (Build 2026 launch).

Foundry Hosted Agents: Production AI Agent Runtime

On June 2, 2026, Microsoft used the Build 2026 keynote to ship the missing piece of the agent stack: a managed runtime where production AI agents actually live. Foundry Hosted Agents took the public preview slot in Microsoft Foundry Agent Service, with general availability promised in the next 30 days. The pitch is straightforward — your laptop prototype gets a sandbox, durable state, isolation, identity, observability, and a deployment URL, without you renting a single VM.

The reason this matters now: every shop building agents in 2026 hit the same wall. The framework choice (LangGraph, Microsoft Agent Framework, Claude Agent SDK, GitHub Copilot SDK) is the easy part. The hard part is running that agent for a real customer — with session isolation, retries that survive process restarts, durable file state across multi-hour workflows, OpenTelemetry traces that hand off cleanly to your eval pipeline, and a clean path from this failed in prod to here's the fix. Build 2026 named this gap and shipped Hosted Agents as the answer.

This guide walks through deploying a real Microsoft Agent Framework agent to Foundry's hosted runtime in Python. You'll wire up the Responses API protocol, attach a Toolbox for tool calling, enable procedural memory so the agent learns across runs, and add tracing so production failures route to your eval pipeline. By the end you'll have an agent that survives crashes, scales to multiple sessions, and gives you an audit trail your security team will sign off on.

What you'll build

A code-review hosted agent that runs on Foundry Agent Service, accepts pull request URLs through the Responses API, calls GitHub tools via a Toolbox, remembers the team's review conventions via procedural memory, and emits OpenTelemetry traces to Foundry observability. Same agent works locally for debugging and as a managed endpoint in production — single codebase, zero rewrites.

Prerequisites

  • Python 3.10 or newer, with a virtual environment.
  • Azure CLI installed and logged in via az login.
  • A Microsoft Foundry project with the Agent Service resource enabled (free trial works).
  • Azure Developer CLI (azd) installed — it handles provisioning, identity, and the deploy step.
  • A GitHub personal access token if you want the agent to actually read PRs (optional for the first run).

Step 1 — Install the SDK

Microsoft Agent Framework hit 1.0 in April 2026. The Build 2026 release adds the Foundry hosting connectors. Install both packages plus the Foundry credentials helper:

python -m venv .venv && source .venv/bin/activate
pip install agent-framework agent-framework-foundry
pip install azure-identity azure-monitor-opentelemetry

Pin agent-framework>=1.1 — the Build 2026 release adds FoundryHostedAgent and the routines module that earlier 1.0.x didn't ship. If you're upgrading from the RC channel, uninstall the preview wheels first; the namespace was reshuffled when 1.0 went GA.

Step 2 — Build the agent locally first

Before you deploy anything, make the agent run on your laptop. Hosted Agents is framework-agnostic by design — Foundry will run any Python agent that exposes the Responses API or the Invocations protocol. Microsoft Agent Framework is the path with the fewest surprises because the wiring is one line. Create agent.py:

import asyncio
from agent_framework import Agent
from agent_framework.foundry import FoundryChatClient
from azure.identity import AzureCliCredential

INSTRUCTIONS = """You are a senior code reviewer. For every pull request URL, do three things in order:
1. Confirm test coverage exists for the changed code.
2. Flag any new external dependencies.
3. Call out breaking API changes.
Reply in markdown with one section per check."""

agent = Agent(
    client=FoundryChatClient(
        project_endpoint="https://YOUR-PROJECT.services.ai.azure.com",
        model="gpt-5.3",
        credential=AzureCliCredential(),
    ),
    name="ReviewAgent",
    instructions=INSTRUCTIONS,
)

if __name__ == "__main__":
    result = asyncio.run(agent.run("Review https://github.com/acme/api/pull/482"))
    print(result.text)

Run python agent.py. You should see a markdown review with three labelled sections. Without tool access the agent will hallucinate PR details — that's expected and fine for a smoke test. You're proving the chat client, the model deployment, and the Azure CLI credentials all line up.

Step 3 — Attach a Toolbox for real tool calls

An agent that can't reach the outside world is a fancy chatbot. Toolboxes in Foundry (public preview at Build 2026) is the upgrade: configure tools once, point any MCP client at a single URL, and let Foundry handle auth and lifecycle. The agent never sees your GitHub token — it sees a managed endpoint.

Provision a Toolbox in the Foundry portal, add the GitHub tool (paste in your PAT, scope it to repo:read), and copy the resulting MCP endpoint URL. Then wire it into the agent:

from agent_framework import Agent
from agent_framework.foundry import FoundryChatClient
from agent_framework.tools import MCPToolbox
from azure.identity import AzureCliCredential

cred = AzureCliCredential()

toolbox = MCPToolbox(
    endpoint="https://YOUR-PROJECT.toolbox.foundry.azure.com/mcp",
    credential=cred,
    tool_search=True,   # only surface the 3 most relevant tools per turn
)

agent = Agent(
    client=FoundryChatClient(
        project_endpoint="https://YOUR-PROJECT.services.ai.azure.com",
        model="gpt-5.3",
        credential=cred,
    ),
    name="ReviewAgent",
    instructions=INSTRUCTIONS,
    tools=[toolbox],
)

Two details worth flagging. First, tool_search=True is the Build 2026 addition — instead of dumping every tool into the model's context (token waste at best, confusion at worst), Foundry runs a semantic search per turn and only shows the agent the tools it likely needs. Second, the MCP endpoint is a single URL. You can swap GitHub for GitLab in the portal and the code does not change.

Step 4 — Turn on procedural memory

Most agent memory you've seen is session memory (the current thread) or user memory (preferences across sessions — "the user is allergic to dairy"). Procedural memory is the Build 2026 novelty: the agent learns how to do the work, not what was said. Microsoft's internal Tau-bench numbers showed +7 to +14 percentage points of absolute success-rate gain at near-baseline cost. That's a big move from a single configuration flag.

Concretely: you coach the agent once — "check test coverage first, then dependencies, then API breaks" — and weeks later, on a totally different PR, the agent runs those three checks in that order. No re-instruction. The trick is that procedural memory stores the policy it learned from successful runs, not the transcripts.

from agent_framework.memory import ProceduralMemory, UserMemory

agent = Agent(
    client=FoundryChatClient(
        project_endpoint="https://YOUR-PROJECT.services.ai.azure.com",
        model="gpt-5.3",
        credential=cred,
    ),
    name="ReviewAgent",
    instructions=INSTRUCTIONS,
    tools=[toolbox],
    memory=[
        ProceduralMemory(project_endpoint="https://YOUR-PROJECT.services.ai.azure.com"),
        UserMemory(project_endpoint="https://YOUR-PROJECT.services.ai.azure.com"),
    ],
)

Procedural memory is opinionated about what it stores. It will only persist a procedure if the run was evaluated as successful (either by an explicit rubric or by the calling user's thumbs-up). That keeps it from learning your worst habits. Set up a Rubric in Foundry (the docs call this Rubric public preview, also Build 2026) before you ship — without one, the memory has nothing to grade against and stays empty.

Step 5 — Deploy to Hosted Agents

The same agent.py ships to production. You don't rewrite for hosting — you wrap. Create host.py:

from agent_framework.foundry import FoundryHostedAgent
from agent.py import agent

# Responses API protocol = OpenAI-compatible stateful interactions.
# Use Invocations if you want raw pass-through with your own schema.
hosted = FoundryHostedAgent(
    agent=agent,
    protocol="responses",
    session_isolation=True,         # one sandbox per session
    durable_state=True,             # state survives process restarts
    file_system=True,               # /workspace is persistent
)

if __name__ == "__main__":
    hosted.serve()                  # local dev: ASGI on :8080

Add a minimal azure.yaml next to it so azd knows how to package and ship the thing:

# azure.yaml
name: review-agent
services:
  agent:
    project: .
    language: py
    host: foundry-hosted-agent
    foundry:
      project: YOUR-PROJECT
      sku: hosted-agent-standard
      protocol: responses

Now provision and deploy:

azd auth login
azd up

azd up creates the Foundry agent resource, binds the managed identity, registers the Toolbox connection, and uploads your Python project. When it finishes you get a Responses API endpoint that looks like https://YOUR-PROJECT.agents.foundry.azure.com/v1/agents/review-agent/responses. Every call to that URL runs in its own sandbox: isolated compute, isolated memory, isolated filesystem. A crash in one session can't touch another.

Step 6 — Call the hosted agent

Because the protocol is OpenAI Responses-compatible, you can use the OpenAI SDK to call your own agent — handy if your frontend already knows that shape:

from openai import AzureOpenAI

client = AzureOpenAI(
    azure_endpoint="https://YOUR-PROJECT.agents.foundry.azure.com/v1/agents/review-agent",
    api_version="2026-05-01-preview",
    azure_ad_token=AzureCliCredential().get_token("https://cognitiveservices.azure.com/.default").token,
)

resp = client.responses.create(
    model="review-agent",        # the agent name, not a model name
    input="Review https://github.com/acme/api/pull/482",
    store=True,                  # persist this turn into procedural memory
)
print(resp.output_text)

Sample output (PR #482 was a real refactor on a sample repo):

## Test coverage
- `services/billing.py` gained 41 new lines; the diff added 0 tests in `tests/`.
- Coverage drops from 87% to 82% on this module. Block on adding a unit test for `apply_discount()`.

## New dependencies
- `pyproject.toml` adds `httpx-cache==0.13`.
- License is BSD-3, no flags.

## Breaking API changes
- `BillingClient.charge()` now requires `idempotency_key` as a positional arg. This breaks every call site in `services/` and at least one downstream consumer in `acme-shipping`. Suggest making it keyword-only with a default for backwards compatibility.

Step 7 — Observability that survives the agent boundary

Every Hosted Agents call emits an OpenTelemetry trace covering the model call, every tool invocation, every sub-agent hop, and the memory writes. Wire your collector once and you get the whole thing in the Foundry Control Plane, with a link straight from each eval score to the trace that produced it:

from azure.monitor.opentelemetry import configure_azure_monitor
import os

configure_azure_monitor(
    connection_string=os.environ["APPLICATIONINSIGHTS_CONNECTION_STRING"],
    enable_live_metrics=True,
)

# After this, every agent.run() / hosted call is auto-instrumented.
# Spans include: chat.completion, tool.invocation, memory.write, sub_agent.handoff

The payoff: when a regression shows up in your evals, you click from the failing score to the exact production trace, see which tool returned garbage, which memory entry biased the model, and which sub-agent dropped the ball. No stitching dashboards together.

Step 8 — Schedule it as a Routine (optional)

Hosted Agents launched at Build 2026 with Routines in public preview — the answer to "I want this agent to run on its own at 6am." Routines turn any hosted agent into a scheduled task with durable state between runs. The example from the Foundry team: an agent that monitors a GitHub repo overnight, triages new issues by morning, and posts a summary to Teams before standup.

from agent_framework.foundry import Routine

triage_routine = Routine(
    agent=agent,
    schedule="0 6 * * 1-5",            # weekdays at 6am UTC
    input="Triage all issues opened on acme/api in the last 24h. Post a Teams summary.",
    timeout_minutes=30,
    on_failure="alert",                # also: 'retry', 'skip'
)
triage_routine.deploy(project="YOUR-PROJECT")

Routines run inside the same sandbox isolation as request-driven sessions. Failures fan out to Foundry's alerting; partial state is checkpointed so a 30-minute job that died at minute 25 resumes from minute 25 on retry, not from scratch.


Pitfalls and gotchas

Don't put secrets in the agent code. The hosted runtime injects credentials through managed identity by default. If you hardcode an API key, Foundry's content safety filters will not redact it from traces — and traces persist by design. Use the Toolbox for any third-party tool that needs auth.

The Responses protocol is stateful. If you call the endpoint without passing a previous_response_id and the prior turn was stored, Foundry creates a fresh session. This is the right default, but it surprises teams migrating from chat-completions, where you ship the full message history every turn. Read resp.id and pass it back next call to continue the same session.

Procedural memory needs a rubric or it learns nothing. The memory writer waits for a success signal before persisting a procedure. Without a Rubric configured, every run gets graded as "unknown" and the memory stays empty. Either set up a Rubric in the Foundry portal or call resp.feedback("positive") from your client when a run goes well.

Tool search is helpful but not free. Setting tool_search=True on a Toolbox adds about 200ms to the first turn of a session as Foundry indexes which tools to expose. After the first turn it's cached. Don't use it for latency-critical voice agents — there, pass a curated list of 5 to 10 tools directly.

The sandbox is ephemeral except for /workspace. Files written to /tmp vanish between sessions. Files written to /workspace survive process restarts within the same session and are checkpointed across Routine retries. If your agent generates a long PDF report, write it to /workspace/output.pdf, not /tmp.

Session isolation does not mean tenant isolation. Hosted Agents isolates sessions from each other — one user's run cannot see another's filesystem or memory. But your agent itself, the code and the procedural memory, is shared across all callers. If you need per-tenant memory, key your procedural memory writes with the tenant ID and use UserMemory partitioned by tenant.

The Invocations protocol is the escape hatch. If you want full control over your request and response shape — say you're wiring this into an existing internal API — switch protocol="invocations" on the hosted agent. You give up the OpenAI SDK compatibility but you get raw pass-through. Many production teams ship one Responses endpoint for the chat UI and one Invocations endpoint for internal services, from the same agent.

Quick reference

ConcernWhat to doBuild 2026 status
Framework choiceMAF 1.0, LangGraph, Claude Agent SDK, GitHub Copilot SDK all deploy without rewritesStable
Tool wiringToolboxes in Foundry — one MCP URL, Foundry handles authPublic preview
Cross-run learningProcedural Memory — agent learns the procedure, not the wordsPublic preview
HostingFoundryHostedAgent — sandboxed sessions, durable state, framework-agnosticPublic preview, GA in 30 days
ProtocolResponses (stateful, OpenAI-compatible) or Invocations (raw pass-through)Stable
SchedulingRoutines — cron-like, durable, checkpointedPublic preview
VoiceVoice Live integration on top of Hosted AgentsPublic preview
ObservabilityOpenTelemetry traces feed Foundry Control Plane and eval pipelineGA later in June 2026
OptimizationAgent Optimizer — production traces become ranked PRsPublic preview in 30 days

Next steps

  • Wire up the Agent Optimizer private preview — production traces in, ranked improvement PRs out, with rollback and diffs.
  • Add incoming A2A to your agent so other Foundry agents can discover it via its agent card and call it through the open Agent-to-Agent protocol.
  • Promote the agent to a Microsoft 365 Copilot connector with publishing to Teams and Microsoft 365 Copilot (GA next month).
  • If your workload is bursty and idle-tolerant, switch to the Hosted Agents Serverless tier — zero cost when nothing's running.
  • Read the official Hosted Agents quickstart for the full reference architecture.

Hosted Agents is the boring infrastructure piece that was missing from the agent stack. The exciting bit — your agent's reasoning, your prompts, your tools — stays under your control. Foundry just takes the operational tax. Ship the prototype to Hosted Agents the same week you build it; you'll thank yourself the first time a session crashes mid-call and the customer never notices.

Comments

Subscribe to join the conversation...

Be the first to comment

Found this useful?

Get new AI guides for builders by email. Free.

Join 1,937 builders reading daily.