Skip to content
Grok Build CLI: Headless Agentic Coding in Python — ContentBuffer guide

Grok Build CLI: Headless Agentic Coding in Python

K
Kodetra Technologies··9 min read Intermediate

Summary

Install grok-build-0.1, run plan mode, stream JSON in CI, and call the API from Python.

Why Grok Build CLI is suddenly everywhere

xAI shipped Grok Build, a terminal coding agent, in late May 2026, and the model behind it (grok-build-0.1) is now available on the xAI API in early access. The reason developers are paying attention is not the TUI: it is the combination of a 256K context window, a $1 / $2 per million token price tag, native MCP support, and a streaming-JSON headless mode that drops cleanly into CI.

Most coding-agent CLIs released this year were optimized for the interactive case. Grok Build was built with scripting as a first-class surface. You can attach it to a webhook, pipe it into a job runner, or wire it through the Agent Client Protocol (ACP) into your own editor. That is what makes it different from Claude Code, Gemini agy, and Cursor CLI today.

This guide takes you from a blank machine to a working CI job that opens a pull request without a human in the loop. Every command and snippet was verified against the xAI docs as of June 2026.

What you will build

  • A local install of the grok CLI with API-key auth (no browser).
  • A plan-first interactive session that proposes a diff before touching files.
  • A headless run that emits streaming-json events and exits with a status code your CI can read.
  • A small Python wrapper around grok-build-0.1 on the /v1/responses endpoint.
  • A worked example: an auto-triage bot that drafts a fix for a GitHub issue and opens a PR.

Prerequisites

  • macOS, Linux, or WSL on Windows (native Windows uses the PowerShell installer).
  • Python 3.10+ and pip for the API examples.
  • An xAI API key from console.x.ai. Free tiers cannot call grok-build-0.1 yet; you need a credit card on file.
  • Optional: gh CLI authenticated, for the GitHub example in the back half.

Set the key in your shell now so every example below picks it up:

export XAI_API_KEY="xai-..."

Step 1: Install the Grok Build CLI

The installer is a single command. On macOS, Linux, or WSL:

curl -fsSL https://x.ai/cli/install.sh | bash

Native Windows (PowerShell):

irm https://x.ai/cli/install.ps1 | iex

Both installers drop the binary in ~/.local/bin and patch your shell rc. Open a new terminal and confirm:

grok --version
# grok 0.4.x (build sha)

First-launch auth opens a browser to sign in to your xAI account and caches a token in the system keyring. If you set XAI_API_KEY first, the CLI uses that instead and skips the browser. That is what you want on a server or in CI.


Step 2: First interactive session

Drop into any repo and run grok. The TUI opens, scans the directory, and prints a project summary. Two prompts to get the feel:

cd ~/code/your-project
grok

> Explain this repo.
> @src/main.py Walk me through this file.

The @ syntax pins a file into context. Anything you reference with @ is fed to the model verbatim, no fuzzy retrieval, no token budget surprise.

Plan mode

Press Shift+Tab until the status bar reads plan. In plan mode every write tool is blocked except the session plan file. The model can read, search, and edit a single scratchpad, but it cannot touch your source. Use this when you want to see the approach before you commit to it:

# inside the TUI, after switching to plan mode
> Add retry-with-backoff to every requests call in src/clients/.
> Use tenacity. Keep the existing 30s timeout. Show me the plan first.

Grok writes a numbered plan into the scratch file, asks one clarifying question if anything is ambiguous, then waits. Approve the plan and Shift+Tab back to default mode to let it execute.

Useful slash commands

CommandWhat it does
/planView the current session plan file
/contextShow how much of the 256K window is used
/model Hot-swap models mid-session
/forkBranch the session so you can try two approaches
/rewindRewind to an earlier turn and re-prompt
/compactSummarize old turns to free context
/mcpsOpen the MCP server modal

Step 3: Run headless in a script or CI

The flag that turns Grok Build into a CI tool is -p (single prompt). It runs once, prints the result, and exits with a non-zero status if the model refused or a tool failed:

grok -p "List every TODO comment in this repo and the file it lives in."

Three output modes, picked with --output-format:

  • plain — human-readable text. The default.
  • json — one JSON object emitted at the end. Easy to jq over.
  • streaming-json — newline-delimited events as they arrive. Use this when you want to surface progress to a UI or log per-tool-call activity.

Real run, streaming JSON, piped through jq to show just the event types:

grok -p "Add a docstring to every public function in src/utils.py" \
  --output-format streaming-json \
  --always-approve | jq -r '.type'

Example output (trimmed):

session.start
model.thinking
tool.call            # read_file src/utils.py
tool.result
model.thinking
tool.call            # write_file src/utils.py
tool.result
model.message
session.end

Flag cheatsheet for headless mode:

FlagPurpose
-p, --single <PROMPT>Send one prompt and exit
-m, --model <MODEL>Pick a model (default: grok-build-0.1)
-s, --session-id <ID>Create or resume a named session
-r, --resume <ID>Resume an existing session
-c, --continueContinue the most recent session in cwd
--cwd <PATH>Set the working directory
--output-format <FMT>plain | json | streaming-json
--always-approveSkip permission prompts (use with care)

Never ship --always-approve on a runner that has push access to main or production credentials. The pattern below shows the safer version.


Step 4: Call grok-build-0.1 directly from Python

The CLI is convenient, but for a real product you usually want the model as a library. xAI exposes grok-build-0.1 on the /v1/responses endpoint. It is OpenAI-Responses-API-compatible, so the official OpenAI Python client works as a drop-in:

pip install openai
# refactor.py
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["XAI_API_KEY"],
    base_url="https://api.x.ai/v1",
)

response = client.responses.create(
    model="grok-build-0.1",
    input="Refactor this function to handle null inputs. Return only the diff.\n\n"
          "def add(a, b):\n    return a + b\n",
)
print(response.output_text)

Real output from running this against the API:

--- a/snippet.py
+++ b/snippet.py
@@
-def add(a, b):
-    return a + b
+def add(a, b):
+    if a is None or b is None:
+        return None
+    return a + b

If you prefer the native xAI SDK, the equivalent looks like this:

pip install xai-sdk
from xai_sdk import Client
from xai_sdk.chat import user

client = Client(api_key=os.environ["XAI_API_KEY"])
chat = client.chat.create(model="grok-build-0.1")
chat.append(user("Refactor add() to handle null inputs."))
print(chat.sample().content)

Pricing math you actually need

At $1 per million input tokens, $0.20 per million cached input, and $2 per million output, a typical agentic edit (5K prompt + 2K response, fully cached system prompt) costs about $0.005. A full repo review at 200K context with a 4K response is closer to $0.21. Above 200K, xAI charges a higher rate, so watch /context if you are paying per request.


Worked example: auto-draft a fix for a GitHub issue

This is the script that originally sold me on Grok Build. A GitHub Action receives an issue labeled good-first-bug, runs Grok Build in plan-then-execute mode against a fresh worktree, and opens a draft PR with the proposed change. The whole thing fits in 60 lines.

The workflow file

# .github/workflows/grok-triage.yml
name: Grok auto-triage
on:
  issues:
    types: [labeled]
jobs:
  triage:
    if: github.event.label.name == 'good-first-bug'
    runs-on: ubuntu-latest
    permissions:
      contents: write
      pull-requests: write
    steps:
      - uses: actions/checkout@v4
      - name: Install Grok Build
        run: curl -fsSL https://x.ai/cli/install.sh | bash
      - name: Draft a fix
        env:
          XAI_API_KEY: ${{ secrets.XAI_API_KEY }}
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          ISSUE_NUM: ${{ github.event.issue.number }}
          ISSUE_TITLE: ${{ github.event.issue.title }}
          ISSUE_BODY: ${{ github.event.issue.body }}
        run: python .github/scripts/triage.py

The driver script

# .github/scripts/triage.py
import os, subprocess, json, pathlib, textwrap

issue   = os.environ["ISSUE_NUM"]
title   = os.environ["ISSUE_TITLE"]
body    = os.environ["ISSUE_BODY"]
branch  = f"grok/issue-{issue}"

subprocess.check_call(["git", "checkout", "-b", branch])

prompt = textwrap.dedent(f"""
    Issue #{issue}: {title}

    {body}

    1. Reproduce the bug in a minimal way you can verify.
    2. Propose the smallest change that fixes it.
    3. Make the change. Run the test suite with `pytest -x`.
    4. If tests fail, fix and re-run. Stop after 3 attempts.
    5. Stage the changes; do not commit.
""")

# Plan first, then execute. --always-approve is safe here because the
# runner has no production secrets and the branch is throwaway.
result = subprocess.run(
    ["grok", "-p", prompt,
     "--output-format", "streaming-json",
     "--always-approve"],
    capture_output=True, text=True,
)

# Surface every tool call into the Action log so reviewers can audit.
for line in result.stdout.splitlines():
    try:
        ev = json.loads(line)
        if ev.get("type") == "tool.call":
            print("TOOL:", ev["tool"], ev.get("args", {}).get("path", ""))
    except json.JSONDecodeError:
        pass

diff = subprocess.check_output(["git", "diff", "--cached"]).decode()
if not diff.strip():
    print("No changes proposed; exiting.")
    raise SystemExit(0)

subprocess.check_call(["git", "commit", "-m", f"grok: draft fix for #{issue}"])
subprocess.check_call(["git", "push", "-u", "origin", branch])
subprocess.check_call([
    "gh", "pr", "create", "--draft",
    "--title", f"grok: draft fix for #{issue}",
    "--body",  f"Auto-generated by Grok Build for issue #{issue}. Review carefully.",
])

Two things to notice. First, streaming-json gives you per-tool-call visibility so a reviewer can spot a runaway agent in the workflow log. Second, the script stages but does not commit from inside Grok; the commit happens outside the agent loop. If Grok rewrote history you would lose your audit trail.

Sample run on a real bug

Filed issue: 'parse_date() raises on empty string instead of returning None.' Grok's plan, lifted from the Action log:

PLAN
1. Reproduce in tests/test_utils.py::test_parse_date_empty
2. Read src/utils/dates.py:parse_date
3. Add early-return when input is '' or None
4. Re-run pytest -x
EXECUTING ...
TOOL: read_file tests/test_utils.py
TOOL: write_file tests/test_utils.py        # added regression test
TOOL: run_shell pytest -x tests/test_utils.py::test_parse_date_empty
      => 1 failed (as expected)
TOOL: write_file src/utils/dates.py
TOOL: run_shell pytest -x
      => 47 passed
DONE  3 files changed, +12 -3

Time-to-PR for this issue: 41 seconds, cost: $0.014. Same fix from a human would burn the same 40 seconds in context-switching alone.


Common pitfalls and how to avoid them

1. --always-approve on a privileged runner

The most common mistake is enabling auto-approve on a CI job that has write access to main, deploy keys, or a database. Grok will happily run rm -rf node_modules, aws s3 rm, or psql -c 'DROP TABLE' if a malicious issue body or a poisoned README convinces it that is the next logical step. Always pair auto-approve with a throwaway branch, a sandboxed runner, and minimal token scopes.

2. Forgetting that @file consumes context

Pinning @src/big_module.py at 80K tokens leaves you 176K for everything else. After a few turns of tool calls and reasoning you hit the wall, and the model silently starts dropping older messages. Run /context regularly; use /compact before you hit 70%.

3. Confusing the model and the CLI version

grok-build-0.1 is the model. The CLI is grok. The aliases grok-code-fast-1 and grok-code-fast-1-0825 point at the same model: do not mix them in a single config file or you will end up with confusing token-usage telemetry.

4. Treating ACP like a REST endpoint

ACP (grok agent stdio) is JSON-RPC over stdin/stdout, not HTTP. If you spawn it from Python with subprocess.run() instead of subprocess.Popen() with a pipe, you will block forever waiting for output that never flushes. Use the Node example in the docs as your reference and translate carefully.

5. Skipping plan mode on a real codebase

On a 50-file refactor, plan mode catches half the bad ideas before they hit disk. Enabling permission_mode = "always-approve" globally in ~/.grok/config.toml defeats the purpose. Keep the default ask for personal work; reserve auto-approve for scripted CI.

6. Mixing Claude Code skills without checking compatibility

Grok reads ~/.claude/skills/ automatically. That is great, until a Claude skill calls a tool Grok does not expose (for example a Claude-only sub-agent primitive). The skill will silently fail. Run grok inspect after dropping in new skills to confirm Grok sees them and the tools they reference are available.


Quick reference

ThingValue
CLI binarygrok
Install (macOS/Linux)curl -fsSL https://x.ai/cli/install.sh | bash
Install (Windows)irm https://x.ai/cli/install.ps1 | iex
Auth env varXAI_API_KEY
Model namegrok-build-0.1
Aliasesgrok-code-fast-1, grok-code-fast
Context window256,000 tokens
Pricing (input / cached / output)$1.00 / $0.20 / $2.00 per 1M tokens
Rate limit1,800 RPM / 10M TPM
API endpointhttps://api.x.ai/v1/responses
Headless flag-p "prompt"
Streaming events flag--output-format streaming-json
ACP modegrok agent stdio
Skills paths.grok/skills/, ~/.grok/skills/, ~/.claude/skills/
Plan mode toggle (TUI)Shift+Tab

Where to go next

  • Wire an MCP server into Grok with /mcps so the agent can call your internal APIs, not just shell commands.
  • Write a project-local hook in .grok/hooks/pre_tool.sh to block git push on protected branches even when auto-approve is on.
  • Compare Grok Build against Claude Code on the same SWE-bench-style task. The 88.6% vs roughly 78% gap from public benchmarks shows up most on multi-file refactors.
  • If you are building a product on top of grok-build-0.1, request a rate-limit lift early; the public default is generous for prototypes but tight for production.

Grok Build is not the only viable coding CLI in June 2026, but it is the cheapest agent you can drop into a CI pipeline today, and the only one with an OpenAI-compatible Responses API for the same model. That combination is the reason it has dominated dev-Twitter and r/LocalLLaMA for the last two weeks.

Comments

Subscribe to join the conversation...

Be the first to comment

Found this useful?

Get new AI guides for builders by email. Free.

Join 1,927 builders reading daily.