ArgoCD ApplicationSets: Multi-Cluster GitOps at Scale — ContentBuffer guide

ArgoCD ApplicationSets: Multi-Cluster GitOps at Scale

K
Kodetra Technologies··8 min read Intermediate

Summary

Generate hundreds of ArgoCD apps from one CRD using cluster, git, and matrix generators.

If you run more than three Kubernetes clusters or more than ten ArgoCD Applications, you have already felt the pain. Every new tenant, region, or feature flag means another Application YAML, another pull request, another copy-paste mistake waiting to happen. ArgoCD ApplicationSets are the controller-driven answer: one CRD describes a template and a set of generators, and the controller materializes the cartesian product as real ArgoCD Applications — reconciled, drift-corrected, and torn down when the inputs go away.

This deep dive walks through the generators that matter in production (List, Cluster, Git, Matrix, Pull Request), shows three real patterns we lean on at scale, and ends with the gotchas that bite teams adopting ApplicationSets in 2026 with ArgoCD 3.3.

Prerequisites

  • ArgoCD 2.10+ (3.x recommended for the new templatePatch and improved deletion safety)
  • kubectl and argocd CLI 2.10+
  • At least two registered clusters (a hub cluster running ArgoCD plus one or more spoke clusters), or one cluster if you only want to test directory generators
  • A Git repository you can write to — ApplicationSets are pull-based; nothing happens without manifests in Git

Why ApplicationSets replace App-of-Apps

The classic App-of-Apps pattern uses one parent Application that points at a directory of child Application manifests. It works, but it is static: someone has to write the child YAML, commit it, and remember to delete it when the workload moves. App-of-Apps also has weak failure semantics — a broken child Application can leave the parent stuck in OutOfSync with no clean rollback.

ApplicationSets fix three things at once. Generators turn a list of inputs into Applications, the template is a single source of truth so a fix lands everywhere, and the controller owns the lifecycle — remove an input and the Application is removed (with a configurable preservation policy if you need it). The same CRD can target dozens of clusters, so multi-cluster fanout stops being a YAML-generation problem.

The five generators you actually use

ArgoCD ships nine generators. In real teams, five do 95 percent of the work. Here is how they map to common needs.

GeneratorWhat it iteratesUse it for
ListAn inline array of objectsHand-curated tenants, demo environments
ClusterClusters registered in ArgoCDFan out a workload to every cluster
Git (directory)Subdirectories under a pathPer-service or per-tenant manifest folders
Git (file)JSON/YAML files in a repoPer-app config files with rich parameters
MatrixCartesian product of two generatorsApp x Cluster, Tenant x Region
Pull RequestOpen PRs in GitHub/GitLab/BitbucketEphemeral preview environments

Pattern 1: List generator for a curated tenant rollout

Start with the simplest generator. The List generator is an inline array; you control it directly in the ApplicationSet YAML. It is the right tool when the set of inputs is small, human-curated, and changes through pull requests rather than discovery.

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: tenant-portal
  namespace: argocd
spec:
  goTemplate: true
  goTemplateOptions: ["missingkey=error"]
  generators:
    - list:
        elements:
          - tenant: acme
            cluster: https://kubernetes.default.svc
            domain: acme.example.com
          - tenant: globex
            cluster: https://kubernetes.default.svc
            domain: globex.example.com
  template:
    metadata:
      name: 'portal-{{.tenant}}'
    spec:
      project: default
      source:
        repoURL: https://github.com/acme/portal-manifests
        targetRevision: main
        path: charts/portal
        helm:
          valueFiles:
            - values.yaml
          parameters:
            - name: tenant
              value: '{{.tenant}}'
            - name: ingress.host
              value: '{{.domain}}'
      destination:
        server: '{{.cluster}}'
        namespace: 'portal-{{.tenant}}'
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
          - CreateNamespace=true

Apply it with kubectl apply -f tenant-portal.yaml. Within seconds the controller creates two Applications: portal-acme and portal-globex. Adding a third tenant is a one-line change. Deleting a tenant from the list deletes the Application and (if prune: true) the workload.

Note goTemplate: true — this opts you into Go templating, which is the only template engine that gets active development in ArgoCD 3.x. The legacy fasttemplate syntax ({{tenant}} without the dot) still works but does not support functions like upper, quote, or default. Always set missingkey=error — silent empty-string substitution is the source of half the production incidents we see.

Pattern 2: Cluster generator for fleet-wide platform services

Once you operate more than two clusters, you have platform components that must run on every one of them — cert-manager, external-dns, Prometheus node exporter, network policies, OPA Gatekeeper bundles. The Cluster generator iterates the clusters ArgoCD already knows about.

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: platform-cert-manager
  namespace: argocd
spec:
  goTemplate: true
  generators:
    - clusters:
        selector:
          matchLabels:
            env: prod
  template:
    metadata:
      name: 'cert-manager-{{.name}}'
    spec:
      project: platform
      source:
        repoURL: https://charts.jetstack.io
        chart: cert-manager
        targetRevision: v1.16.2
        helm:
          parameters:
            - name: installCRDs
              value: 'true'
            - name: extraArgs[0]
              value: '--cluster-resource-namespace={{.metadata.labels.region}}'
      destination:
        server: '{{.server}}'
        namespace: cert-manager
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
          - CreateNamespace=true
          - ServerSideApply=true

The selector matches clusters by the labels you set when you ran argocd cluster add <context> --label env=prod --label region=us-east-1. Add a new prod cluster with the right label and cert-manager installs itself within the next reconcile interval (3 minutes by default). Decommission a cluster — remove the ArgoCD cluster secret — and the Application disappears with it.

Two operational tips. First, always include ServerSideApply=true for platform components: server-side apply is the only way to coexist with operators that mutate their own resources, and it gives you a clean conflict mode rather than perpetual OutOfSync. Second, namespace the ApplicationSet under an ArgoCD project (here, platform) that restricts the destination clusters and namespaces — without a project boundary, a templating bug can rewrite resources on the wrong cluster.

Pattern 3: Matrix generator for app x cluster fanout

The Matrix generator is the unlock for fleet-scale GitOps. It takes two child generators and produces the cartesian product. The most common shape is Git directory x Cluster: every service in a monorepo deployed to every matching cluster, with no per-service-per-cluster Application manifests.

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: services-fanout
  namespace: argocd
spec:
  goTemplate: true
  generators:
    - matrix:
        generators:
          - git:
              repoURL: https://github.com/acme/services-monorepo
              revision: main
              directories:
                - path: services/*
                - path: services/legacy-*
                  exclude: true
          - clusters:
              selector:
                matchLabels:
                  workload: app
  template:
    metadata:
      name: '{{.path.basename}}-{{.name}}'
    spec:
      project: apps
      source:
        repoURL: https://github.com/acme/services-monorepo
        targetRevision: main
        path: '{{.path.path}}/overlays/{{.metadata.labels.env}}'
      destination:
        server: '{{.server}}'
        namespace: '{{.path.basename}}'
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
          - CreateNamespace=true
          - ApplyOutOfSyncOnly=true

If your monorepo has 12 services and your fleet has 4 application clusters, this single ApplicationSet creates 48 Applications. Crucially, when a developer adds services/billing-v2/ on a new branch and merges, the directory generator picks it up on the next poll and the Matrix produces 4 new Applications — no platform team intervention required.

ApplyOutOfSyncOnly=true is a quiet productivity win at this scale: ArgoCD only applies resources that actually drifted, which keeps API server load proportional to change rate rather than fleet size.

Pattern 4: Pull Request generator for preview environments

The Pull Request generator turns every open PR into its own Application. Combined with a Git directory generator that points at a values overlay, you get isolated preview environments for every PR — created on open, updated on push, deleted on merge or close.

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: previews
  namespace: argocd
spec:
  goTemplate: true
  generators:
    - pullRequest:
        github:
          owner: acme
          repo: portal
          tokenRef:
            secretName: github-token
            key: token
          labels:
            - preview
        requeueAfterSeconds: 60
  template:
    metadata:
      name: 'preview-pr-{{.number}}'
      labels:
        preview: 'true'
    spec:
      project: previews
      source:
        repoURL: https://github.com/acme/portal
        targetRevision: '{{.head_sha}}'
        path: deploy
        helm:
          parameters:
            - name: image.tag
              value: '{{.head_sha}}'
            - name: ingress.host
              value: 'pr-{{.number}}.preview.example.com'
      destination:
        server: https://kubernetes.default.svc
        namespace: 'preview-pr-{{.number}}'
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
          - CreateNamespace=true

Two things make this safe. The labels filter means only PRs tagged preview get an environment — you do not accidentally spin up a cluster's worth of bot PRs. And tying targetRevision to head_sha rather than the branch name means a force-push reconciles to the new commit immediately, instead of ArgoCD chasing a moving branch tip.

Templating tips that survive contact with reality

  • Use Go templates (goTemplate: true). The fasttemplate engine is on the way out and lacks the conditionals you will eventually need.
  • Set goTemplateOptions: ["missingkey=error"] — silent missing-key substitution is the most common ApplicationSet bug.
  • Prefer spec.template.metadata.name values that are unique and deterministic. Including the cluster name plus the path basename is almost always enough.
  • When you need conditional fields, use templatePatch (ArgoCD 2.10+) instead of duplicating the whole template — it strategic-merges over the base.
  • Validate templates locally with argocd appset generate <file>. The CLI prints exactly what the controller would create, before you apply.

Gotchas we have hit in production

1. Deletion safety is opt-in for a reason

By default, deleting an ApplicationSet deletes every Application it produced — and if your sync policy includes prune: true, that means deleting workloads on every cluster. Set spec.syncPolicy.preserveResourcesOnDeletion: true on the ApplicationSet during migrations and on anything mission-critical. ArgoCD 3.3 added a cascading-delete confirmation flag — turn it on at the controller level.

2. Template loops bite when generators emit the same name

Two generators producing parameters that template to the same Application name will fight each other forever. The controller does not error; it just thrashes. Audit your generator outputs with argocd appset generate and ensure the resulting names are unique across the entire ApplicationSet output, not just within one generator.

3. Large Matrix outputs need rate limiting

A Matrix that produces 500 Applications and a sync wave that fires them all at once will DDoS your own Git provider. Use spec.strategy with the RollingSync type to roll out in batches by label, and configure ArgoCD's repo-server cache (--repo-server-cache) generously so manifest generation does not hammer Git.

4. Cluster generator does not know about cluster health

If a spoke cluster's API server is down, the Cluster generator still includes it — ArgoCD does not currently filter clusters by reachability. Pair the Cluster generator with a label like health: ready that you toggle from your cluster lifecycle automation, or accept that some Applications will sit in Unknown until the cluster comes back.

Quick reference

ConceptWhere it livesTypical pitfall
Generatorsspec.generatorsProducing duplicate names
Templatespec.templateForgetting missingkey=error
Sync policyspec.template.spec.syncPolicyprune: true in non-isolated namespaces
Lifecyclespec.syncPolicy (top-level)Default deletes children
Validationargocd appset generateSkipping it before applying

Next steps

  1. Pick one of your existing App-of-Apps trees and replace the parent with a Git directory ApplicationSet. Keep the children manifests in place — the controller will adopt them.
  2. Add the Cluster generator for one platform component (cert-manager is a good starter). Confirm a label change adds and removes the Application cleanly.
  3. Pilot the Pull Request generator on one frontend repo. Set TTL labels on namespaces so abandoned previews self-clean.
  4. Once you trust the controller, introduce the Matrix generator for a small subset of services. Measure repo-server CPU before and after — it is the canary for fleet scaling.

ApplicationSets are not magic; they are a controller that turns inputs into Applications. But once you frame your platform that way, the daily question shifts from which YAML do I write? to what is the right input?, and that is the question worth answering at scale.

Comments

Subscribe to join the conversation...

Be the first to comment