Featured image of post Terraform Development Pipeline

Terraform Development Pipeline

The purpose of a development pipeline is to deploy with confidence and therefore at high frequencies.

I have never been a software developer and never will be, but code is for me the natural way to deploy and change infrastructure for a decade now. Infrastructure automation can be the whole spectrum from a simple PowerShell Script up to IaC manifests, written in Terraform (or Ansible, or Pupumi, etc.). At some point in this spectrum, pipelines will be an important topic. I have boiled down the value of pipelines for me in one sentence: The purpose of a development pipeline is to deploy with confidence and therefore at high frequencies. To deliver this value, a pipeline needs to include several aspects. This blog post will cover some of the components I use for my private terraform projects, in addition to an important warning.

# Warning: Secure Repository and Pipelines

Keep in mind that typically the credentials used in your pipeline are very powerful. You should think about possible attack vectors, configure permissions carefully, and protect sensitive secrets.

# Protect Main Branch

Require all commits be made to a non-target branch (in my case main is the target of the ruleset) and submitted via a pull request before they can be merged. When possible require at least one approving review before a pull request can be merged.

Protect Main Branch

See also: GitHub Docs - Managing a branch protection rule

The enforced Pull Request is also the event that triggers further code validation.

# Workflow run approval for outside collaborators

A critical attack vector is a malicious change to the existing Workflow. Think about this case:

  1. You have a Workflow that executes a terraform plan on all Pull Requests, which is typically harmless
  2. An outside collaborator changes the Workflow into a terraform destroy, which is extremely critical
  3. The outside collaborator now opens a new pull request and the configured Workflow is executed

To mitigate this attack vector, make sure you require approval for all outside collaborator to run workflows on their pull request and always carefully review potential changes on the workflow files before workflow run approval.

Workflow run approval for outside collaborators

See also: GitHub Docs - Approving workflow runs from public forks

# Artifact and log retention

Consider limiting the artifact and log retention to a minimum.

By default, the artifacts and log files generated by workflows are retained for 90 days before they are automatically deleted. You can adjust the retention period, depending on the type of repository:

  • For public repositories: you can change this retention period to anywhere between 1 day and 90 days.
  • For private repositories: you can change this retention period to anywhere between 1 day and 400 days.

See also: GitHub Docs - Artifact and log retention

# Development and Deployment Stages

Development and Deployment Stages

Perhaps you are missing QA and staging in this diagram, that is correct. For my private projects, QA and staging don`t make sense, but for production using this additional step is highly recommended. Possible QA and staging strategies are not covered in this blog post but are at the top of my list of upcoming posts.

# Development

# Coding (VSCode Extensions)

VSCode is my preferred code editor and I use way too many extensions. When it comes to a terraform project, there are two important extensions:

# Commit (Pre-Commit Hooks)

So far, I never used Pre-Commit Hooks for my terraform projects, but I do know a lot of fans of this additional validation. If you are interested in Terraform-Related Git hook scripts have a look at this collection of git hooks to be used with the pre-commit framework.

Git hook scripts are useful for identifying simple issues before submission to code review. We run our hooks on every commit to automatically point out issues in code such as missing semicolons, trailing whitespace, and debug statements. By pointing these issues out before code review, this allows a code reviewer to focus on the architecture of a change while not wasting time with trivial style nitpicks.

Source: https://pre-commit.com/index.html

To give myself an impression of the functionality, I have created a simple configuration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
repos:
- repo: https://github.com/antonbabenko/pre-commit-terraform
  rev: v1.90.0 # Get the latest from: https://github.com/antonbabenko/pre-commit-terraform/releases
  hooks:
    - id: terraform_validate
      args:
        - --args=-json
        - --args=-no-color
        - --tf-init-args=-upgrade
        - --tf-init-args=-lockfile=readonly
        - --hook-config=--parallelism-ci-cpu-cores=1

A validation failure on git commit looks like this:

Pre-Commit Hooks - Issue

# CI/CD

I mostly use GitHub Actions for my pipelines, whether terraform or other things (even this blog post is published from one). GitHub Actions are well adopted by tool vendors and can developed side by side with the code itself.

# Push (Test)

Fast Workflow with some basic tests.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
name: Test on Push
on:
  push:
    branches-ignore:
      - main
      - master

jobs:
  terraform:
    name: Terraform Validation
    runs-on: ubuntu-latest
    permissions:
      actions: read
      contents: read
    steps:
      - name: Checkout Code
        uses: actions/checkout@v4

      - name: Terraform Setup
        uses: hashicorp/setup-terraform@v3

      - name: Terraform fmt
        id: fmt
        run: terraform fmt -check
        continue-on-error: true

      - name: Terraform Init without backend
        id: init
        run: terraform init -backend=false

      - name: Terraform Validate
        id: validate
        run: terraform validate -no-color

Fast Workflow with some basic tests

# Pull Request (Validate)

Infracost report

I use the Infracost Workflow published on the GitHub Marketplace. In the case, that you only use the free tier of Infracost you can strip down the workflow - Only PR comments and no Infracost Cloud uploads.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
# Infracost runs on pull requests (PR) and posts PR comments.
# If you use Infracost Cloud, Infracost also runs on main/master branch pushes so the dashboard is updated.
# The GitHub Action docs (https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows) describe other trigger options.
name: InfraCost on PR
on:
  pull_request:
    types: [opened, synchronize, closed]
  push:
    branches:
      - main
      - master

env:
  # If you use private modules you'll need this env variable to use
  # the same ssh-agent socket value across all jobs & steps.
  SSH_AUTH_SOCK: /tmp/ssh_agent.sock
jobs:
  # This stage runs the Infracost CLI and posts PR comments.
  # It also updates PR comments when the PR is updated (synchronize event).
  infracost-pull-request-checks:
    name: Infracost Pull Request Checks
    if: github.event_name == 'pull_request' && (github.event.action == 'opened' || github.event.action == 'synchronize')
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write # Required to post comments
    # env:
      # If you store Terraform variables or modules in a 3rd party such as TFC or Spacelift,
      # specify the following so Infracost can automatically retrieve them.
      # See https://www.infracost.io/docs/features/terraform_modules/#registry-modules for details.
      #   INFRACOST_TERRAFORM_CLOUD_TOKEN: ${{ secrets.TFC_TOKEN }}
      #   INFRACOST_TERRAFORM_CLOUD_HOST: app.terraform.io
    steps:
      # If you use private modules, add an environment variable or secret
      # called GIT_SSH_KEY with your private key, so Infracost CLI can access
      # private repositories (similar to how Terraform/Terragrunt does).
      # - name: add GIT_SSH_KEY
      #   run: |
      #     ssh-agent -a $SSH_AUTH_SOCK
      #     mkdir -p ~/.ssh
      #     echo "${{ secrets.GIT_SSH_KEY }}" | tr -d '\r' | ssh-add -
      #     ssh-keyscan github.com >> ~/.ssh/known_hosts

      - name: Setup Infracost
        uses: infracost/actions/setup@v3
        # See https://github.com/infracost/actions/tree/master/setup for other inputs
        # If you can't use this action, use Docker image infracost/infracost:ci-0.10
        with:
          api-key: ${{ secrets.INFRACOST_API_KEY }}

      # Checkout the base branch of the pull request (e.g. main/master).
      - name: Checkout base branch
        uses: actions/checkout@v4
        with:
          ref: '${{ github.event.pull_request.base.ref }}'

      # Generate Infracost JSON file as the baseline.
      - name: Generate Infracost cost estimate baseline
        run: |
          infracost breakdown --path=. \
                              --format=json \
                              --out-file=/tmp/infracost-base.json          

      # Checkout the current PR branch so we can create a diff.
      - name: Checkout PR branch
        uses: actions/checkout@v4

      # Generate an Infracost diff and save it to a JSON file.
      - name: Generate Infracost diff
        run: |
          infracost diff --path=. \
                          --format=json \
                          --compare-to=/tmp/infracost-base.json \
                          --out-file=/tmp/infracost.json          

      # Posts a comment to the PR using the 'update' behavior.
      # This creates a single comment and updates it. The "quietest" option.
      # The other valid behaviors are:
      #   delete-and-new - Delete previous comments and create a new one.
      #   hide-and-new - Minimize previous comments and create a new one.
      #   new - Create a new cost estimate comment on every push.
      # See https://www.infracost.io/docs/features/cli_commands/#comment-on-pull-requests for other options.
      - name: Post Infracost comment
        run: |
            infracost comment github --path=/tmp/infracost.json \
                                     --repo=$GITHUB_REPOSITORY \
                                     --github-token=${{ github.token }} \
                                     --pull-request=${{ github.event.pull_request.number }} \
                                     --behavior=update            

  # Run Infracost on default branch and update Infracost Cloud
  infracost-default-branch-update:
    # If you use private modules, or store Terraform variables or modules in a 3rd party
    # such as TFC or Spacelift, include the same steps/variables as the infracost-pull-request-checks job
    name: Infracost Default Branch Update
    if: github.event_name == 'push' && (github.ref_name == 'main' || github.ref_name == 'master')
    runs-on: ubuntu-latest
    steps:
      - name: Setup Infracost
        uses: infracost/actions/setup@v3
        with:
          api-key: ${{ secrets.INFRACOST_API_KEY }}

      - name: Checkout main/master branch
        uses: actions/checkout@v4

      - name: Run Infracost on default branch and update Infracost Cloud
        run: |
          infracost breakdown --path=. \
                    --format=json \
                    --out-file=infracost.json

          infracost upload --path=infracost.json || echo "Always pass main branch runs even if there are policy failures"          

  # Update PR status in Infracost Cloud
  infracost-pull-request-status-update:
    name: Infracost PR Status Update
    if: github.event_name == 'pull_request' && github.event.action == 'closed'
    runs-on: ubuntu-latest
    steps:
    - name: Infracost PR Status Update
      run: |
        PR_STATUS="MERGED"
        if [[ ${{ github.event.pull_request.merged }} = false ]]; then PR_STATUS="CLOSED"; fi

        echo "Updating status of ${{ github.event.pull_request.html_url }} to $PR_STATUS"
        curl -i \
          --request POST \
          --header "Content-Type: application/json" \
          --header "X-API-Key: $INFRACOST_API_KEY" \
          --data "{ \"query\": \"mutation {updatePullRequestStatus( url: \\\"${{ github.event.pull_request.html_url }}\\\", status: $PR_STATUS )}\" }" \
          "https://dashboard.api.infracost.io/graphql";        
      env:
        INFRACOST_API_KEY: ${{ secrets.INFRACOST_API_KEY }}

GitHub Action - Infracost report*

Vulnerability scan

Vulnerability scanning is done by Trivy. The results are written to the workflow log and the PR is blocked on findings.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
name: Trivy on PR

on:
  pull_request:
    types: [opened, synchronize]

permissions:
  contents: read

jobs:
  trivy:
    permissions:
      contents: read
      security-events: write
      actions: read 
    name: Trivy
    runs-on: "ubuntu-20.04"
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Run Trivy vulnerability scanner in IaC mode
        uses: aquasecurity/trivy-action@master
        with:
          scan-type: 'config'
          hide-progress: true
          format: 'table'
          exit-code: '1'
          severity: 'CRITICAL,HIGH'

To accept trivy findings, I prefer the inline method #trivy:ignore:AVD-AZU-0039:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# Create virtual machine
#trivy:ignore:AVD-AZU-0039
resource "azurerm_virtual_machine" "spokevm" {
  name                             = var.vmname
  location                         = var.location
  resource_group_name              = var.rgname
  network_interface_ids            = [azurerm_network_interface.spokevmnic.id]
  vm_size                          = var.vmsize
  delete_os_disk_on_termination    = true
  delete_data_disks_on_termination = true

  storage_os_disk {
    name              = "${var.vmname}-myOsDisk"
    caching           = "ReadWrite"
    create_option     = "FromImage"
    managed_disk_type = "Standard_LRS"
  }

  storage_image_reference {
    publisher = "canonical"
    offer     = "0001-com-ubuntu-server-jammy"
    sku       = "22_04-lts-gen2"
    version   = "latest"
  }

  os_profile {
    computer_name  = var.vmname
    admin_username = var.adminusername
    admin_password = var.vmpassword
    custom_data    = data.template_cloudinit_config.config_linux.rendered
  }

  os_profile_linux_config {
    disable_password_authentication = false
  }

  boot_diagnostics {
    enabled     = "true"
    storage_uri = azurerm_storage_account.spokestorageaccount.primary_blob_endpoint
  }

  tags = var.tags
}

GitHub Action - Trivy Log

Linting

Linting is done by TFlint. The results are written to the workflow log and the PR is blocked on findings.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
name: TFlint on PR

on:
  pull_request:
    types: [opened, synchronize]

permissions:
  contents: read

jobs:
  tflint:
    name: TFlint linting
    runs-on: ubuntu-latest
    permissions:
      actions: read
      contents: read
    steps:
    - uses: actions/checkout@v4
      name: Checkout source code

    - uses: terraform-linters/setup-tflint@v3
      name: Setup TFLint
      with:
        tflint_wrapper: false
        tflint_version: latest

    - name: Show version
      run: tflint --version

    - name: Init TFLint
      run: tflint --init
      env:
        GITHUB_TOKEN: ${{ github.token }}

    - name: Run TFLint
      id: tflint
      run: tflint -f compact

My workspace contains a customized .tflint.hcl which includes an additional rule to enforce a minimum set of tags on supported resources.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
plugin "azurerm" {
    enabled = true
    version = "0.26.0"
    source  = "github.com/terraform-linters/tflint-ruleset-azurerm"
}

rule "azurerm_resource_missing_tags" {
  enabled = true
  tags = ["supportgroup", "applicationname", "environment"]
  exclude = [] # (Optional) Exclude some resource types from tag checks
}

A linting error looks similar to this and the PR is blocked:

GitHub Action - TFLint Log

Validation and Plan

My “Validation and Plan” workflow uses the same vars as the deployment workflow, this is required to execute the terraform plan step with the enabled backend.

Steps:

  • Terraform Format and Style
  • Terraform Initialization
  • Terraform Validation
  • Terraform Plan
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
name: Validate and Plan on PR
on:
  pull_request:
    types: [opened, synchronize]

jobs:
  terraform:
    name: Terraform Validation and Plan
    runs-on: ubuntu-latest
    permissions:
      actions: read
      contents: read
      pull-requests: write
    env:
      ARM_CLIENT_ID: ${{ secrets.AZURE_AD_CLIENT_ID }}
      ARM_CLIENT_SECRET: ${{ secrets.AZURE_AD_CLIENT_SECRET }}
      ARM_SUBSCRIPTION_ID: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
      ARM_TENANT_ID: ${{ secrets.AZURE_AD_TENANT_ID }}
      VAR_PASSWORD: ${{ secrets.VAR_PASSWORD }}
    steps:
      - name: Checkout Code
        uses: actions/checkout@v4

      - name: Terraform Setup
        uses: hashicorp/setup-terraform@v3

      - name: Terraform fmt
        id: fmt
        run: terraform fmt -check
        continue-on-error: true

      - name: Terraform Init
        id: init
        run: terraform init -backend-config="resource_group_name=tfstate" -backend-config="storage_account_name=tfstate1910602351" -backend-config="container_name=hubspoke" -backend-config="key=hubandspokemin.tfstate"

      - name: Terraform Validate
        id: validate
        run: terraform validate -no-color

      - name: Terraform Plan
        id: plan
        run: terraform plan -no-color -var=vm_admin_pwd=$VAR_PASSWORD

      - uses: actions/github-script@v6
        if: github.event_name == 'pull_request'
        env:
          PLAN: "terraform\n${{ steps.plan.outputs.stdout }}"
        with:
          github-token: ${{ secrets.GITHUB_TOKEN }}
          script: |
            const output = `#### Terraform Format and Style 🖌\`${{ steps.fmt.outcome }}\`
            #### Terraform Initialization ⚙️\`${{ steps.init.outcome }}\`
            #### Terraform Validation 🤖\`${{ steps.validate.outcome }}\`
            <details><summary>Validation Output</summary>

            \`\`\`\n
            ${{ steps.validate.outputs.stdout }}
            \`\`\`

            </details>

            #### Terraform Plan 📖\`${{ steps.plan.outcome }}\`

            <details><summary>Show Plan</summary>

            \`\`\`\n
            ${process.env.PLAN}
            \`\`\`

            </details>

            *Pusher: @${{ github.actor }}, Action: \`${{ github.event_name }}\`, Working Directory: \`${{ env.tf_actions_working_dir }}\`, Workflow: \`${{ github.workflow }}\`*`;

            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: output
            })            
          

The return of the validation and plan is added as a comment to the PR.

GitHub Action - Validation and Plan

# Merge (Deploy)

My deployment workflow executes a minimum set of validation prior terraform apply.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
name: Validate and Deploy on Merge
on:
  push:
    branches:
      - main
      - master

jobs:
  deploy:
    name: Validate and Deploy
    runs-on: ubuntu-latest
    permissions:
      actions: read
      contents: read
    env:
      ARM_CLIENT_ID: ${{ secrets.AZURE_AD_CLIENT_ID }}
      ARM_CLIENT_SECRET: ${{ secrets.AZURE_AD_CLIENT_SECRET }}
      ARM_SUBSCRIPTION_ID: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
      ARM_TENANT_ID: ${{ secrets.AZURE_AD_TENANT_ID }}
      VAR_PASSWORD: ${{ secrets.VAR_PASSWORD }}
    steps:
    - uses: actions/checkout@v4
      name: Checkout source code

    - uses: terraform-linters/setup-tflint@v3
      name: Setup TFLint
      with:
        tflint_version: latest

    - name: Init TFLint
      run: tflint --init
      env:
        GITHUB_TOKEN: ${{ github.token }}

    - name: Run TFLint
      id: tflint
      run: tflint -f compact

    - name: Terraform Setup
      uses: hashicorp/setup-terraform@v3

    - name: Terraform Init
      id: init
      run: terraform init -backend-config="resource_group_name=tfstate" -backend-config="storage_account_name=tfstate1910602351" -backend-config="container_name=hubspoke" -backend-config="key=hubandspokemin.tfstate"

    - name: Terraform Validate
      id: validate
      run: terraform validate -no-color

    - name: Terraform Apply
      id: plan
      run: terraform apply -no-color -auto-approve -var=vm_admin_pwd=$VAR_PASSWORD

# Summary

GitHub Actions as a Terraform deployment pipeline are pretty flexible and easy to implement, but as usual, there are some limitations:

  • No “Approval” inner Workflow execution (e.g based on plan results or cost changes)
  • The results of Plan and Apply are not easy to read

Take a look at my “Azure Hub & Spoke Minimum” Repo to see the full configuration in action.

In the case that GitHub Actions are not sufficient, you should take a look at Terraform Cloud or other IaC Management Platforms like Scalr or Spacelift. In addition to the capabilities to deal with the limitations mentioned above, these platforms offer many more features in terms of collaboration.

Licensed under CC BY-NC-SA 4.0
Built with Hugo
Theme Stack designed by Jimmy