I have never been a software developer and never will be, but code is for me the natural way to deploy and change infrastructure for a decade now. Infrastructure automation can be the whole spectrum from a simple PowerShell Script up to IaC manifests, written in Terraform (or Ansible, or Pupumi, etc.). At some point in this spectrum, pipelines will be an important topic. I have boiled down the value of pipelines for me in one sentence: The purpose of a development pipeline is to deploy with confidence and therefore at high frequencies. To deliver this value, a pipeline needs to include several aspects. This blog post will cover some of the components I use for my private terraform projects, in addition to an important warning.
Warning: Secure Repository and Pipelines
Keep in mind that typically the credentials used in your pipeline are very powerful. You should think about possible attack vectors, configure permissions carefully, and protect sensitive secrets.
Protect Main Branch
Require all commits be made to a non-target branch (in my case main is the target of the ruleset) and submitted via a pull request before they can be merged. When possible require at least one approving review before a pull request can be merged.
See also: GitHub Docs - Managing a branch protection rule
The enforced Pull Request is also the event that triggers further code validation.
Workflow run approval for outside collaborators
A critical attack vector is a malicious change to the existing Workflow. Think about this case:
- You have a Workflow that executes a
terraform plan
on all Pull Requests, which is typically harmless
- An outside collaborator changes the Workflow into a
terraform destroy
, which is extremely critical
- The outside collaborator now opens a new pull request and the configured Workflow is executed
To mitigate this attack vector, make sure you require approval for all outside collaborator to run workflows on their pull request and always carefully review potential changes on the workflow files before workflow run approval.
See also: GitHub Docs - Approving workflow runs from public forks
Artifact and log retention
Consider limiting the artifact and log retention to a minimum.
By default, the artifacts and log files generated by workflows are retained for 90 days before they are automatically deleted. You can adjust the retention period, depending on the type of repository:
- For public repositories: you can change this retention period to anywhere between 1 day and 90 days.
- For private repositories: you can change this retention period to anywhere between 1 day and 400 days.
See also: GitHub Docs - Artifact and log retention
Development and Deployment Stages
Perhaps you are missing QA and staging in this diagram, that is correct. For my private projects, QA and staging don`t make sense, but for production using this additional step is highly recommended. Possible QA and staging strategies are not covered in this blog post but are at the top of my list of upcoming posts.
Development
Coding (VSCode Extensions)
VSCode is my preferred code editor and I use way too many extensions. When it comes to a terraform project, there are two important extensions:
Commit (Pre-Commit Hooks)
So far, I never used Pre-Commit Hooks for my terraform projects, but I do know a lot of fans of this additional validation. If you are interested in Terraform-Related Git hook scripts have a look at this collection of git hooks to be used with the pre-commit framework.
Git hook scripts are useful for identifying simple issues before submission to code review. We run our hooks on every commit to automatically point out issues in code such as missing semicolons, trailing whitespace, and debug statements. By pointing these issues out before code review, this allows a code reviewer to focus on the architecture of a change while not wasting time with trivial style nitpicks.
Source: https://pre-commit.com/index.html
To give myself an impression of the functionality, I have created a simple configuration:
1
2
3
4
5
6
7
8
9
10
11
|
repos:
- repo: https://github.com/antonbabenko/pre-commit-terraform
rev: v1.90.0 # Get the latest from: https://github.com/antonbabenko/pre-commit-terraform/releases
hooks:
- id: terraform_validate
args:
- --args=-json
- --args=-no-color
- --tf-init-args=-upgrade
- --tf-init-args=-lockfile=readonly
- --hook-config=--parallelism-ci-cpu-cores=1
|
A validation failure on git commit
looks like this:
CI/CD
I mostly use GitHub Actions for my pipelines, whether terraform or other things (even this blog post is published from one). GitHub Actions are well adopted by tool vendors and can developed side by side with the code itself.
Push (Test)
Fast Workflow with some basic tests.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
|
name: Test on Push
on:
push:
branches-ignore:
- main
- master
jobs:
terraform:
name: Terraform Validation
runs-on: ubuntu-latest
permissions:
actions: read
contents: read
steps:
- name: Checkout Code
uses: actions/checkout@v4
- name: Terraform Setup
uses: hashicorp/setup-terraform@v3
- name: Terraform fmt
id: fmt
run: terraform fmt -check
continue-on-error: true
- name: Terraform Init without backend
id: init
run: terraform init -backend=false
- name: Terraform Validate
id: validate
run: terraform validate -no-color
|
Pull Request (Validate)
Infracost report
I use the Infracost Workflow published on the GitHub Marketplace. In the case, that you only use the free tier of Infracost you can strip down the workflow - Only PR comments and no Infracost Cloud uploads.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
|
# Infracost runs on pull requests (PR) and posts PR comments.
# If you use Infracost Cloud, Infracost also runs on main/master branch pushes so the dashboard is updated.
# The GitHub Action docs (https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows) describe other trigger options.
name: InfraCost on PR
on:
pull_request:
types: [opened, synchronize, closed]
push:
branches:
- main
- master
env:
# If you use private modules you'll need this env variable to use
# the same ssh-agent socket value across all jobs & steps.
SSH_AUTH_SOCK: /tmp/ssh_agent.sock
jobs:
# This stage runs the Infracost CLI and posts PR comments.
# It also updates PR comments when the PR is updated (synchronize event).
infracost-pull-request-checks:
name: Infracost Pull Request Checks
if: github.event_name == 'pull_request' && (github.event.action == 'opened' || github.event.action == 'synchronize')
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write # Required to post comments
# env:
# If you store Terraform variables or modules in a 3rd party such as TFC or Spacelift,
# specify the following so Infracost can automatically retrieve them.
# See https://www.infracost.io/docs/features/terraform_modules/#registry-modules for details.
# INFRACOST_TERRAFORM_CLOUD_TOKEN: ${{ secrets.TFC_TOKEN }}
# INFRACOST_TERRAFORM_CLOUD_HOST: app.terraform.io
steps:
# If you use private modules, add an environment variable or secret
# called GIT_SSH_KEY with your private key, so Infracost CLI can access
# private repositories (similar to how Terraform/Terragrunt does).
# - name: add GIT_SSH_KEY
# run: |
# ssh-agent -a $SSH_AUTH_SOCK
# mkdir -p ~/.ssh
# echo "${{ secrets.GIT_SSH_KEY }}" | tr -d '\r' | ssh-add -
# ssh-keyscan github.com >> ~/.ssh/known_hosts
- name: Setup Infracost
uses: infracost/actions/setup@v3
# See https://github.com/infracost/actions/tree/master/setup for other inputs
# If you can't use this action, use Docker image infracost/infracost:ci-0.10
with:
api-key: ${{ secrets.INFRACOST_API_KEY }}
# Checkout the base branch of the pull request (e.g. main/master).
- name: Checkout base branch
uses: actions/checkout@v4
with:
ref: '${{ github.event.pull_request.base.ref }}'
# Generate Infracost JSON file as the baseline.
- name: Generate Infracost cost estimate baseline
run: |
infracost breakdown --path=. \
--format=json \
--out-file=/tmp/infracost-base.json
# Checkout the current PR branch so we can create a diff.
- name: Checkout PR branch
uses: actions/checkout@v4
# Generate an Infracost diff and save it to a JSON file.
- name: Generate Infracost diff
run: |
infracost diff --path=. \
--format=json \
--compare-to=/tmp/infracost-base.json \
--out-file=/tmp/infracost.json
# Posts a comment to the PR using the 'update' behavior.
# This creates a single comment and updates it. The "quietest" option.
# The other valid behaviors are:
# delete-and-new - Delete previous comments and create a new one.
# hide-and-new - Minimize previous comments and create a new one.
# new - Create a new cost estimate comment on every push.
# See https://www.infracost.io/docs/features/cli_commands/#comment-on-pull-requests for other options.
- name: Post Infracost comment
run: |
infracost comment github --path=/tmp/infracost.json \
--repo=$GITHUB_REPOSITORY \
--github-token=${{ github.token }} \
--pull-request=${{ github.event.pull_request.number }} \
--behavior=update
# Run Infracost on default branch and update Infracost Cloud
infracost-default-branch-update:
# If you use private modules, or store Terraform variables or modules in a 3rd party
# such as TFC or Spacelift, include the same steps/variables as the infracost-pull-request-checks job
name: Infracost Default Branch Update
if: github.event_name == 'push' && (github.ref_name == 'main' || github.ref_name == 'master')
runs-on: ubuntu-latest
steps:
- name: Setup Infracost
uses: infracost/actions/setup@v3
with:
api-key: ${{ secrets.INFRACOST_API_KEY }}
- name: Checkout main/master branch
uses: actions/checkout@v4
- name: Run Infracost on default branch and update Infracost Cloud
run: |
infracost breakdown --path=. \
--format=json \
--out-file=infracost.json
infracost upload --path=infracost.json || echo "Always pass main branch runs even if there are policy failures"
# Update PR status in Infracost Cloud
infracost-pull-request-status-update:
name: Infracost PR Status Update
if: github.event_name == 'pull_request' && github.event.action == 'closed'
runs-on: ubuntu-latest
steps:
- name: Infracost PR Status Update
run: |
PR_STATUS="MERGED"
if [[ ${{ github.event.pull_request.merged }} = false ]]; then PR_STATUS="CLOSED"; fi
echo "Updating status of ${{ github.event.pull_request.html_url }} to $PR_STATUS"
curl -i \
--request POST \
--header "Content-Type: application/json" \
--header "X-API-Key: $INFRACOST_API_KEY" \
--data "{ \"query\": \"mutation {updatePullRequestStatus( url: \\\"${{ github.event.pull_request.html_url }}\\\", status: $PR_STATUS )}\" }" \
"https://dashboard.api.infracost.io/graphql";
env:
INFRACOST_API_KEY: ${{ secrets.INFRACOST_API_KEY }}
|
Vulnerability scan
Vulnerability scanning is done by Trivy. The results are written to the workflow log and the PR is blocked on findings.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
|
name: Trivy on PR
on:
pull_request:
types: [opened, synchronize]
permissions:
contents: read
jobs:
trivy:
permissions:
contents: read
security-events: write
actions: read
name: Trivy
runs-on: "ubuntu-20.04"
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Run Trivy vulnerability scanner in IaC mode
uses: aquasecurity/trivy-action@master
with:
scan-type: 'config'
hide-progress: true
format: 'table'
exit-code: '1'
severity: 'CRITICAL,HIGH'
|
To accept trivy findings, I prefer the inline method #trivy:ignore:AVD-AZU-0039
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
|
# Create virtual machine
#trivy:ignore:AVD-AZU-0039
resource "azurerm_virtual_machine" "spokevm" {
name = var.vmname
location = var.location
resource_group_name = var.rgname
network_interface_ids = [azurerm_network_interface.spokevmnic.id]
vm_size = var.vmsize
delete_os_disk_on_termination = true
delete_data_disks_on_termination = true
storage_os_disk {
name = "${var.vmname}-myOsDisk"
caching = "ReadWrite"
create_option = "FromImage"
managed_disk_type = "Standard_LRS"
}
storage_image_reference {
publisher = "canonical"
offer = "0001-com-ubuntu-server-jammy"
sku = "22_04-lts-gen2"
version = "latest"
}
os_profile {
computer_name = var.vmname
admin_username = var.adminusername
admin_password = var.vmpassword
custom_data = data.template_cloudinit_config.config_linux.rendered
}
os_profile_linux_config {
disable_password_authentication = false
}
boot_diagnostics {
enabled = "true"
storage_uri = azurerm_storage_account.spokestorageaccount.primary_blob_endpoint
}
tags = var.tags
}
|
Linting
Linting is done by TFlint. The results are written to the workflow log and the PR is blocked on findings.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
|
name: TFlint on PR
on:
pull_request:
types: [opened, synchronize]
permissions:
contents: read
jobs:
tflint:
name: TFlint linting
runs-on: ubuntu-latest
permissions:
actions: read
contents: read
steps:
- uses: actions/checkout@v4
name: Checkout source code
- uses: terraform-linters/setup-tflint@v3
name: Setup TFLint
with:
tflint_wrapper: false
tflint_version: latest
- name: Show version
run: tflint --version
- name: Init TFLint
run: tflint --init
env:
GITHUB_TOKEN: ${{ github.token }}
- name: Run TFLint
id: tflint
run: tflint -f compact
|
My workspace contains a customized .tflint.hcl which includes an additional rule to enforce a minimum set of tags on supported resources.
1
2
3
4
5
6
7
8
9
10
11
|
plugin "azurerm" {
enabled = true
version = "0.26.0"
source = "github.com/terraform-linters/tflint-ruleset-azurerm"
}
rule "azurerm_resource_missing_tags" {
enabled = true
tags = ["supportgroup", "applicationname", "environment"]
exclude = [] # (Optional) Exclude some resource types from tag checks
}
|
A linting error looks similar to this and the PR is blocked:
Validation and Plan
My “Validation and Plan” workflow uses the same vars as the deployment workflow, this is required to execute the terraform plan
step with the enabled backend.
Steps:
- Terraform Format and Style
- Terraform Initialization
- Terraform Validation
- Terraform Plan
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
|
name: Validate and Plan on PR
on:
pull_request:
types: [opened, synchronize]
jobs:
terraform:
name: Terraform Validation and Plan
runs-on: ubuntu-latest
permissions:
actions: read
contents: read
pull-requests: write
env:
ARM_CLIENT_ID: ${{ secrets.AZURE_AD_CLIENT_ID }}
ARM_CLIENT_SECRET: ${{ secrets.AZURE_AD_CLIENT_SECRET }}
ARM_SUBSCRIPTION_ID: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
ARM_TENANT_ID: ${{ secrets.AZURE_AD_TENANT_ID }}
VAR_PASSWORD: ${{ secrets.VAR_PASSWORD }}
steps:
- name: Checkout Code
uses: actions/checkout@v4
- name: Terraform Setup
uses: hashicorp/setup-terraform@v3
- name: Terraform fmt
id: fmt
run: terraform fmt -check
continue-on-error: true
- name: Terraform Init
id: init
run: terraform init -backend-config="resource_group_name=tfstate" -backend-config="storage_account_name=tfstate1910602351" -backend-config="container_name=hubspoke" -backend-config="key=hubandspokemin.tfstate"
- name: Terraform Validate
id: validate
run: terraform validate -no-color
- name: Terraform Plan
id: plan
run: terraform plan -no-color -var=vm_admin_pwd=$VAR_PASSWORD
- uses: actions/github-script@v6
if: github.event_name == 'pull_request'
env:
PLAN: "terraform\n${{ steps.plan.outputs.stdout }}"
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
const output = `#### Terraform Format and Style 🖌\`${{ steps.fmt.outcome }}\`
#### Terraform Initialization ⚙️\`${{ steps.init.outcome }}\`
#### Terraform Validation 🤖\`${{ steps.validate.outcome }}\`
<details><summary>Validation Output</summary>
\`\`\`\n
${{ steps.validate.outputs.stdout }}
\`\`\`
</details>
#### Terraform Plan 📖\`${{ steps.plan.outcome }}\`
<details><summary>Show Plan</summary>
\`\`\`\n
${process.env.PLAN}
\`\`\`
</details>
*Pusher: @${{ github.actor }}, Action: \`${{ github.event_name }}\`, Working Directory: \`${{ env.tf_actions_working_dir }}\`, Workflow: \`${{ github.workflow }}\`*`;
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: output
})
|
The return of the validation and plan is added as a comment to the PR.
Merge (Deploy)
My deployment workflow executes a minimum set of validation prior terraform apply
.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
|
name: Validate and Deploy on Merge
on:
push:
branches:
- main
- master
jobs:
deploy:
name: Validate and Deploy
runs-on: ubuntu-latest
permissions:
actions: read
contents: read
env:
ARM_CLIENT_ID: ${{ secrets.AZURE_AD_CLIENT_ID }}
ARM_CLIENT_SECRET: ${{ secrets.AZURE_AD_CLIENT_SECRET }}
ARM_SUBSCRIPTION_ID: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
ARM_TENANT_ID: ${{ secrets.AZURE_AD_TENANT_ID }}
VAR_PASSWORD: ${{ secrets.VAR_PASSWORD }}
steps:
- uses: actions/checkout@v4
name: Checkout source code
- uses: terraform-linters/setup-tflint@v3
name: Setup TFLint
with:
tflint_version: latest
- name: Init TFLint
run: tflint --init
env:
GITHUB_TOKEN: ${{ github.token }}
- name: Run TFLint
id: tflint
run: tflint -f compact
- name: Terraform Setup
uses: hashicorp/setup-terraform@v3
- name: Terraform Init
id: init
run: terraform init -backend-config="resource_group_name=tfstate" -backend-config="storage_account_name=tfstate1910602351" -backend-config="container_name=hubspoke" -backend-config="key=hubandspokemin.tfstate"
- name: Terraform Validate
id: validate
run: terraform validate -no-color
- name: Terraform Apply
id: plan
run: terraform apply -no-color -auto-approve -var=vm_admin_pwd=$VAR_PASSWORD
|
Summary
GitHub Actions as a Terraform deployment pipeline are pretty flexible and easy to implement, but as usual, there are some limitations:
- No “Approval” inner Workflow execution (e.g based on plan results or cost changes)
- The results of Plan and Apply are not easy to read
Take a look at my “Azure Hub & Spoke Minimum” Repo to see the full configuration in action.
In the case that GitHub Actions are not sufficient, you should take a look at Terraform Cloud or other IaC Management Platforms like Scalr or Spacelift. In addition to the capabilities to deal with the limitations mentioned above, these platforms offer many more features in terms of collaboration.