Hubble Homelab

Documentation

This repository contains infrastructure-as-code for the automated deployment and configuration, and management of a Hashicorp (Nomad + Consul + Vault) cluster on Proxmox.

Disclaimer

This project is in alpha status and subject to bugs and breaking changes.

Please do not run any code on your machine without understanding the provisioning flow, in case of data loss. Some playbooks may perform destructive actions that are irreversible!

Overview

This project aims to provision a full Hashicorp cluster in a semi-automated manner. It utilizes Packer, Ansible and Terraform:

  1. Packer creates base Proxmox VM templates from cloud images and ISOs
  2. Terraform provisions cluster nodes by cloning existing VM templates
  3. Ansible installs and configures Vault, Consul, Nomad on cluster nodes

It comprises minimally of one server and one client node with no high availability (HA). The nodes run Vault, Consul and Nomad as a cluster.

To support HA, the setup can be further expanded to at least three server nodes and multiple client nodes hosted on a Proxmox cluster, spanning multiple physical machines.

Features

  • Golden image creation with Packer
  • Declarative configuration of Proxmox VMs and Vault with Terraform
  • Automated post-provisioning with Ansible
  • Nomad container scheduling and orchestration
  • Consul service discovery
  • Secure node communication via mTLS
  • Personal Certificate Authority hosted on Vault
  • Secrets management, retrieval and rotation with Vault
  • Automated certificate management with Vault and consul-template
  • Let's Encrypt certificates on Traefik reverse proxy

Getting Started

See the documentation for more information on the concrete steps to configure and provision the cluster.

Folder Structure

.
├── ansible/
│   ├── roles
│   ├── playbooks
│   ├── inventory    # inventory files
│   └── goss         # goss config
├── bin              # custom scripts
├── packer/
│   ├── base         # VM template from ISO
│   └── base-clone   # VM template from existing template
└── terraform/
    ├── cluster      # config for cluster
    ├── dev          # config where I test changes
    ├── minio        # config for Minio buckets
    ├── modules      # tf modules
    ├── nomad        # nomad jobs
    ├── postgres     # config for Postgres DB users
    ├── proxmox      # config for Proxmox accounts
    └── vault        # config for Vault

Limitations

  • Manual Vault unseal on reboot
  • Inter-job dependencies are not supported in Nomad
  • Vault agent is run as root

See issues for more information.

Acknowledgements

Prerequisites

Hardware Requirements

This project can be run on any modern x86_64 system that meets the recommended system requirements of Proxmox. I recommend mini-SFF workstations such as those from Project TinyMiniMicro. Alternatively, you may choose to run the cluster on a different hypervisor, on ARM64 systems or entirely on bare metal but YMMV.

My own setup comprises of:

  • 1x Intel HP Elitedesk 800 G2 Mini
    • CPU: Intel Core i5-6500T
    • RAM: 16GB DDR4
    • Storage: 256GB SSD (OS), 3TB HDD
  • 1x Raspberry Pi 4B+
  • TP-Link 5 Port Gigabit Switch

While a separate router and NAS is recommended, I run a virtualized instance of both within Proxmox itself.

Networking

The LAN is not restricted to any specific network architecture, but all server nodes should be reachable by each other, and the controller host via SSH.

The following are optional, but highly recommended:

  • A local DNS server that forwards service.consul queries to Consul for DNS lookup. This project uses Coredns.
  • A custom domain from any domain registrar, added to Cloudflare as a zone.

Controller Node

A workstation, controller node or separate host system will be used to run the required provisioning tools. This system will need to have the following tools installed:

  • Packer
  • Terraform
  • Ansible
  • Python 3 for various scripts (optional)

Alternatively, you are free to install the above tools on the same server that you are provisioning the cluster.

Cluster Requirements

  • An existing Proxmox server that is reachable by the controller node
  • (Optional) An offline, private root and intermediate CA.
  • A self-signed certificate, private key for TLS encryption of Vault. A default key-pair is generated on installation of Vault.

Note: While Vault can use certificates generated from its own PKI secrets engine, a temporary key pair is still required to start up Vault.

Getting Started

Our goal is to provision a Nomad, Consul and Vault cluster with one server node and one client node. The basic provisioning flow is as follows:

  1. Packer creates base Proxmox VM templates from cloud images and ISOs
  2. Terraform provisions cluster nodes by cloning existing VM templates
  3. Ansible installs and configures Vault, Consul, Nomad on cluster nodes

Assumptions

The following assumptions are made in this guide:

  • All prerequisites are fulfilled
  • The cluster is provisioned on a Proxmox server
  • All nodes are running Debian 11 virtual machines (not LXCs)

Please make the necessary changes if there are any deviations from the above.

Creating a VM template

The Proxmox builder plugin is used to create a new VM template. It supports two different builders:

We will be using the first builder. If you have an existing template to provision, you may skip to the next section. Otherwise, assuming that we are lacking an existing, clean VM template, we will import a cloud image and turn it into a new template.

Note: It is important that the existing template must have:

  • An attached cloud-init drive for the builder to add the SSH communicator configuration
  • cloud-init installed
  • qemu-guest-agent installed
  1. (Optional) Run the bin/import-cloud-image script to import a new cloud image:
$ import-cloud-image [URL]
  1. Navigate to packer/base-clone

Tip: Use the bin/generate-vars script to quickly generate variable files in packer and terraform subdirectories.

  1. Populate the necessary variables in auto.pkrvars.hcl:
proxmox_url      = "https://<PVE_IP>:8006/api2/json"
proxmox_username = "<user>@pam"
proxmox_password = "<password>"

clone_vm = "<cloud-image-name>"
vm_name  = "<new-template-name>"
vm_id    = 5000

ssh_username = "debian"
ssh_public_key_path = "/path/to/public/key"
ssh_private_key_path = "/path/to/private/key"
  1. Build the image:
$ packer validate -var-file="auto.pkrvars.hcl" .
$ packer build -var-file="auto.pkrvars.hcl" .

Packer will create a new base image and use the Ansible post-provisioner to install and configure software (eg. Docker, Nomad, Consul and Vault). For more details, see Packer.

Provisioning with Terraform

We are using the bpg/proxmox provider to provision virtual machines from our Packer templates.

  1. Navigate to terraform/cluster
  2. Populate the necessary variables in terraform.tfvars:
proxmox_ip        = "https://<PVE_IP>:8006/api2/json"
proxmox_api_token = "<API_TOKEN>"

template_id = 5000
ip_gateway  = "10.10.10.1"

servers = [
  {
    name       = "server"
    id         = 110
    cores      = 2
    sockets    = 2
    memory     = 4096
    disk_size  = 10
    ip_address = "10.10.10.110/24"
  }
]

clients = [
  {
    name       = "client"
    id         = 111
    cores      = 2
    sockets    = 2
    memory     = 10240
    disk_size  = 15
    ip_address = "10.10.10.111/24"
  }
]

ssh_user             = "debian"
ssh_private_key_file = "/path/to/ssh/private/key"
ssh_public_key_file  = "/path/to/ssh/public/key"
  1. Provision the cluster:
$ terraform init
$ terraform plan
$ terraform apply

The above configuration will provision two VM nodes in Proxmox:

Server node: VMID 110 at 10.10.10.110
Client node: VMID 111 at 10.10.10.111

An Ansible inventory file tf_ansible_inventory should be generated in the same directory with the given VM IPs in the server and client groups.

For more details, refer to the Terraform configuration for Proxmox.

Configuration with Ansible

At this stage, there should be one server node and one client node running on Proxmox that is reachable by SSH. These nodes should have Nomad, Consul and Vault installed. We will proceed to use Ansible (and Terraform) to configure Vault, Consul and Nomad (in that order) into a working cluster.

  1. Navigate to ansible
  2. Ensure that the Terraform-generated Ansible inventory file is being read:
$ ansible-inventory --graph
  1. Populate and check the group_vars files in inventory/group_vars/{prod,server,client}.yml
$ ansible-inventory --graph --vars

Note: The nfs_share_mounts variable in inventory/group_vars/client.yml should be modified or removed if not required

  1. Run the playbook:
$ ansible-playbook main.yml

The playbook will perform the following:

  1. Create a root and intermediate CA for Vault
  2. Configure Vault to use new CA
  3. Initialize Vault roles, authentication and PKI with Terraform with configuration in terraform/vault
  4. Configure Vault-agent and consul-template in server node
  5. Configure Consul and Nomad in server node. These roles depend on Vault being successfully configured and started as they require Vault to generate a gossip key and TLS certificates
  6. Repeat 4-5 for client node

Note on Data Loss

When re-running the playbook on the same server, Vault will not be re-initialized. However, if the playbook is run on a separate server (eg. for testing on a dev cluster), the Vault role will permanently delete any existing state in the terraform/vault subdirectory if a different vault_terraform_workspace is not provided. This WILL result in permanent data loss and care should be taken when running the role (and playbook) on multiple clusters or servers.

Post Setup

Smoke Tests

Smoke tests are performed with goss as part of the main.yml playbook to ensure all required software are installed and running.

Note: The included goss files are static with hardcoded information. As such, they will fail if some of the Ansible default variables are changed (eg. username, NFS mountpoints). See issues for details on a workaround.

Running Applications

After verifying that the cluster is up and running, we can begin to run applications on it with Nomad jobs. This project provides a number of Nomad jobspec files in terraform/nomad/apps to be run with Terraform with the following features:

  • With Vault integration configured, Nomad supports the fetching of application secrets with Vault
  • Traefik as a reverse proxy
  • (Optional) Postgres as a database (with Vault-managed DB credentials)

See Adding a New Application for details on onboarding a new application to Nomad.

Provisioning

Provisioning requires a minimum of one server and one client node with no high availability (HA).

To support HA, the setup can be further expanded to at least three server nodes and multiple client nodes hosted on a Proxmox cluster, spanning multiple physical machines.

Images

Cloud Images

Cloud images are pre-installed disk images that have been customized to run on cloud platforms. They are shipped with cloud-init that simplifies the installation and provisioning of virtual machines.

Unlike ISOs and LXC container images, Proxmox's API lacks support for uploading cloud images directly from a given URL (see here and here). Instead, they must be manually downloaded and converted into a VM template to be available to Proxmox.

Warning: When cloning the cloud image template with Terraform, qemu-guest-agent must be installed and agent=1 must be set. Otherwise, Terraform will timeout. As such, it is recommended to create a further bootstrapped template with Packer and Ansible.

Manual Upload

  1. Download any cloud image:
$ wget https://cloud.debian.org/images/cloud/bullseye/20230124-1270/debian11-generic-amd64-20230124-1270.qcow2
  1. Create a Proxmox VM from the downloaded image:
$ qm create 9000 \
    --name "debian-11-amd64" \
    --net0 "virtio,bridge=vmbr0" \
    --serial0 socket \
    --vga serial0 \
    --scsihw virtio-scsi-pci \
    --scsi0 "local:0,import-from=/path/to/image" \
    --bootdisk scsi0 \
    --boot "order=scsi0" \
    --ide1 "local:cloudinit" \
    --ostype l26 \
    --cores 1 \
    --sockets 1 \
    --memory 512 \
    --agent 1
  1. Resize the new VM (if necessary):
$ qm resize 9000 scsi0 5G
  1. Convert the VM into a template:
$ qm template 9000

Script

A full script of the steps above can be found at bin/import-cloud-image.

$ import-cloud-image --help

Usage: import-cloud-image [--debug|--force] [URL] [FILENAME]

References

Packer

Packer is used to create golden images in Proxmox with the community Proxmox builder plugin.

Two different builders are supported: proxmox-iso and proxmox-clone to target both ISO and cloud-init images for virtual machine template creation in Proxmox.

Proxmox-clone

The proxmox-clone builder creates a new VM template from an existing one. If you do not have an existing VM template or want to create a new template, you can upload a new cloud image and convert it into a new VM template.

Note that this existing template must have:

  • An attached cloud-init drive for the builder to add the SSH communicator configuration
  • cloud-init installed

After running the builder, it will do the following:

  1. Clone existing template by given name
  2. Add a SSH communicator configuration via cloud-init
  3. Connect via SSH and run the shell provisioner scripts to prepare the VM for Ansible
  4. Install and start qemu-guest-agent
  5. Run the Ansible provisioner with the ansible/common.yml playbook
  6. Stop and convert the VM into a template with a new (and empty) cloud-init drive

Variables

VariableDescriptionTypeDefault
proxmox_urlProxmox URL Endpointstring
proxmox_usernameProxmox usernamestring
proxmox_passwordProxmox pwstring
proxmox_nodeProxmox node to start VM instringpve
clone_vmName of existing VM template to clonestring
vm_idID of final VM templatenumber5000
vm_nameName of final VM templatestring
template_descriptionDescription of final VM templatestring
coresNumber of CPU coresnumber1
socketsNumber of CPU socketsnumber1
memoryMemory in MBnumber1024
ssh_usernameUser to SSH into during provisioningstring
ip_addressTemporary IP address of VM templatestring10.10.10.250
gatewayGateway of VM templatestring10.10.10.1
ssh_public_key_pathCustom SSH public key pathstring
ssh_private_key_pathCustom SSH private key pathstring

Proxmox-ISO

This builder configuration is a work-in-progress!!

The proxmox-iso builder creates a VM template from an ISO file.

Variables

VariableDescriptionTypeDefault
proxmox_urlProxmox URL Endpointstring
proxmox_usernameProxmox usernamestring
proxmox_passwordProxmox pwstring
proxmox_nodeProxmox node to start VM instringpve
iso_urlURL for ISO file to upload to Proxmoxstring
iso_checksumChecksum for ISO filestring
vm_idID of created VM and final templatenumber9000
coresNumber of CPU coresnumber1
socketsNumber of CPU socketsnumber1
memoryMemory in MBnumber1024
ssh_usernameUser to SSH into during provisioningstring

Build Images

  1. Create and populate the auto.pkrvars.hcl variable file.

  2. Run the build:

$ packer validate -var-file="auto.pkrvars.hcl" .
$ packer build -var-file="auto.pkrvars.hcl" .

If a template of the same vm_id already exists, you may force its re-creation with the --force flag:

$ packer build -var-file="auto.pkrvars.hcl" --force .

Note: This is only available from packer-plugin-proxmox v1.1.2.

Notes

  • Currently, only proxmox_username and proxmox_password are supported for authentication.
  • The given ssh_username must already exist in the VM template when using proxmox-clone.

Terraform

Terraform is used to provision Proxmox guest VMs by cloning existing templates.

State

Terraform state can be configured to be stored in a Minio S3 bucket.

terraform {
  backend "s3" {
    region = "main"
    bucket = "terraform-state"
    key    = "path/to/terraform.tfstate"

    skip_credentials_validation = true
    skip_region_validation      = true
    skip_metadata_api_check     = true
    force_path_style            = true
  }
}

Initialize the backend with:

$ terraform init \
    -backend-config="access_key=${TFSTATE_ACCESS_KEY}" \
    -backend-config="secret_key=${TFSTATE_SECRET_KEY}" \
    -backend-config="endpoint=${TFSTATE_ENDPOINT}"

Note: When the Minio credentials are passed with the -backend-config flag, they will still appear in plain text in the .terraform subdirectory and any plan files.

Postgres

This uses the Vault and Postgresql provider to declaratively manage roles and databases in a single Postgres instance.

The Vault and Postgres provider must be configured appropriately:

provider "vault" {
  address      = var.vault_address
  token        = var.vault_token
  ca_cert_file = var.vault_ca_cert_file
}

provider "postgresql" {
  host     = var.postgres_host
  port     = var.postgres_port
  database = var.postgres_database
  username = var.postgres_username
  password = var.postgres_password
  sslmode  = "disable"
}

Overview

This Terraform configuration provisions and manages multiple databases a single instance of Postgres. It uses a custom module (terraform/modules/database) to create a new role and database for a given application. Vault is then used to periodically rotate the database credentials with a static role in the database secrets engine. To access the rotated credentials in Vault from Nomad, a relevant Vault policy is also created.

Prerequisites

  • An existing Vault instance
  • To access the credentials in Nomad, Vault integration must be configured
  • An existing Postgres instance

Minimally, the Postgres instance should have a default user and database (postgres) that can has the privileges to create roles and databases. The connection credentials must be passed as variables.

Usage

The database module requires two shared resources from Vault:

resource "vault_mount" "db" {
  path = "postgres"
  type = "database"
}

resource "vault_database_secret_backend_connection" "postgres" {
  backend       = vault_mount.db.path
  name          = "postgres"
  allowed_roles = ["*"]

  postgresql {
    connection_url = local.connection_url
  }
}

These resources provide a single shared backend and DB connection that must be passed to each module:

module "role" {
  source   = "../modules/database"
  for_each = local.roles

  postgres_vault_backend = vault_mount.db.path
  postgres_db_name       = vault_database_secret_backend_connection.postgres.name

  postgres_role_name                   = each.key
  postgres_role_password               = each.key
  postgres_static_role_rotation_period = each.value
}

The for_each meta-argument simplifies the use of the module further by simply requiring a list of role objects as input:

postgres_roles = [
  {
    name = "foo"
    rotation_period = 86400
  },
  {
    name = "bar"
  },
]
  • name is the chosen name of the role
  • rotation_period is the password rotation period of the role in seconds (optional with a default of 86400)

The Nomad job obtains the database credentials with a template and vault block:

vault {
  policies = ["foo"]
}

template {
  data        = <<EOF
{{ with secret "postgres/static-creds/foo" }}
DATABASE_URL = "postgres://foo:{{ .Data.password }}@localhost:5432/foo?sslmode=disable"
{{ end }}
EOF
  destination = "secrets/.env"
  env         = true
}

Variables

VariableDescriptionTypeDefault
vault_addressVault addressstringhttps://localhost:8200
vault_token(Root) Vault token for providerstring
vault_ca_cert_fileLocal path to Vault CA cert filestring./certs/vault_ca.crt
postgres_usernamePostgres root usernamestringpostgres
postgres_passwordPostgres root passwordstringpostgres
postgres_databasePostgres databasestringpostgres
postgres_hostPostgres hoststringlocalhost
postgres_portPostgres portstring"5432"
postgres_rolesList of roles to be addedlist(object)

Notes

  • Any new entries must also be added to allowed_policies in the vault_token_auth_backend_role.nomad_cluster resource in Vault to be available by Nomad.

Proxmox

This page describes the Terraform configuration for managing Proxmox. It uses the bpg/proxmox provider to manage three types of Proxmox resources:

  • Access management
  • Cloud images
  • VMs

Upload of Cloud Images

The same Terraform configuration in terraform/proxmox can also be used to upload cloud images to Proxmox with a given source URL. These images must have the .img extension or Proxmox will fail.

However, these cloud images cannot be used directly by Packer or Terraform to create VMs. Instead, a template must be created as described in Cloud Images.

VM Management

The Terraform configuration in terraform/cluster is used to create Proxmox VMs for the deployment of server and client cluster nodes. It utilizes a custom module (terraform/modules/vm) that clones an existing VM template and bootstraps it with cloud-init.

Note: The VM template must have cloud-init installed. See Packer for how to create a compatible template.

While root credentials can be used, this configuration accepts an API token (created previously):

provider "proxmox" {
    endpoint = "https://[ip]:8006/api2/json"
    api_token = "terraform@pam!some_secret=api_token"
    insecure = true

    ssh {
      agent = true
    }
}

The number of VMs provisioned are defined by the length of the array variables. The following will deploy four nodes in total: two server and two client nodes with the given IP addresses. All nodes will be cloned from the given VM template.

template_id = 5003
ip_gateway  = "10.10.10.1"

servers = [
  {
    name       = "server"
    id         = 110
    cores      = 2
    sockets    = 2
    memory     = 4096
    disk_size  = 10
    ip_address = "10.10.10.110/24"
  }
]

clients = [
  {
    name       = "client"
    id         = 111
    cores      = 2
    sockets    = 2
    memory     = 10240
    disk_size  = 15
    ip_address = "10.10.10.111/24"
  }
]

On success, the provisioned VMs are accessible via the configured SSH username and public key.

Note: The VM template must have qemu-guest-agent installed and agent=1 set. Otherwise, Terraform will timeout.

Ansible Inventory

Terraform will also generate an Ansible inventory file tf_ansible_inventory in the same directory. Ansible can read this inventory file automatically by appending the following in the ansible.cfg:

inventory=../terraform/cluster/tf_ansible_inventory,/path/to/other/inventory/files

Variables

Proxmox

VariableDescriptionTypeDefault
proxmox_ipProxmox IP addressstring
proxmox_userProxmox API tokenstringroot@pam
proxmox_passwordProxmox API tokenstring

VM

VariableDescriptionTypeDefault
proxmox_ipProxmox IP addressstring
proxmox_api_tokenProxmox API tokenstring
target_nodeProxmox node to start VM instringpve
tagsList of Proxmox VM tagslist(string)[prod]
template_idTemplate ID to clonenumber
onbootStart VM on bootboolfalse
startedStart VM on creationbooltrue
serversList of server config (see above)list(object)[]
clientsList of client config (see above)list(object)[]
disk_datastoreDatastore on which to store VM diskstringvolumes
control_ip_addressControl IPv4 address in CIDR notationstring
ip_gatewayIPv4 gateway addressstring
ssh_usernameUser to SSH into during provisioningstring
ssh_private_key_fileFilepath of private SSH keystring
ssh_public_key_fileFilepath of public SSH keystring
  • The VM template corresponding to template_id must exist
  • The IPv4 addresses must be in CIDR notation with subnet masks (eg. 10.0.0.2/24)

Notes

Proxmox credentials and LXC bind mounts

Root credentials must be used in place of an API token if you require bind mounts with an LXC. There is no support for mounting bind mounts to LXC via an API token.

Vault

This uses the Vault provider to declaratively manage secrets and policies in a running Vault instance. The Vault provider must be configured appropriately:

provider "vault" {
  address      = var.vault_address
  token        = var.vault_token
  ca_cert_file = var.vault_ca_cert_file
}

Workspaces

Ansible initializes Vault in the vault role. When doing so, any existing Vault resources in the same workspace are destroyed permanently. As such, care should be taken to ensure the appropriate workspaces are used when running the role on multiple Vault server instances or environments (eg. dev and prod).

Outputs

Vault produces the following outputs:

  • Certificate key pair for Ansible certificate authentication to Vault

Variables

VariableDescriptionTypeDefault
vault_addressVault addressstringhttps://localhost:8200
vault_token(Root) Vault token for providerstring
vault_ca_cert_fileLocal path to Vault CA cert filestring./certs/vault_ca.crt
vault_audit_pathVault audit file pathstring/vault/logs/vault.log
admin_passwordPassword for admin userstring
kvuser_passwordPassword for kv userstring
allowed_server_domainsList of allowed_domains for PKI server rolelist(string)["service.consul", "dc1.consul", "dc1.nomad", "global.nomad"]
allowed_client_domainsList of allowed_domains for PKI client rolelist(string)["service.consul", "dc1.consul", "dc1.nomad", "global.nomad"]
allowed_auth_domainsList of allowed_domains for PKI auth rolelist(string)["global.vault"]
allowed_vault_domainsList of allowed_domains for PKI vault rolelist(string)["vault.service.consul", "global.vault"]
ansible_public_key_pathLocal path to store Ansible public key for authstring../../certs/ansible.crt
ansible_private_key_pathLocal path to store Ansible private key for authstring../../certs/ansible_key.pem

Notes

  • The resources for Postgres database secrets engine are configured separately in Postgres. This is because the Postgres database might not be up when Vault is being initialized.
  • It is not recommended to change the ansible_*_key_path variables. Changing them will heavily affect the Ansible roles when they attempt to login to Vault with the auth certs.

Ansible

Ansible playbooks are used to configure provisioned server and client nodes to run a functional cluster. They use modular and customizable roles to setup various software.

Roles

Common

This role installs common packages and performs standard post-provisioning such as:

  • Creation of user
  • Creation of NFS share directories
  • Installation of Hashicorp software
  • Installation of Bitwarden CLI

Note: Security hardening and installation of Docker are performed separately in the common.yml playbook.

Variables

VariableDescriptionTypeDefault
common_userUser to be createdstringdebian
common_timezoneTimezone to setstringAsia/Singapore
common_keyring_dirKeyring directory path for external apt repositoriesstring/etc/apt/keyrings
common_nfs_dirNFS share directory pathstring/mnt/storage
common_packagesList of common packages to be installedlist(string)See defaults.yml for full list
common_nomad_versionNomad version to installstring1.6.1-1
common_consul_versionConsul version to installstring1.15.4-1
common_vault_versionVault version to installstring1.14.0-1
common_consul_template_versionConsul template version to installstring0.32.0-1
common_reset_nomadClear Nomad data directorybooleantrue
common_dotfilesList of dotfiles to be added, and their destinationslist[]

Tags

  • Skip bw to not install the Bitwarden CLI
  • Skip nfs to not create any NFS share directories
  • Skip dotfiles to not copy any remote dotfiles

Notes

  • This role clears any existing /opt/nomad/data directories to a blank slate. To disable this behaviour, set common_reset_nomad: false.
  • This role only supports Ubuntu/Debian amd64 systems with apt.
  • The Hashicorp apt server only supports amd64 packages. For arm64 systems, download the individual zip files instead.
  • common_dotfiles is used to add dotfiles from a Github repository to the host. For example:
common_dotfiles:
  - url: https://raw.githubusercontent.com/foo/repo/master/.vimrc
    dest: /home/foo/.vimrc

Consul

This role deploys a new Consul instance. It can deploy Consul as a server or client, depending on the host's group name.

Prerequisites

  • An existing Vault instance to save gossip key and provision TLS certs
  • An existing consul-template instance to rotate TLS certs
  • Consul installed
  • Ansible auth certificate on localhost to access Vault

Setup

For encryption, the role creates consul-template templates for:

  • Consul's gossip key. A new key is added with consul keygen if it does not already exist
  • Consul TLS certs from Vault PKI

Variables

VariableDescriptionTypeDefault
consul_config_dirConfiguration directorystring/etc/consul.d
consul_data_dirData directorystring/opt/consul
consul_tls_dirTLS files directorystring${consul_data_dir}/tls
consul_template_config_dirconsul-template configuration filestring/etc/consul-template
consul_upstream_dns_addressList of upstream DNS servers for dnsmasq["1.1.1.1"]
consul_serverStart Consul in server modebooltrue
consul_bootstrap_expect(server only) The expected number of servers in a clusternumber1
consul_clientStart Consul in client modeboolfalse
consul_server_ip(client only) Server's IP addressstring-
consul_vault_addrVault server API address to usestringhttps://localhost:8200
consul_common_nameConsul node certificate common_namestringSee below
consul_alt_namesConsul's TLS certificate alt namesstringconsul.service.consul
consul_ip_sansConsul's TLS certificate IP SANsstring127.0.0.1
setup_consul_watchesSet up Consul watches for healthchecksboolfalse
consul_gotify_urlGotify URL for sending webhookstring""
consul_gotify_tokenGotify token for sending webhookstring""

Notes

  • consul_server and consul_agent are mutually exclusive and cannot be both true.
  • consul_bootstrap_expect must be the same value in all Consul servers. If the key is not present in the server, that server instance will not attempt to bootstrap the cluster.
  • An existing Consul server must be running and reachable at consul_server_ip when consul_agent is true.
  • The default value of consul_common_name is server.dc1.consul or client.dc1.consul depending on whether Consul is started in server or client mode.

Consul-template

This role deploys a new Consul-template instance.

Prerequisites

  • consul-template installed
  • Access to any template destination directories

Setup

Vault-agent is used to authenticate to Vault for consul-template. It only requires access to the vault_agent_token_file. This means consul-template requires access to Vault directories. It also requires access to any template destination directories (eg. Consul, Nomad TLS directories). As such, the role runs consul-template as root. I'm still considering alternatives that allow consul-template to be ran as a non-privileged user.

Note: Vault and Vault-agent do not have to be installed for the role to run successfully. However, they must be available for the consul-template service to start without error.

Variables

VariableDescriptionTypeDefault
consul_template_dirConfiguration directorystring/opt/consul-template
vault_addressVault instance IP addressstring${ansible_default_ipv4.address}

Issue Cert

This role issues a new Vault certificate from the configured pki_int role.

Prerequisites

  • An existing Vault instance
  • (Optional) An existing consul-template instance
  • Ansible auth certificate on localhost

Setup

The role issues a new certificate from Vault and writes it to the host's filesystem at a chosen path. The role logins with an existing Ansible auth certificate with limited permissions from its configured policies.

The role also optionally adds a consul-template template stanza to automatically renew the certificate key pair.

Variables

VariableDescriptionTypeDefault
issue_cert_roleCertificate rolestringclient
issue_cert_common_nameCertificate common namestring""
issue_cert_ttlCertificate TTLstring24h
issue_cert_vault_addrVault instance addressstringhttps://localhost:8200
issue_cert_ownerCertificate key pair ownerstring""
issue_cert_groupCertificate key pair groupstring""
issue_cert_pathCertificate pathstringcert.crt
issue_cert_key_pathPrivate key pathstringkey.pem
issue_cert_ca_pathCA pathstringca.crt
issue_cert_auth_roleAuth role to write certificate tostring""
issue_cert_auth_policiesPolicies to add to auth rolestring""
issue_cert_add_templateAdd consul-template templatebooleantrue
issue_cert_consul_template_configconsul-template config file pathstring/etc/consul-template/consul-template.hcl
issue_cert_consul_template_markerconsul-template template markerstring# {mark} TLS
issue_cert_serviceService to restart after consul-template renews certstring""
  • issue_cert_auth_* variables are only used when issue_cert_role = "auth"

Nomad

This role deploys a new Nomad instance. It can deploy Nomad as a server or client, depending on the host's group name.

Prerequisites

  • An existing Vault instance to save gossip key and provision TLS certs
  • An existing consul-template instance to rotate TLS certs
  • Nomad installed
  • Ansible auth certificate on localhost to access Vault

Setup

For encryption, the role creates consul-template templates for:

  • Nomad's gossip key. A new key is added with nomad operator gossip keyring generate if it does not already exist
  • Nomad TLS certs from Vault PKI
  • Vault token for Vault integration

Variables

VariableDescriptionTypeDefault
nomad_config_dirConfiguration directorystring/etc/nomad.d
nomad_data_dirData directorystring/opt/nomad
nomad_tls_dirTLS files directorystring${nomad_data_dir}/tls
consul_template_config_dirconsul-template configuration filestring/etc/consul-template
nomad_register_consulRegister Nomad as a Consul servicebooltrue
nomad_vault_integrationSets up Vault integration in server nodebooltrue
nomad_serverStart Nomad in server modebooltrue
nomad_bootstrap_expect(server only) The expected number of servers in a clusternumber1
nomad_clientStart Nomad in client modeboolfalse
nomad_server_ip(client only) Server's IP addressstring-
nomad_vault_addrVault server API address to usestringhttps://localhost:8200
nomad_common_nameNomad node certificate common_namestringserver.global.nomad
nomad_alt_namesNomad's TLS certificate alt namesstringnomad.service.consul
nomad_ip_sansNomad's TLS certificate IP SANsstring127.0.0.1
cni_plugin_versionCNI plugins versionstring1.3.0

Notes

  • nomad_server and nomad_agent are mutually exclusive and cannot be both true.
  • nomad_bootstrap_expect must be the same value in all Nomad servers. If the key is not present in the server, that server instance will not attempt to bootstrap the cluster.
  • An existing Nomad server must be running and reachable at nomad_server_ip when nomad_agent is true.
  • The default value of nomad_common_name is server.global.nomad or client.global.nomad depending on whether nomad is started in server or client mode.

Unseal Vault

Work in Progress: This role is unfinished and untested.

This role unseals an initialized but sealed Vault server. The unseal key shares can be provided as:

  • A variable array of keys
  • A variable array of file paths to the keys on the remote filesystem
  • Secrets from Bitwarden

Variables

VariableDescriptionTypeDefault
unseal_vault_portConfigured Vault portint8200
unseal_vault_addrVault HTTP addressstringhttp://localhost:8200
unseal_storeAccepts file, bitwardenstring
unseal_keys_filesArray of files with unseal keyslist
unseal_keysArray of key shareslist
unseal_bw_passwordBitwarden passwordstring
unseal_bw_keys_namesList of Bitwarden secrets storing key shareslist

Vault

This role deploys a new Vault instance and performs the required initialization. If ran on a client node, it provisions a Vault agent instance instead.

Prerequisites

  • Vault >1.14.0 installed
  • Terraform installed on Ansible host
  • A private key and signed certificate for TLS encryption. If from a self-signed CA, the certificate chain must be trusted.
  • (Optional) Bitwarden password manager installed

Initialization

Vault is configured and started. If the instance is uninitialized, the role performs first-time initialization and stores the root token and unseal key. Only a single unseal key is supported at the moment. The secrets can be stored in the filesystem or on Bitwarden.

Note: If storing in Bitwarden, the Bitwarden CLI must be installed, configured and the bw_password variable must be provided.

It then proceeds to login with the root token and setup the PKI secrets engine and various authentication roles with the Terraform provider. A full list of Terraform resources can be found at homelab/terraform/vault.

Warning: Any existing Vault resources in the same workspace are destroyed permanently. Take care that the appropriate workspaces are used when running the role on multiple Vault server instances.

Vault Agent

If this role is ran on a client node or vault_setup_agent is true (on a server node), it will also provision a Vault-Agent instance. It requires an existing unsealed Vault server and should be run only after the Vault server has been setup.

Vault-agent's method of authentication to Vault is TLS certificate authentication. Ansible will generate these certificates and write them to the agent's auth role.

Note: This means Ansible requires access to Vault which it receives through authentication using its own TLS certificates, created by Terraform during the provisioning of the Vault server. These certificates were also written to homelab/certs/

Variables

VariableDescriptionTypeDefault
vault_config_dirConfiguration directorystring/etc/vault.d
vault_data_dirRestricted data directorystring/opt/vault/data
vault_log_dirRestricted logs directorystring/opt/vault/logs
vault_tls_dirTLS files directorystring/opt/vault/tls
vault_ca_cert_dirVault's CA certificate directorystring/usr/share/ca-certificates/vault
vault_serverSetup Vault serverbooltrue
vault_log_fileAudit log filestring${vault_log_dir}/vault.log
vault_store_localCopy Vault init secrets to local filebooltrue
vault_secrets_fileFile path for Vault init secretsstringvault.txt
vault_store_bwStore root token in Bitwardenboolfalse
vault_terraform_workspaceTerraform workspacestringdefault
vault_admin_passwordPassword for admin userstringpassword
vault_register_consulRegister Vault as a Consul servicebooltrue
vault_setup_agentSetup Vault agentbooltrue
vault_server_fqdnExisting Vault server's FQDNstring${ansible_default_ipv4.address}

Notes

  • vault_server and vault_setup_agent are not mutually exclusive. A host can have both instances running at the same time. However, there must already be an existing server instance if vault_server is false.
  • vault_server_fqdn is used to communicate with an existing Vault server that is listening on port 8200 when setting up Vault agent.

Vault Initialization Secrets

This role offers two methods of storing the secrets generated (root token and unseal key(s)) during the initial Vault initialization:

  • On the Ansible host system
  • In Bitwarden
  • Both

Storing the secrets on the local filesystem is only recommended as a temporary measure (to verify the secrets), or for testing and development. The file should be deleted afterwards or moved to a safer location.

Warning: The Bitwarden storage functionality is not very robust and not recommended at the moment. Use it with caution.

Storing the secrets in Bitwarden requires the following prerequisites:

  • Bitwarden CLI tool must be installed and configured
  • User is logged into Bitwarden
  • bw_password variable must be defined and passed to Ansible safely

The bw_get.sh and bw_store.sh helper scripts are used to create or update the secrets. Take care that the scripts will overwrite any existing secrets (of the same name).

Applications

Actual

  • On first startup, you will be prompted to secure the new server with a password.

Calibre Web

  • Point the books bind mount to an existing calibre database with the books metadata.

Gotify

  • Populate GOTIFY_DEFAULTUSER_NAME and GOTIFY_DEFAULTUSER_PASS with custom credentials.

Linkding

  • Populate LD_SUPERUSER_NAME and LD_SUPERUSER_PASSWORD with custom credentials.

yarr

  • Populate the AUTH_FILE environment variable with custom credentials in the form username:password.

Adding a New Application

Some notes when adding a new application jobspec to Nomad in terraform/nomad/apps.

Traefik

To place the application behind the Traefik reverse proxy, its jobspec should include the service.tags:

tags = [
    "traefik.enable=true",
    "traefik.http.routers.app-proxy.entrypoints=https",
    "traefik.http.routers.app-proxy.tls=true",
    "traefik.http.routers.app-proxy.rule=Host(`app.example.tld`)",
]

Secrets

This section is relevant if the application requires KV secrets from Vault. It uses the Vault Terraform module.

  1. Firstly, add the relevant KV secrets to Vault.

  2. Next, create and add a Vault policy for read-only access to the relevant KV secrets:

# terraform/vault/policies/nomad_app.hcl
path "kvv2/data/prod/nomad/app" {
    capabilities = ["read"]
}

# terraform/vault/policies.tf
resource "vault_policy" "nomad_app" {
    name   = "nomad_app"
    policy = file("policies/nomad_app.hcl")
}
  1. Include the vault and template blocks in the Nomad jobspec:
vault {
    policies = ["nomad_app"]
}

template {
    data        = <<EOF
{{ with secret "kvv2/data/prod/nomad/app" }}
AUTH="{{ .Data.data.username }}":"{{ .Data.data.password }}"
{{ end }}
EOF
    destination = "secrets/auth.env"
    env         = true
}

This will access the Vault secrets and include them as the AUTH environment variable in the job.

Database

This section is relevant if the application requires access to the Postgres database. It uses the Postgres Terraform module.

  1. Add the application name into the postgres_roles variable in terraform/postgres/:
postgres_roles = [
    {
        name = "app"
        rotation_period = 86400
    }
]

This will create a Postgres role and database in the running Postgres instance, a static role in Vault for rotation of the role's credentials, and a Vault policy to read the role's credentials.

  1. Add a template and vault block to access the database credentials:
vault {
    policies = ["app"]
}

template {
    data        = <<EOF
{{ with secret "postgres/static-creds/app" }}
DATABASE_URL = "postgres://foo:{{ .Data.password }}@localhost:5432/foo?sslmode=disable"
{{ end }}
EOF
    destination = "secrets/.env"
    env         = true
}

Diun

Diun allows monitoring a Docker image for new updates. To opt in to watching a task's Docker image, include the diun.enable label:

config {
  labels = {
    "diun.enable" = "true"
  }
}

By default, this will only watch the current tag of the image. If the tag is latest, Diun will send a notification when that tag's checksum changes.

To allow Diun to watch other tags, include additional labels:

config {
  labels = {
    "diun.enable"     = "true"
    "diun.watch_repo" = "true"
    "diun.max_tags"   = 3
  }
}

This will let Diun watch all tags in the Docker repo. It is highly recommended to set a maximum number of tags that Diun should watch, otherwise Diun will watch ALL tags, including older ones.

See Diun for more information on configuring Diun.

Diun

Diun is used to monitor Docker images for new updates.

Configuration

watch:
  workers: 10
  schedule: "0 0 * * 5"
  jitter: 30s
  firstCheckNotif: false

providers:
  docker:
    watchByDefault: false

notif:
  telegram:
    # Telegram bot token
    token: aabbccdd:11223344
    # Telegram chat ID
    chatIDs:
      - 123456789
    templateBody: |
      Docker tag {{ .Entry.Image }} which you subscribed to through {{ .Entry.Provider }} provider has been released.

Watch Images

To opt in to watching a Docker image, include the diun.enable Docker label:

config {
  labels = {
    "diun.enable" = "true"
  }
}

By default, this will only watch the current tag of the image. If the tag is latest, Diun will send a notification when that tag's checksum changes.

To allow Diun to watch other tags, include additional labels:

config {
  labels = {
    "diun.enable"     = "true"
    "diun.watch_repo" = "true"
    "diun.max_tags"   = 3
  }
}

This will let Diun watch all tags in the Docker repo. It is highly recommended to set a maximum number of tags that Diun should watch, otherwise Diun will watch ALL tags, including older ones.

Command Line

# manipulate images in database
$ docker exec diun diun image list
$ docker exec diun diun image inspect --image=[image]
$ docker exec diun diun image remove --image=[image]

# send test notification
$ docker exec diun diun notif test

References

Registry

Basic Auth

Create a password file with htpasswd:

$ docker run \
    --entrypoint htpasswd \
    httpd:2 -Bbn foo password > htpasswd

Usage

Login to the registry by providing the username and password given in Basic Auth:

$ docker login foo.example.com

References

Issues

This documents known issues that have not been fixed.

Manual Vault Unseal Process

Vault server must be manually unsealed when host is rebooted.

Unreachable Nomad Jobs on Reboot

On some occasions, restarting the Nomad client results in some running jobs being unreachable. The temporary fix is to restart the job (not alloc or task).

Vault-agent not reloading TLS certs

Vault-agent does not reload its own TLS configuration after the certificate has been renewed. Although this causes the agent to fail to authenticate with Vault, it does not constitute a systemd service failure, and the service must be manually restarted to read the new TLS configuration. Sending a SIGHUP sending is not supported.

Similar issues: #16266 and #18562. A fix is available in Vault 1.14.

Static Goss Files

The provided goss files in ansible/goss contain hardcoded information that can cause the smoke tests to fail if some Ansible variables are modified:

  • common_user
  • common_nfs_dir
  • common_packages

The temporary workaround is to create your own goss files, edit the given goss files or to simply comment out the smoke test tasks.

To fix this, goss supports templating to create dynamic goss files. The ansible_collection.goss role must be modified to add support for dynamic tests.

Roadmap

  • Run consul-template as non-root user
  • Run vault-agent as non-root user
  • Automated gossip key rotation for Nomad and Consul
  • ACLs for Nomad and Consul
  • unseal_vault role
  • Packer base builder
    • preseed.cfg is unreachable by boot command when controller host and Proxmox VM are on different subnets.
  • Fix configurable cert TTL by Vault
  • Improve robustness of Bitwarden scripts in Vault role