Hubble Homelab
This repository contains infrastructure-as-code for the automated deployment and configuration, and management of a Hashicorp (Nomad + Consul + Vault) cluster on Proxmox.
Disclaimer
This project is in alpha status and subject to bugs and breaking changes.
Please do not run any code on your machine without understanding the provisioning flow, in case of data loss. Some playbooks may perform destructive actions that are irreversible!
Overview
This project aims to provision a full Hashicorp cluster in a semi-automated manner. It utilizes Packer, Ansible and Terraform:
- Packer creates base Proxmox VM templates from cloud images and ISOs
- Terraform provisions cluster nodes by cloning existing VM templates
- Ansible installs and configures Vault, Consul, Nomad on cluster nodes
It comprises minimally of one server and one client node with no high availability (HA). The nodes run Vault, Consul and Nomad as a cluster.
To support HA, the setup can be further expanded to at least three server nodes and multiple client nodes hosted on a Proxmox cluster, spanning multiple physical machines.
Features
- Golden image creation with Packer
- Declarative configuration of Proxmox VMs and Vault with Terraform
- Automated post-provisioning with Ansible
- Nomad container scheduling and orchestration
- Consul service discovery
- Secure node communication via mTLS
- Personal Certificate Authority hosted on Vault
- Secrets management, retrieval and rotation with Vault
- Automated certificate management with Vault and consul-template
- Let's Encrypt certificates on Traefik reverse proxy
Getting Started
See the documentation for more information on the concrete steps to configure and provision the cluster.
Folder Structure
.
├── ansible/
│ ├── roles
│ ├── playbooks
│ ├── inventory # inventory files
│ └── goss # goss config
├── bin # custom scripts
├── packer/
│ ├── base # VM template from ISO
│ └── base-clone # VM template from existing template
└── terraform/
├── cluster # config for cluster
├── dev # config where I test changes
├── minio # config for Minio buckets
├── modules # tf modules
├── nomad # nomad jobs
├── postgres # config for Postgres DB users
├── proxmox # config for Proxmox accounts
└── vault # config for Vault
Limitations
- Manual Vault unseal on reboot
- Inter-job dependencies are not supported in Nomad
- Vault agent is run as root
See issues for more information.
Acknowledgements
Prerequisites
Hardware Requirements
This project can be run on any modern x86_64 system that meets the recommended system requirements of Proxmox. I recommend mini-SFF workstations such as those from Project TinyMiniMicro. Alternatively, you may choose to run the cluster on a different hypervisor, on ARM64 systems or entirely on bare metal but YMMV.
My own setup comprises of:
- 1x Intel HP Elitedesk 800 G2 Mini
- CPU: Intel Core i5-6500T
- RAM: 16GB DDR4
- Storage: 256GB SSD (OS), 3TB HDD
- 1x Raspberry Pi 4B+
- TP-Link 5 Port Gigabit Switch
While a separate router and NAS is recommended, I run a virtualized instance of both within Proxmox itself.
Networking
The LAN is not restricted to any specific network architecture, but all server nodes should be reachable by each other, and the controller host via SSH.
The following are optional, but highly recommended:
- A local DNS server that
forwards
service.consul
queries to Consul for DNS lookup. This project uses Coredns. - A custom domain from any domain registrar, added to Cloudflare as a zone.
Controller Node
A workstation, controller node or separate host system will be used to run the required provisioning tools. This system will need to have the following tools installed:
- Packer
- Terraform
- Ansible
- Python 3 for various scripts (optional)
Alternatively, you are free to install the above tools on the same server that you are provisioning the cluster.
Cluster Requirements
- An existing Proxmox server that is reachable by the controller node
- (Optional) An offline, private root and intermediate CA.
- A self-signed certificate, private key for TLS encryption of Vault. A default key-pair is generated on installation of Vault.
Note: While Vault can use certificates generated from its own PKI secrets engine, a temporary key pair is still required to start up Vault.
Getting Started
Our goal is to provision a Nomad, Consul and Vault cluster with one server node and one client node. The basic provisioning flow is as follows:
- Packer creates base Proxmox VM templates from cloud images and ISOs
- Terraform provisions cluster nodes by cloning existing VM templates
- Ansible installs and configures Vault, Consul, Nomad on cluster nodes
Assumptions
The following assumptions are made in this guide:
- All prerequisites are fulfilled
- The cluster is provisioned on a Proxmox server
- All nodes are running Debian 11 virtual machines (not LXCs)
Please make the necessary changes if there are any deviations from the above.
Creating a VM template
The Proxmox builder plugin is used to create a new VM template. It supports two different builders:
proxmox-clone
- From an existing VM template (recommended)proxmox-iso
- From an ISO file (incomplete)
We will be using the first builder. If you have an existing template to provision, you may skip to the next section. Otherwise, assuming that we are lacking an existing, clean VM template, we will import a cloud image and turn it into a new template.
Note: It is important that the existing template must have:
- An attached cloud-init drive for the builder to add the SSH communicator configuration
- cloud-init installed
- qemu-guest-agent installed
- (Optional) Run the
bin/import-cloud-image
script to import a new cloud image:
$ import-cloud-image [URL]
- Navigate to
packer/base-clone
Tip: Use the
bin/generate-vars
script to quickly generate variable files inpacker
andterraform
subdirectories.
- Populate the necessary variables in
auto.pkrvars.hcl
:
proxmox_url = "https://<PVE_IP>:8006/api2/json"
proxmox_username = "<user>@pam"
proxmox_password = "<password>"
clone_vm = "<cloud-image-name>"
vm_name = "<new-template-name>"
vm_id = 5000
ssh_username = "debian"
ssh_public_key_path = "/path/to/public/key"
ssh_private_key_path = "/path/to/private/key"
- Build the image:
$ packer validate -var-file="auto.pkrvars.hcl" .
$ packer build -var-file="auto.pkrvars.hcl" .
Packer will create a new base image and use the Ansible post-provisioner to install and configure software (eg. Docker, Nomad, Consul and Vault). For more details, see Packer.
Provisioning with Terraform
We are using the bpg/proxmox provider to provision virtual machines from our Packer templates.
- Navigate to
terraform/cluster
- Populate the necessary variables in
terraform.tfvars
:
proxmox_ip = "https://<PVE_IP>:8006/api2/json"
proxmox_api_token = "<API_TOKEN>"
template_id = 5000
ip_gateway = "10.10.10.1"
servers = [
{
name = "server"
id = 110
cores = 2
sockets = 2
memory = 4096
disk_size = 10
ip_address = "10.10.10.110/24"
}
]
clients = [
{
name = "client"
id = 111
cores = 2
sockets = 2
memory = 10240
disk_size = 15
ip_address = "10.10.10.111/24"
}
]
ssh_user = "debian"
ssh_private_key_file = "/path/to/ssh/private/key"
ssh_public_key_file = "/path/to/ssh/public/key"
- Provision the cluster:
$ terraform init
$ terraform plan
$ terraform apply
The above configuration will provision two VM nodes in Proxmox:
Server node: VMID 110 at 10.10.10.110
Client node: VMID 111 at 10.10.10.111
An Ansible inventory file tf_ansible_inventory
should be generated in the same
directory with the given VM IPs in the server
and client
groups.
For more details, refer to the Terraform configuration for Proxmox.
Configuration with Ansible
At this stage, there should be one server node and one client node running on Proxmox that is reachable by SSH. These nodes should have Nomad, Consul and Vault installed. We will proceed to use Ansible (and Terraform) to configure Vault, Consul and Nomad (in that order) into a working cluster.
- Navigate to
ansible
- Ensure that the Terraform-generated Ansible inventory file is being read:
$ ansible-inventory --graph
- Populate and check the
group_vars
files ininventory/group_vars/{prod,server,client}.yml
$ ansible-inventory --graph --vars
Note: The
nfs_share_mounts
variable ininventory/group_vars/client.yml
should be modified or removed if not required
- Run the playbook:
$ ansible-playbook main.yml
The playbook will perform the following:
- Create a root and intermediate CA for Vault
- Configure Vault to use new CA
- Initialize Vault roles, authentication and PKI with Terraform with
configuration in
terraform/vault
- Configure Vault-agent and consul-template in server node
- Configure Consul and Nomad in server node. These roles depend on Vault being successfully configured and started as they require Vault to generate a gossip key and TLS certificates
- Repeat 4-5 for client node
Note on Data Loss
When re-running the playbook on the same server, Vault will not be
re-initialized. However, if the playbook is run on a separate server (eg. for
testing on a dev cluster), the Vault role will permanently delete any
existing state in the terraform/vault
subdirectory if a different
vault_terraform_workspace
is not provided. This WILL result in permanent data
loss and care should be taken when running the role (and playbook) on multiple
clusters or servers.
Post Setup
Smoke Tests
Smoke tests are performed with goss as part
of the main.yml
playbook to ensure all required software are installed and
running.
Note: The included goss files are static with hardcoded information. As such, they will fail if some of the Ansible default variables are changed (eg. username, NFS mountpoints). See issues for details on a workaround.
Running Applications
After verifying that the cluster is up and running, we can begin to run
applications on it with Nomad jobs. This project provides a number of Nomad
jobspec files in terraform/nomad/apps
to be run with Terraform with the
following features:
- With Vault integration configured, Nomad supports the fetching of application secrets with Vault
- Traefik as a reverse proxy
- (Optional) Postgres as a database (with Vault-managed DB credentials)
See Adding a New Application for details on onboarding a new application to Nomad.
Provisioning
Provisioning requires a minimum of one server and one client node with no high availability (HA).
To support HA, the setup can be further expanded to at least three server nodes and multiple client nodes hosted on a Proxmox cluster, spanning multiple physical machines.
Images
Cloud Images
Cloud images are pre-installed disk images that have been customized to run on
cloud platforms. They are shipped with cloud-init
that simplifies the
installation and provisioning of virtual machines.
Unlike ISOs and LXC container images, Proxmox's API lacks support for uploading cloud images directly from a given URL (see here and here). Instead, they must be manually downloaded and converted into a VM template to be available to Proxmox.
Warning: When cloning the cloud image template with Terraform,
qemu-guest-agent
must be installed andagent=1
must be set. Otherwise, Terraform will timeout. As such, it is recommended to create a further bootstrapped template with Packer and Ansible.
Manual Upload
- Download any cloud image:
$ wget https://cloud.debian.org/images/cloud/bullseye/20230124-1270/debian11-generic-amd64-20230124-1270.qcow2
- Create a Proxmox VM from the downloaded image:
$ qm create 9000 \
--name "debian-11-amd64" \
--net0 "virtio,bridge=vmbr0" \
--serial0 socket \
--vga serial0 \
--scsihw virtio-scsi-pci \
--scsi0 "local:0,import-from=/path/to/image" \
--bootdisk scsi0 \
--boot "order=scsi0" \
--ide1 "local:cloudinit" \
--ostype l26 \
--cores 1 \
--sockets 1 \
--memory 512 \
--agent 1
- Resize the new VM (if necessary):
$ qm resize 9000 scsi0 5G
- Convert the VM into a template:
$ qm template 9000
Script
A full script of the steps above can be found at bin/import-cloud-image.
$ import-cloud-image --help
Usage: import-cloud-image [--debug|--force] [URL] [FILENAME]
References
Packer
Packer is used to create golden images in Proxmox with the community Proxmox builder plugin.
Two different builders are supported: proxmox-iso
and proxmox-clone
to
target both ISO and cloud-init images for virtual machine template creation in
Proxmox.
Proxmox-clone
The proxmox-clone
builder creates a new VM template from an existing one. If
you do not have an existing VM template or want to create a new template, you
can upload a new cloud image and convert it into a new VM template.
Note that this existing template must have:
- An attached cloud-init drive for the builder to add the SSH communicator configuration
cloud-init
installed
After running the builder, it will do the following:
- Clone existing template by given name
- Add a SSH communicator configuration via cloud-init
- Connect via SSH and run the shell provisioner scripts to prepare the VM for Ansible
- Install and start
qemu-guest-agent
- Run the Ansible provisioner with the
ansible/common.yml
playbook - Stop and convert the VM into a template with a new (and empty) cloud-init drive
Variables
Variable | Description | Type | Default |
---|---|---|---|
proxmox_url | Proxmox URL Endpoint | string | |
proxmox_username | Proxmox username | string | |
proxmox_password | Proxmox pw | string | |
proxmox_node | Proxmox node to start VM in | string | pve |
clone_vm | Name of existing VM template to clone | string | |
vm_id | ID of final VM template | number | 5000 |
vm_name | Name of final VM template | string | |
template_description | Description of final VM template | string | |
cores | Number of CPU cores | number | 1 |
sockets | Number of CPU sockets | number | 1 |
memory | Memory in MB | number | 1024 |
ssh_username | User to SSH into during provisioning | string | |
ip_address | Temporary IP address of VM template | string | 10.10.10.250 |
gateway | Gateway of VM template | string | 10.10.10.1 |
ssh_public_key_path | Custom SSH public key path | string | |
ssh_private_key_path | Custom SSH private key path | string |
Proxmox-ISO
This builder configuration is a work-in-progress!!
The proxmox-iso
builder creates a VM template from an ISO file.
Variables
Variable | Description | Type | Default |
---|---|---|---|
proxmox_url | Proxmox URL Endpoint | string | |
proxmox_username | Proxmox username | string | |
proxmox_password | Proxmox pw | string | |
proxmox_node | Proxmox node to start VM in | string | pve |
iso_url | URL for ISO file to upload to Proxmox | string | |
iso_checksum | Checksum for ISO file | string | |
vm_id | ID of created VM and final template | number | 9000 |
cores | Number of CPU cores | number | 1 |
sockets | Number of CPU sockets | number | 1 |
memory | Memory in MB | number | 1024 |
ssh_username | User to SSH into during provisioning | string |
Build Images
-
Create and populate the
auto.pkrvars.hcl
variable file. -
Run the build:
$ packer validate -var-file="auto.pkrvars.hcl" .
$ packer build -var-file="auto.pkrvars.hcl" .
If a template of the same vm_id
already exists, you may force its re-creation
with the --force
flag:
$ packer build -var-file="auto.pkrvars.hcl" --force .
Note: This is only available from
packer-plugin-proxmox
v1.1.2.
Notes
- Currently, only
proxmox_username
andproxmox_password
are supported for authentication. - The given
ssh_username
must already exist in the VM template when usingproxmox-clone
.
Terraform
Terraform is used to provision Proxmox guest VMs by cloning existing templates.
State
Terraform state can be configured to be stored in a Minio S3 bucket.
terraform {
backend "s3" {
region = "main"
bucket = "terraform-state"
key = "path/to/terraform.tfstate"
skip_credentials_validation = true
skip_region_validation = true
skip_metadata_api_check = true
force_path_style = true
}
}
Initialize the backend with:
$ terraform init \
-backend-config="access_key=${TFSTATE_ACCESS_KEY}" \
-backend-config="secret_key=${TFSTATE_SECRET_KEY}" \
-backend-config="endpoint=${TFSTATE_ENDPOINT}"
Note: When the Minio credentials are passed with the
-backend-config
flag, they will still appear in plain text in the.terraform
subdirectory and any plan files.
Postgres
This uses the Vault and Postgresql provider to declaratively manage roles and databases in a single Postgres instance.
The Vault and Postgres provider must be configured appropriately:
provider "vault" {
address = var.vault_address
token = var.vault_token
ca_cert_file = var.vault_ca_cert_file
}
provider "postgresql" {
host = var.postgres_host
port = var.postgres_port
database = var.postgres_database
username = var.postgres_username
password = var.postgres_password
sslmode = "disable"
}
Overview
This Terraform configuration provisions and manages multiple databases a single
instance of Postgres. It uses a custom module (terraform/modules/database
) to
create a new role and database for a given application. Vault is then used to
periodically rotate the database credentials with a static role in the database
secrets
engine.
To access the rotated credentials in Vault from Nomad, a relevant Vault policy
is also created.
Prerequisites
- An existing Vault instance
- To access the credentials in Nomad, Vault integration must be configured
- An existing Postgres instance
Minimally, the Postgres instance should have a default user and database
(postgres
) that can has the privileges to create roles and databases. The
connection credentials must be passed as variables.
Usage
The database
module requires two shared resources from Vault:
resource "vault_mount" "db" {
path = "postgres"
type = "database"
}
resource "vault_database_secret_backend_connection" "postgres" {
backend = vault_mount.db.path
name = "postgres"
allowed_roles = ["*"]
postgresql {
connection_url = local.connection_url
}
}
These resources provide a single shared backend and DB connection that must be passed to each module:
module "role" {
source = "../modules/database"
for_each = local.roles
postgres_vault_backend = vault_mount.db.path
postgres_db_name = vault_database_secret_backend_connection.postgres.name
postgres_role_name = each.key
postgres_role_password = each.key
postgres_static_role_rotation_period = each.value
}
The for_each
meta-argument simplifies the use of the module further by simply
requiring a list of role objects as input:
postgres_roles = [
{
name = "foo"
rotation_period = 86400
},
{
name = "bar"
},
]
name
is the chosen name of the rolerotation_period
is the password rotation period of the role in seconds (optional with a default of86400
)
The Nomad job obtains the database credentials with a template
and vault
block:
vault {
policies = ["foo"]
}
template {
data = <<EOF
{{ with secret "postgres/static-creds/foo" }}
DATABASE_URL = "postgres://foo:{{ .Data.password }}@localhost:5432/foo?sslmode=disable"
{{ end }}
EOF
destination = "secrets/.env"
env = true
}
Variables
Variable | Description | Type | Default |
---|---|---|---|
vault_address | Vault address | string | https://localhost:8200 |
vault_token | (Root) Vault token for provider | string | |
vault_ca_cert_file | Local path to Vault CA cert file | string | ./certs/vault_ca.crt |
postgres_username | Postgres root username | string | postgres |
postgres_password | Postgres root password | string | postgres |
postgres_database | Postgres database | string | postgres |
postgres_host | Postgres host | string | localhost |
postgres_port | Postgres port | string | "5432" |
postgres_roles | List of roles to be added | list(object) |
Notes
- Any new entries must also be added to
allowed_policies
in thevault_token_auth_backend_role.nomad_cluster
resource in Vault to be available by Nomad.
Proxmox
This page describes the Terraform configuration for managing Proxmox. It uses the bpg/proxmox provider to manage three types of Proxmox resources:
- Access management
- Cloud images
- VMs
Upload of Cloud Images
The same Terraform configuration in terraform/proxmox
can also be used to
upload cloud images to Proxmox with a given source URL. These images
must have the .img
extension or Proxmox will fail.
However, these cloud images cannot be used directly by Packer or Terraform to create VMs. Instead, a template must be created as described in Cloud Images.
VM Management
The Terraform configuration in terraform/cluster
is used to create Proxmox VMs
for the deployment of server and client cluster nodes. It utilizes a custom
module (terraform/modules/vm
) that clones an existing VM template and
bootstraps it with cloud-init.
Note: The VM template must have cloud-init installed. See Packer for how to create a compatible template.
While root credentials can be used, this configuration accepts an API token (created previously):
provider "proxmox" {
endpoint = "https://[ip]:8006/api2/json"
api_token = "terraform@pam!some_secret=api_token"
insecure = true
ssh {
agent = true
}
}
The number of VMs provisioned are defined by the length of the array variables. The following will deploy four nodes in total: two server and two client nodes with the given IP addresses. All nodes will be cloned from the given VM template.
template_id = 5003
ip_gateway = "10.10.10.1"
servers = [
{
name = "server"
id = 110
cores = 2
sockets = 2
memory = 4096
disk_size = 10
ip_address = "10.10.10.110/24"
}
]
clients = [
{
name = "client"
id = 111
cores = 2
sockets = 2
memory = 10240
disk_size = 15
ip_address = "10.10.10.111/24"
}
]
On success, the provisioned VMs are accessible via the configured SSH username and public key.
Note: The VM template must have
qemu-guest-agent
installed andagent=1
set. Otherwise, Terraform will timeout.
Ansible Inventory
Terraform will also generate an Ansible inventory file tf_ansible_inventory
in
the same directory. Ansible can read this inventory file automatically by
appending the following in the ansible.cfg
:
inventory=../terraform/cluster/tf_ansible_inventory,/path/to/other/inventory/files
Variables
Proxmox
Variable | Description | Type | Default |
---|---|---|---|
proxmox_ip | Proxmox IP address | string | |
proxmox_user | Proxmox API token | string | root@pam |
proxmox_password | Proxmox API token | string |
VM
Variable | Description | Type | Default |
---|---|---|---|
proxmox_ip | Proxmox IP address | string | |
proxmox_api_token | Proxmox API token | string | |
target_node | Proxmox node to start VM in | string | pve |
tags | List of Proxmox VM tags | list(string) | [prod] |
template_id | Template ID to clone | number | |
onboot | Start VM on boot | bool | false |
started | Start VM on creation | bool | true |
servers | List of server config (see above) | list(object) | [] |
clients | List of client config (see above) | list(object) | [] |
disk_datastore | Datastore on which to store VM disk | string | volumes |
control_ip_address | Control IPv4 address in CIDR notation | string | |
ip_gateway | IPv4 gateway address | string | |
ssh_username | User to SSH into during provisioning | string | |
ssh_private_key_file | Filepath of private SSH key | string | |
ssh_public_key_file | Filepath of public SSH key | string |
- The VM template corresponding to
template_id
must exist - The IPv4 addresses must be in CIDR notation with subnet masks (eg.
10.0.0.2/24
)
Notes
Proxmox credentials and LXC bind mounts
Root credentials must be used in place of an API token if you require bind mounts with an LXC. There is no support for mounting bind mounts to LXC via an API token.
Vault
This uses the Vault provider to declaratively manage secrets and policies in a running Vault instance. The Vault provider must be configured appropriately:
provider "vault" {
address = var.vault_address
token = var.vault_token
ca_cert_file = var.vault_ca_cert_file
}
Workspaces
Ansible initializes Vault in the vault role. When doing so, any existing Vault resources in the same workspace are destroyed permanently. As such, care should be taken to ensure the appropriate workspaces are used when running the role on multiple Vault server instances or environments (eg. dev and prod).
Outputs
Vault produces the following outputs:
- Certificate key pair for Ansible certificate authentication to Vault
Variables
Variable | Description | Type | Default |
---|---|---|---|
vault_address | Vault address | string | https://localhost:8200 |
vault_token | (Root) Vault token for provider | string | |
vault_ca_cert_file | Local path to Vault CA cert file | string | ./certs/vault_ca.crt |
vault_audit_path | Vault audit file path | string | /vault/logs/vault.log |
admin_password | Password for admin user | string | |
kvuser_password | Password for kv user | string | |
allowed_server_domains | List of allowed_domains for PKI server role | list(string) | ["service.consul", "dc1.consul", "dc1.nomad", "global.nomad"] |
allowed_client_domains | List of allowed_domains for PKI client role | list(string) | ["service.consul", "dc1.consul", "dc1.nomad", "global.nomad"] |
allowed_auth_domains | List of allowed_domains for PKI auth role | list(string) | ["global.vault"] |
allowed_vault_domains | List of allowed_domains for PKI vault role | list(string) | ["vault.service.consul", "global.vault"] |
ansible_public_key_path | Local path to store Ansible public key for auth | string | ../../certs/ansible.crt |
ansible_private_key_path | Local path to store Ansible private key for auth | string | ../../certs/ansible_key.pem |
Notes
- The resources for Postgres database secrets engine are configured separately in Postgres. This is because the Postgres database might not be up when Vault is being initialized.
- It is not recommended to change the
ansible_*_key_path
variables. Changing them will heavily affect the Ansible roles when they attempt to login to Vault with the auth certs.
Ansible
Ansible playbooks are used to configure provisioned server and client nodes to run a functional cluster. They use modular and customizable roles to setup various software.
Roles
Common
This role installs common packages and performs standard post-provisioning such as:
- Creation of user
- Creation of NFS share directories
- Installation of Hashicorp software
- Installation of Bitwarden CLI
Note: Security hardening and installation of Docker are performed separately in the
common.yml
playbook.
Variables
Variable | Description | Type | Default |
---|---|---|---|
common_user | User to be created | string | debian |
common_timezone | Timezone to set | string | Asia/Singapore |
common_keyring_dir | Keyring directory path for external apt repositories | string | /etc/apt/keyrings |
common_nfs_dir | NFS share directory path | string | /mnt/storage |
common_packages | List of common packages to be installed | list(string) | See defaults.yml for full list |
common_nomad_version | Nomad version to install | string | 1.6.1-1 |
common_consul_version | Consul version to install | string | 1.15.4-1 |
common_vault_version | Vault version to install | string | 1.14.0-1 |
common_consul_template_version | Consul template version to install | string | 0.32.0-1 |
common_reset_nomad | Clear Nomad data directory | boolean | true |
common_dotfiles | List of dotfiles to be added, and their destinations | list | [] |
Tags
- Skip
bw
to not install the Bitwarden CLI - Skip
nfs
to not create any NFS share directories - Skip
dotfiles
to not copy any remote dotfiles
Notes
- This role clears any existing
/opt/nomad/data
directories to a blank slate. To disable this behaviour, setcommon_reset_nomad: false
. - This role only supports Ubuntu/Debian amd64 systems with
apt
. - The Hashicorp apt server only supports amd64 packages. For arm64 systems, download the individual zip files instead.
common_dotfiles
is used to add dotfiles from a Github repository to the host. For example:
common_dotfiles:
- url: https://raw.githubusercontent.com/foo/repo/master/.vimrc
dest: /home/foo/.vimrc
Consul
This role deploys a new Consul instance. It can deploy Consul as a server or client, depending on the host's group name.
Prerequisites
- An existing Vault instance to save gossip key and provision TLS certs
- An existing consul-template instance to rotate TLS certs
- Consul installed
- Ansible auth certificate on localhost to access Vault
Setup
For encryption, the role creates consul-template templates for:
- Consul's gossip key. A new key is added with
consul keygen
if it does not already exist - Consul TLS certs from Vault PKI
Variables
Variable | Description | Type | Default |
---|---|---|---|
consul_config_dir | Configuration directory | string | /etc/consul.d |
consul_data_dir | Data directory | string | /opt/consul |
consul_tls_dir | TLS files directory | string | ${consul_data_dir}/tls |
consul_template_config_dir | consul-template configuration file | string | /etc/consul-template |
consul_upstream_dns_address | List of upstream DNS servers for dnsmasq | ["1.1.1.1"] | |
consul_server | Start Consul in server mode | bool | true |
consul_bootstrap_expect | (server only) The expected number of servers in a cluster | number | 1 |
consul_client | Start Consul in client mode | bool | false |
consul_server_ip | (client only) Server's IP address | string | - |
consul_vault_addr | Vault server API address to use | string | https://localhost:8200 |
consul_common_name | Consul node certificate common_name | string | See below |
consul_alt_names | Consul's TLS certificate alt names | string | consul.service.consul |
consul_ip_sans | Consul's TLS certificate IP SANs | string | 127.0.0.1 |
setup_consul_watches | Set up Consul watches for healthchecks | bool | false |
consul_gotify_url | Gotify URL for sending webhook | string | "" |
consul_gotify_token | Gotify token for sending webhook | string | "" |
Notes
consul_server
andconsul_agent
are mutually exclusive and cannot be bothtrue
.consul_bootstrap_expect
must be the same value in all Consul servers. If the key is not present in the server, that server instance will not attempt to bootstrap the cluster.- An existing Consul server must be running and reachable at
consul_server_ip
whenconsul_agent
istrue
. - The default value of
consul_common_name
isserver.dc1.consul
orclient.dc1.consul
depending on whether Consul is started in server or client mode.
Consul-template
This role deploys a new Consul-template instance.
Prerequisites
- consul-template installed
- Access to any template destination directories
Setup
Vault-agent is used to authenticate to Vault for
consul-template. It only requires access to the vault_agent_token_file
. This
means consul-template requires access to Vault directories. It also requires
access to any template destination directories (eg. Consul, Nomad TLS
directories). As such, the role runs consul-template as root. I'm still
considering alternatives that allow consul-template to be ran as a
non-privileged user.
Note: Vault and Vault-agent do not have to be installed for the role to run successfully. However, they must be available for the consul-template service to start without error.
Variables
Variable | Description | Type | Default |
---|---|---|---|
consul_template_dir | Configuration directory | string | /opt/consul-template |
vault_address | Vault instance IP address | string | ${ansible_default_ipv4.address} |
Issue Cert
This role issues a new Vault certificate from the configured pki_int
role.
Prerequisites
- An existing Vault instance
- (Optional) An existing consul-template instance
- Ansible auth certificate on localhost
Setup
The role issues a new certificate from Vault and writes it to the host's filesystem at a chosen path. The role logins with an existing Ansible auth certificate with limited permissions from its configured policies.
The role also optionally adds a consul-template template stanza to automatically renew the certificate key pair.
Variables
Variable | Description | Type | Default |
---|---|---|---|
issue_cert_role | Certificate role | string | client |
issue_cert_common_name | Certificate common name | string | "" |
issue_cert_ttl | Certificate TTL | string | 24h |
issue_cert_vault_addr | Vault instance address | string | https://localhost:8200 |
issue_cert_owner | Certificate key pair owner | string | "" |
issue_cert_group | Certificate key pair group | string | "" |
issue_cert_path | Certificate path | string | cert.crt |
issue_cert_key_path | Private key path | string | key.pem |
issue_cert_ca_path | CA path | string | ca.crt |
issue_cert_auth_role | Auth role to write certificate to | string | "" |
issue_cert_auth_policies | Policies to add to auth role | string | "" |
issue_cert_add_template | Add consul-template template | boolean | true |
issue_cert_consul_template_config | consul-template config file path | string | /etc/consul-template/consul-template.hcl |
issue_cert_consul_template_marker | consul-template template marker | string | # {mark} TLS |
issue_cert_service | Service to restart after consul-template renews cert | string | "" |
issue_cert_auth_*
variables are only used whenissue_cert_role = "auth"
Nomad
This role deploys a new Nomad instance. It can deploy Nomad as a server or client, depending on the host's group name.
Prerequisites
- An existing Vault instance to save gossip key and provision TLS certs
- An existing consul-template instance to rotate TLS certs
- Nomad installed
- Ansible auth certificate on localhost to access Vault
Setup
For encryption, the role creates consul-template templates for:
- Nomad's gossip key. A new key is added with
nomad operator gossip keyring generate
if it does not already exist - Nomad TLS certs from Vault PKI
- Vault token for Vault integration
Variables
Variable | Description | Type | Default |
---|---|---|---|
nomad_config_dir | Configuration directory | string | /etc/nomad.d |
nomad_data_dir | Data directory | string | /opt/nomad |
nomad_tls_dir | TLS files directory | string | ${nomad_data_dir}/tls |
consul_template_config_dir | consul-template configuration file | string | /etc/consul-template |
nomad_register_consul | Register Nomad as a Consul service | bool | true |
nomad_vault_integration | Sets up Vault integration in server node | bool | true |
nomad_server | Start Nomad in server mode | bool | true |
nomad_bootstrap_expect | (server only) The expected number of servers in a cluster | number | 1 |
nomad_client | Start Nomad in client mode | bool | false |
nomad_server_ip | (client only) Server's IP address | string | - |
nomad_vault_addr | Vault server API address to use | string | https://localhost:8200 |
nomad_common_name | Nomad node certificate common_name | string | server.global.nomad |
nomad_alt_names | Nomad's TLS certificate alt names | string | nomad.service.consul |
nomad_ip_sans | Nomad's TLS certificate IP SANs | string | 127.0.0.1 |
cni_plugin_version | CNI plugins version | string | 1.3.0 |
Notes
nomad_server
andnomad_agent
are mutually exclusive and cannot be bothtrue
.nomad_bootstrap_expect
must be the same value in all Nomad servers. If the key is not present in the server, that server instance will not attempt to bootstrap the cluster.- An existing Nomad server must be running and reachable at
nomad_server_ip
whennomad_agent
istrue
. - The default value of
nomad_common_name
isserver.global.nomad
orclient.global.nomad
depending on whether nomad is started in server or client mode.
Unseal Vault
Work in Progress: This role is unfinished and untested.
This role unseals an initialized but sealed Vault server. The unseal key shares can be provided as:
- A variable array of keys
- A variable array of file paths to the keys on the remote filesystem
- Secrets from Bitwarden
Variables
Variable | Description | Type | Default |
---|---|---|---|
unseal_vault_port | Configured Vault port | int | 8200 |
unseal_vault_addr | Vault HTTP address | string | http://localhost:8200 |
unseal_store | Accepts file, bitwarden | string | |
unseal_keys_files | Array of files with unseal keys | list | |
unseal_keys | Array of key shares | list | |
unseal_bw_password | Bitwarden password | string | |
unseal_bw_keys_names | List of Bitwarden secrets storing key shares | list |
Vault
This role deploys a new Vault instance and performs the required initialization. If ran on a client node, it provisions a Vault agent instance instead.
Prerequisites
- Vault >1.14.0 installed
- Terraform installed on Ansible host
- A private key and signed certificate for TLS encryption. If from a self-signed CA, the certificate chain must be trusted.
- (Optional) Bitwarden password manager installed
Initialization
Vault is configured and started. If the instance is uninitialized, the role performs first-time initialization and stores the root token and unseal key. Only a single unseal key is supported at the moment. The secrets can be stored in the filesystem or on Bitwarden.
Note: If storing in Bitwarden, the Bitwarden CLI must be installed, configured and the
bw_password
variable must be provided.
It then proceeds to login with the root token and setup the PKI secrets engine
and various authentication roles with the Terraform provider. A full list of
Terraform resources can be found at homelab/terraform/vault
.
Warning: Any existing Vault resources in the same workspace are destroyed permanently. Take care that the appropriate workspaces are used when running the role on multiple Vault server instances.
Vault Agent
If this role is ran on a client node or vault_setup_agent
is true
(on a
server node), it will also provision a Vault-Agent instance. It requires an
existing unsealed Vault server and should be run only after the Vault server has
been setup.
Vault-agent's method of authentication to Vault is TLS certificate authentication. Ansible will generate these certificates and write them to the agent's auth role.
Note: This means Ansible requires access to Vault which it receives through authentication using its own TLS certificates, created by Terraform during the provisioning of the Vault server. These certificates were also written to
homelab/certs/
Variables
Variable | Description | Type | Default |
---|---|---|---|
vault_config_dir | Configuration directory | string | /etc/vault.d |
vault_data_dir | Restricted data directory | string | /opt/vault/data |
vault_log_dir | Restricted logs directory | string | /opt/vault/logs |
vault_tls_dir | TLS files directory | string | /opt/vault/tls |
vault_ca_cert_dir | Vault's CA certificate directory | string | /usr/share/ca-certificates/vault |
vault_server | Setup Vault server | bool | true |
vault_log_file | Audit log file | string | ${vault_log_dir}/vault.log |
vault_store_local | Copy Vault init secrets to local file | bool | true |
vault_secrets_file | File path for Vault init secrets | string | vault.txt |
vault_store_bw | Store root token in Bitwarden | bool | false |
vault_terraform_workspace | Terraform workspace | string | default |
vault_admin_password | Password for admin user | string | password |
vault_register_consul | Register Vault as a Consul service | bool | true |
vault_setup_agent | Setup Vault agent | bool | true |
vault_server_fqdn | Existing Vault server's FQDN | string | ${ansible_default_ipv4.address} |
Notes
vault_server
andvault_setup_agent
are not mutually exclusive. A host can have both instances running at the same time. However, there must already be an existing server instance ifvault_server
isfalse
.vault_server_fqdn
is used to communicate with an existing Vault server that is listening on port 8200 when setting up Vault agent.
Vault Initialization Secrets
This role offers two methods of storing the secrets generated (root token and unseal key(s)) during the initial Vault initialization:
- On the Ansible host system
- In Bitwarden
- Both
Storing the secrets on the local filesystem is only recommended as a temporary measure (to verify the secrets), or for testing and development. The file should be deleted afterwards or moved to a safer location.
Warning: The Bitwarden storage functionality is not very robust and not recommended at the moment. Use it with caution.
Storing the secrets in Bitwarden requires the following prerequisites:
- Bitwarden CLI tool must be installed and configured
- User is logged into Bitwarden
bw_password
variable must be defined and passed to Ansible safely
The bw_get.sh
and bw_store.sh
helper scripts are used to create or update
the secrets. Take care that the scripts will overwrite any existing secrets (of
the same name).
Applications
Actual
- On first startup, you will be prompted to secure the new server with a password.
Calibre Web
- Point the
books
bind mount to an existing calibre database with the books metadata.
Gotify
- Populate
GOTIFY_DEFAULTUSER_NAME
andGOTIFY_DEFAULTUSER_PASS
with custom credentials.
Linkding
- Populate
LD_SUPERUSER_NAME
andLD_SUPERUSER_PASSWORD
with custom credentials.
yarr
- Populate the
AUTH_FILE
environment variable with custom credentials in the formusername:password
.
Adding a New Application
Some notes when adding a new application jobspec to Nomad in
terraform/nomad/apps
.
Traefik
To place the application behind the Traefik reverse proxy, its jobspec should
include the service.tags
:
tags = [
"traefik.enable=true",
"traefik.http.routers.app-proxy.entrypoints=https",
"traefik.http.routers.app-proxy.tls=true",
"traefik.http.routers.app-proxy.rule=Host(`app.example.tld`)",
]
Secrets
This section is relevant if the application requires KV secrets from Vault. It uses the Vault Terraform module.
-
Firstly, add the relevant KV secrets to Vault.
-
Next, create and add a Vault policy for read-only access to the relevant KV secrets:
# terraform/vault/policies/nomad_app.hcl
path "kvv2/data/prod/nomad/app" {
capabilities = ["read"]
}
# terraform/vault/policies.tf
resource "vault_policy" "nomad_app" {
name = "nomad_app"
policy = file("policies/nomad_app.hcl")
}
- Include the
vault
andtemplate
blocks in the Nomad jobspec:
vault {
policies = ["nomad_app"]
}
template {
data = <<EOF
{{ with secret "kvv2/data/prod/nomad/app" }}
AUTH="{{ .Data.data.username }}":"{{ .Data.data.password }}"
{{ end }}
EOF
destination = "secrets/auth.env"
env = true
}
This will access the Vault secrets and include them as the AUTH
environment
variable in the job.
Database
This section is relevant if the application requires access to the Postgres database. It uses the Postgres Terraform module.
- Add the application name into the
postgres_roles
variable interraform/postgres/
:
postgres_roles = [
{
name = "app"
rotation_period = 86400
}
]
This will create a Postgres role and database in the running Postgres instance, a static role in Vault for rotation of the role's credentials, and a Vault policy to read the role's credentials.
- Add a
template
andvault
block to access the database credentials:
vault {
policies = ["app"]
}
template {
data = <<EOF
{{ with secret "postgres/static-creds/app" }}
DATABASE_URL = "postgres://foo:{{ .Data.password }}@localhost:5432/foo?sslmode=disable"
{{ end }}
EOF
destination = "secrets/.env"
env = true
}
Diun
Diun allows monitoring a Docker image for new
updates. To opt in to watching a task's Docker image, include the diun.enable
label:
config {
labels = {
"diun.enable" = "true"
}
}
By default, this will only watch the current tag of the image. If the tag is
latest
, Diun will send a notification when that tag's checksum changes.
To allow Diun to watch other tags, include additional labels:
config {
labels = {
"diun.enable" = "true"
"diun.watch_repo" = "true"
"diun.max_tags" = 3
}
}
This will let Diun watch all tags in the Docker repo. It is highly recommended to set a maximum number of tags that Diun should watch, otherwise Diun will watch ALL tags, including older ones.
See Diun for more information on configuring Diun.
Diun
Diun is used to monitor Docker images for new updates.
Configuration
watch:
workers: 10
schedule: "0 0 * * 5"
jitter: 30s
firstCheckNotif: false
providers:
docker:
watchByDefault: false
notif:
telegram:
# Telegram bot token
token: aabbccdd:11223344
# Telegram chat ID
chatIDs:
- 123456789
templateBody: |
Docker tag {{ .Entry.Image }} which you subscribed to through {{ .Entry.Provider }} provider has been released.
Watch Images
To opt in to watching a Docker image, include the diun.enable
Docker label:
config {
labels = {
"diun.enable" = "true"
}
}
By default, this will only watch the current tag of the image. If the tag is
latest
, Diun will send a notification when that tag's checksum changes.
To allow Diun to watch other tags, include additional labels:
config {
labels = {
"diun.enable" = "true"
"diun.watch_repo" = "true"
"diun.max_tags" = 3
}
}
This will let Diun watch all tags in the Docker repo. It is highly recommended to set a maximum number of tags that Diun should watch, otherwise Diun will watch ALL tags, including older ones.
Command Line
# manipulate images in database
$ docker exec diun diun image list
$ docker exec diun diun image inspect --image=[image]
$ docker exec diun diun image remove --image=[image]
# send test notification
$ docker exec diun diun notif test
References
Registry
Basic Auth
Create a password file with htpasswd
:
$ docker run \
--entrypoint htpasswd \
httpd:2 -Bbn foo password > htpasswd
Usage
Login to the registry by providing the username and password given in Basic Auth:
$ docker login foo.example.com
References
Issues
This documents known issues that have not been fixed.
Manual Vault Unseal Process
Vault server must be manually unsealed when host is rebooted.
Unreachable Nomad Jobs on Reboot
On some occasions, restarting the Nomad client results in some running jobs being unreachable. The temporary fix is to restart the job (not alloc or task).
Vault-agent not reloading TLS certs
Vault-agent does not reload its own TLS configuration after the certificate has
been renewed. Although this causes the agent to fail to authenticate with Vault,
it does not constitute a systemd service failure, and the service must be
manually restarted to read the new TLS configuration. Sending a SIGHUP
sending
is not supported.
Similar issues: #16266 and
#18562. A
fix is available in Vault
1.14.
Static Goss Files
The provided goss files in ansible/goss
contain hardcoded information that can
cause the smoke tests to fail if some Ansible variables are modified:
- common_user
- common_nfs_dir
- common_packages
The temporary workaround is to create your own goss files, edit the given goss files or to simply comment out the smoke test tasks.
To fix this, goss
supports
templating to create dynamic goss files. The ansible_collection.goss
role must
be modified to add support for dynamic tests.
Roadmap
- Run consul-template as non-root user
- Run vault-agent as non-root user
- Automated gossip key rotation for Nomad and Consul
- ACLs for Nomad and Consul
-
unseal_vault
role -
Packer
base
builderpreseed.cfg
is unreachable by boot command when controller host and Proxmox VM are on different subnets.
- Fix configurable cert TTL by Vault
- Improve robustness of Bitwarden scripts in Vault role