Quick Start KVM libvirt VMs with Terraform and Ansible – Part 1

Managing a virtual infrastructure by code makes the daily work of an administrator much easier. Especially when it comes to providing test environments, such as cluster servers, which may need to be often completely torn down and reinstalled.

Under Linux, KVM (Kernel Virtual Machine) and libvirt is a standard for server virtualization, among other things because it is very resource-efficient and is available fully integrated in all Linux derivatives. With the tools Virt-Manager (GUI) or virsh (commandline) individual VMs can be installed and managed quickly. But what if a project requirement is for example: A Kafka cluster of 8 VMs with 3 x Broker, 3 x Zookeeper, 1 x Control Center and 1 x KSQL, with different RAM, CPU and disk configurations?
Phew, that’s a lot of work, I’d rather go to the cloud 😉

This is where the two tools Terraform and Ansible play out their strengths in on-premise environments. Terraform from Hashicorp is now THE standard tool for Infrastructure as Code (IaC) because it comes with many plug-ins, so-called providers for a wide variety of devices such as VMs, active network components, containers and all kinds of cloud environments, including one for libvirt. The admin then describes with just a few lines of code how his environment (server, network) should look like. For example, he defines the RAM/disk equipment, the VMs, which Linux distribution and network configuration he wants to have and terraform takes care of the actual deployment. This way the environment can be quickly torn down and started over again.

Ansible (OpenSource/RedHat) then completes the setup and installs the actual software stack with all configurations, groups, users etc. on the prepared systems. Both tools overlap sometimes in functionality.

Enough of the preface! In the first step, I will show you how to write a simple Terraform module yourself to install a virtual server environment with CentOS (or a Linux distribution of your choice) under KVM/libvirt in a very elegant and automated way. In the second part I will go into further configuration via Ansible.

It is assumed that a working KVM/libvirt environment with a default network (NAT) and at least one disk pool is already available.

Terraform Installation

Deployment via Terraform takes only a view seconds.

My installation guide has been tested on Fedora 32, but can be done similarly on other Linux distributions.
When installing Terraform, we follow the installation instructions on the Hashicorp website.

 juergen  ~  sudo dnf install -y dnf-plugins-core
[sudo] Passwort für juergen: 
Letzte Prüfung auf abgelaufene Metadaten: vor 0:43:34 am Mo 16 Nov 2020 10:43:23 CET.
Das Paket dnf-plugins-core-4.0.18-1.fc32.noarch ist bereits installiert.
Abhängigkeiten sind aufgelöst.
Nichts zu tun.
Fertig.

 juergen  ~  sudo dnf config-manager --add-repo https://rpm.releases.hashicorp.com/fedora/hashicorp.repo
Paketquelle von https://rpm.releases.hashicorp.com/fedora/hashicorp.repo wird hinzugefügt

 juergen  ~  sudo dnf -y install terraform
Hashicorp Stable - x86_64                                                                                             1.6 MB/s | 363 kB     00:00    
Abhängigkeiten sind aufgelöst.
======================================================================================================================================================
 Package                             Architecture                     Version                               Repository                           Size
======================================================================================================================================================
Installieren:
 terraform                           x86_64                           0.13.5-1                              hashicorp                            27 M

Transaktionsübersicht
======================================================================================================================================================
Installieren  1 Paket

Gesamte Downloadgröße: 27 M
Installationsgröße: 82 M
Pakete werden heruntergeladen:
terraform-0.13.5-1.x86_64.rpm                                                                                         3.9 MB/s |  27 MB     00:06    
------------------------------------------------------------------------------------------------------------------------------------------------------
...                                                                                    

Fertig.
 
 juergen  ~  terraform -install-autocomplete

Installation of the Terraform libvirt Provider

Afterwards we only have to install the libvirt provider for Terraform. This is the “plugin” and the interface that translates the Terraform commands into libvirt commands.

The provider can be found here: https://github.com/dmacvicar/terraform-provider-libvirt.

But first we should check the requirements on the system.

juergen  ~  virsh version --daemon
Kompiliert gegen die Bibliothek: libvirt 6.1.0
Verwende Bibliothek: libvirt 6.1.0

Verwende API: QEMU 6.1.0
Laufender Hypervisor: QEMU 4.2.1
Läuft gegen Dämon: 6.1.0

 juergen  ~  terraform --version
Terraform v0.13.5

With Terraform v0.13 the way providers are made available has changed. Terraform now uses a provider repository by default to provide officially supported providers. But the libvirt provider is not available there (details here), so we have to download and install it manually. The download of the precompiled version can be found here: https://github.com/dmacvicar/terraform-provider-libvirt/releases and the installation is based on this article: https://github.com/dmacvicar/terraform-provider-libvirt/blob/master/docs/migration-13.md


 juergen  ~  wget https://github.com/dmacvicar/terraform-provider-libvirt/releases/download/v0.6.3/terraform-provider-libvirt-0.6.3+git.1604843676.67f4f2aa.Fedora_32.x86_64.tar.gz

 juergen  ~  tar zxvf terraform-provider-libvirt-0.6.3+git.1604843676.67f4f2aa.Fedora_32.x86_64.tar.gz 
terraform-provider-libvirt

 juergen  ~  mkdir -p ~/.local/share/terraform/plugins/registry.terraform.io/dmacvicar/libvirt/0.6.3/linux_amd64

 juergen  ~  mv terraform-provider-libvirt ~/.local/share/terraform/plugins/registry.terraform.io/dmacvicar/libvirt/0.6.3/linux_amd64/

 juergen  ~  ll ~/.local/share/terraform/plugins/registry.terraform.io/dmacvicar/libvirt/0.6.3/linux_amd64
insgesamt 34440
-rwxr-xr-x. 1 juergen juergen 35265880 10. Nov 19:05 terraform-provider-libvirt
 juergen  ~ 

Now the installation is complete.

First steps with Terraform libvirt Provider

For an initial test we first create an empty module directory (test) and create a file libvirt.tf in it (.tf denotes a terraform file).
Each terraform module must first declare the required providers, which is done with the block required_providers {} and in our case contains the libvirt provider. Furthermore a URI of the virtual server resource is needed. Finally the actual resource definition follows, which contains only a name for testing purposes.

juergen  ~  Scripts  terraform  mkdir test
juergen  ~  Scripts  terraform  cd test

juergen  ~  Scripts  terraform  test  vi libvirt.tf 

# Deklarieren des libvirt Providers (einmal pro Modul)
terraform {
  required_providers {
    libvirt = {
      source  = "dmacvicar/libvirt"
      version = "0.6.3"
    }
  }
}

# Definieren der virtuellen Server Resource (einmal pro Modul)
provider "libvirt" {
    uri = "qemu:///system"
}

# Die eigentliche Resourcenbeschreibung (was soll dieses Modul tun)
resource "libvirt_domain" "terraform_test" {
  name = "terraform_test"
}

After that we can initialize the Terraform module and get the Terraform Plan output.
A new terraform module is always initialized with the command “terraform init”. With the command “terraform plan” we get the construction plan of our infrastructure. But only the command “terraform apply” finally executes the construction plan. In our example we only want to see if terraform and the libvirt provider are installed correctly. Therefore we will limit ourselves to displaying the construction plan on the console (dry run).

 juergen  ~  Scripts  terraform  test  terraform init

Initializing the backend...

Initializing provider plugins...
- Finding dmacvicar/libvirt versions matching "0.6.3"...
- Installing dmacvicar/libvirt v0.6.3...
- Installed dmacvicar/libvirt v0.6.3 (unauthenticated)

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.


 juergen  ~  Scripts  terraform  test  terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.


------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # libvirt_domain.terraform_test will be created
  + resource "libvirt_domain" "terraform_test" {
      + arch        = (known after apply)
      + emulator    = (known after apply)
      + fw_cfg_name = "opt/com.coreos/config"
      + id          = (known after apply)
      + machine     = (known after apply)
      + memory      = 512
      + name        = "terraform_test"
      + qemu_agent  = false
      + running     = true
      + vcpu        = 1
    }

Plan: 1 to add, 0 to change, 0 to destroy.

------------------------------------------------------------------------

Note: You didn't specify an "-out" parameter to save this plan, so Terraform
can't guarantee that exactly these actions will be performed if
"terraform apply" is subsequently run.

First concrete example

Now it’s time to provide a first virtual server via Terraform Script.

Preperation: Linux Cloud-Image

To provide an operating system, Libvirt provider uses cloud images specially prepared for cloud deployment, which are initialized and configured with CloudInit. CloudInit is the standard way for cloud providers to make a wide variety of distributions available quickly and in large numbers. In principle, the cloud-init agent is installed in the underlying operating system image, which prepares the operating system with all necessary adjustments, such as host name, SSH keys, users, etc., at the first startup to provide individual environments.

In my example I use a CentOS 8 Cloud image, which I download from the CentOS download page. We need the “GenericCloud” image there. Alternatively, Terraform could get the image during the execution of the deployment plan at runtime. However, if you deploy more often and many VMs with the same base image, this process takes of course longer when executing the Terraform module.

The Terraform Module

Then we create a new module directory e.g. “newsrv” and change into it. For our example we need three files: a terraform file (main.tf), a file for global variables (terraform.tfvars) and a file named cloud_init.cfg.

I will now take a closer look at the individual files and explain their purpose.

terraform.tfvars

This is a central file for the declaration of variables. In my example here I have put all parameters into it, which I want to configure individually. For example, you can add more host blocks for more VM instances and thus easily describe a whole server farm.

But first we give the project a name. This name will later be used in main.tf to logically group Guest instances of a project and to distinguish them from other projects. Then follows the path or URL to the cloud image and we define the KVM disk pool for the cloud image. You can use the same disk pool for the virtual VM disks (qcow2) and the cloud OS images, but you can also keep them separate, e.g. an SSD and HDD disk pool.

# Projectname
projectname = "newsrv"

# OS Image
#sourceimage = "https://cloud.centos.org/centos/8/x86_64/images/CentOS-8-GenericCloud-8.2.2004-20200611.2.x86_64.qcow2"
sourceimage = "/home/juergen/images/CentOS-8-GenericCloud-8.2.2004-20200611.2.x86_64.qcow2"

# The baseimage is the source diskimage for all VMs created from the sourceimage
baseimagediskpool = "default"

The cloud image always represents the base operating system image (base image) and the actual VMs (guests) are logically linked to it. In the libvirt environment one also speaks of disk image chains or overlay image. The virtual disk of the VM is an additional virtual layer over the BaseOS image. This has the advantage that the basic framework of the operating system with the files it contains does not have to be duplicated for each VM. This can therefore be understood as a kind of deduplication at the VM level. The libvirt_provider always works according to this principle. Therefore a baseimagediskpool is always needed. Since we work with the qcow2 format for the virtual disks, the initial size on the disk is very small, because only the changes (increments) to the baseOS image have to be stored there.

Here you can see the file sizes of a new VM after executing the Terraform plan (commoninit_newsrv.iso is also created by Cloud-Init)

 juergen  ~  sudo ls -alh /var/lib/libvirt/images
insgesamt 1,2G
drwx--x--x. 2 root root 4,0K 18. Nov 09:29 .
drwxr-xr-x. 9 root root 4,0K  2. Jun 19:54 ..
-rw-r--r--. 1 qemu qemu 1,1G 18. Nov 09:29 baseosimage_newsrv
-rw-r--r--. 1 qemu qemu 366K 18. Nov 09:29 commoninit_newsrv.iso
-rw-r--r--. 1 qemu qemu 111M 18. Nov 09:31 newsrv.qcow2

Now follow in the file the network and domain settings. Here we use the libvirt default network, which is the standard NAT subnet. The setup of a bridged network (same subnet as the host), I explain in the following at the end of this article.



# Domain and network settings
domainname = "mydomain.vm"  
networkname = "default"    # Virtual Networks: default (=NAT)

Then follow the specific parameters for the virtual host or hosts. Within hosts = {} further server blocks can be added, each describing the configuration of a virtual server instance. You can also add additional parameters that you want to configure individually. But for my projects the following ones were sufficient.

One more note: Because the virtual disk depends on the BaseOS image, the capacity (disksize in bytes) of the VM must be selected accordingly larger. How to determine the size of the base image is described in the code here.

# Host specific settings
# RAM size in bytes
# Disksize in bytes (disksize must be bigger than sourceimage virtual size)
# Example:
#    qemu-img info debian-10.3.4-20200429-openstack-amd64.qcow2
#         virtual size: 2 GiB (2147483648 bytes)
hosts = {
   "newsrv" = {
      name     = "newsrv",
      vcpu     = 1,
      memory   = "1024",
      diskpool = "default",
      disksize = 12000000000,
      mac      = "52:54:00:11:11:33",
   },
}

main.tf

The main.tf is the actual construction plan or recipe for our new environment.

In the main.tf first the minimum required terraform version is defined and the libvirt provider is activated with the following code lines:

# Declare libvirt provider for this project
terraform {
  required_version = ">= 0.13"
  required_providers {
    libvirt = {
      source  = "dmacvicar/libvirt"
      version = "0.6.3"
    }
  }
}
# Provider URI for libvirt
provider "libvirt" {
  uri = "qemu:///system"
}

Then we declare the variables that we have previously defined in terraform.tfvars (also with default values), which we need to create the virtual machine.

If nothing is defined in the file terraform.tfvars, the fall-back value stored in default = {}, such as the URL for the source image, is used here.

# Use terraform.tfvars to define the settings of your servers
# the variables here are the defaults if no terraform.tfvars setting is found
variable "projectname" {
 type   = string
 default = "myproject"
}
variable "hosts" {
  default = {
    "srv1" = {
       name = "srv1",
       vcpu     = 1,
       memory   = "1536",
       diskpool = "default",
       disksize = "4000000000",
       mac      = "52:54:00:11:11:11",
     },
  }
}
variable "baseimagediskpool" {
  type    = string
  default = "default"
}
variable "domainname" {
  type    = string
  default = "domain.local"
}
variable "networkname" {
  type    = string
  default = "default"
}
variable "sourceimage" {
  type    = string
  default = "https://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud.qcow2"
}

The next block defines the location of the base image within KVM. Here a cloud image in qcow2 format is expected and a KVM disk pool. Each VM within our project is then linked to this base image and uses it as the base layer for its own disk image.

# Base OS image
resource "libvirt_volume" "baseosimage" {
  name   = "baseosimage_${var.projectname}"
  source = var.sourceimage
  pool   = var.baseimagediskpool
}

Next, a virtual disk is created for each virtual host in a for_each loop.

# Create a virtual disk per host based on the Base OS Image
resource "libvirt_volume" "qcow2_volume" {
  for_each = var.hosts
  name           = "${each.value.name}.qcow2"
  base_volume_id = libvirt_volume.baseosimage.id
  pool           = each.value.diskpool
  format         = "qcow2"
  size           = each.value.disksize
}

Then some variables are passed to CloudInit, which are later used in the file cloudinit.cfg For example, the host name and domain are defined here. You could define more settings per variable, but currently I still configure some of them “hard-coded” in cloudinit.cfg.

# Use cloudinit config file and forward some variables to cloud_init.cfg
data "template_file" "user_data" {
  template = file("${path.module}/cloud_init.cfg")
  for_each   = var.hosts
  vars     = {
    hostname   = each.value.name
    domainname = var.domainname
  }
}

# Use CloudInit to add the instance
resource "libvirt_cloudinit_disk" "commoninit" {
  for_each   = var.hosts
  name      = "commoninit_${each.value.name}.iso"
  user_data = data.template_file.user_data[each.key].rendered
}

Now follows the creation of the VM (guest/domain). With the help of a loop (for_each) you can easily create multiple VMs (libvirt_domain), including network interface configuration and virtual disk. Furthermore, the commoninit.iso libvirt_cloudinit_disk is assigned to the cloudinit variables, with which the operating system installation process is initiated.

# Define KVM-Guest/Domain
resource "libvirt_domain" "newvm" {
  for_each   = var.hosts
  name   = each.value.name 
  memory = each.value.memory
  vcpu   = each.value.vcpu

  network_interface {
    network_name   = var.networkname
    mac            = each.value.mac
    # If networkname is host-bridge do not wait for a lease
    wait_for_lease = var.networkname == "host-bridge" ? false : true
  }

  disk {
    volume_id = element(libvirt_volume.qcow2_volume[each.key].*.id, 1 )
  }

  cloudinit = libvirt_cloudinit_disk.commoninit[each.key].id

}
## END OF KVM DOMAIN CONFIG

Finally, an output of the result of the just created virtual machine is displayed on the console.



# Output results to console
output "hostnames" {
  value = [libvirt_domain.newvm.*]
}

cloudinit.cfg

As mentioned at the beginning, CloudInit takes care of the initial configuration of the new server image. The CentOS GenericCloud image, which we have downloaded for preparation or loaded via Terraform at runtime, is the basis for installing an operating system adapted to your needs.

In this context we also prepare the new server for Ansible by storing the SSH public keys for the Ansible user.

To create a new SSH keypair enter the following at the command line (the following example refers to my user “juergen”. For the user “ansible” the procedure is the same):

juergen  ~  ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/juergen/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/juergen/.ssh/id_rsa
Your public key has been saved in /home/juergen/.ssh/id_rsa.pub
The key fingerprint is:
SHA256: juergen@localhost.localdomain
The key's randomart image is:
+---[RSA 3072]----+
|                 |
+----[SHA256]-----+

 juergen  ~  ll .ssh
insgesamt 12
-rw-------. 1 juergen juergen 2622 17. Nov 16:12 id_rsa
-rw-r--r--. 1 juergen juergen  583 17. Nov 16:12 id_rsa.pub
-rw-r--r--. 1 juergen juergen  372 17. Nov 12:52 known_hosts
 juergen  ~ 

The public key, which is stored in id_rsa.pub, is stored in cloudinit.cfg.

In cloudinit.cfg we also configure all settings for our customized setup, such as hostname, additional users, SSH keys, packages to be installed etc.

The file is in yaml format. Here it is important to pay attention to the correct form, especially the correct indentations!

I also activate ssh access via password authentication, since this is deactivated by default in the cloud images, but I find it very convenient to log in to the console via Virtual Manager if necessary. Then we set a new password for the user “root”.

#cloud-config
# vim: syntax=yaml
#
# ***********************
# 	---- for more examples look at: ------
# ---> https://cloudinit.readthedocs.io/en/latest/topics/examples.html
# ******************************
#
# This is the configuration syntax that the write_files module
# will know how to understand. encoding can be given b64 or gzip or (gz+b64).
# The content will be decoded accordingly and then written to the path that is
# provided.
#
# Note: Content strings here are truncated for example purposes.
ssh_pwauth: true
chpasswd:
  list: |
     root:Geheim1234
  expire: false

Then follow a few new Linux users, which are to be created on all VMs initially, with the respective ssh-rsa public key (only use the key here), which we created above, in order to be able to log on to the VM from the KVM host without a password.

# User 'ansible' is used for ansible
users:
  - name: juergen
    ssh_authorized_keys:
      - ssh-rsa AAAAB3Nza...
    sudo: ['ALL=(ALL) NOPASSWD:ALL']
    shell: /bin/bash
    groups: wheel
  - name: root
    ssh_authorized_keys:
      - ssh-rsa AAAAB3NzaC1...
  - name: ansible
    ssh_authorized_keys:
      - ssh-rsa AAAAB3NzaC1yc2E...
    sudo: ['ALL=(ALL) NOPASSWD:ALL']
    shell: /bin/bash
    groups: wheel

The next step is to specify the host name and the FQDN (the variables are filled in terraform.tfvars and then passed on to cloudinit.cfg via main.tf).

# Set hostname based on main.tf variables 
preserve_hostname: false 
fqdn: ${hostname}.${domainname}
hostname: ${hostname}

Afterwards a reboot follows to make the new hostname known in DHCP and DNS. We also install python36, which we need later for ansible. I have commented out the general system update of all packages here, because this will later make ansible for me.







# Initiate a reboot after setting the fqdn. It's necessary to update the DNS/DHCP information in libwirt dnsmasq
power_state:
 delay: "+1" 
 mode: reboot
 condition: true

# Install python for ansible
packages:
  - python36

#package_update: true
#package_upgrade: true
#package_reboot_if_required: true

3, 2, 1, Server!

Now we have everything together to start a first deployment attempt via terraform. The rest are just a handful of commands that execute our plan.

terraform init (once for initialization of the project)
terraform plan (checks the plan for code errors and displays the expected result on the console)
terraform apply (actually implements the plan)
terraform destroy (tears everything down again, removes all VMs, virtual disks and networks related to this project)

If the command “terraform plan” runs through without any error messages, it shows on the console what it would do. Here you can check your setup again. Afterwards, the command “terraform apply” actually executes the plan and hopefully successfully creates the virtual machine with all configurations as we imagined it. This process usually takes less than a minute, depending on the complexity and whether the BaseOS image is already locally located. If everything doesn’t work at first go, Terraform provides very informative support for error analysis in the Console Output. In addition, Google will certainly help as well. There are now answers to most questions on the net.

Finally, “terraform destroy” allows you to quickly delete the entire project and reset everything to the starting point. Once you have become familiar with this system, you will certainly have a lot of fun deploying your environments using terraform code.

The second part of my Quickstart Guide is based on this part and shows how to use Ansible to configure your VM.

Optional: Bridged instead NAT Network

In the example above, we will create a virtual machine using libvirt’s default NAT network. Sometimes it is useful or necessary to run the VM on the same subnet as the KVM host. For this purpose we have to run the VM over a so called Bridge Network Device.

For this we first have to create a network bridge under Linux.

 juergen  ~  sudo brctl addbr br1

 juergen  ~  sudo brctl show
bridge name	bridge id		STP enabled	interfaces
br1		8000.f6e545eb05ba	no		
docker0		8000.02420865003e	no		
virbr0		8000.525400b83e08	yes		virbr0-nic
							vnet0

 juergen  ~  sudo brctl addif br1 enp0s31f6

 juergen  ~  brctl show
bridge name	bridge id		STP enabled	interfaces
br1		8000.f6e545eb05ba	no		enp0s31f6
							vnet0
docker0		8000.02420865003e	no		
virbr0		8000.525400b83e08	yes		virbr0-nic

We can now use this virtual Bridge Device (br1) with the libvirt provider in our Terraform module to create a libvirt network. Examples of libvirt_network can be found here: https://github.com/dmacvicar/terraform-provider-libvirt/blob/master/website/docs/r/network.markdown

resource "libvirt_network" "vmbridge" {
  # the name used by libvirt
  name = "vmbridge"

  # mode can be: "nat" (default), "none", "route", "bridge"
  mode = "bridge"

  # (optional) the bridge device defines the name of a bridge device
  # which will be used to construct the virtual network.
  # (only necessary in "bridge" mode)
  bridge = "br1"
  autostart = true
}