Infrastructure as Code

How to manage an expensive on-demand process? “infrastructure as code” provides the answer and Google Cloud Platform provides the tool.

Preface

At Jobrapido, we have the need to launch a process with a high computational cost, but only for the time strictly related to its run, without any continuous or background tasks. It is a low frequency, “on demand” process that does only what it is designed for and nothing more.

Specifically, the process takes a list representing all the searches coming to our website and passes them through a list of refinement/categorization steps, in order to simplify them only to the most relevant keywords.

At the beginning, we thought of renting the whole Infrastructure as a Service on cloud, keeping it “up & running”. But immediately afterwards we understood that the user interaction model of our application does not need an approach like that, since there’s no activity that requires continuous listening.

Also since the manual approach of creation/destruction for the whole IaaS was not sustainable, we found the Infrastructure as Code approach more easy-to-use, replicable and testable.

A Brief Theory

Infrastructure as Service (IaaS)

The capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, and deployed applications; and possibly limited control of select networking components (e.g., host firewalls). [P. Mell and T. Grance, 2011]

Infrastructure as Code (IaC)

Infrastructure as code describes the approach to program the creation and modification of your infrastructure including virtual servers, networking, storage, and more. [A. Wittig and M. Wittig, 2016]

In our context, IaC means having the possibility to exploit the techniques that we apply to our application code even to the infrastructure, improving its quality by having it versioned, automatically tested and “built & deployed” via pipelines.

Let’s Proceed

Since we found the IaC approach very good, here’s a little example to explain the concepts behind our solution.  

Scenario

What we want to achieve with this working example is to design an infrastructure that (as we do in our real one) takes a list of usernames, collects their birth date and defines which of them are actually adult.

Requirements

The most important requirement is to not waste computational power, in order to pay strictly for what we consume, without leaving any resources unused.

From the business perspective, it is required to make users as autonomous as possible in their run management, without being conscious of what happens under the bonnet from an infrastructure point of view.

Constraints

We have to accept the fact that we can’t control the whole infrastructure due to the presence of a “Blackbox” application that comes from an external provider, which in our example is the piece of code that defines if someone is an adult based on the birth date.

Our Implementation

In the context of IaC, the described scenario could be managed by following a declarative approach.

It means to define a script that declares the needed elements and the structure of their relations, without describing the control flow.

The Choice: Google Cloud Platform

We chose to use Google Cloud Platform (GCP) since we are skilled in it and also because Jobrapido has a profitable collaboration with Google.

The IaC module that GCP offers is Deployment Manager.

This module is based on these main concepts:

  • The deployment is a unit that collects a set of Google Cloud resources.
  • The resource is a unit described by a configuration file. It could be a VM or a database.
  • The configuration is a file descriptor with all the characteristics of a specific instance of a resource.
  • The template is a building block that is an abstraction written in Jinja or Python, and could be reused in multiple configurations. Jinja is simpler because it’s based on YAML syntax but gives only basic template rendering possibilities (variables, if…else…, for…in…). If you have more complex needs, you can use Python instead.

In Detail

We need an infrastructure with a front-end app, that exposes an API to the user and is able to manage the pipeline, where the first step is a call to a custom back-end app (a CRUD in front of the database) and the last is the blackbox app mentioned above.

For all the code, please see this link on GitHub: https://github.com/gcddemotest.

In an IaC project all the machines are created with a specific configuration, based on a specific template, that defines the type of machine, network configurations and environment variables, and launches the application declared as Docker image.

Front-end App

Since the front-end app is the entry point of our infrastructure, we will use its descriptor to go a little bit deeper in the analysis. All the common parts will be skipped for the other modules, with only the peculiarities pointed out.

In particular, for the front-end app, we can see that the IaC descriptor declares:

  • The application name.
  • The type, as the template to be used for its instantiation. In our case we’ve used docker-vm-pool.jinja, a Jinja template to exploit its functionalities like: groups of instances of a “dockerized” application, health checks and auto-scalability behind a load balancer.
  • The application, as a Docker image, where each image is saved in the project’s Google registry.
  • The environment variables. In this case, the endpoints to reference the other modules it has to call.
- name: frontend-app
  type: tpl/docker-vm-pool.jinja
  properties:
    containerImage: gcr.io/gcd-jr-demo/frontend-sample-app:latest
    zone: us-central1-c
    machineType: f1-micro
    size: 2
    maxSize: 3
    containerPort: 8080
    hostPort: 8080
    coolDownPeriodSec: 15
    healthCheckPort: 8080
    healthCheckPath: /
    containerEnv:
      - name: NODE_ENV
        value: prod
      - name: BACKEND_APP_ADDRESS
        value: $(ref.backend-app.address)
      - name: BACKEND_APP_PORT
        value: 8080
      - name: BLACKBOX_APP_ADDRESS
        value: $(ref.blackbox-app.address)

It is a Node.js application which exposes a REST method that internally builds the control flow of the full application:

@route(HttpMethod.GET, "/process-users"
public async getEndpoint(exchange: Exchange) {

      [...]

      const response = await axios.get<Array<Readonly<User>>>(`http://${backendApp.address}:${backendApp.port}/users`);
      const operations = response
        .data.map((user) => new Promise<ProcessedUser>(async (resolve, reject) => {
            [...]
            const blackboxResponse = await 
axios.get<Readonly<BlackboxServiceResponse>>(`http://${blackboxApp.address}:${blackboxApp.port}/adult`,
              {
                params: { age },
              });

            [...]

      }));
      exchange.response.send({ users });

      [...]
  }
Back-end App

The IaC descriptor of the back-end app is similar to the one of the front-end app, but since it acts as a CRUD for our datasource, it is the only machine that knows the location and credentials of the datasource.

- name: backend-app
  type: tpl/docker-vm-pool.jinja
  properties:
    containerImage: gcr.io/gcd-jr-demo/backend-sample-app:latest
    zone: us-central1-c
    machineType: f1-micro
    size: 2
    maxSize: 10
    containerPort: 8080
    hostPort: 8080
    coolDownPeriodSec: 15
    healthCheckPort: 8080
    healthCheckPath: /
    containerEnv:
      - name: NODE_ENV
        value: prod
      - name: DB_HOST
        value: $(ref.cloudsql-proxy-instance.address)
      - name: DB_USER
        value: myuser
      - name: DB_PASSWORD
        value: mypassword
      - name: DB_DATABASE
        value: mydb
Database Tier

We need to declare a descriptor because the pattern that Google Cloud uses to connect an application to a database passes through this. In our case we have chosen to maintain the actual database out of the IaC.

Note: in this descriptor we are using a new type of jinja template, that simply creates a single VM where a binary application runs, thanks to a startup script, without all the scalability features used by the previous modules.

- name: backend-app
  type: tpl/docker-vm-pool.jinja
  properties:
    containerImage: gcr.io/gcd-jr-demo/backend-sample-app:latest
    zone: us-central1-c
    machineType: f1-micro
    size: 2
    maxSize: 10
    containerPort: 8080
    hostPort: 8080
    coolDownPeriodSec: 15
    healthCheckPort: 8080
    healthCheckPath: /
    containerEnv:
      - name: NODE_ENV
        value: prod
      - name: DB_HOST
        value: $(ref.cloudsql-proxy-instance.address)
      - name: DB_USER
        value: myuser
      - name: DB_PASSWORD
        value: mypassword
      - name: DB_DATABASE
        value: mydb
Blackbox App

The IaC descriptor of the black-box app represents a VM that runs it.

- name: blackbox-app
  type: tpl/instance.jinja
  properties:
    zone: us-central1-c
    machineType: f1-micro
    metadata-from-file:
      startup-script: blackbox-vm.sh
    metadata:
     
 NODE_ENV: prod

Putting All Together…

At this point, we have a number of fragments, each representing a piece of the infrastructure. They could be put in the same descriptor and as a result we can have a single file representing all the modules that compose our application.

This file could be consumed by a shell script based on the gcloud command line tool, in the context of a “build plan” with Jenkins, Bamboo or other tools for CI/CD.

gcloud deployment-manager deployments create my-architecture --config=infrastructure.yml

Another possibility is to use the Google Cloud’s Deployment Manager API passing this descriptor as an argument to let a non-technical user create and/or destroy the whole infrastructure on demand.

POST https://www.googleapis.com/deploymentmanager/v2/projects/ gcd-jr-demo/global/deployments

Conclusions

So, what we achieved at the end of the day? We’ve a code base, versioned and tested, representing our infrastructure that: a business user can create, take advantage of and destroy on-demand; a developer can easily debug, share and adapt to future projects; makes our financial department happy, removing completely the unused resources and their costs.

Omar Brescianini – Software Engineer @ Jobrapido
Gianluca Moretti – Software Engineer @ Jobrapido

Please follow and like us: