Automated multi-country mail sending application

Automated multi-country mail sending application: from requirements to deployment on Mesos cluster

Jobrapido is the world’s biggest job search engine. We conduct business in 58 countries and help jobseekers apply for a role in line with their expectations.   

One way to keep the jobseekers updated on new jobs is by sending them mails, and the most crucial aspect is to do it at the right time; sending notifications at the wrong time could have a dramatic negative impact on the service effectiveness (mail campaign conversion rate).   

Since mail recipients are distributed all over the world, it is important to take care of the time zones. Our mail marketing team has put in a great effort to identify the best time to send emails for each country. Also, they have provided the query conditions to select the target audience of the emails.   

In this article, we will describe how we handled the task of sending emails at the right time (taking care of time zones), focusing on some technical details on how we automated and scheduled the recurring task of jobseeker extraction. 

The Project 

The complete process of sending emails is represented by the following picture: 

The objective of this article is to describe in-depth the first part, the mail jobseeker extractor, a recurrently scheduled job that extracts the right jobseekers using the conditions provided by the marketing team and write their ids in a database table.  

The second step (not covered in this article) is composed of a processor that reads the ids from the database, retrieves all the information needed to send emails, and enqueues messages to the mailing platform that sends emails. 

A closer look at the requirement

The Mail Jobseeker Extractor Configurations 

  • Queries: a list of pre-defined SQL queries: <query_key, sql (where conditions)> 
  • Schedule (one for each country): 
    • Period: time range (start – end). E.g., from 2020-01-01 to 2020-01-31.
    • Interval: frequency in calendar days at which mails should be sent. E.g., if 7, a run is scheduled every week for the selected period.
    • Time: time (hh:mm) of the day when the mail sender configurator process should start.
    • Query: key of the SQL query selected for this schedule.

The Mail Jobseeker Extractor Process

The entire process can be summarized in 3 steps:    

  1. A task is scheduled every 15 minutes to call the mail jobseeker extractor application.  
  2. The mail jobseeker extractor application fetches the schedule properties and checks if it is time to run.  
  3. If it is time to run, it selects the configured query (to extract the target jobseeker ids for the current run) and performs an insert into the database.  

To keep track of the last run date and to avoid unwanted multiple runs, we save some information for each country in the database. This data enables the next scheduled task to decide whether to proceed or not.  

Now that the requirement is clear let’s deep dive into some technical details on how we implemented it. 

The Mail Jobseeker Extractor Application 

We have chosen Node as the runtime for the core application since we have several network calls and, as you might know, Node.js performs well in I/O intensive applications. We have chosen Typescript as the language to take advantage of the type system. 

The application is deployed on a Mesos cluster leveraging its main frameworks: Chronos and Marathon. Mesos is a cluster management platform that abstracts the underlying computing resources (such as CPU, memory and storage resources) in a way that allows data centers to work as if they were one large machine. It includes Chronos for distributed and fault-tolerant execution of scheduled jobs and Marathon for hosting high-available long-running (containerized) applications. 

For simplicity, we have implemented both the job and the microservices in the same project.  

Since Chronos can be used to run arbitrary commands on a Mesos cluster, we have used it to schedule (as mentioned above, every 15 minutes) a task to start our mail jobseeker extractor application (in the project it is called run mode).  

We have leveraged Marathon to deploy some microservices (we have called it serve mode). The Chronos job interacts with these microservices to retrieve all the info (e.g. time zones, schedule and queries configs) needed to implement the core logic of the mail jobseeker extraction process described above.  

In order to hide some details that are out of scope for this article, the microservices have been implemented by exposing some REST APIs that return a hardwired result. 

Below is the whole technology stack: 

You can find the project on GitHub:  

https://github.com/jobrapido/mail-jobseeker-extractor

Follow the README instructions to install and run it. 

Code Analysis

Serve & Run

As mentioned earlier, our application handles two different types of commands:   

  • the run command, invoked by Chronos every time it starts 
  • the serve command, invoked just once when the application is deployed by Marathon 

The information about which command is invoked by Marathon and Chronos is contained in their deployment descriptor files (i.e. marathon.json and chronos.json; we will deepen some details related to these files in the next section). 

To handle these commands in the application, we have used commander, the node.js command-line interfaces’ solution. The commander provides a global object, program, which can be used to specify (sub) commands. We can add a description, option, and action handler to the command. 

import * as program from "commander"; 

program
  .command("run")
  .description("Run the query configured for the specified country")
  .option("-c, --country <required>", "The country to execute the jobseeker mail extractor")
  .action(async (command) => {
  … 
  … 
  }); 

program
  .command("serve")
  .description("Serve rest interface via http")
  .action(() => {
    const app = container.resolve(Application);
    app.start(8080);
  });

Run Logic

The below parameters decide if it is time for the task to run in a particular country: 

  • job is not already running 
  • current time falls in the configured time range 
  • configured interval has elapsed 
  • configured scheduled time has elapsed 

If all the parameters satisfy the running conditions, the task is run and the database is updated with the jobseeker details. 

Since we run the task in 58 countries, it is very important to handle the time zones correctly. We have used the Moment.js library for this purpose. 

public async insert(country: string) {
    const config = await this.datastoreService.composeMailJobseekerExtractorConfig(country);
    const currentState = await this.getCurrentRunningState(country);

    if (this.canRun(config, currentState)) {
      await this.lock(country, config.timezone);
      try {
        const sql = this.queryComposerService.composeInsert(config);
        const result = await this.postgresService.execute(country, sql);
        await this.unlock(country, config.timezone, moment.tz(config.timezone).format("YYYY-MM-DD"));
        return result.rowCount;
      } catch (error) {
        await this.unlock(country, config.timezone, currentState.lastrun);
        throw error;
      }
    }
    return 0;
}

private canRun(config: IMailJobseekerExtractorConfig, currentState: ICountryState) {
    const today = moment.tz(config.timezone);
    return !currentState.running
      && this.todayBetweenBoundaries(today, config.start, config.end)
      && this.configuredDaysIntervalElapsed(today, config.interval, moment.tz(currentState.lastrun, config.timezone))
      && this.configuredHourElapsed(today.format("HH:mm"), config.time);
  }

private todayBetweenBoundaries(today: Moment, start: Moment, end: Moment) { 
  return today.isBetween(start, end, "day", "[]"); 
} 
 
private configuredDaysIntervalElapsed(today: Moment, interval: number, last: Moment) { 
  const nextAvailableTick = last.clone().add(interval, "days"); 
  return nextAvailableTick.isBefore(today); 
} 
 
private configuredHourElapsed(currentTime: string, configuredTime: string) { 
  return moment(currentTime, "HH:mm").isSameOrAfter(moment(configuredTime, "HH:mm")); 
}

Deploy on Chronos 

To deploy the Chronos app, we use the below command: 

curl -vvv -L -H 'Content-Type: application/json' -X POST -d@chronos-env.json http://${CHRONOS_HOST}:4400/scheduler/iso8601

Where chronos-env.json (see the chronos-deploy.sh file to understand how the env file is created) is the Chronos deployment descriptor file holding all the info needed for deployment in the Mesos environment, such as schedule frequency, docker container image, environment variables, the available resources’ info (CPUs, memory, etc) and the command to invoke (i.e. node main.js run -c ${DEPLOY_APP_COUNTRY}

Since we have 58 countries and hence 58 Chronos tasks to deploy, we use a bash script (available in the source code of the project linked above) to launch the command for all the countries. 

Deploy on Marathon

To deploy the microservices, the command is similar to the Chronos one: 

curl -X PUT http://${MARATHON_HOST}:8080/${MARATHON_APP_ID} -d@marathon-env.json -H "Content-type: application/json"

The marathon-env.json compared to the chronos-env.json has a little bit different syntax but the content is similar (i.e. the command to invoke node main.js serve). Since the deployed application in this case is a long-running application, the schedule frequency is not specified but other information such as the number of instances, service port, upgrade strategy and health checks are required. 

Conclusions

This article gives just an overview of how you can set up, deploy, and run both scheduled jobs and long-running applications employing Mesos frameworks. Further details on how you can configure Chronos/Marathon according to your needs (i.e. configure how often cronjob has to run, configure autoscaling, etc) can be easily found in the official docs: Chronos (https://mesos.github.io/chronos/), Marathon (https://mesosphere.github.io/marathon/), Mesos (https://mesos.apache.org/). 

Even though the core of this article is related to Mesos frameworks, we also wanted to explain some other interesting aspects related to the application that we have developed. The need for scheduling recurrent jobs to regularly extract some data (extraction based on evolving conditions) from the DB is a typical requested feature, especially in mail sending applications. Another important and not so easy task is handling time zones in a multi-country application where jobs have to start at a specific country-local time. 

So, clone the GitHub project and… have fun!

Sanchi GoyalSoftware Engineer @ Jobrapido
Howard ScordioSoftware Engineer @ Jobrapido

Cover: Freepik

Please follow and like us: