Kenshoo Tech Blog: February 2013

Kenshoo has been handling large scale applications for a while now. Terabytes of data are being managed and processed by a feature-rich set of applications and products. Not surprisingly, one of the key tools for scaling these applications successfully is distribution - identifying components that are over-burdened by functionality or load, and redesigning them into smaller, decoupled, single-responsibility services that can be scaled horizontally.

At first, when faced with this task, us developers were somewhat puzzled: while our larger legacy systems are already surrounded by a cozy network of operations-ready tools (for logging, deploying, monitoring, configuring, and provisioning servers), these tools were not applicable to our new services, as they've grown to be somewhat coupled with the applications they support.

We needed an improved set of tools to be used uniformly across the board, so that our operation teams can deploy, manage, and troubleshoot dozens of different services without using dozens of different tools. To be more concise: we needed a one-size-fits-all "Kenshoo Box" with built-in solutions for all operational aspects, into which developers can pour their business logic easily, and know that they can deliver it to production without making the operation teams angry.

If you need a similar "box" - this post is for you.

Solution Overview

In the core of this "box" stands Dropwizard - an excellent open-source project developed for these same purposes by Yammer. Dropwizard is an easy-to-use combination of mature, open-source Java projects (such as Jetty, Metrics, Jackson, Jersey and others), that lets you configure them all in one place and provides built-in integration between them.

By itself, Dropwizard almost gets the job done - it lets the developer create web services, expose metrics, configure loggers and appenders, and it exposes a built-in set of administrative services over HTTP (healthchecks, ping, threads..) that makes monitoring easy for both humans and machines. To make your service "droppable", it lets you package your service as a standalone JAR with a single accompanying configuration file (in yml format).

The missing pieces are deployment (how do you manage dozens of boxes running different services, and upgrade a specific service?), and centralized logging - which we found to be crucial for troubleshooting truly distributed systems. To complete Dropwizard into a self-deploying, centrally-manageable service, we've added a few features using other open-source tools:

"Daemonizing": using Ubuntu's Upstart, we make the service run as daemon (yes, we deploy all services on Ubuntu).
Logging: using a custom Dropwizard ConfiurationBundle, we integrate a Graylog2 appender for centralized logging
Packaging: using our own (open-sourced) Gradle FPM Plugin, we produce a Deb package that includes the JAR, yml file, and upstart script. The package also describes its dependencies on other packages (e.g. the relevant Java version),
Provisioning and Deployment: using Puppet, we automatically provision new servers and deploy the latest version of our service on new or existing servers

The rest of this post will describe how each of these tools is implemented.

Daemonize with Upstart

We're using a very simple upstart file that simply constructs the Java command (java -jar ...) with the arguments expected by Dropwizard, and executes it. The most important part of this script is the injection of external configuration - this allows passing local environment-dependent configurations (hosts, passwords etc.) to the application easily. We will later describe how this local configuration file gets there (hint: Puppet).

Here's an example of such an upstart script:

Log with Graylog2

While Dropwizard provides the convenient Logback as it's logging framework, and includes the logging configuration in the all-encompassing yml file, it only allows creating a limited set of appenders, all of which create local files on the server running your service. Since we're running hundreds of servers running dozens of services, we need a centralized log viewer that will display logs from all servers. Graylog2 is a good choice for that (even though other tools, including some hosted solutions, are also applicable).

To send logs to our centralized Graylog2 instance, we need to configure an appropriate GELF appender. We're using Moocar's logback-gelf, but we must configure it programatically since such custom configurations are not supported by Dropwizard through its yml file. To do that, we implemented a ConfiguredBundle that creates the appender and adds it to the relevant Logger:

Then, we call bootstrap.addBundle(new LogbackConfigurer()) in our Dropwizard Service implementation (read more about it here).

Package with Gradle

This phase makes deployment simpler - in two aspects: First, it packages all three artifacts (JAR, yml file, upstart script) into one artifact. Second, it declares the version of the artifact and describes it's dependencies - so that the work left for the deployment tool is as simple as can be. This is all done by the FPM Plugin for Gradle - read all about it in the project readme: https://github.com/kenshoo/gradle-fpm-plugin#readme

In our case, configuring the plugin in our build.gradle file looks something like this:

Provision and deploy with Puppet

We've come a long way so far - we have a deb file ready, containing everything we need. All we have to do now is install it on any Ubuntu machine. But - we want to be able to do that both on new machines and existing ones; on machines with previous versions running; on multiple machines at the same time; and so on.

Another concern we still have to deal with is environment-specific configuration. We "taught" our upstart script to read a configuration file with such settings and feed it to our service when starting it - so now we will take care of creating these configuration files with Puppet.

If you're not familiar with puppet, it should suffice to note that Puppet allows you to programatically define a server's desired configuration, apply it to a given server (or set of servers), and then continuously enforce it as both the definitions or the servers change over time. Such definitions may include creation of users, installation of any package or program, file and folder management and whatever else an IT guy with a keyboard and SSH access can do...

So what should our Puppet code do? To install the service we've created and packaged, we need to:

Create an appropriate user, user group, and folders expected to exist by our service
Create a server.cfg file with any external configuration required by our service
Define our debian repository as a source, fetch and install the latest debian package of our service
Restart the service

Here's a sample Puppet code that does just that:

Fire away!

Apart from the four simple pieces of code you've seen here, not much else is needed to create a fully-operational, easy to manage, scalable platform of independent services. Just throw your business logic into the comforting arms of Dropwizard, that will help you create REST APIs, monitors, DAOs and whatever else you need, and your code is out there, getting things done.

This post was co-authored by Lior Harel and Tzach Zohar, architects at Kenshoo.

Kenshoo Tech Blog

Wednesday, February 27, 2013

Building and delivering production-ready services with Gradle, Dropwizard, Graylog2 and Puppet