Tuesday, September 3, 2013

Implementing Cassandra at Kenshoo - Lessons Learned

We had a problem...

Our tracking system at Kenshoo got too big.
Too much data.
Spread over many isolated MySQL servers.
So around two years ago we turned to NoSQL.

How to choose a NoSQL solution?

Out of the many options that exist out there, we considered at first three: MongoDB, Hadoop (HBase) and Cassandra.

We finally picked Cassandra, because:
  1. we liked its clean, symmetric design.
  2. our benchmarks showed it seemed to be good at writes (tracking systems are generally very write-intensive).
  3. was easy to setup (contra Hadoop).
  4. had a more appropriate data-model (contra MongoDB).
  5. had a growing, active community (they all do).


Lesson #1: Cassandra is very good at writes, but the benchmarks we did were not helpful. Even if you get to run your tests on production-like scenarios on production-like machines and network topology (we didn’t), the actual performance you eventually see in Cassandra is extremely sensitive to server configuration, real-life scenarios, and internal operations (compaction, read-repair etc.) done by Cassandra. There are benchmarks out there that show Cassandra performs very well in comparison with other leading NoSQL solutions (outperforms with writes, comparable with reads, improves linearly with number of nodes). But we suggest to take even these (presumably) more carefully crafted benchmarks with more than a grain of salt.

Lesson #2: NoSQL frameworks are “fast moving targets” and are currently evolving very rapidly, copying off features from one another, closing some gaps from traditional SQL etc. Since implementing this over existing projects usually takes quite some time overall (over a year in our case) by the time you finish - some of the original considerations are no longer valid. Not much one can do here - aside from maybe the obvious advice to take a look at these projects’ road-maps and hope they do reflect their future trajectory.

To rewrite on not to rewrite?

The next issue we faced was just how much rewriting of the existing tracking system we should do. Our existing code was complicated , tightly coupling business logic, database access, configuration and customizations. The temptation to simply “start afresh” with a new system was strong, though that would mean continuous merges of any bug-fixes/new-features from the “MySQL-branch” to the “Cassandra-branch”. We eventually chose the other path - of gradual evolution from one system to the next. This meant first of all adding a lot of missing tests - unit, end-to-end etc. - to add confidence to the change, then abstracting the repository and database access layers, implementing the new Cassandra-facing repository, and finally being able to work in a “composite” mode against the two very different systems - MySQL and Cassandra.

Lesson #3: although not a universal maxim, it seems that overcoming the temptations of rewrites usually pays off.

Data model

We store most of our tracking data in a single “events” column-family:

Events column-family
Rows are indexed by user-id
Column names represent single events
Column values are JSON’s like so:
{"event-time": 1367515779,
"type": "click",
device”: “mobile”
.
}
Since we occasionally need to look-up data that’s not user-specific, we also created an additional “index-lookup” column-family. This data model allows us to serve the most time-critical queries (vs. “get all events for given user”) in the fastest manner. Alas, we did not at first duplicate the data appropriately in the indexes, thus needing for each “index-lookup” to issue two round-trips to the cluster: 1) find the index-entry, then 2) get the additional data needed for the event.

Lesson #4: model your data to best serve your critical lookups, don’t be shy in duplicating data - round-trips to the cluster are costly.

Lesson #5: JSON is great in terms of flexibility and readability, but there are faster binary serializations out there, and after trimming all other factors, this can in turn become an issue.

Consistency

Cassandra is an eventual consistency system. We ran into consistency issues in two cases: 1) nodes down; and 2) read-before-write race conditions (which is actually an “anti-pattern”). The first issue was solved by scheduling read-repairs at reasonable intervals - so Cassandra could overcome the inconsistencies itself. The second needed changes in the application workflow - locally caching results and making sure no read-before-write race conditions occur.

Lesson #6: consistency is obviously something you need to think about in these types of systems. Anti-patterns are generally best avoided.


Erasing data

We have two basic scenarios where we need to erase data: cleaning-up data that’s too old and correcting erroneous data. We started out by implementing an erasing process over Cassandra - as we did with MySQL. This turned out to be not only ineffective but actually completely unneeded: old-data can be cleared very easily using Cassandra’s built-in time-to-live feature; and instead of deleting erroneous data we now keep all events with an “event version” allowing us to use only the latest.

Lesson #7: Since tracking data is a type of CRAP, we gave up on the update and delete parts of CRUD, and stuck with creating and reading only.

Current installation specs and numbers

  • 16 node-cluster
  • Dell R720 servers
  • 2 x CPU sockets with total 24 cores (12 physical cores and hyperthreading).
  • 32GB of RAM
  • 6 x 600GB SAS 10K Drives
  • 2 x 2TB SATA (for backup)
  • 2 x 1GB/sec NICs (for redundancy)
  • Currently about 10TB of data in the cluster
  • Average write latency: 0.8 ms/op (min 0.5 ms/op, max: 2.5 ms/op)
  • Average read latency: 23 ms/op (min: 11.7 ms/op, max: 40 ms/op)

Development and testing environment

Although setting up a local cluster for development and testing is relatively easy, it nevertheless has some pitfalls: how do you clear up data between tests? how do you allow developers to work without Cassandra at all? what about QA and staging? Currently, our tests run on both an embedded server and a dev cluster; there is a separate cluster for staging; and the application can be configured to be loaded w/o Cassandra at all.

Lesson #8: as always, dev-ops are tedious, time consuming tasks that must be accounted for in any effort-estimation


Transitioning to Cassandra

Due to risk management, we didn’t want to transition our tracking application all at once from MySQL to Cassandra. The path chosen was to:
  1. have the data written both to Cassandra and to MySQL but read-off only from MySQL, just to see how the cluster performs. Then,
  2. gradually move application servers to read from Cassandra, all the time continuing to write to MySQL in case we needed to backtrack (which indeed we needed, more than once). And finally,
  3. Stop using MySQL for tracking.
We obviously also had to migrate old data from MySQL to Cassandra - we wrote a dedicated process for this. The only down-side of this play-it-safe approach is that being able to easily “switch back to MySQL” can create a negative incentive to move forward in the face of obstacles.

Lesson #9: keep the old data around - you’ll probably need it.


Thursday, August 1, 2013

JUnit Rule for Verifying Logback Logging

Testing logging behavior is tricky: loggers are inevitably a dependency, and as such they should be mocked when unit-testing your component. However, most logging frameworks rely on static access to loggers, which makes mocking impossible, or at least cumbersome. This post by Pat Kua outlines some alternatives to the mocking approach, among which is creating another appender that will capture the output. I find this approach to be the most elegant, as it does not impose any awkward changes to the tested code.

In Kenshoo, we're shifting from using the good old log4j to the newer Logback implementation of the slf4j logging facade. This post by Aurelien Broszniowski suggests an easy implementation of the add-capturing-appender approach using Logback. Heavily relying on this implementation, we created a LogbackVerifier, a JUnit Rule that wraps the whole thing to make it more readable and reusable.

You would use it similarly to other verification rules (such as ExpectedException). In this example, we verify that service.doSomething(String) logs an info message on success and an error message (with an IllegalArgumentException object) on bad input:

Here's the LogbackVerifier implementation:

Implementation Notes:

  • of course, unlike your production code that can depend on Logback package in runtime only (and use slf4j in compile-time), your test code will need a Logback compile-time dependency to use the LogbackVerifier
  • This implementation uses Mockito, but one can easily replace it with any other mocking framework
  • The invocation order is not verified - it's easy to enhance this implementation to verify order. Since we're using Mockito, org.mockito.InOrder would do the trick
This post was authored by Tzach Zohar, architect at Kenshoo.

Become a Jenkins Slave



A few weeks ago, on a very busy day, our continuous integration server Jenkins was overloaded with many builds, and the build queue was too long. We decided that we need more computing power. Fortunately, Jenkins provides a built-in ability to distribute work by adding slave machines, so the only question was: what’s the fastest, most cost-effective way to set up more slaves?

We realized that some of our employee’s personal workstations often stand idle for hours, days or even months - either because of maternity leaves, long vacations or part-time jobs. So we were looking for the tools to utilize this wasted CPU power and create local Jenkins slaves on Kenshoo employees’ personal computers.

We chose Vagrant - a tool to easily build virtual machines over Oracle’s Virtualbox.

Over a year ago we started to use Puppet as our automation tool to configure machines, and created a puppet module that sets up a Jenkins slave. Since Vagrant plays nice with Puppet (i.e. you can configure Vagrant to run a specific Puppet module on a VM), our work was already half done - we simply used the same puppet modules to configure the Vagrant box.

Finally we used the Jenkins swarm plugin that enables automatic discovery of Jenkins slaves.

These 4 tools helped us package a Vagrant box and run “vagrant up” to turn a developer computer into a Jenkins slave.

The rest of this post will describe how each of these tools were used.

We created 2 vagrant projects.
The first creates a jenkins-slave from a basic Ubuntu box taken from http://www.vagrantbox.es/
Here is the project vagrantFile:


The Jenkins.pp is a starting point for the puppet process and contains the basic Jenkins slave module and two file modules which copy resources to the created box from the vagrant share folder:
1. Jenkins Swarm plugin jar taken from here
2. A service file to run the swarm jar with all necessary parameters to connect the Jenkins master.
jenkins.pp:

swarm service:

The Puppetfile in this project contains all the puppet modules needed for the jenkins-slave.

We usedlibrarian-puppet installbefore running Vagrant to fetch all the puppet modules dependencies for the jenkins-slave module into the project module folder.
Now running “vagrant up” will create a virtual machine which will connect to our Jenkins master and become a Jenkins slave.

So why do we need the second Vagrant project?
We wanted to simplify it even further for developers to run a slave.
We used “vagrant package” to create a box from the first project and host the .box file in one of our local file servers for everyone to use.
All a developer needs to do is to install Vagrant and Vritualbox, download a simple Vagrant file and run the command line “vagrant up”.
No need to run librarian-puppet and puppet which can take a lot of time.

This post was authored by Alon Levi, build architect at Kenshoo.

Wednesday, February 27, 2013

Building and delivering production-ready services with Gradle, Dropwizard, Graylog2 and Puppet

Kenshoo has been handling large scale applications for a while now. Terabytes of data are being managed and processed by a feature-rich set of applications and products. Not surprisingly, one of the key tools for scaling these applications successfully is distribution - identifying components that are over-burdened by functionality or load, and redesigning them into smaller, decoupled, single-responsibility services that can be scaled horizontally.

At first, when faced with this task, us developers were somewhat puzzled: while our larger legacy systems are already surrounded by a cozy network of operations-ready tools (for logging, deploying, monitoring, configuring, and provisioning servers), these tools were not applicable to our new services, as they've grown to be somewhat coupled with the applications they support.

We needed an improved set of tools to be used uniformly across the board, so that our operation teams can deploy, manage, and troubleshoot dozens of different services without using dozens of different tools. To be more concise: we needed a one-size-fits-all "Kenshoo Box" with built-in solutions for all operational aspects, into which developers can pour their business logic easily, and know that they can deliver it to production without making the operation teams angry.

If you need a similar "box" - this post is for you.

Solution Overview

In the core of this "box" stands Dropwizard - an excellent open-source project developed for these same purposes by Yammer. Dropwizard is an easy-to-use combination of mature, open-source Java projects (such as Jetty, Metrics, Jackson, Jersey and others), that lets you configure them all in one place and provides built-in integration between them.

By itself, Dropwizard almost gets the job done - it lets the developer create web services, expose metrics, configure loggers and appenders, and it exposes a built-in set of administrative services over HTTP (healthchecks, ping, threads..) that makes monitoring easy for both humans and machines. To make your service "droppable", it lets you package your service as a standalone JAR with a single accompanying configuration file (in yml format). 

The missing pieces are deployment (how do you manage dozens of boxes running different services, and upgrade a specific service?), and centralized logging - which we found to be crucial for troubleshooting truly distributed systems. To complete Dropwizard into a self-deploying, centrally-manageable service, we've added a few features using other open-source tools:
  • "Daemonizing": using Ubuntu's Upstart, we make the service run as daemon (yes, we deploy all services on Ubuntu). 
  • Logging: using a custom Dropwizard ConfiurationBundle, we integrate a Graylog2 appender for centralized logging
  • Packaging: using our own (open-sourced) Gradle FPM Plugin, we produce a Deb package that includes the JAR, yml file, and upstart script. The package also describes its dependencies on other packages (e.g. the relevant Java version), 
  • Provisioning and Deployment: using Puppet, we automatically provision new servers and deploy the latest version of our service on new or existing servers
The rest of this post will describe how each of these tools is implemented. 

Daemonize with Upstart 

We're using a very simple upstart file that simply constructs the Java command (java -jar ...) with the arguments expected by Dropwizard, and executes it. The most important part of this script is the injection of external configuration - this allows passing local environment-dependent configurations (hosts, passwords etc.) to the application easily. We will later describe how this local configuration file gets there (hint: Puppet).

Here's an example of such an upstart script:



Log with Graylog2

While Dropwizard provides the convenient Logback as it's logging framework, and includes the logging configuration in the all-encompassing yml file, it only allows creating a limited set of appenders, all of which create local files on the server running your service. Since we're running hundreds of servers running dozens of services, we need a centralized log viewer that will display logs from all servers. Graylog2 is a good choice for that (even though other tools, including some hosted solutions, are also applicable). 

To send logs to our centralized Graylog2 instance, we need to configure an appropriate GELF appender. We're using Moocar's logback-gelf, but we must configure it programatically since such custom configurations are not supported by Dropwizard through its yml file. To do that, we implemented a ConfiguredBundle that creates the appender and adds it to the relevant Logger:


Then, we call bootstrap.addBundle(new LogbackConfigurer()) in our Dropwizard Service implementation (read more about it here).

Package with Gradle

This phase makes deployment simpler - in two aspects: First, it packages all three artifacts (JAR, yml file, upstart script) into one artifact. Second, it declares the version of the artifact and describes it's dependencies - so that the work left for the deployment tool is as simple as can be. This is all done by the FPM Plugin for Gradle - read all about it in the project readme: https://github.com/kenshoo/gradle-fpm-plugin#readme

In our case, configuring the plugin in our build.gradle file looks something like this:


Provision and deploy with Puppet

We've come a long way so far - we have a deb file ready, containing everything we need. All we have to do now is install it on any Ubuntu machine. But - we want to be able to do that both on new machines and existing ones; on machines with previous versions running; on multiple machines at the same time; and so on.

Another concern we still have to deal with is environment-specific configuration. We "taught" our upstart script to read a configuration file with such settings and feed it to our service when starting it - so now we will take care of creating these configuration files with Puppet.

If you're not familiar with puppet, it should suffice to note that Puppet allows you to programatically define a server's desired configuration, apply it to a given server (or set of servers), and then continuously enforce it as both the definitions or the servers change over time. Such definitions may include creation of users, installation of any package or program, file and folder management and whatever else an IT guy with a keyboard and SSH access can do...

So what should our Puppet code do? To install the service we've created and packaged, we need to:

  1. Create an appropriate user, user group, and folders expected to exist by our service
  2. Create a server.cfg file with any external configuration required by our service
  3. Define our debian repository as a source, fetch and install the latest debian package of our service
  4. Restart the service

Here's a sample Puppet code that does just that:


Fire away!

Apart from the four simple pieces of code you've seen here, not much else is needed to create a fully-operational, easy to manage, scalable platform of independent services. Just throw your business logic into the comforting arms of Dropwizard, that will help you create REST APIs, monitors, DAOs and whatever else you need, and your code is out there, getting things done.  

This post was co-authored by Lior Harel and Tzach Zohar, architects at Kenshoo.