We had a problem...
Our tracking system at
Kenshoo got too big.
Too much data.
Spread over many
isolated MySQL servers.
So around two years ago
we turned to NoSQL.
How to choose a NoSQL solution?
Out of the many options
that exist out there, we considered at first three: MongoDB,
Hadoop (HBase) and Cassandra.
We finally picked
Cassandra, because:
- we liked its clean, symmetric design.
- our benchmarks showed it seemed to be good at writes (tracking systems are generally very write-intensive).
- was easy to setup (contra Hadoop).
- had a more appropriate data-model (contra MongoDB).
- had a growing, active community (they all do).
Lesson #1:
Cassandra is very good at writes, but the benchmarks we did
were not helpful. Even if you get to run your tests on
production-like scenarios on production-like machines and network
topology (we didn’t), the actual performance you eventually see in
Cassandra is extremely sensitive to server configuration, real-life
scenarios, and internal operations (compaction, read-repair etc.)
done by Cassandra. There are benchmarks
out there that show Cassandra performs very well in
comparison with other leading NoSQL solutions (outperforms with
writes, comparable with reads, improves linearly with number of
nodes). But we suggest to take even these (presumably) more carefully
crafted benchmarks with more than a grain of salt.
Lesson #2: NoSQL
frameworks are “fast moving targets” and are currently evolving
very rapidly, copying off features from one another, closing some
gaps from traditional SQL etc. Since implementing this over existing
projects usually takes quite some time overall (over a year in our
case) by the time you finish - some of the original considerations
are no longer valid. Not much one can do here - aside from maybe the
obvious advice to take a look at these projects’ road-maps and hope
they do reflect their future trajectory.
To rewrite on not to rewrite?
The next issue we faced
was just how much rewriting of the existing tracking system we should
do. Our existing code was complicated , tightly coupling business
logic, database access, configuration and customizations. The
temptation to simply “start afresh” with a new system was strong,
though that would mean continuous merges of any
bug-fixes/new-features from the “MySQL-branch” to the
“Cassandra-branch”. We eventually chose the other path - of
gradual evolution from one system to the next. This meant first of
all adding a lot of missing tests - unit, end-to-end etc. - to add
confidence to the change, then abstracting the repository and
database access layers, implementing the new Cassandra-facing
repository, and finally being able to work in a “composite” mode
against the two very different systems - MySQL and Cassandra.
Lesson #3: although
not a universal maxim, it seems that overcoming the temptations of
rewrites usually pays off.
Data model
We store most of our
tracking data in a single “events” column-family:
Events
column-family
Rows
are indexed by user-id
Column
names represent single events
Column
values are JSON’s like so:
{"event-time": 1367515779,
"type": "click",
“device”: “mobile”
….
}
Since we occasionally
need to look-up data that’s not user-specific, we also created an
additional “index-lookup” column-family. This data model allows
us to serve the most time-critical queries (vs. “get all events for
given user”) in the fastest manner. Alas, we did not at first
duplicate the data appropriately in the indexes, thus needing for
each “index-lookup” to issue two round-trips to the cluster: 1)
find the index-entry, then 2) get the additional data needed for the
event.
Lesson #4: model
your data to best serve your critical lookups, don’t be shy in
duplicating data - round-trips to the cluster are costly.
Lesson #5: JSON
is great in terms of flexibility and readability, but there are
faster binary serializations out there, and after trimming all other
factors, this can in turn become an issue.
Consistency
Cassandra is an
eventual consistency system. We ran into consistency issues in two
cases: 1) nodes down; and 2) read-before-write race conditions (which
is actually an “anti-pattern”).
The first issue was solved by scheduling read-repairs at reasonable
intervals - so Cassandra could overcome the inconsistencies itself.
The second needed changes in the application workflow - locally
caching results and making sure no read-before-write race conditions
occur.
Lesson #6:
consistency is obviously something you need to think about in
these types of systems. Anti-patterns are generally best avoided.
Erasing data
We have two basic
scenarios where we need to erase data: cleaning-up data that’s too
old and correcting erroneous data. We started out by implementing an
erasing process over Cassandra - as we did with MySQL. This turned
out to be not only ineffective but actually completely unneeded:
old-data can be cleared very easily using Cassandra’s built-in
time-to-live feature; and instead of deleting erroneous data we now
keep all events with an “event version” allowing us to use only
the latest.
Lesson #7: Since
tracking data is a type of CRAP,
we gave up on the update and delete parts of CRUD, and stuck with
creating and reading only.
Current installation specs and numbers
- Dell R720 servers
- 2 x CPU sockets with total 24 cores (12 physical cores and hyperthreading).
- 32GB of RAM
- 6 x 600GB SAS 10K Drives
- 2 x 2TB SATA (for backup)
- 2 x 1GB/sec NICs (for redundancy)
- Currently about 10TB of data in the cluster
- Average write latency: 0.8 ms/op (min 0.5 ms/op, max: 2.5 ms/op)
- Average read latency: 23 ms/op (min: 11.7 ms/op, max: 40 ms/op)
Development and testing environment
Although setting up a
local cluster for development and testing is relatively easy, it
nevertheless has some pitfalls: how do you clear up data between
tests? how do you allow developers to work without Cassandra at all?
what about QA and staging? Currently, our tests run on both an
embedded server and a dev cluster; there is a separate cluster for
staging; and the application can be configured to be loaded w/o
Cassandra at all.
Lesson #8: as
always, dev-ops are tedious, time consuming tasks that must be
accounted for in any effort-estimation
Transitioning to Cassandra
Due to risk management,
we didn’t want to transition our tracking application all at once
from MySQL to Cassandra. The path chosen was to:
- have the data written both to Cassandra and to MySQL but read-off only from MySQL, just to see how the cluster performs. Then,
- gradually move application servers to read from Cassandra, all the time continuing to write to MySQL in case we needed to backtrack (which indeed we needed, more than once). And finally,
- Stop using MySQL for tracking.
We obviously also had
to migrate old data from MySQL to Cassandra - we wrote a dedicated
process for this. The only down-side of this play-it-safe approach is
that being able to easily “switch back to MySQL” can create a
negative incentive to move forward in the face of obstacles.
Lesson #9: keep
the old data around - you’ll probably need it.