Thursday, February 5, 2015

Writing Pluggable code using IoC and Lists

One of the main requirements and sources of difficulties we software developers face is the need to write code that can both be easily reused and expanded later to meet new product requirements. In the following post I will try to present a way of doing that using Lists with injected values.

In order to understand this post you need to a basic understanding of IoC in general (wiki link)
and since since this example uses Spring Framework, you should have a fair understanding of that as well.
I’ll be using Spring framework's @Autowired to inject dependencies but this works just as well with @Inject @Resoruce etc.

As a first example, assume we have an interface called Validator and two classes that implement it: MaxLenghValidator and MinLengthValidator. Any class that now contains an @Autowired list of Validator's will be injected with two items in the list, namely: MaxLengthValidator and MinLenghValidator. If we create new implementations of Validator, this injected list will change automatically.

This simple pattern can be used to effectively solve a variety of problems. One such example is implementing the Chain of Responsibility design pattern (a good article about it can be found here: http://rdafbn.blogspot.ie/2012/11/chain-of-responsibility-using-spring.html)

That's the basic use-pattern of @Autowired lists.
Now let’s see how this can allow us to easy expand our code.

We do that by first choosing the right module architecture. Looking at the example above, one appropriate module architecture might be:

(Module)Validator-API (Runtime dependency in Validator-Impl)

Validator(I)
ValidatorService(C)

(Module)Validator-Impl (compile dependency in Validator-Api)

Implements of Validator(C)

Notice the ValidatorService (which is part of Validator-API) is only run-time dependent on Validator-Impl. This means that at compile time, it only needs to know about the Validator interface, but not about any of its implementations - the latter are needed indeed only at run-time. This modeling allows us to add as many Validators as we'd like without changing anything in the ValidatorService.

What if we want to control the order in which the various Validators appear in the list? By default, list items inside @Autowired lists are added alphabetically. In order to explicitly state a different ordering, we can add an @Ordered class annotation - as shown in example 2

We can now try to create a full-fledged service, that validates entities, does some calculations on them and finally saves them. We'll want to write a generic service - that will do this for any Entity.

We will start with the API and Impl module structures above, and create an enum of EntityType and an Entity POJO.

Then we can create the validate, calculation and save interfaces:

Finally we can create the EntityService interface, that will tie them all together:

That's it - we’ve finished the API, and can now proceed with the implementation layer.
Let's say we want to add a Campaign entity:

Next we need to create an EntityServiceImpl:

Now each time doTheStuff method is called, it gets the stuff done by using the entity implementation provided.
Notice I’ve added a Map to hold each entity implementation for quick access. We can construct these maps using the @PostConstruct annotation, which is triggered right after the bean has been created and its members Autowired, including the aforementioned list. So now the list can be transformed into a map for quicker access.

This type of structure allows us to add as many Entities as we'd like, and our EntityService will be able to handle them automatically. This applies not only to our implementations - any external JAR containing classes that implement our interface will be added to our injected list due to its run-time dependency.

Speaking of external JAR's, we might want to allow others - either inside or outside of our organization - to be able to implement our interfaces more easy. So let's create an "interface of interfaces" that once implemented can be integrated to our system. In order to achieve this, we can create an EntityServiceProvider that can hold the various interface implementations for any EntityType. We then modify our EntityService to use it:

To wrap things up:
What we’ve shown is a pattern for creating flexible ("plug-in able") code which is both service-oriented and testable: each component is independent and can thus be tested separately: we can write separate unit tests for each component's logic, and a behavioral test for the wrapping service (EntityService) will provide good code coverage of the overall functionality.

Here at Kenshoo our code needs to be expand regularly, for example to support new channels and new channel features. Hence we must have solid structures that enable new code to easily be integrated to existing code in our system - and we use this pattern extensively.

Tuesday, February 3, 2015

Validation of RESTful (Swagger) Documentation

Documenting an API is probably the most important part after creating it. After all, without good documentation an API is - useless. At Kenshoo we've chosen Swagger 2 to document our RESTful APIs.

About Swagger

The goal of Swagger™ is to define a standard, language-agnostic interface to REST APIs which allows both humans and computers to discover and understand the capabilities of the service without access to source code, documentation, or through network traffic inspection. When properly defined via Swagger, a consumer can understand and interact with the remote service with a minimal amount of implementation logic. Similar to what interfaces have done for lower-level programming, Swagger removes the guesswork in calling the service.
Swagger Spec defines the specification.
Swagger Editor provides a convenient web editor to edit and preview the documentation, and also some convenient examples to get started.
Swagger UI visualizes the documentation as a human readable web page.

With Swagger 2 the API descriptor is kept in a single yaml file:

But how can we validate that our documentation is accurate and matches the actual code?

The question becomes even more important as time passes and code evolves.

Do resources still exist? Were paths changed? Maybe properties were renamed?

One way of solving these issues would be to regenerate the documentation from code. But as we strongly believe in a "design first" approach - i.e. that documentation should be created before code - we looked for a different path, one that could allow us to write the documentation first, and then verify the code matches our design.

Swagger-validator is a new component we're open sourcing at Kenshoo, that aims to help verify that the documentation of an API matches the actual code.

In order to use it, all the developer needs to do is map the paths and definitions of the yaml file to the Java classes.

For example:

As you can see under each path a new custom element x-javaClass was added. It contains the fully qualified Java class name.

And that's all!

The validator can now run the necessary checks, throwing a meaningful exception if anything fails.

The validator can be activated as any other JVM process (be it Java, Scala, or Groovy), but the easiest way to run it is probably via a simple unit test:

So what's validated?

Resources are validated against their paths, using the @Path annotations their operations (GET, POST, etc.) have

Objects (definitions) are validated against the properties they carry - matching properties to object fields.

The current implementation uses Java 7.

The resources are validated with Jax-RS annotations.

The definitions are validated as POJOs.

Friday, January 23, 2015

Template Engines in Kenshoo

Recently we have begun re-engineering our RealTime Campaigns solution.

RealTime Campaigns (RTC) are an automated way for marketers to sync their product inventory with their online advertising. It works by taking a feed and applying some rules to create ads, keywords and any other search engine structure required.

For example:

Assume you have a feed of products and each product has a list of properties (e.g. short description, description, price, color, brand) the tool will allow you to create keywords using the template language. An advertiser will use the template engine to build from the list of the properties the keywords, text, and the ad headline 1, headline 2 and URL.

When designing the RTC engine we first needed to choose one of two classes of engines:

1. The logic-less template engines (e.g. Mustache)

2. The logic enabled template engines (e.g FreeMarker, Velocity)

After evaluating both options and validating the business requirements we decided to go with option 2 that will allow for creating templates that are based on the content of the feed items.

Then came the fun part of evaluating which engine should we work with.

We chose FreeMarker as it is the most mature, has a great set of string manipulation functions and is commonly used with other open source platforms we are using in house (e.g. Spring and DropWizard views templating).

We rolled out the new feature and it was well received in the field but there was one piece missing... Users wanted to test their templates. You would think that a tool for experimenting with templates already exists in the wild but after searching for an online simulator for Freemarker we found none. We therefore decided to write one - you can find our new simulator at freemarker-online.kenshoo.com. If you are interested in the code behind it or want to contribute to it, we have opened source it and the code can be found here.

Enjoy!

Wednesday, November 12, 2014

Micro Services Hitting Production Environment / Micro Services & Shared Resources

In late 2013, we began to read more and more about a new development approach - micro services.

After watching James Lewis's lecture, we realized this is exactly what we need.

We felt, for quite a long time, that our application was getting bigger and bigger and it was more difficult to keep the high velocity of our development cycle.

We have a release cycle of 2 weeks, so within 2 weeks, our development/QA teams must develop the new features, debug the new functionality and make sure nothing is broken.

As our application became VERY big, the latter task became harder and harder.

We realized that micro services were what we need and we decided, as the best practices suggest, to start developing new features as micro services alongside the existing (big) application.

Our application is written in JAVA, using various Spring frameworks for both REST and offline-batch processing (including Spring MVC, Spring Batch, Spring Security and much more).

The application is deployed in a clustered, scalable environment running on Tomcat web servers.

As we are extensively using Spring, it was only natural to choose Spring Boot as our micro services "launcher". Using a predefined template (archetype) of Spring Boot, we could enable quick creation of a new micro service so that developers can focus on the business logic and not on wiring the new deployable project.

Since a micro service, by definition, has its own repository, build, Spring context, internal logic and RESTful API, we can build each service as WAR file and deploy it on our Tomcat servers. In other words, each Tomcat server will deploy multiple WAR files, and each WAR file is a standalone micro service.

Deploying each micro service in its own Tomcat/machine was a less-preferred option, because it would complicate our deployment and scaling logic.

The WAR prototype Maven build included:

<groupId>org.springframework.boot</groupId>

<artifactId>spring-boot-starter-web</artifactId>

<groupId>org.springframework.boot</groupId>

<artifactId>spring-boot-starter-tomcat</artifactId>

</exclusion>

</exclusions>

</dependency>

<groupId>org.springframework.boot</groupId>

<artifactId>spring-boot-starter-security</artifactId>

</dependency>

….

</dependencies>

As expected, the development and QA cycles were dramatically improved. Now we only had to test the logic of a single, small application and its API with the 'big' application.

And we were happy & satisfied with our decision...

Two weeks later, the new micro services went to the production environment and then we started to see some issues we didn't anticipate at first.

The micro services approach, particularly if you build each micro service as a standalone deployable WAR file, is great for development and testing but, in our production environment, all micro services were deployed on the same scalable Tomcat servers and that was the problem.

Resource allocation:

Each service in our system (whether it is a micro service or just a piece of code in the application) usually needs resources:

* Database connections

* Threads from a thread pool

When you have a single application, you can do some rough assumption of the load and capacity of each server and, based on that, pre-allocate thread pools and database connection pools.

When deploying multiple (tens...) of WAR files in a single JVM (Tomcat), it is very hard to make those assumptions. In some use cases, all the resources of a single machine can be allocated to a single service and in other use cases, the resource should be spread between several services.

If you allocate each WAR file/each micro service the 'worst case scenario' when it comes to resource utilization, you'll exhaust your external resources (database has a limited number of connections...).

If you under-allocate the resources per service, you may not be able to serve requests in certain scenarios.

Application boot time:

As we are using a scalable environment, we allocate more servers based on the load of requests.

In this case, it is critical that the new servers will be available to serve requests ASAP.

When you deploy multiple WAR files in a Tomcat server, and each WAR file has its own Spring context that needs to initialize, the Tomcat deploys the WAR files one by one, and each service creates a new application context (which again takes time) and starts its own services.

In fact, the time between the Tomcat start time and application availability was increased almost by 10 when using this approach.

So what do we do?

Deploying each micro service in a dedicated Tomcat would solve the thread pool issue but won’t solve the database connection pool issue (and would dramatically complicate the deployment procedure).

Of course, we didn't want to ditch the micro services approach and move back so we decided to use the micro services in a different approach:

Each micro service will be built as a JAR file and Not WAR file.

External resources, such as thread pool and database connections, will be Autowired and injected by the context (which is NOT part of the micro service).

We used Spring boot (again) for this approach and the configuration was:

<groupId>org.springframework.boot</groupId>

<artifactId>spring-boot-starter-batch</artifactId>

<groupId>org.springframework.boot</groupId>

<artifactId>spring-boot-starter-logging</artifactId>

</exclusion>

<groupId>org.hsqldb</groupId>

<artifactId>hsqldb</artifactId>

</exclusion>

</exclusions>

</dependency>

<groupId>org.springframework</groupId>

<artifactId>spring-jdbc</artifactId>

<version>3.1.0.RELEASE</version>

</dependency>

<groupId>org.springframework.boot</groupId>

<artifactId>spring-boot-starter-jdbc</artifactId>

</dependency>

….

</dependencies>

We created a single WAR project that will be the container of all the micro services (again using Spring Boot).

This new WAR held all the micro services as dependencies so once it had been deployed, it injected the thread pools, database connection pools & all other shared resources to the various JARs / micro services.

So in this approach, we have:

1) A single allocation of resources that will be used among ALL micro services.

2) Single Spring application context that is initialized in boot time and hence system boot time improved dramatically.

To Illustrate:

When each micro service had its own WAR container:

After migrating the WAR projects to JAR projects and creating a shared resources WAR container:

Roy Udassin

Friday, June 27, 2014

Making integration tests run faster

The problem - slow tests

Our integration tests, using both Junit and Cucumber, are at least two orders of magnitude slower than our unit tests, which is to be expected. But we noticed that over time, our integration tests were just getting slower.

Circumventing for the moment the debate of if and when integration tests are appropriate (some actually call any integration testing "a scam"), the simple fact of the matter is: we have them, many of them. Whether it's BDD we'd like to support, end-to-end tests, strict integrations with other frameworks (web-services, message-queues, persistence layers etc.) and those murky integration tests put in place to circumvent very non-test-friendly legacy code - they all need to be supported.

So, what can be done?

Profiling our test suits we found - unsurprisingly - that loading the Spring context was the number one hot-spot, both when running Cucumber and using Junit. And it was the growing size of the context that made single-test runs slower over time - starting up the context simply took more and more time. For our context loading, we found Spring took about 20 seconds to package-scan our annotated beans and another 5 minutes to actually load the beans, doing the needed wiring and initializations.
Start-up time isn't a big issue for our Jenkins builds - they reuse the same context between tests so load is done only once - but it was a big issue for developers. Waiting over five minutes just for the test to start meant developers were simply not running them locally.

Making Spring context load faster

The first step we took was to try and load the Spring beans lazily.

The easiest way to do this is declaratively, via the XML files (an attribute of either the <bean> or <beans> tags). Alas, for us this approach was not sufficient, as some of these XML's are being imported from other jars, and so we had difficultly controlling them: the thing with lazy initialization is that it needs to be done "all the way down" - if some bean isn't lazy, it will force all the beans it depends on to be loaded eagerly as well. So misbehaving beans imported from other jars "hampered with our cause".

What eventually worked for us was to implement our own custom context loader for tests. Specifically, our own version of Spring's SmartContextLoader. This loader of ours changes the bean definitions to lazy during context start. That makes sure all beans are indeed defined as lazy (code below).

Two caveats are called for though:

In production, we usually want to have our context loaded eagerly (that's Spring's default), because we want to fail-fast if it's broken. Having it load lazily in tests means your load sequence is different than production. If your beans do any non-trivial stuff inside their initialization (they really shouldn't - but we can't always have it our way) be aware of this difference.
It seems there are certain types of beans that simply can't be set as lazy without breaking the context loading, so these need to either be filtered out from the context loading (if possible) or kept unchanged.

Another important lesson we've learned was to become cautious of Spring batch. Spring batch jobs are nice to have, but they come with a price: they put a very large burden on the context, creating lots of AOP proxy beans, and these can't be loaded lazily. If you defined such jobs - you need to take extra care that they are defined in separate XML's, added to the context only when really needed.

Componentization is key

The more profound steps we've embarked upon were to improve our application's componentization, at two levels: 1) breaking our main applications (AKA mother-ship / monolith) into a set of smaller services; and 2) better defining the internal component structure of our main applications.

Well defined components allow for:

Smaller, isolated and self-sufficient contexts that load quickly. Tests can then load only the minimal contexts needed to run the tests.
Stabler code: Spring contexts with many direct and transitive dependencies easily break due to "far-away" changes made by distant team-members working on some seemingly completely unrelated feature.
Most importantly: clear and well defined components help one understand what their code is doing.

Neglecting to pay attention to ones higher levels of componentization tends to lead over time to applications where everything is connected to everything else - a situation also known as a big ball of mud. Unfortunately, improving an application's level of componentization is difficult - it requires much more skill - and work - than refactoring single classes. In Uncle Bob's excellent Object Oriented Principles one can find six different principles to adhere to in order to reach this super important goal. One of the nice "side-effects" of this effort is, well, faster integration tests.

Our custom context loader: