Lightweight Tests for your Nginx API Gateway

An API Gateway is a design pattern often used in microservice architecture to provide a single access point to the underlying system. While building the Meltwater API, we have frequently used this design pattern.

In this article we explain why we have found it crucial to have meaningful tests for our API gateway. We also show you our test setup, and how you can use this approach yourself. You should continue reading if you need a simple yet effective way to verify the basic correctness of your API gateway.

Key Component

In the Meltwater API team, we use our API gateway for two purposes. Firstly, to encapsulate the internal system architecture, allowing changes in underlying components to not affect the public API. Such changes could be for example adding new services or moving functionality between existing services. Secondly, the API gateway allows us to reduce the user facing interface and simplify its documentation by providing a tailored public API.

In terms of technology, we chose Nginx, a high performance reverse proxy. After a couple of months of running it in production and adding more and more functionality, we ran into issues related to overall testability.

Fig. 1. Nginx gateway in front of some microservices, composing them into a number of user facing APIs

Verifying Correctness Is Tricky

Before we introduced tests, we had around 40 Nginx location directives, 8 underlying microservices (composing 3 public API products) and no automated tests. This level of complexity slowed down our development and release process. Every change had to be verified manually in either local or separate test environment. The biggest hassle was to verify the correctness of the regular expressions in the Nginx location directives and ensure that existing API routes are not affected.

location /path1 {
  set $backend_upstream "https://service1.myapi.com/v1/path1";
  proxy_pass $backend_upstream;
}

location ~ ^/path1/(?<id>.*)$ {
  set $backend_upstream "https://service2.myapi.com/v1/path1/";
  proxy_pass $backend_upstream$id;
}

location ~ ^/path2/(?<id>.*)$ {
  set $backend_upstream "https://service3.myapi.com/v1/path2/";
  proxy_pass $backend_upstream$id;
}

Fig. 2. Sample configuration - unless your regex is strong it’s not always clear what patterns this will match

Common issues we were facing included:

  • Too broad regular expression in matching location directive
  • Missed to pass dynamic path elements and query parameters to target url
  • Wrong ordering of location directives
  • Missed to assert beginning (^) and end of string ($) in the regular expression
  • Hard to understand the configuration at a glance

For example, if by mistake we would define the first location as regex match ~ ^/path1 instead of exact match /path1, the behaviour would change drastically. Requests to /path1/id would get proxied to https://service1.myapi.com/v1/path1 instead of https://service2.myapi.com/v1/path1/id.

Initially such configuration changes were verified by us during development via manual tests. This process was prone to human error and did not give us enough confidence that we had not introduced any regression. Our manual tests often covered only a fraction of the functionality, usually the path / location which was added or modified. Adding a new path with a wrong regex or in a wrong position could affect existing routes as it was shown above.

Finally this type of testing was slow and since the gateway is a single access point to the API, it was also a bottleneck regarding our development and release process. Changes needed to be deployed fast with high degree of certainty. We did not want to waste time on manual smoke tests and potential rollbacks.

We decided to introduce automated tests. Having a couple of solutions on the table, we picked an approach where we run integration tests against mocked services. We wanted to verify all potential bugs as early as possible and keep the test setup as lightweight as possible.

As an aside: Executing end to end tests against a Staging environment was no option for us, since we only have a Production environment following the blue/green deployment paradigm.

Creating the Test Setup

Our main requirements for our new test environment were to:

  • not affect our Production configuration. All the required steps should be done on top of it
  • encapsulate test environment in a docker container for the sake of CI / CD integration. By doing this we also gain full control over the network configuration in our test environment (which we need to override DNS lookup as you will learn later in this post)
  • stub internal services, so the provided response allows us to verify that the gateway is proxying to the desired target url
  • implement tests in one of the frameworks we use already, which are RSpec (Ruby) and ExUnit (Elixir)

Fig. 3. Test environment components and test flow

Overriding DNS Resolution

Since the API gateway is proxying requests to a number of internal services, the first step was to override the DNS lookup for these hostnames, so they resolve to 127.0.0.1. For this purpose we decided to use dnsmasq, a network utility bundle, which provides a DNS server. A cool property of dnsmasq is that DNS names can be defined in the /etc/hosts file. So, to have all our internal services hostnames resolved to 127.0.0.1, all we needed to do was to add the following lines to /etc/hosts and run dnsmasq as daemon service inside our test docker container:

127.0.0.1  service1.myapi.com
127.0.0.1  service2.myapi.com
127.0.0.1  service3.myapi.com

You can verify that the DNS setup inside the test container is working as expected using dig command:

$ dig +noall +answer service1.myapi.com @127.0.0.1 service2.myapi.com @127.0.0.1 service3.myapi.com @127.0.0.1
service1.myapi.com.    0    IN    A    127.0.0.1
service2.myapi.com.    0    IN    A    127.0.0.1
service3.myapi.com.    0    IN    A    127.0.0.1

Finally, we wanted Nginx to use the local DNS resolver in the test environment. We did this by moving the production and test resolvers to separate files to make it easy to overwrite the production config with a test one:

server {
  listen                *:80;
  include               /etc/nginx/dns.conf;
  server_name           gateway localhost;
}

Where test /etc/nginx/dns.conf is:

resolver 127.0.0.1;

Stubbing Internal Services

The next step was to stub the internal services. We decided to reuse the already running Nginx server. We defined a virtual server configuration and injected it into /etc/nginx/conf.d location of the tested Nginx gateway:

server {
  listen              	*:443 ssl;
  server_name         	*.myapi.com;
  ssl_protocols       	TLSv1 TLSv1.1 TLSv1.2;
  ssl_ciphers         	AES128-SHA:AES256-SHA:RC4-SHA:DES-CBC3-SHA:RC4-MD5;
  ssl_certificate     	/etc/ssl/certs/nginx-selfsigned.crt;
  ssl_certificate_key 	/etc/ssl/private/nginx-selfsigned.key;

  location ~ (?<endpoint>.*)$ {
    add_header X-Powered-By service-mock;
    return 200 "https://$host$endpoint$is_args$args";
  }
}

It is expected to handle all the requests to *.myapi.com locations and return the complete target url - the one to which Nginx is proxying the request. This allows us to verify if a call to a specific path in the gateway is proxied to the expected target url. This was the basic validation we wanted. Sample requests against test environment:

$ curl localhost:81/path1
https://service1.myapi.com/v1/path1

$ curl localhost:81/path1/id
https://service2.myapi.com/v1/path1/id

$ curl localhost:81/path2/id
https://service3.myapi.com/v1/path2/id

If you are curious about what it took to execute all the above steps that resulted in our test docker container, check out the repository with our demo application.

Writing And Executing Tests

Having it all set up, writing tests was the easiest part. We chose Ruby’s RSpec testing framework, since we already use it. Here is an example test spec, which tests an Nginx running at localhost:81:

describe "gateway" do
  GATEWAY_BASE_URL = "http://localhost:81"
  context "/path1" do
    context "get" do
      result = Client.get "#{GATEWAY_BASE_URL}/path1"
      it "return 200" do
        expect(result.code).to eq(200)
      end
      it "proxy to service2" do
        expect(result.body).to eq("https://service2.myapi.com/v1/path1")
      end
    end
  end
  context "/path2" do
    context "/some_id" do
      context "get" do
        result = Client.get "#{GATEWAY_BASE_URL}/path2/some_id"
        it "return 200" do
          expect(result.code).to eq(200)
        end
        it "proxy to service3" do
          expect(result.body).to eq("https://service3.myapi.com/v1/path2/some_id")
        end
      end
    end
  end
end

To make it visible which paths and HTTP methods we are testing, we are grouping path elements in separate contexts. The test output is even more verbose:

gateway
  /path1
    get
      return 200
      proxy to service2
  /path2
    /some_id
      get
        return 200
        proxy to service3

Finished in 0.00287 seconds (files took 0.56456 seconds to load)
4 examples, 0 failures

Outcome

Having all the above scripted, automated and integrated into CI / CD gave us enough confidence when making further changes to our API routes. With this setup in place, it is easy to modify the gateway and write tests for such changes or even do test first development. Moreover, the test structure and output is really self documenting, making it easier for us to reason about our Nginx configuration.

Finally, to encourage others to try out a similar approach, we have built a demo application, which should make it easy to bootstrap a similar test environment or just check the details of our solution.

We hope you found this useful. If you have questions, or if you end up using our approach, or if you have tried out other gateway testing approaches, please leave a comment below.