Drone CI at Scale

Drone is a powerful open source continuous integration tool which we have been running as an internal service to Meltwater’s engineers. During the past two years we have seen the service adopted by the majority of development teams, to where we currently run over 1,200 pipelines per day on average.

In this post, our Foundation team shares some of the features that have made Drone such an important tool in the daily workflows of our engineers.

A bit of History

Foundation is a “mission” within Meltwater dedicated to helping our engineering teams deliver business value quickly. Foundation provides training, best practices, and runs services (such as Drone) which help accelerate our software delivery lifecycle.

Over the years teams have stood up or adopted various continuous integration (CI) tools to meet their needs at the time (Jenkins, GoCD, TravisCI, CircleCI, and more). Thankfully our Drone service has been able to replace almost all of them. Reducing the number of different tools in use in our organization reduces context switching and gives our engineers common ground when working together on automating their CI workflows.

The graph below shows adoption of our drone service over time, with an annotation for when we upgraded to Drone 1.0, after which adoption grew significantly.

Today, Drone covers almost 50% of Meltwater’s continuous integration footprint.

Autoscaling

The ability to automatically scale our Drone agents to handle spikes in demand has been vital to our service. With over 300 engineers working in timezones around the globe, demand can rise and fall at unexpected times.

Drone’s official autoscaler has been rock-solid in our experience. The Drone autoscaler is its own docker image which runs as a separate process. It continuously polls the Drone build queue and launches or terminates instances based on volume. Our service runs in AWS EC2, but the autoscaler supports other cloud services, such as Google Cloud and Digital Ocean.

Other than normal daily development workflows, we have seen a few patterns in developer activity that necessitate effective auto scaling of our Drone agents.

Drone’s cron feature allows our teams to schedule triggers of their build and test pipelines. This can quickly queue well over 100 builds in Drone, and thanks to the autoscaler, it usually only takes a few minutes for everything to complete as seen in this Kibana graph:

Our teams also take advantage of tools like Dependabot and Renovate to automatically open pull requests in their repositories with security fixes and other updates to their dependencies. This can result in spikes of queued pipelines.

The result is that in the course of a week, we can have scaling events that require over 60 agents at peak times. Here is a graph showing the drone_server_count Prometheus metric from the autoscaler process over one week. The Drone server and autoscaler processes both provide many useful Prometheus metrics at their /metrics endpoint.

Less waiting time between committing code and pipeline runs means that our engineers rarely have to wait for their pipelines to start. Developer time is very valuable!

The Drone autoscaler does have a limitation, in that it is only compatible with the docker runner. Drone provides other runners where pipelines can execute, each optimized for a different use case. We use the exec runner for some pipelines where docker is not desired. Unfortunately, we can’t leverage the Drone autoscaler to scale these runners.

Path-based Pipelines

One feature we found lacking in Drone was the equivalent of Jenkins’ “Included/Excluded Regions”, which would trigger a pipeline based on files changed in the git commit range. After some discussion with the Drone developers, we learned we could write our own conversion extension to accomplish this. Conversion extensions can modify a .drone.yml configuration file before it is parsed and processed by the Drone server process, which gave us full control over the resulting .drone.yml file. We have open sourced our conversion extension as drone-convert-pathschanged and hope that you will find it useful.

This extension has proven an invaluable feature in monolithic repositories that require different steps to run when files change beneath certain directory structures. The resulting pipelines that execute don’t require teams to write their own complex scripting to determine what should happen when certain paths change.

Here is an example utilizing both include and exclude patterns. This ensures the message step runs when .yml files are changed in the root of the repository, but not .drone.yml:

---
kind: pipeline
type: docker
name: default

steps:
- name: message
  image: busybox
  commands:
  - echo "A .yml file in the root of the repo other than .drone.yml was changed"
  when:
    paths:
      include:
      - "*.yml"
      exclude:
      - .drone.yml

With Drone, you are not limited to a single pipeline that runs on a single agent. It is possible to define multiple pipelines, each of which can run on its own agent. Combined with Drone’s routing feature, we can trigger pipelines on specific agents based on the paths changed.

When combining paths and routes, we can write a .drone.yml like this:

---
kind: pipeline
type: docker
name: datacenter/A
node:
  datacenter: A

trigger:
  paths:
    include:
    - datacenter/A/**

steps:
- name: deploy
  image: alpine
  commands:
  - ./datacenter/A/deploy.sh
---
kind: pipeline
type: docker
name: datacenter/B
node:
  datacenter: B

trigger:
  paths:
    include:
    - datacenter/B/**

steps:
- name: deploy
  image: alpine
  commands:
  - ./datacenter/B/deploy.sh

Note that the above example uses ** which matches any sequence of characters including path separators (see doublestar).

The datacenter/A pipeline above runs on an agent with environment variable DRONE_RUNNER_LABELS=datacenter:A, the datacenter/B pipeline runs on an agent with environment variable DRONE_RUNNER_LABELS=datacenter:B.

Here is a recording of what the workflow for a repository with the above .drone.yml might look like, making commits to different directories and watching the resulting pipeline execute:

Docker Registry Plugin

Running each pipeline step in a separate docker container is a core feature of Drone. Most of the time, public docker images are all you need when constructing your pipelines. However, sometimes you need to run a docker image from a private docker registry in a pipeline step.

Drone supports pulling private images with the image_pull_secrets parameter. This requires that someone manually create the secret in their repository settings.

Meltwater has multiple shared private docker registries that many teams use in their pipeline steps. Our teams also need to run docker images from AWS ECR, which is not possible with the image_pull_secrets method, since ECR tokens are only valid for 12 hours.

The drone-registry-plugin is a registry extension that solves both of these use cases. It gives us a way to let all pipelines in our service authenticate with the private docker registries we choose, even AWS ECR (teams only need to share their ECR registry with our AWS account).

The configuration file for the drone-registry-plugin looks like this:

# private docker registry
- address: docker.io
  username: octocat
  password: correct-horse-batter-staple

# AWS ECR
- address: 012345678910.dkr.ecr.eu-west-1.amazonaws.com
  aws_access_key_id: a50d28f4dd477bc184fbd10b376de753
  aws_secret_access_key: bc5785d3ece6a9cdefa42eb99b58986f9095ff1c

Pipelines could then run images from these private registries as steps, for example:

---
kind: pipeline
type: docker
name: default

steps:
- name: test
  image: 012345678910.dkr.ecr.eu-west-1.amazonaws.com/mytestimg:1.0
  commands:
  - npm install
  - npm test

Drone Cache Plugin

Often when running a continuous integration pipeline, you will have steps that download dependencies which may or may not be changing between builds. This is where proper dependency caching can save you a lot of time!

A team in our organization did not feel that the caching plugins available at the time offered enough features, so they took the initiative to write their own drone-cache pipeline plugin.

Our drone-cache plugin caches desired workspace files between builds to reduce build times. You can provide your own cache key templates, use different archive formats, and even use multiple storage backends depending on your needs.

For our internal Drone service, we manage an AWS S3 bucket where the instance role of our agents grants them read/write access to the bucket. This means that teams don’t have to manage their own S3 bucket just to use the drone-cache plugin in our service.

Drone pipeline plugins have an advantage over plugins in other CI tools, in that they are just another docker container included in a pipeline step. There is no need for teams to work with the Drone server administrators to install certain plugins, there is no chance of a newer version of a plugin breaking some pipelines and not others. Teams manage their plugins and plugin versions 100% in their own .drone.yml pipeline steps.

We have a separate post dedicated to the drone-cache plugin at Making Drone Builds 10 Times Faster!

Drone CLI

In addition to its powerful web interface, Drone also provides a command line application (CLI). The Drone CLI lets our users examine their build history, manage secrets, view logs, trigger builds, and much more.

Here is an example of using the Drone CLI to show the most recent build number and author, then viewing more information about that particular build:

$ export DRONE_SERVER=https://cloud.drone.io
$ export DRONE_TOKEN=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...

$ drone build ls --limit 1 --format "Build #{{ .Number }} Author: {{ .Author }}" meltwater/drone-convert-pathschanged
Build #54 Author: jlehtimaki

$ drone build info meltwater/drone-convert-pathschanged 54
Number: 54
Status: success
Event: push
Commit: fb5ce12e6cacb0c7a7e636ff8dd2ea5b3403043e
Branch: master
Ref: refs/heads/master
Author: jlehtimaki <joonas.lehtimaki@gmail.com>
Message: Bitbucket-server support (#29)

* Bitbucket-server support

The Drone CLI also integrates with the Drone autoscaler, allowing Drone administrators to list current agents, create/destroy agents, pause/resume autoscaling, and more.

Here is an example of using the Drone CLI to list the current agents the autoscaler is managing:

$ export DRONE_SERVER=https://drone.example.com
$ export DRONE_TOKEN=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
$ export DRONE_AUTOSCALER=https://autoscaler.example.com

$ drone server ls
agent-hlCPFyxT
agent-8vrp8tTl

$ drone server ls --format="{{ .Name }} <{{ .Address }}>"
agent-hlCPFyxT <198.51.100.88>
agent-8vrp8tTl <198.51.100.245>

Try Drone

These days there are more continuous integration tools available to developers than ever before, and the ecosystem continues to evolve and expand. In this post we have shared some of the features that have made Drone successful in Meltwater. We encourage you to give Drone a try and see what it can offer your organization.

If you are interested in trying Drone yourself, cloud.drone.io is free for public GitHub repositories. When you are ready to run it yourself, Drone provides both OSS and enterprise offerings.