Maven Build Cache Extension

At Meltwater, our teams have the flexibility to select the programming languages that best suit their needs. With many options available, JVM languages have emerged as a popular choice.

For managing JVM projects, some teams opt for Gradle, my team prefers using Maven as a project management tool.

One of the features that Gradle has been offering for years is build caching, which we, as Maven users, were envious of until Maven released the maven cache build extension in version 3.9.0 as of June 2023.

The Maven Build Cache Extension is a tool that helps to improve build efficiency by reusing outputs that were generated in previous builds. This cache system stores build outputs either locally or remotely. When inputs have not been modified, new builds can retrieve these outputs from the cache. This approach saves time by avoiding the regeneration of outputs, resulting in more efficient and quicker build processes.

In this article, we will explore the benefits of using the extension and how it can be integrated into a CI/CD pipeline. I have also created a github repository for a sample remote server and application, which I will refer to below to explain the actions and concepts.

Caching Local Builds

Enabling Maven Cache for local builds is quite straightforward. For the basic setup, you only need to follow these steps:

Update the Maven wrapper version (assuming you’re using a Maven wrapper):

https://github.com/cenkakin/maven-cache-server/pull/1/files#diff-262c1fd0bdcab53ac574c65d8f3cb1618291cfbe28b93a0ff34f5dc9a240ebb4R1

Declare caching extension in your project (either in pom.xml or .mvn/extensions.xml):

https://github.com/cenkakin/maven-cache-server/pull/1/files#diff-f7de712954aa6928742af835b0a84af8546ba732ca535f13e3636e14ae94a047

Add maven-build-cache-config.xml in .mvn/ to customize the default behavior (optional)

That’s all there is to it for local builds. You can try it out with the sample app.

Remote Cache

Using Maven Cache in a remote setup, for example on a CI Server, requires a bit more work. The cache needs a server for storage to share the results between builds. While the documentation is an excellent source to learn more about the plugin’s inner workings and configurations, it doesn’t offer users a standard server. It simply suggests: “The simplest option is to set up an HTTP server that supports HTTP PUT/GET/HEAD operations (Nginx, Apache, or similar).”

WebDAV server

One option was to use a WebDAV server. We initially checked the server that their integration test uses to test our pipeline. It’s a basic WebDAV file server with an optional authentication feature. Here is an example of docker-compose file to run it:

version: "3.8"
services:
    web-dav:
        image: xama/nginx-webdav@sha256:84171a7e67d7e98eeaa67de58e3ce141ec1d0ee9c37004e7096698c8379fd9cf
        ports:
          - "80:80"
        environment:
          WEBDAV_USERNAME: admin
          WEBDAV_PASSWORD: admin
        volumes:
          - ./cache-storage/:/var/webdav/public

The server does the job. However, as time passes, the number of files stored on the server, such as jars, build report XMLs, etc., keeps growing. Therefore, you would need to remove them manually or through automation regularly.

Our Custom Solution - Maven Cache Server

We decided to take a different approach and started using S3 for storing files with a lifecycle policy. To make this possible, we implemented our own remote server. It’s a pretty simple Spring application with an optional HTTP basic authentication.

Here is a simple docker-compose file (See readme for more detailed instructions):

version: "3.8"
services:
    maven-cache-server:
        image: ghcr.io/cenkakin/maven-cache-server:latest
        ports:
            - "8080:8080"
        environment:
            authentication.enabled: true
            authentication.user.name: admin
            authentication.user.password: admin
            maven-cache-server.storeType: S3
            maven-cache-server.baseFolderToStore: YOUR\_BUCKET\_PATH
            aws.accessKeyId: YOUR\_ACCESS\_KEY
            aws.secretAccessKey: YOUR\_SECRET\_KEY
            aws.region: YOUR\_REGION

You should provide the necessary AWS properties to let it connect to your S3 bucket. Here is a page for S3 bucket Terraform configuration with a lifecycle policy.

Connect The Parts

When it comes to setting up a CI/CD pipeline, the type of server you use is irrelevant as the necessary changes are the same:

Add .mvn/maven-build-cache-config.xml file with the remote server base URL. From the sample project:

<cache xmlns="http://maven.apache.org/BUILD-CACHE-CONFIG/1.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://maven.apache.org/BUILD-CACHE-CONFIG/1.0.0 https://maven.apache.org/xsd/build-cache-config-1.0.0.xsd">
   <configuration>
       <validateXml>true</validateXml>
       <remote enabled="false" id="cache-remote-server">
           <url>http://localhost:8080</url>
       </remote>
   </configuration>
</cache>

Update your ~/.m2/settings.xml if a remote server requires authentication. From the sample project:

<settings>
    <servers>
        <server>
            <id>cache-remote-server</id>
            <username>admin</username>
            <password>admin</password>
        </server>
    </servers>
</settings>

Note that server.id in .mvn/maven-build-cache-config.xml and remote.id in ~/.m2/settings.xml should match.

In the provided cache configuration, the remote server feature is disabled by default. This is because we only want to include remote cache in CI/CD builds. You can enable it by simply adding these flags to your maven build command: -Dmaven.build.cache.remote.enabled=true -Dmaven.build.cache.remote.save.enabled=true

Benefits

We have continually been improving our CI/CD efficiency (parallel executions, tweaking maven parameters, etc), and adding the Maven cache has been the biggest boost to our productivity so far. Our build times have been significantly reduced, which is crucial for us as we deploy our applications multiple times per day.

We enjoy working on our monorepo, where team members can work simultaneously on different modules within the same repository. Monorepos, in particular, (or any multi-module project) allows Maven users to take maximum advantage of the Maven cache. It reuses caches from previous or parallel builds between modules, assuming no changes are made to those specific modules. This helps us to speed up our development process and work with greater efficiency.