Setting up Matomo Auto-Archiving in Docker

October 15, 2020 - 5 min read

TL;DR

Scroll to the end of the page and copy the full Dockerfile from there.

The base idea

Docker containers provide all the features a regular OS install would offer. Therefore, the idea is that we take advantage of the cron feature built into Linux and add our own cron tasks inside the container. To add cron tasks, we need to create our own container image using a Dockerfile. The required steps will depend on what base image you are using, meaning whether you use Alpine (clearly labeled) or Debian (all non-Alpine images and default).

Right now your Dockerfile should look something like this, but feel free to use any variant found on the Matomo Docker Hub page—I will tell you what to watch out for.

FROM matomo:fpm-alpine

Installing Cron (Debian-based images only)

If you are using Debian, you will need to install the cron package manually first. Because they are cleared by default, you will need to update your package sources first, and then you’ll be able to download cron.

Therefore, you will need to add the following lines:

# Install Cron
RUN apt-get update
RUN apt-get install cron -y

Adding our Crontab entries

As I didn’t want to rely on external files, I chose to add the tasks using commands. The instruction used might look strange at first, so let’s break it up:

RUN echo "*/5 * * * * /usr/local/bin/php /var/www/html/console core:archive --url=https://analytics.example.com/" >> /etc/crontabs/root

Using the >> syntax, we can pipe the output of a command into a file and append it. As we always want the same output, we simply use the echo command to append a fixed string. The value behind the chevrons is the path of the file to which we want to append the output.

If you are using Debian, use /etc/crontab instead of /etc/crontabs/root for the file path instead. Also, since we no longer specify what user should be used through the file path, we now have to put that information into the cron file itself. Make sure to apply these changes to the other entries as well! The result will look like this:

RUN echo "*/5 * * * * root /usr/local/bin/php /var/www/html/console core:archive --url=https://analytics.example.com/" >> /etc/crontab

The string we append consists of two parts: The cron schedule expression (e.g., */5 * * * *—this one executes every 5 minutes) and the command that should be executed (e.g., php console scheduled-tasks:run) based on that schedule. As cron expressions can be very confusing at first, I recommend you play around with a generator like crontab guru.

Running the archiving script on a schedule

The Matomo archiving process can be initiated by instructing php to execute the script located at /var/www/html/console (no file extension required). In practice, this looks like this:

$ /usr/local/bin/php /var/www/html/console core:archive --url=https://analytics.example.com/

Of course, you’ll have to adjust the url argument to match the one of your Matomo installation. Also, note that I entered the entire path to the php executable to be extra sure that it will execute properly.

If we take what we just learned and put it together, we get the following instruction, which sets up a crontab when building the container image that instructs PHP to execute the Matomo console script every 5 minutes and start the archiving process.

# Run archive script every 5 minutes
RUN echo "*/5 * * * * /usr/local/bin/php /var/www/html/console core:archive --url=https://analytics.example.com/" >> /etc/crontabs/root

Running all scheduled tasks on a schedule

Apart from the archiving script, there is also a script to run all scheduled tasks. These include, for example, sending emails.

# Run scheduled tasks every 20 minutes
RUN echo "*/20 * * * * /usr/local/bin/php /var/www/html/console scheduled-tasks:run" >> /etc/crontabs/root

Custom start command

Finally, we need to add a custom start command as when using the default one, the cron daemon would not be running. This first starts the cron daemon and then continues to start the main PHP process, which will keep running in the foreground.

# Start Cron and PHP
CMD crond && php-fpm

Important: The array syntax doesn’t work here, so stick to the one shown above!

If you use a Debian base image, you’ll need to replace crond with /etc/init.d/cron start and if you use Apache (aka the regular version of the image) a Debian base image, make sure to replace php-fpm with apache2-foreground. Keep in mind that both of these changes may apply to you!

Final Dockerfile

Combining all of the previous commands and some extra spaces so it’s all aligned, we end up with the following Dockerfile. Of course, it will look slightly different if you use a different base image.

FROM matomo:fpm-alpine

# Run archive script every 5 minutes
RUN echo "*/5     *       *       *       *       /usr/local/bin/php /var/www/html/console core:archive --url=https://analytics.example.com/" >> /etc/crontabs/root

# Run scheduled tasks every 20 minutes
RUN echo "*/20    *       *       *       *       /usr/local/bin/php /var/www/html/console scheduled-tasks:run" >> /etc/crontabs/root

# Start Cron and PHP
CMD crond && php-fpm

Now build this image and replace the official image with it. You should have no problems doing so!

All that’s left to do is go to your Matomo dashboard > General settings > Archiving settings and disable browser-based archiving. In addition, you might want to consider adjusting how often reports are going to get archived. If it’s higher than the interval in which the crontab is triggered, this value is the limiting factor.

Screenshot of Archiving settings