Writing About Building Docker Exec (Part 1)
A while ago I made a start on the Project Euler problems - a set of mathematics-based programming questions - both because I wanted to improve my mathematical thinking and because their clear definition makes them a great target for learning new programming languages. With my trademark focus, I got completely sidetracked and instead of doing a lot of the exercises in one language, I solved exercise one in lots of languages.
Solving the problem in all those languages meant having to install tens of compilers or interpreters on my local machine, and then when I wanted to work on the solutions on a different computer I had to install all these tools a second time. Some of the compilers took a long time to install (at the time Rust was built from source), or had tricky configurations on Linux (Objective C with gcc), not to mention upgrading being a pain. I figured there must be a better way.
A typical way to resolve complex tooling requirements is to make use of a virtual machine - preinstalling all the required tools on said VM and providing that to anyone who might want to use the tools. Installing all the compilers and interpreters could be installed on a VM making running the solutions trivial. But what about the size of the VM? And what if I wanted to be able to use a compiler with tricky configuration without requiring all the others? Separate VMs? Vagrant has helped in this area, but Virtual Machines are quite a heavyweight solution for this and come with a size and time to boot cost.
Instead, the way I chose to address these issues was to provide the tooling through Docker images.
The first challenge was to turn the manual steps I’d taken to install the compilers and interpreters, and turn those into Dockerfiles. Some Dockerfiles were simple - I used the official NodeJS and Java distributions for example - while others such as Objective C and Clojure required some research to get working. Eventually, I had Dockerfiles for all of the languages I was targeting. These Dockerfiles were stored in a single git repository in subfolders named for each image.
I created automated build jobs on Docker Hub for each image. Automated builds on Docker hub can be configured for a given path in a repository, but the build trigger functionality seems a little broken: instead of building the image only when something in the specified path changes, Docker Hub triggers a build whenever anything in the whole repository changes. This meant that if I changed something for just one image, all of the others would be rebuilt. I didn’t like this so I broke the images out into individual repositories.
Now I had 30 odd repositories in my GitHub account dedicated to these Docker files, so I decided to move them to a separate GitHub organisation: Docker Exec. As there were now many individual repositories to maintain I started putting batch operations into some shell scripts which I stored in the docker-exec-automation repository. In addition, I didn’t want the mass commits that this generated contributing to my public activity so I created a user account ‘docker-exec-bot’ which would perform these operations on my behalf.
The next step was to come up with a way to pass source files to the Docker container and have the container do something with them. Docker provides a simple volume mounting mechanism, with a volume being either a file or directory. As my intention was to compile and execute (or just interpret) I decided to mount just the source file as read only, preventing any build artefacts being copied back to the host system.
With the source file mounted in the Docker image, I needed a way to compile and execute or interpret the source file. The most lightweight means to achieve this was a shell script. Writing shell scripts to perform this task and bundling them with the Docker image was trivial, but if I wanted to change how the script operated for example, how it parsed arguments, across all of the images, that would mean updating it 25+ times. Centralising the scripts in a single repository that each image repository would include as a git submodule seemed a good solution to this, and taking it a step further I was able to generalise the 25+ scripts into one of four families which drew from common functionality:
At this point I had Docker images which were capable of executing a source file in a variety of languages by setting the endpoint as the parameterised scripts that were included as submodules in each Docker image git repository. Taking a C++ source file as an example, the syntax for this looked like:
This was enough for the Project Euler problems as I could easily invoke this from the shell script I was using to verify the solutions. And this is exactly what I did.
Carry on reading: Building Docker Exec Part 2