I recently found a very cool site to practice linux sysadmin type tasks, https://sadservers.com. The idea behind it is to allow learners to try solving realistic challenges that they might face in an actual system administration setting. The architecture basically involves giving each user their own virtual machine to run commands in, and in that way mimics perfectly the target task environment.
I wondered, though, would it be possible to target the same set of learning objectives with a different architecture. Enter my favorite tool (for the reasons explained earlier, as well as in this post) Gitpod! Since almost all of the scenarios involved situations that would be reproducible inside a container, I felt like Gitpod would be a very usable platform.
Apart from being very cool authentic tasks, they were also things that I personally wanted to develop my skills in, so I immediately set off to reproduce what I could!
A Few of the Activities
The very first thing I found was the activity to discover which process was writing to a log file. As an integral component to any production system, logging is super relevant, and trying to either makes logs more or less verbose is something I’ve dealt with a lot.
In this case, you could imagine that the logs need to be reduced, either because they’re filling up the disk, or leaking sensitive information, or just generally logging more than we need to be logging…so finding the “thing” that’s logging would be the starting point to reducing the logs anyway.
For this activity, it was actually my first authentic opportunity to check file usage (as opposed to port usage) for running processes, and I had a great time googling around to see which commands could do that. This one was easily reproducible in a container, so that’s what I did.
Two of the other activities involved an apache web server that wasn’t able to serve the
index.html, and a dockerized app that wasn’t able to start. These were also very fun to troubleshoot, and since Gitpod allows running docker inside the pod, I was able to re-implement those here and here.
But where it REALLY got cool was the activity involving fixing a scenario where you can’t
ping google.com. It led me down a very long and circuitous path about just what happens in Linux when you try to resolve the DNS for a particular domain, and I learned a lot! It took a bit of customizing the underlying container image running in the pod (similar to the previous activities), but I finally got this scenario implemented in Gitpod as well.
The Case for Containers (and disposable learning environments in general)
The most obvious reason why these types of learning activities might be better suited to an environment like Gitpod is the complexity of running them. As described in the linked architecture above, there is a LOT of infrastructure needing to be set up and maintained and debugged and etc just to give learners a shell into a Linux system.
Granted, k8s is also very complex, but that’s all abstracted away from the learning materials designer, who is only interested in creating scenarios to target specific learning objectives (not set up a proxy server and worker queue). The beauty of Gitpod is that it takes the magical abstraction of Docker, and removes the need to understand Docker. It’s essentially running containers with a web browser, and so it’s suitable for all types of interesting learning materials.
(Obviously, if somebody is working on debugging networking in Linux, they probably are capable of running Docker themselves, but I’m also thinking about ecology science students who want to follow along a workshop on analyzing tree measurements in R Studio where introducing Docker as a dependency has the potential to bog down the learners in incidental complexity unrelated to the core learning objectives).
It’s also WAY cheaper! Big shout out to the amazing team at Gitpod and the very generous free plan!
One of the obvious advantages to giving the learners their own VM (and keeping that configuration out of source control), is that there isn’t anywhere obvious to look for how the learning environment was configured. The virtual machines are based on private machine images, so it’s not possible to take a peek at how the activities were set up.
When setting these up in Gitpod, I ran into a few challenges achieving something similar.
Because the intent of Gitpod is to be initiated from a source control repository (they support GitHub, GitLab, and Bitbucket), the entire configuration of the scenario needs to be in code (technically I could have used private repositories, but then the activities wouldn’t have been publicly accessible).
That’s where a particular feature of Gitpod came in handy, the ability to specify a custom Dockerfile for the running pod. And if you can specify a custom base image, you can use a multi-stage build for that image, which allowed me to tweak the underlying file structure (I suppose if any learners are reading this, then they’re probably ahead of the curve anyway!) so that slight modifications to things in
/etc/nginx/nginx.conf could be copied over, and anyone inspecting the layers from the multistage build would just see that
/etc had been copied over, which is still a fairly large haystack (as opposed to copying ONLY the one configuration file, which would be 100% obvious in the layers from the resulting image).
To the extent possible, it’s good to avoid giving clues about the answer in the activity design/instructions, and I’m reasonably happy that I was able to do this.
Webapps are the Map, and Running Systems are the Territory
I previously wrote about a few web-based micromaterials for IP addresses and cron expressions, and while these are still very nice materials for basic learning outcomes, they are necessarily different from something that’s running in Gitpod.
The two linked materials above are fine for lower-level learning outcomes like knowledge (in the case of the IP material) and comprehension (and a bit of application, in the cron material).
Those “systems” are still based on logic that I created and embedded into the software that’s running them. So, for example, the cron strings rely on gnarly functions that I wrote myself, and therefore the assessment of whether the learner has correctly achieved the learning objective is also based on logic that I wrote myself.
Obviously, the particular learning objectives in the cron exercises are just focused on people getting familiar with turning words into a cron string and vice-versa. What would be even more useful (because a more authentic target use task) would be to have learners take a description in words and set up a cron on a running system, since this is actually the type of task you need to know cron syntax for anyway.
Likewise, just knowing what the
strace command does is nice and everything, but actually using it in a running Linux system is where the real learning happens, because that’s where the real assessment happens (ie, does the system do the thing you were trying to make it do?). Related to situated cognition, the actual expression of the knowledge lies in its use, and that’s my favorite thing about the sadservers project.
I wanted to close with a few notes about the scenarios that can’t currently be implemented in Gitpod. One of them involves determining if you’re inside a VM or a container, and given that Gitpod is by definition inside a container, this seems a bit unnecessary to implement.
An additional exercise involves fixing a k3s deployment, and I’m currently looking into whether this is possible in Gitpod. It looks like there’s an open issue to run k3s natively, whereas it does seem possible to run k3s via emulation (it’s just a very convoluted and long setup process).
And obviously, some things will only be possible inside VMs (eg, dealing with boot issues and file system mounts), and so Gitpod is definitely not a silver bullet…
But there are a LOT of things that can be worked on inside a containerized environment, and as I continue to work through the sadservers scenarios, I’ll see if I can implement them inside Gitpod (cause it’s fun!).