Creating a Docker Alternative in Python


Docker is a popular containerization platform that allows developers to easily package applications into lightweight containers that can run isolated on any system. Containers provide a convenient way to deploy and scale applications by bundling together all the dependencies and configurations needed to run the app.

In this guide, we will walk through how to create a simple docker alternative in Python. The goal is to build a basic container runtime that can build images, run containers, and manage containers lifecycles. While this will only cover a subset of Docker’s functionality, it will demonstrate the core concepts needed to build a container engine.

Overview

At a high level, here are the key components we need to implement:

  • Image Builder: Allow building images from a Dockerfile
  • Container Runtime: Run containers using Linux namespaces and cgroups
  • Networking: Enable networking between containers
  • Storage: Allow mounting host directories into containers
  • Container Lifecycle: Start, stop and delete containers
  • CLI: Command line interface to build, run and manage containers

For simplicity, we won’t be implementing orchestration features like Swarm or Kubernetes. Our focus is just on building and running containers locally.

Implementing the Image Builder

First, we need a way to build container images. Docker images are made up of read-only layers that represent filesystem changes made during the image build process. Images are built from a Dockerfile which defines a series of instructions to assemble the image.

We can implement a simple image builder in Python like this:

import tarfile

class ImageBuilder:

  def __init__(self, tag):
    self.tag = tag
    self.layers = []

  def run(self, cmd):
    result = subprocess.run(cmd, capture_output=True)
    layer = result.stdout
    self.layers.append(layer)

  def save(self):
    with tarfile.open(self.tag + '.tar', 'w') as tar:
      for i, layer in enumerate(self.layers):
        tarinfo = tarfile.TarInfo(str(i))
        tarinfo.size = len(layer)
        tar.addfile(tarinfo, io.BytesIO(layer))

The ImageBuilder class stores each command output as a separate layer. The save method writes the layers into a tarball image that can be loaded later.

To build an image, we can create a Dockerfile like:

FROM ubuntu:18.04

RUN apt-get update && apt-get install -y python3
RUN pip install flask

CMD ["python", "app.py"] 

And build it in Python:

builder = ImageBuilder('myimage')

builder.run(['docker', 'pull', 'ubuntu:18.04'])
builder.run(['apt-get', 'update']) 
builder.run(['apt-get', 'install', '-y', 'python3'])
builder.run(['pip', 'install', 'flask'])

builder.save()

This will execute each RUN command and capture the output into a layer. The resulting myimage.tar contains all the filesystem changes needed to run this image.

Implementing the Container Runtime

To run containers, we need to implement a container runtime that can launch processes in isolated environments. Linux provides namespaces and control groups (cgroups) that allow partitioning resources between processes.

We can spawn containers using the python-runc library which leverages these Linux features under the hood:

import runc

class Container:

  def __init__(self, image, cmd, name):
    self.image = image
    self.cmd = cmd
    self.name = name
    self.runc = runc.Runc()

  def start(self):
    rootfs = unpack_image(self.image) 
    config = {
      'root': {
        'path': rootfs,
        'readonly': True  
      }
    }
    container = self.runc.create(self.name, config)
    container.run(self.cmd)

  def stop(self):
    self.runc.kill(self.name)

  def delete(self):
    self.runc.delete(self.name)

The start method extracts the root filesystem from the image tarball and uses runc to spawn the container process inside a new namespace. stop and delete manage the container lifecycle.

With this, we can start a container from the image we built earlier:

container = Container('myimage.tar', ['python', 'app.py'], 'mycontainer')
container.start()

This will launch app.py isolated inside the container namespaces with the filesystem setup according to the image.

Adding Container Networking

For networking, we want containers to have their own virtual interfaces so they can communicate with each other.

We can use the python-netifaces and python-iptables libraries to configure virtual interfaces and iptables rules when starting containers:

import netifaces
import iptables

class Container:

  def start(self):
    # Create network namespace
    netifaces.create_network_namespace(self.name)

    # Add virtual interface
    netifaces.add_interface(self.name, 'eth0', '02:42:ac:11:00:02')
    
    # Set up iptables rules 
    iptables.add_rule('FORWARD', f'-i {self.name} -o eth0 -j ACCEPT')
    iptables.add_rule('FORWARD', f'-i eth0 -o {self.name} -j ACCEPT')

    # Start container process
    self.runc.run(self.cmd)

This will give each container a unique virtual eth0 interface on a private subnet. The iptables rules allow traffic to flow between containers and the host interface.

We can test connectivity by starting two containers and pinging between them:

container1 = Container(...) 
container2 = Container(...)

container1.start()
container2.start()

container1.exec(['ping', '-c', '3', container2.ip])

Persistent Storage with Volumes

For persistent storage, we want to allow containers to mount host directories as data volumes.

The python-runc library makes this easy - we can define volumes in the runtime config:

config = {
  'root': {
    'path': rootfs,
  },
  'mounts': [
    {
      'type': 'bind',
      'source': '/host/directory',
      'destination': '/container/directory',
      'options': ['bind']
    }
  ]
}

runc.create(name, config)

This will bind mount /host/directory into the container at /container/directory, allowing the container to persist data.

We can improve the developer experience by exposing this through a simple volume parameter:

container = Container(..., volumes={'/data': '/usr/app/data'})

The container runtime would handle mapping this to the appropriate bind mount.

Implementing a CLI

So far we have a Python API to build images and run containers. To make this tool more usable, we should add a command line interface.

We can use the argparse module to parse commands and arguments:

import argparse

parser = argparse.ArgumentParser()

parser.add_argument('command', choices=['build', 'run', 'stop', 'rm'])

args = parser.parse_args()

if args.command == 'build':
  # Implement build command
elif args.command == 'run':
  # Implement run command  
# etc...

This allows us to expose familiar docker-style commands:

# Build image
$ container build -t myimage .

# Run container
$ container run -d --name mycontainer myimage

# Stop running container
$ container stop mycontainer 

# Remove container
$ container rm mycontainer

We can continue expanding the CLI to implement more Docker functionalities like image tagging, container listing, logs, exec, etc.

Conclusion

In this guide, we built a simple docker-like container engine in Python using Linux namespaces, cgroups and iptables. The key components include:

  • An image builder to generate root filesystem tarballs from Dockerfiles
  • A container runtime using python-runc to launch isolated processes
  • Networking using virtual interfaces and iptables rules
  • Volumes to allow binding host directories into containers
  • A CLI for users to build, run and manage containers

This covers the foundational aspects of building a container engine. Additional work could include:

  • Expanding the CLI to cover more docker commands
  • Adding image distribution using a registry
  • Implementing swarm to orchestrate containers across multiple hosts
  • Adding security features like user namespaces, AppArmor, seccomp

While still very basic, this demonstrates how Docker’s container runtime could be implemented in Python. The modular design allows each component to be improved and expanded independently. With the power of Python