Docker Storage: An Introduction

docker-storage-photo

There are lots of places inside Docker (both at the engine level and container level) that use or work with storage.

In this post, I’ll take a broad look at a few of them, including: image storage, the copy-on-write mechanism, union file systems, storage drivers, and volumes.

You’ll need Docker installed locally on your machine if you want to try out some of the commands in this post. Check out the official docs for how to install Docker on Linux, or our previous post showing how to install Docker on a non-Linux machine.

Let’s dive in.

Image Storage

How are Docker images stored?

Let’s imagine we want to pull a Docker image from a registry, like so:

$ sudo docker pull nginx

When you run this command, Docker will attempt to pull the nginx image from the Docker Hub, which is a bit like GitHub but for Docker images. On the Docker Hub, you can see the descriptions of Docker images and take a look at their Dockerfiles, which contain the instructions that tell Docker how to build the image from the source.

Once the command completes, you should have the nginx image in your local machine, being managed by your local Docker engine.

We can verify this is the case by listing the local images:

$ sudo docker images

You should see something like this:

REPOSITORY     TAG       IMAGE ID       CREATED        VIRTUAL SIZE
alpine         latest    3e467a6273a3   3 weeks ago    4.797 MB
mysql          5.7       ea0aca21950d   5 months ago   360.3 MB
nginx          latest    5328fdfe9b8e   5 months ago   133.9 MB

Now, if we want to launch an nginx container, the process is very fast because we already have the nginx image stored locally.

We can launch it like so:

$ sudo docker run –name web1 -d -p 8080:80 nginx

This command maps port 80 of the container to port 8080 of the host machine. After it has run, you can connect to localhost:8080 to verify that nginx responds.

But what’s going on in the background, as far as this container’s file system is concerned? To understand that, we need to look at the copy-on-write mechanism.

The Copy-on-Write Mechanism

When we launch an image, the Docker engine does not make a full copy of the already stored image. Instead, it uses something called the copy-on-write mechanism. This is a standard UNIX pattern that provides a single shared copy of some data, until the data is modified.

To do this, changes between the image and the running container are tracked. Just before any write operation is performed in the running container, a copy of the file that would be modified is placed on the writeable layer of the container, and that is where the write operation takes place. Hence the name, “copy-on-write”.

If this wasn’t happening, each time you launched an image, a full copy of the filesystem would have to be made. This would add time to the startup process and would end up using a lot of disk space.

Because of the copy-on-write mechanism, running containers can take less than 0.1 seconds to start up, and can occupy less than 1MB on disk. Compare this to Virtual Machines (VMs), which can take minutes and can occupy gigabytes of disk space, and you can see why Docker has seen such fast adoption.

But how is the copy-on-write mechanism implemented? To understand that, we need to take a look at the Union File System.

The Union File System

The Union File System (UFS) specialises in not storing duplicate data.

If two images have identical data, that data does not have to be recorded twice on disk. Instead, you can store the data once and then use it in many locations. This is possible with something called a layer.

Each layer is a file system, and as the name suggests, they can be layered on top of each other. Crucially, single layers containing shared files can be used in many images. This allows images to be constructed and deconstructed as needed, via the composition of different file system layers.

The layers that come with an image you pull from the Docker Hub are read-only. But when you run a container, you add a new layer on top of that. And the new layer is writable.

When you write to that layer, the entire stack is searched for the file you are writing to. And if a file is found, it is first copied to the writable layer. The write operation is then performed on that layer, not the underlying layer.

This works because when reading from a UFS volume, a search is done for the file that is being read. The first file that is found, reading from top to bottom, is used. So files on the writeable layer of your container are always used.

If we were to run thousands containers based on the same base layers we reap huge benefits in both startup time and disk space.

One example setup that would benefit is a web app that horizontally scales many identical web servers. Another would be a hosting company that provides the same basic image to all customers, and then only writes the data that customers add or change.

Storage Drivers

Docker has the benefit of being a complete product (the “batteries included” model) but also providing pluggability in case you want to add things.

By default, Docker ships with a AUFS storage driver. However, other storage drivers are pluggable such as OverlayFS, Device Mapper, BTRFS, VFS, and ZFS. They all implement image composition and copy-on-write mechanism, among other features.

To see what storage driver your Docker engine is using, run:

$ sudo docker info

If you’re using the Docker default storage driver, you should see something like this:

Containers: 0
Images: 622
Server Version: 1.9.1
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 624
 Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.19.0-56-generic
Operating System: Ubuntu 15.04
CPUs: 2
Total Memory: 3.593 GiB
Name: mavungu-Aspire-5250
ID: 6MUZ:QTM5:GEHK:KQF5:4GUD:BQVX:NKCM:XH4M:6ESI:BGB7:6PYS:AEJY
Username: mazembo
Registry: https://index.docker.io/v1/
WARNING: No swap limit support

Notice the Storage Driver: aufs line in this output. That means we’re using the stock AUFS driver.

That’s all we’ll say about storage drivers for now, as there’s way too much to cover in this post. If you want to know more, the official docs are a good place to start.

Let’s look at the way Docker works with app generated data.

Volumes

A volume is a directory mounted inside a container that exists outside of the union file system. They are created via a Dockerfile, or the Docker CLI tool. The volume can map to an existing directory on the host machine, or remote NFS device.

The directory a volume maps to exists independently from any containers that mount it. This means you can create containers, write to volumes, and then destroy the containers again, without fear of losing any app data.

Volumes are great when you need to share data (or state) between containers, by mounting the same volume in multiple containers. Though take note: it’s important to implement locks or some other concurrent write access protection.

They’re also great when you want to share data between containers and the host machines, for example accessing source code.

Another common use is of volumes is when you’re dealing with large files, such as logs or databases. That’s because writing to a volume is faster than writing to the union file system, which uses the (IO expensive) copy-on-write mechanism.

To demonstrate the power of volumes and how to use them, let’s look at two scenarios.

Running a Container With a Volume Flag

Launch a container with -v, the volume flag:

$ sudo docker run -d -v /code -p 8080:80 --name mynginx nginx

This creates a procedurally named directory (which we will look at shortly) on the host machine and then maps it to the /code directory in the container.

You can see the volume has been created and mounted with this command:

$ sudo docker inspect mynginx

You should see a long JSON-like output like this:

"Mounts": [
        {
            "Name": "12f6b6d488484c65bedcda8300166d76e6879a496ce2d0742ab23981621c8b1a",
            "Source": "/var/lib/docker/volumes/12f6b6d488484c65bedcda8300166d76e6879a496ce2d0742ab23981621c8b1a/_data",
            "Destination": "/code",
            "Driver": "local",
            "Mode": "",
            "RW": true
        },

    ],


"Image": "nginx",
        "Volumes": {
            "/code": {},
            "/var/cache/nginx": {}
        },

This output confirms the creation of the volume at the docker engine level as well as the mapping to the container’s /code directory. Also take note of /var/lib/docker/volumes/12f6[...]/_data, being the the volume path. We will use this path to access our data on the host machine.

Okay, next, grab a shell inside the container:

$ sudo docker exec -it mynginx /bin/bash

Check the /code directory exists:

$ ls
bin  boot  code  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var

Change to the /code directory:

$ cd code

Write something to a test file:

$ echo Hello > myfile

And exit the container:

$ exit

Cool. So we just wrote some data to a file inside the volume mount inside our container. Let’s look inside that directory on the host machine we saw in the docker inspect output above to see if we can find the data we wrote.

Login as the superuser, so you can access the Docker lib files:

$ sudo –i

Now, change to the directory listed in the previous docker inspect output:

$ cd /var/lib/docker/volumes/12f6b6d488484c65bedcda8300166d76e6879a496ce2d0742ab23981621c8b1a/_data

Check the contents of the directory:

$ ls
myfile

Bingo! That’s the file we created inside the container.

You can even run cat myfile if you want to check the contents are the same. Or additionally, you could modify the contents here and then grab a shell inside the container and check that it has been updated there.

Create Engine Level Volumes and Storage for Transient Containers

Since Docker 1.9, it is possible to create volumes using the Docker API.

You can create a volume via the Docker API like this:

$ sudo docker volume create --name myvolume

We can check it worked like so:

$ sudo docker volume inspect myvolume
[
    {
        "Name": "myvolume",
        "Driver": "local",
        "Mountpoint": "/var/lib/docker/volumes/myvolume/_data"
    }
]

Now, let’s run a little test:

$ sudo docker run -d -v myvolume:/data busybox sh -c "echo Hello > /data/myfile.txt"

What’s happening here?

First, we launch a busybox container and mount the myvolume volume to the /data directory. Then we execute a command inside the container that writes “Hello” to the /data/myfile.txt file. After that command has run, the container is stopped.

You can modify the above command to run cat /data/myfile.txt if you want to read the data from inside the container at any point.

So, let’s see if we can find that file on our host machine.

Log in as the superuser:

$ sudo –i

Then change directory to the path listed as the Mountpoint in the output from the docker volume inspect myvolume command above.

$ cd /var/lib/docker/volumes/myvolume/_data

And again, check the contents:

$ ls
myfile

You can then read this file, write to it, and so on. And everything you do will be reflected inside the container. And vice versa.

Conclusion

In this post on Docker storage, we saw:

  • How docker images are stored locally by the Docker engine
  • How the copy-on-write mechanism and the union file system optimize storage and start up time for Docker containers
  • The variety of storage drivers compatible with Docker
  • How volumes provide shared persistent data for Docker containers

This article was previously published with the consent of the author on Deis blog and Codeship blog.

About the Author

Dr Mazembo Mavungu Eddy is a Tech and Social Science Writer, Developer, Senior Research Consultant and Founder of Dielais. He lives in France and provides training and consultancy on Docker and related container technologies. Get in touch by writing to mazemb_eddy at yahoo.fr or dielainfos at gmail.com. Follow us on Twitter: @mazembo

Connecting Docker Containers, Part Two

This post is part two of a miniseries looking at how to connect Docker containers.

In part one, we looked at the bridge network driver that allows us to connect containers that all live on the same Docker host. Specifically, we looked at three basic, older uses of this network driver: port exposure, port binding, and linking.

In this post, we’ll look at a more advanced, and up-to-date use of the bridge network driver.

We’ll also look at using the overlay network driver for connecting Docker containers across multiple hosts.

User-Defined Networks

Docker 1.9.0 was released in early November 2015 and shipped with some exciting new networking features. With these changes, now, for two containers to communicate, all that is required is to place them in the same network or sub-network.

Let’s demonstrate that.

First, let’s see what we already have:

$ sudo docker network ls
NETWORK ID          NAME                DRIVER
362c9d3713cc        bridge              bridge
fbd276b0df0a        singlehost          bridge
591d6ac8b537        none                null
ac7971601441        host                host

Now, let’s create a network :

$ sudo docker network create backend

If that worked, our network list will show our newly created network:

$ sudo docker network ls
NETWORK ID          NAME                DRIVER
362c9d3713cc        bridge              bridge
fbd276b0df0a        singlehost          bridge
591d6ac8b537        none                null
ac7971601441        host                host
d97889cef288        backend             bridge

Here we can see the backend network has been created using the default bridge driver. This is a bridge network, as covered in part one of this miniseries, and is available to all containers on the local host.

We’ll use the client_img and server_img images we created in part one of this miniseries. So, if you don’t already have them set up on your machine, go back and do that now. It won’t take a moment.

Got your images set up? Cool.

Let’s run a server container from the server_img image and put it on the backend network using the --net option.

Like so:

$ sudo docker run -itd --net=backend --name=server server_img /bin/bash

Like before, attach to the container:

$ sudo docker attach server

If you do not see the shell, click the up arrow.

Now start the Apache HTTP server:

$ /etc/init.d/apache2 start

At this point, any container on the backend network will be able to access our Apache HTTP server.

We can test this by starting a client container on a different terminal, and putting it on the backend network.

Like so:

$ sudo docker run -itd --net=backend --name=client client_img /bin/bash

Attach to the container:

$ sudo docker attach client

Again, if you do not see the shell, click the up arrow.

Now run:

$ curl server

You should see the default web page HTML. This tells us our network is functioning as expected.

Like mentioned in part one of this miniseries, Docker takes care of setting up the container names as resolvable hostnames, which is why we can curl server directly without knowing the IP address.

Multiple user-defined networks can be created, and containers can be placed in one or more networks, according to application topology. This flexibility, then, is especially useful for anyone wanting to deliver microservices, multitenancy, and micro-segmentation architectures.

Multi-Host Networking

What if you want to create networks that span multiple hosts? Well, since Docker 1.9.0, you can do just that!

So far, we’ve been using the bridge network driver, which has a local scope, meaning bridge networks are local to the Docker host. Docker now provides a new overlay network driver, which has global scope, meaning overlay networks can exist across multiple Docker hosts. And those Docker hosts can exist in different datacenters, or even different cloud providers!

To set up an overlay network, you’ll need:

  • A host with a 3.16 kernel version or higher
  • A key-value store (e.g. etcd, Consul, and Apache ZooKeeper)
  • A cluster of hosts with connectivity to the key-value store
  • A properly configured Docker Engine daemon on each host in the cluster

Let’s take a look at an example.

For the purposes of this post, I am going to use the multihost-local.sh script with Docker Machine to get three virtual hosts up and running.

This script spins up Virtual Machines (VMs), not containers. We then run Docker on these VMs to simulate a cluster of Docker hosts.

After running the script, here’s what I have:

$ docker-machine ls
NAME         ACTIVE   DRIVER       STATE     URL                         SWARM   ERRORS
mhl-consul   -        virtualbox   Running   tcp://192.168.99.100:2376
mhl-demo0    -        virtualbox   Running   tcp://192.168.99.101:2376
mhl-demo1    -        virtualbox   Running   tcp://192.168.99.102:2376

Okay, let’s rewind and look at what just happened.

This script makes use of Docker Machine, which you must have installed. For this post, we used Docker Machine 0.5.2. For instructions on how to download and install 0.5.2 for yourself, see the release notes.

The multihost-local.sh script uses Docker Machine to provision three VirtualBox VMs, installs Docker Engine on them, and configure them appropriately.

Docker Machine works with most major virtualization hypervisors and cloud service providers. It has support for AWS, Digital Ocean, Google Cloud Platform, IBM Softlayer, Microsoft Azure and Hyper-V, OpenStack, Rackspace, VirtualBox, VMware Fusion®, vCloud® Air™ and vSphere®.

We now have three VMs:

  • mhl-consul: runs Consul
  • mhl-demo0: Docker cluster node
  • mhl-demo1: Docker cluster node

The Docker cluster nodes are configured to coordinate through the VM running Consul, our key-value store. This is how the cluster comes to life.

Cool. Fastforward.

Now, let’s set up an overlay network.

First, we need to grab a console on the mhl-demo0 VM, like so:

$ eval $(docker-machine env mhl-demo0)

Once there, run:

$ docker network create -d overlay myapp

This command creates an overlay network called myapp across all the hosts in the cluster. This is possible because Docker is coordinating with the rest of the cluster through the key-value store.

To confirm this has worked, we can grab a console on each VM in the cluster and list out the Docker networks.

Copy the eval command above, replacing mhl-demo0 with the relevent host name.

Then run:

$ docker network ls
NETWORK ID          NAME                DRIVER
7b9e349b2f01        host                host
1f6a49cf5d40        bridge              bridge
38e2eba8fbc8        none                null
385a8bd92085        myapp               overlay

Here you see the myapp overlay network.

Success!

Remember though: all we’ve done so far is create a cluster of Docker VMs and configure an overlay network which they all share. We’ve not actually created any Docker containers yet. So let’s do that and test the network.

We’re going to:

  1. Run the default nginx image on the mhl-demo0 host (this provides us with a preconfigured Nginx HTTP server)
  2. Run the default busybox image on the mhl-demo1 host (this provides us with a basic OS and tools like GNU Wget)
  3. Add both containers into the myapp network
  4. Test they can communicate

First, grab a console on the mhl-demo0 host:

$ eval $(docker-machine env mhl-demo0)

Then, run the nginx image:

$ docker run --name ng1 --net=myapp -d nginx

To recap, we now have:

  • A Nginx HTTP server,
  • Running in a container called ng1,
  • In the myapp network,
  • On the mhl-demo0 host

To test this is working, let’s try to access it from another container on another host.

Grab a console on the mhl-demo1 host this time:

$ eval $(docker-machine env mhl-demo1)

Then run:

$ docker run -it --net=myapp busybox wget -qO- ng1

What this does:

  • Creates an unnamed container from the busybox image,
  • Adds it to the myapp network,
  • Runs the command wget -qO- ng1,
  • And stops the container (we left our other containers running before)

The ng1 in that Wget command is the the name of our Nginx container. Docker lets us use the container name as a resolvable hostname, even though the container is running on a different Docker host.

If everything is successful, you should see something like this:

<!DOCTYPE html>

<html>

<head>

<title>Welcome to nginx!</title>

Voila! We have a multi-host container network.

Conclusion

Docker offer the advantages of lightweight self-contained and isolated environments. However, it is crucial that containers are able to communicate with each other and with the host network if they are going to be useful for us.

In this miniseries, we have explored a few ways to connect containers locally and across multiple hosts. We’ve also looked at how to network containers with the host network.

About the Author

Dr Mazembo Mavungu Eddy is a Tech and Social Science Writer, Developer, Senior Research Consultant and Founder of Dielais. He lives in France and provides training and consultancy on Docker and related container technologies. Get in touch by writing to mazemb_eddy at yahoo.fr or dielainfos at gmail.com. Follow me on Twitter: @mazembo

Connecting Docker Containers, Part One

Docker containers are self-contained, isolated environments. However, they’re often only useful if they can talk to each other.

There are many ways to connect containers. And we won’t attempt to cover them all. But in this miniseries, we will look at some common ways.

This topic seems elementary, but grasping these techniques and the underlying design concepts is important for working with Docker.

Understanding this topic will:

  • Help developers and ops people explore the broad spectrum of container deployment choices
  • Let developers and ops people to embark more confidently with a microservice design architecture
  • Empower developers and ops people to better orchestrate more complex distributed applications

Fortunately, the large number of connection options for containers enables a broad range of approaches, giving us the flexibility to choose an architecture that suits the needs of any application.

In this post, we’ll look at three of the older, more basic ways of connecting Docker containers. Using this knowledge and experience as a foundation, we’ll then move on to two newer, easier, more powerful ways in the next post.

Setup

Before we can demonstrate how containers can be connected, we need to create a pair of them for use in our examples.

The first image will be derived from a simple Ubuntu installation. It will act as a client container.

First, we create the container and attach to it:

$ sudo docker run -itd --name=client_setup ubuntu /bin/bash
$ sudo docker attach client_setup

Then, once we have a shell inside the container, we run:

$ apt-get install curl

If you do not see the shell, click the up arrow.

Now, detach from the client container using CTRL+p then CTRL+q.

Then, stop it, and commit:

$ sudo docker stop client_setup
$ sudo docker commit client_setup client_img

We now have an image called client_img to use.

The second container we want to use is, again, derived from an Ubuntu installation. But this time, we’ll modify it to run a Apache HTTP server.

First, we create it and attach, like before:

$ sudo docker run -itd --name=server_setup ubuntu /bin/bash
$ sudo docker attach server_setup

Then, once we have a shell inside the container, we install the Apache HTTP server:

$ apt-get install apache2

When the installation is complete, detach from the container using CTRL+p and CTRL+q.

Now, stop the container, and commit it:

$ sudo docker stop server_setup
$ sudo docker commit server_setup server_img

We now have two images: client_img and server_img.

Now we have this set up, we can explore the various connection possibilities.

The Docker Bridge

By default, a Docker container is isolated from other containers and the external network. Docker provides a bridge, the docker0 interface, which is created with the installation of the Docker Engine.

It is through the Docker bridge that communication is possible amongst containers and between containers and the host machine.

You can see the Docker bridge by running this command on a Docker host:

$ ifconfig docker0

You should see something like this in the output:

docker0   Link encap:Ethernet  HWaddr 02:42:a2:dc:0f:a8  
          inet addr:172.17.0.1  Bcast:0.0.0.0  Mask:255.255.0.0
          inet6 addr: fe80::42:a2ff:fedc:fa8/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1477 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2436 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:83901 (83.9 KB)  TX bytes:3606039 (3.6 MB)  

The bridge interface works locally, on a single Docker host, and is the connection mechanism behind all three approaches we cover in this post. In the next post, we’ll move on to the overlay interface, which allows us to network containers across multiple Docker hosts.

Exposing Ports

First, let’s see how we can run a server container so that it exposes port 80 (HTTP) to other containers.

To do so, I run the container with the expose command, which tells Docker to expose port number specified when it runs the container. An exposed port is one that other containers can reach.

Let’s run server_img as a container called server1, exposing port 80:

$ sudo docker run -itd --expose=80 --name=server1 server_img /bin/bash

We’ll name our containers sequentially (server1, server2, and so on) as we go.

Then, attach to the container:

$ sudo docker attach server1

Again, If you do not see the shell, click the up arrow.

Start the Apache HTTP server:

$ /etc/init.d/apache2 start

And get the IP address:

$ ifconfig
eth0      Link encap:Ethernet  HWaddr 02:42:ac:11:00:03  
          inet addr:172.17.0.3  Bcast:0.0.0.0  Mask:255.255.0.0

Okay, so we have an IP address of 172.17.0.3. Let’s test we can see that from a client container.

Open a second terminal.

Start the client1 container:

$ sudo docker run -itd --name=client1 client_img /bin/bash

Attach to it:

$ sudo docker attach client1

If you do not see the shell, click the up arrow.

Make a test connection to the server1 container’s IP address:

$ curl 172.17.0.3 

If everything works, you should see the HTML of the Apache HTTP server’s default page. This indicates the the client1 container is able to make connections to the server1 container’s HTTP port.

Port Binding

What if we want to expose our HTTP server to the host network, including application on the host machine, and other machines on the host network? In this scenario, we need to bind a host port to a container port.

To expose the Apache HTTP server to the host network, we need to bind port 80 on the server container to port 8080 on the host machine.

We can do this like so:

$ sudo docker run -itd -p 8080:80 --name=server2 server_img /bin/bash

The -p 8080:80 option is the thing to pay attention to here.

Now, attach to the container:

$ sudo docker attach server2 

If you do not see the shell, click the up arrow. And start the Apache HTTP server:

$ /etc/init.d/apache2 start

Now, from the host system, visit http://localhost:8080/ and you should see the Apache HTTP server default page.

apache default page

Any machine on your host network that can access port 8080 of your host machine will also be able to access this.

Linking Containers

Another approach to connecting containers involves something Docker calls linking.

When you link one container to another, Docker will make some information about the linked container available via environment variables.

Let’s take a look.

First, start the server container:

$ sudo docker run -itd --name=server3 server_img /bin/bash

Now, start the client container and link it to the server container, like so:

$ sudo docker run -itd --link server3 --name=client3 client_img /bin/bash

Notice the --link server3 option being used here.

Now attach to the client container:

$ sudo docker attach client3

And check out the available environment variables:

$ env | grep SERVER3
SERVER3_PORT_80_TCP_PROTO=tcp
SERVER3_PORT=tcp://172.17.0.2:80
SERVER3_PORT_80_TCP_PORT=80
SERVER3_NAME=/client3/server3
SERVER3_PORT_80_TCP=tcp://172.17.0.2:80
SERVER3_PORT_80_TCP_ADDR=172.17.0.

Docker also updates the /etc/hosts file in the client container to add server3 as a local hostname pointing to the server container.

To demonstrate this, run:

$ curl server3

You should see the same default HTML again.

Wrap-Up

In part one of this miniseries, we introduced the Docker bridge interface, which lets us connect containers on the same host.

We looked at three connection methods:

  • Connection through port exposure
  • Binding of the host port to the container port
  • Linking two containers through the link option

In part two, we’ll look at isolating containers inside user-defined networks. We’all also introduce the overlay interface, and take a look at how to use it for networking Docker containers across multiple Docker hosts. Even across datacenters and cloud providers!

About the Author

Dr Mazembo Mavungu Eddy is a Tech and Social Science Writer, Developer, Senior Research Consultant and Founder of Dielais. He lives in France and provides training and consultancy on Docker and related container technologies. Get in touch by writing to mazemb_eddy at yahoo.fr or dielainfos at gmail.com. Follow me on Twitter: @mazembo

The State of Containers and the Future of the Docker Ecosystem

Containers (and in particular, Docker) are getting ever more popular.

A recent report by O’Reilly Media and Ruxit presents interesting findings on the adoption and use patterns of containers and Docker.

For instance: the deployment of containers in production is likely to increase significantly in the short term. The report also highlights that one of the major barriers preventing production adoption has to do with the need for better operations tools. This sort of information may be crucial in guiding decision making on investment and innovation priorities.

This post considers some key aspects of the report. I first present the approach used for the research, then highlight the main findings. I conclude with a quick comparison to similar research reports published during the course of this year.

Approach

O’Reilly Media and Ruxit invited individuals from the O’Reilly community to share how their organisations currently use (or plan to use) containers, which container technologies and infrastructures they opt for, and what motivations and challenges are associated with opting to use containers.

Some 138 self-selected participants were reached. They came from a range of industries and represented a broad spectrum of company size. About half of the respondents came from organisations with fewer than 500 employees. While 13% of responses came from individuals working at companies or organisations with over 10,000 employees.

These demographic data alone confirms that containers are being adopted by a broad range of company size.

Findings

Who’s Using Containers and What Sort of Containers Are Being Used?

The survey reveals containers are used by 65% of participants, and predominantly by those companies running more than 10 hosts in their infrastructure.

As to the type of container used, Docker is the most popular at 78%, followed by LXC at 24%, rkt at 16%, and 11% for other technologies, including Cloud Foundry’s Warden and Microsoft’s Hyper-V.

What Infrastructure Are Containers Being Deployed On?

The likelihood of running containers did not significantly vary depending on the number of hosts a company was operating.

With regards to the underlying OS being used to run containers, EC2 Amazon Linux and the Ubuntu/Debian distributions were found to be the most popular, followed by CentOS and RedHat Enterprise Linux.

As for the OS the containers are running, Ubuntu/Debian was the most popular choice, with 67% of respondents using or planning to use.

*Reasons for Using Containers *

The survey asked participants to select the factors that motivated them to start using containers.

These were the responses:

  • Faster or easier deployment (85%)
  • Flexibility in deployment (62%)
  • Better isolation (54%)
  • Architectural reasons, i.e. microservices (48%)
  • Cost savings (30%)

Given that Docker is by far the most predominant container technology, participants’ emphasis on deployment and workflow gains reflects how Docker has successfully simplified the process of building, packaging, and shipping applications across multiple environments and infrastructures.

Further analysis of participants’ views on how containers impact infrastructure reveals containerisation has simplified and sped up deployment, making it easier and faster to test, iterate, and rollback if required.

Containerisation has also improved operational management by making it easier to automate deployment and integrate with DevOps tools. This improvement in the workflow has resulted in more frequent and regular deployments.

The increased capacity to run multiple containers per host has resulted in fewer hosts and thus has reduced the infrastructure cost. However, container adoption has meant that companies have to run their on-premises private registries and significantly change their monitoring and alerting infrastructure.

In addition, for companies running a small number of services, containerisation may result in increased complexity, and does not necessarily lead to reduced number of hosts or the associate cost savings.

Production Adoption Lags Behind

The survey finds adoption in production (currently 40%) is slower than for development (86%) and testing (64%) environments.

However, production deployment of containers is slated to increase significantly in the short-term, with half of all respondents (53%) reporting they intended to use containers in the production environment within the next 6-12 months.

These figures demonstrate a broad emerging consensus that containers are production ready.

Why Aren’t Containers Being Widely Adopted in Production?

In response to this question, the following challenges were identified:

  • Technology maturity (56%)
  • Orchestration (50%)
  • Monitoring (46%)
  • Automation (40%)

A number of participants also indicated it was difficult to convince clients, management, or the development team of the benefits of adopting containers.

However, as the container ecosystem matures and better tools are developed, many of this issues will be addressed, likely leading to increased adoption.

Conclusion

The O’Reilly Media and Ruxit Survey is by no means the only attempt at measuring container adoption and use trends. Over the past months, several other organisations have produced similar data, including: Datadog, ClusterHQ and DevOps.com, vmblog.com and StackEngine, and New Relic on Docker. While these research initiatives differ in their approach and focus areas, they do not contradict the general picture emerging from the O’Reilly Media and Ruxit survey report.

Across the board, this body of research seems to agree. Containers are getting more popular. Docker dominaes the the container ecosystem. There is a lag in production deployments, but this should increase in the near future. And there is a need for better tooling in order to facilitate orchestration, automation, and monitoring.

Deis, which provides an open source Heroku-inspired workflow on top of Docker, is one such response to this need.

About the Author

Dr Mazembo Mavungu Eddy is a Tech and Social Science Writer, Developer, Senior Research Consultant and Founder of Dielais. He lives in France and provides training and consultancy on Docker and related container technologies. Get in touch by writing to mazemb_eddy at yahoo.fr or dielainfos at gmail.com. Follow me on Twitter: @mazembo

DockerCon 2015 and the Future of the Container Technology

In my previous post, I told the story of how I came to find out about Docker, and how Docker has improved my development practice as a freelancer. In this post, I want to talk about why I’m excited by the interesting announcements, hot tools, and cool demos made public during the last Docker conference, DockerCon, held in San Francisco this June.

The theme of this year’s DockerCon was “Docker in production”. There were around two thousand attendees. That’s four times as many as the previous year. And the previous year was the first ever DockerCon. This increase in attendance is a reflection of the phenomenal success and rising popularity of Docker.

Here are my take-aways from the conference.

Runtime Solutions

Keynote speeches highlighted major achievements of the past year.

Developers and ops folk are familiar with the phrase: “It works on my machine, but not on the production server!” This issue (we call it the runtime problem) is solved with Docker’s container infrastructure. And the challenge of packaging and distributing images is solved with the establishment of a Docker registry for storing and distributing Docker images, i.e. Docker Hub.

In addition to choosing between private and public repositories for Docker images, the Docker company now offers a commercial product consisting of an on-premise Docker registry suitable for companies with a need for ownership and security.

Building and Deploying Distributed Applications

To address the challenges of building and deploying distributed applications in a scalable, reliable, and secure way, a number of things have emerged:

  • Docker Compose: makes it easy to describe various components and services of a distributed application so the application can be deployed easily and in a repeatable way taking advantage of Docker’s container infrastructure.
  • Docker Machine: makes it possible to declare a cluster of machines and make them easily configurable so the task of deploying and scaling distributed applications with all their services is smoothly and transparently managed.
  • Docker networking: is the underlying networking infrastructure which Docker has re-worked in its effort to re-engineer the way distributed applications are approached by developers and ops folks.
  • Docker Swarm: provides a way to orchestrate smooth integration between the different tools in the Docker ecosystem. It provides a transparent way to move an application from development to the production, taking full advantage of Docker.
  • Docker Notary: strengthens the security of images and the Docker platform. This tool is still being evaluated, but promises to provide tightened security and trust in the hosting and distribution of Docker images and applications code.

While the number of tools being developed around Docker keeps increasing, the Docker company and early adopters call for “incremental” change, meaning: companies and developers are advised to pick the tool that will improve their processes one at a time without feeling the pressure to swap out all of their existing tool sets. Examples of Docker integration with existing tool sets are also being investigated and promoted.

Cool Demos

There were a number of cool demos showing how some of the new tools work, including:

  • Use of Docker machine on a cloud provider where an application is deployed and later scaled to serve more requests.
  • Transparent migration of a containerised application from one region to another without shutting down the application.
  • Demonstration of Notary’s ability to protect against untrusted sources injecting malicious code into application images.

Feedback From Cloud Providers and End-Users

A number of cloud providers and Docker users provided feedback on using Docker in production or integrating Docker in their existing development and deployment processes.

Microsoft, IBM, and Amazon showed how they used—and were planning to use—the Docker platform to offer to their customers Docker-compatible containers and related tools from development to the deployment stage. Business Insider and the General Services Administration (GSA) of the US Federal Government shared positive experiences integrating Docker into their production software stacks.

Looking to the Future

Lastly, important announcements were made concerning the next tools the Docker team will be working on and the future direction of the Docker project.

Solomon Hykes, Docker CTO, announced the extraction of a number of plumbing tools from the Docker code by popular demand. Of the tens of thousands of lines of code that constitute the Docker platform, roughly 50% is plumbing! Docker has plumbing for interacting with both Linux and Windows native capabilities; it has plumbing for networking; service discovery; master election; security; and more. These plumbing tools have been isolated and can now be used outside the Docker platform. Security work, and more precisely the integration of the work that has been done on Notary, will also be prioritized.

Docker announced the building of a tool that integrates all of the tools together in a unified and transparent manner. The product is called Project Orca and a demonstration of working code was provided, but the project is still under active development.

There were also developments in how the Docker open source project will move forward. Given the fact Docker has emerged as a de facto standard in container software, major players in the industry have agreed to promote it to a de jure standard, i.e. formalised with a documented open standard.

For this to happen, a number of decisions have been made. The most important of which was the creation of the Open Container Initiative (OCI) which is now responsible for promoting and protecting the standard.

The Docker container code, renamed runC, has been donated as the reference code for the OCI. In partnership with the Linux foundation, the founding members of the OCI are going to put in place a light governance model which will steer the work in this area without becoming a barrier to innovation and creativity.

Conclusion

As a freelance developer who has often had to worry about the full application stack, I warmly welcome the addition of Docker to my workflow.

Docker has provided a unified, repeatable, reliable, and broadly standardised way of dealing with the infrastructural aspect of building, running, and deploying applications.

My excitement is further reinvigorated by the large array of tools and products being developed around Docker, many of which were unveiled at DockerCon 2015. These include Docker Machine, Docker networking, Docker swarm, Notary, and Project Orca.

The fact that there is a broad consensus in the industry to promote Docker as the standard container model is an additional reason for optimism.

About the Author

Dr Mazembo Mavungu Eddy is a Tech and Social Science Writer, Developer, Senior Research Consultant and Founder of Dielais. He lives in France and provides training and consultancy on Docker and related container technologies. Get in touch by writing to mazemb_eddy at yahoo.fr or dielainfos at gmail.com. Follow me on Twitter: @mazembo

Why I’m Excited About Docker

docker-excitement

Docker is already being used in production by big companies, has received a tremendous amount of publicity, and continues to spur excitement in the tech industry. For a project that is only two years old, this success is unprecedented.

In the first post of this miniseries, I’d like to share the reasons behind my excitement about Docker and explain why I think Docker has made my work as a developer easier. In the second post in this miniseries, I will summarise what I learnt from the interesting announcements and demos at DockerCon, which took place in San Francisco in June.

My Encounter With Docker

Docker is a relatively new technology. In two years of existence, it has attracted a great deal of publicity and excitement. And many believe, as I do, that it’s going to shape the industry for the years to come.

I came across this technology six months ago when, at a Python meetup, I saw another developer wearing a Docker t-shirt. After the meetup, I went ahead and looked up “Docker” and discovered the vibe and dynamism behind this community.

As a freelance developer and a former network and system administrator, I’m really excited by the possibilities made possible by Docker. I remember how challenging and time consuming it used to be, setting up your own development environment, testing various alternative environment configuration scenarios, reproducing production environments on a local Linux box, setting up your production environment on a VPS and deploying applications in different ways depending on your VPS provider, and so on.

These operations were repetitive, error prone, and time consuming. One could spend an entire week—or even more—just working on these operational tasks that were far removed from the real business of your application.

Docker changed all this in a dramatic way.

And my evaluation is coming from the perspective of a freelance developer who only deals with small projects most of the time. For large scale or enterprise operations, Docker represents even a greater step forward in terms of higher velocity and more unified, integrated, repeatable, and reliable process for building, running, and delivering software.

Docker Features I Am Excited About

There are many features of Docker that makes developers, and ops people alike, happier in their jobs. But these are some of the features that I appreciate.

Runtime

Docker solves the runtime problem by offering repeatable and declarative environments which are defined in a file called Dockerfile, or docker-compose.yml in case of a multi-containers application.

Using these definitions, Docker is able to bring to life your environment anywhere you want to run it. You build your environment once, but can run it everywhere. For an overview of docker compose, check out the tutorial.

Experimentation

Docker makes it easier to play with various technologies. With the Docker hub, ready-made containers are good to go. You can easily play with tools such as Hadoop, Redis, ElasticSearch, RabbitMQ, and so on. You can learn how they behave in a specific constellation and just destroy the container after you’re done with experimenting.

This kind of exercise used to require a lot of time and infrastructure. With a lightweight container like Docker, it’s now a trivial operation. The increased capacity to learn and experiment with new technologies in containers can be transformational from a creative development perspective.

Service Independence

With major cloud providers partnering with Docker in the provision of container services in the cloud, developers have increased opportunities and a broader spectrum of choices of cloud services.

Moving from one cloud provider to another becomes less painful as more components of the container ecosystem become standardised. Not to mention the standard cost benefits of swapping amortized physical hardware costs for the cloud pay for what you use model, meaning your app scales down in cost as it scales down in traffic.

Improved OSS Experience

I take pleasure in getting open source applications up and running, seeing how they run, learning from them, and seeing what can be reused.

A stumbling block has always been how difficult it is to assemble the services or micro-components that the application relies on. With Docker, open source project maintainers are sharing not only the application code, but also a pre-made running environment as defined in the project’s Dockerfile.

Traditionally, open source applications have a bad reputation for taking hours, or even days, to get properly set up. But with Docker, it is just a matter of building and running the Docker image. This represents a great step forward for ease of use and contributor onboarding.

If you want a feel of how easy this workflow is, check out the youtube-audio-dl project on GitHub. The README has instructions for getting started with Docker and Docker Compose.

This reproducibility also has a knock-on effect for collaboration and bug squashing. Because everyone receives not only the application code, but also the environment it runs on. So the usual excuse “it works on my machine” is something of the past.

Development Model

Another benefit of Docker is that it encourages you to shift your thinking towards new and improved development models for the cloud. For example, 12 Factor Applications. Also, modularized architectures, i.e. microservices.

Following the 12 Factor Application model, your app is going to be built for the cloud (and the associated cost, scaling, availability, and fault-tolerance models) right from the start.

Developing with microservices can open up many opportunities. For example, it allows you to develop each part of your application with the stack that is suitable for the job. Not only you will make your job easier, but you also separate out different parts of your app that can be worked on separately by separate people who have different skill sets.

How Does Docker Affect My Development Workflow?

With a Docker powered development environment, it is no longer necessary to set up a different virtual environment with virtualenv (for a Python environment) tool in order to isolate applications packages with system modules.

Whether it is a Flask or a Django application that I am working on, the description inserted in the Dockerfile (for the image) and the requirements.txt (for the necessary modules) will ensure that I am running the right stack in an isolated manner.

The easiness with which one is able to fire up a new development environment, trying it out, change it, destroy it and redesign a new one is just amazing. As a developer, I spend more of my time attending to the real business of the application, instead of troubleshooting installation issues.

With the introduction of Docker in my development environment, the workflow and commands remain generally unchanged. There’s no need to learn a totally new vocabulary to do the development work.

For instance, to fire up a new Rails application, I run:

$ docker-compose run web rails new . --force --database=postgresql --skip-bundle

To create a database:

$ docker-compose run web rake db:create

The same experience applies to the Django framework.

To start a new project, I run:

$ docker-compose run web django-admin.py startproject blog .

And to set up my database and run migrations, I run:

$ docker-compose run web python manage.py syncdb

To a Rails or Django developer, these commands should look very familiar!

This contributes to making the development process under a containerised development environment smooth and easy to adapt to.

There are of course still issues that surface with ruby on rails. For instance, whenever your Gemfile changes maybe because you have added a new gem to be installed, bundle pulls not only the new gem, but all the gems in your gemfile. This isn’t the most efficient way to set up the development environment. However, various approaches have been suggested in order to cache bundle install with Docker.

For more information on how to setup Docker Compose in a multi-container development environment, check out the Django docs or the Rails docs.

Finally, running my database driven web application in multiple Docker containers using Docker Compose presents me with the flexibility and scalability potentials I did not have before. My web and database layers can evolve and scale on different paths.

Conclusion

Docker is very exciting.

Innovations brought about by docker are likely to have an enduring impact on how we develop and deploy applications.

As an individual freelance developer, several things make my life easier:

  • Better, easier runtimes
  • Increased ability to experiment with new technologies
  • Increased independence of service providers
  • Enhances OSS use and collaboration
  • Shifts towards improved development models

Docker also works well with frameworks like Django and Rails and the Docker Compose commands integrate well with the existing commands from these frameworks.

In part two of this mini series, I will share with you some highlights of what I saw at DockerCon and explain why I am excited by these developments.

This article was previously published with the consent of the author on Deis blog .

About the Author

Dr Mazembo Mavungu Eddy is a Tech and Social Science Writer, Developer, Senior Research Consultant and Founder of Dielais. He lives in France and provides training and consultancy on Docker and related container technologies. Get in touch by writing to mazemb_eddy at yahoo.fr or dielainfos at gmail.com. Follow me on Twitter: @mazembo