Since the announcement of Docker 1.12 Swarm mode at DockerCon 16, many people including me got the same questions:
- How large does the Swarm mode scale?
- Where is the Swarm limitation?
To answer these questions, I wanted to form the largest possible Swarm cluster. I heard from many sources that Swarm could support as large as 1000 nodes without any problems. If I could identify Swarm limitations in real condition, I could notify the Docker team. This would help to enhance Swarm and get a better orchestration system!
I started the project called Swarm2k to form a 2,000 nodes cluster by crowdsourcing them.
In less than 5 days, we reached a total of 2,384 servers for the experiment. Scaleway committed 1,200 nodes, the major part of the cluster and we've got many other contributions from the Docker community from North America, Europe and Asia.
We kicked off the project for the pre-test round 5 hours before the planned experiment time with the Scaleway team. A set of 3 managers were located in New York, while massive numbers of Scaleway nodes were from Europe.
Everything went well and other contributors started to join.
The 1st experiment was done. We formed a 100 nodes cluster without any problems.
Reaching 2K nodes
During the joining process, I decided to start the 2nd experiment: live upgrading of each manager.
I did it the hard way by bringing down each manager one by one without demotion.
There were 3 managers
mg0 was the leader at that time, so the upgrading was done starting from
mg2. It brought down then I upgraded it from a 16GB machine to a 64GB machine. After that
mg1 was upgraded in the same way.
One of the most exciting moment was during the upgrading of
mg0, the current leader.
mg0 down during the joining process. Some contributors started to have issues to join the cluster.
mg0 went down. The leader election was started.
mg1 became the new leader.
mg0 went up. CPU usage of all managers went peak. The Raft logs seemed to be invalidated and readjusted. After the Raft recovery process was done the overall system came back and everything worked fine again.
All high CPU periods lasted around 25 minutes during the upgrade.
You can see the disconnection on the graph above:
- 19.50 - 20.00
mg2down for upgrading
- 20.10 - 20.20
mg1down for upgrading
- 20.35 - 20.45
mg0down for upgrading.
After this test, the contributors progressively joined to reach 2,000 nodes. During that time several leave/join processes happened. More than 800 rejoined nodes were falsely reported to be down. So we cleaned up the down nodes to get the correct numbers.
The time had come and we started filling with tasks our historical crowdsourcing cluster as the third experiment. We started with small number of tasks and everything seemed fine. I progressively increased the task rate from 1000 to 10,000 to finally reach 100,000. I was carefully monitoring the task failure which was significantly increasing and finally decided to stop.
Numbers of Tasks Matter
From the observation, you can go far beyond 2,300 Swarm nodes with no problem. But it doesn't make sense to go that far as you'll be capped by the number of running tasks, not the number of nodes.
The limitation of the current version of Docker Swarm 1.12-rc4 is the number of tasks handled by the managers.
This number is around 95,000.
It's actually a very large number and, in practice, you probably won't need that many tasks.
This seems to be a Swarm limitation as each node was running only 40-45 tasks as reported by contributors. The maximum number of containers allowed on the smallest node (512MB of RAM) is around 60-70 containers, there was still some room.
I would like to thank you @Scaleway for being the major part of this experiment without any single node failing!
It would have been impossible for this historical cluster to be formed without the help from Scaleway and all other contributors.
Thanks to the Docker engineering teams for this great software and all critical advises during the experiment.
A special thanks to all of our Swarm2K Heroes. Thank you very much for being together. I'm looking forwards to do this huge experiment with all of you again!!