I am wondering if it is possible to create a virtual cluster with Docker so that I can run scripts that have been designed for HPC clusters using SGE cluster management. These are pretty large/complicated workflows, so its not just something I can re-write, say for TORQUE/PBS. Theoretically I should be able to trick Docker into thinking there are multiple nodes, just like my internal HPC cluster. If someone can save me the pain telling me it can't be done, I would be greatly appreciative.
Warning: I am not a cluster admin. I'm more like the end user. I am running on my Mac OSX 10.9.5
Client version: 1.7.0 Client API version: 1.19 Go version (client): go1.4.2 Git commit (client): 0baf609 OS/Arch (client): darwin/amd64 Server version: 1.7.0 Server API version: 1.19 Go version (server): go1.4.2 Git commit (server): 0baf609 OS/Arch (server): linux/amd64 bash-3.2$ boot2docker version Boot2Docker-cli version: v1.7.0 Git commit: 7d89508
I've been using a derivative of an image (the Dockerfile
is here). My steps are pretty straightforward and follow the instructions on the website:
- Create image
docker-machine create -d virtualbox local
- Make it the active image
eval "$(docker-machine env local)"
- Get swarm image
docker run --rm swarm create
- Create swarm master
docker-machine create \ -d virtualbox \ --swarm \ --swarm-master \ --swarm-discovery token://$TOKEN \ swarm-master
- Use the token to create the swarm nodes
docker-machine create \ -d virtualbox \ --swarm \ --swarm-discovery token://$TOKEN \ swarm-agent-00
- Add another node
docker-machine create \ -d virtualbox \ --swarm \ --swarm-discovery token://$TOKEN \ swarm-agent-01
Now here is the crazy part. When I try to source the image using this command: eval "$(docker-machine env --swarm swarm-master)"
I get this stupid thing Cannot connect to the Docker daemon. Is 'docker -d' running on this host?
. I then tried eval $(docker-machine env swarm-master)
and it works, but I'm not 100% sure its the right thing to do:
NAME ACTIVE DRIVER STATE URL SWARM
local virtualbox Running tcp://192.168.99.105:2376
swarm-agent-00 virtualbox Running tcp://192.168.99.107:2376 swarm-master
swarm-agent-01 virtualbox Running tcp://192.168.99.108:2376 swarm-master
swarm-master * virtualbox Running tcp://192.168.99.106:2376 swarm-master (master)
- At this point, I build my multi-container app using this yaml file:
bior: image: stevenhart/bior_annotate command: login -f sgeadmin volumes: - .:/Data links: - sge sge: build: . ports: - "6444" - "6445" - "6446"
using docker-compose up
- And then finally open up the new image
docker run -it --rm dockersge_sge login -f sgeadmin
But here is the problem
when I run qhost
I get the following:
HOSTNAME ARCH NCPU NSOC NCOR NTHR LOAD MEMTOT MEMUSE SWAPTO SWAPUS ---------------------------------------------------------------------------------------------- global - - - - - - - - - - 6bf6f6fda409 lx-amd64 1 1 1 1 0.01 996.2M 96.2M 1.1G 0.0
Shouldn't it think there are multiple CPUs, i.e. each one of my swarm nodes?