Using Docker for HPC with Sun Grid Engine
Asked Answered
M

1

6

I am wondering if it is possible to create a virtual cluster with Docker so that I can run scripts that have been designed for HPC clusters using SGE cluster management. These are pretty large/complicated workflows, so its not just something I can re-write, say for TORQUE/PBS. Theoretically I should be able to trick Docker into thinking there are multiple nodes, just like my internal HPC cluster. If someone can save me the pain telling me it can't be done, I would be greatly appreciative.

Warning: I am not a cluster admin. I'm more like the end user. I am running on my Mac OSX 10.9.5

Client version: 1.7.0
Client API version: 1.19 Go version (client): go1.4.2 Git commit
(client): 0baf609 OS/Arch (client): darwin/amd64 Server version: 1.7.0
 Server API version: 1.19 Go version (server): go1.4.2 Git commit
 (server): 0baf609 OS/Arch (server): linux/amd64 bash-3.2$ boot2docker
 version Boot2Docker-cli version: v1.7.0 Git commit: 7d89508

I've been using a derivative of an image (the Dockerfileis here). My steps are pretty straightforward and follow the instructions on the website:

  1. Create image
docker-machine create -d virtualbox local
  1. Make it the active image
eval "$(docker-machine env local)"
  1. Get swarm image
docker run --rm swarm create
  1. Create swarm master
docker-machine create \
    -d virtualbox \
    --swarm \
    --swarm-master \
    --swarm-discovery token://$TOKEN \
    swarm-master
  1. Use the token to create the swarm nodes
docker-machine create \
-d virtualbox \
--swarm \
--swarm-discovery token://$TOKEN \
swarm-agent-00
  1. Add another node
 docker-machine create \
-d virtualbox \
--swarm \
--swarm-discovery token://$TOKEN \
swarm-agent-01

Now here is the crazy part. When I try to source the image using this command: eval "$(docker-machine env --swarm swarm-master)" I get this stupid thing Cannot connect to the Docker daemon. Is 'docker -d' running on this host?. I then tried eval $(docker-machine env swarm-master) and it works, but I'm not 100% sure its the right thing to do:

NAME             ACTIVE   DRIVER       STATE     URL                         SWARM 
local                     virtualbox   Running   tcp://192.168.99.105:2376   
swarm-agent-00            virtualbox   Running   tcp://192.168.99.107:2376   swarm-master
swarm-agent-01            virtualbox   Running   tcp://192.168.99.108:2376   swarm-master
swarm-master     *        virtualbox   Running   tcp://192.168.99.106:2376   swarm-master (master)
  1. At this point, I build my multi-container app using this yaml file:
bior:
 image: stevenhart/bior_annotate
 command: login -f sgeadmin
 volumes:
  - .:/Data
 links: 
  - sge

sge:
 build: .
 ports:
  - "6444"
  - "6445"
  - "6446"

using docker-compose up

  1. And then finally open up the new image

docker run -it --rm dockersge_sge login -f sgeadmin

But here is the problem

when I run qhost I get the following:

    HOSTNAME                ARCH         NCPU NSOC NCOR NTHR  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
----------------------------------------------------------------------------------------------
global                  -               -    -    -    -     -       -       -       -       -
6bf6f6fda409            lx-amd64        1    1    1    1  0.01  996.2M   96.2M    1.1G     0.0

Shouldn't it think there are multiple CPUs, i.e. each one of my swarm nodes?

Markusmarl answered 21/7, 2015 at 2:28 Comment(0)
L
3

I assume you are running qhost inside your docker.

The thing with swarm is, that it doesn't combine all the hosts into one big machine (I used to think so).

Instead, you have, for example, 5 one core machines, then swarm will pick a machine with as few dockers as possible and run the docker on that machine.

So swarm is the controller who spreads the dockers in a cluster, rather than combining the hosts into one.

Hope it helps! If you have additional questions, please ask :)

UPDATE

I'm not sure if it suits you, but if you don't get it with swarm, I would recommend kubernetes. I use it on my Raspberry Pis. It is very cool and more mature than swarm, with things like auto healing and so on.

I don't know, but surely there's a way of integrating docker with hadoop too...

Luff answered 21/7, 2015 at 7:9 Comment(3)
Thanks. That makes sense given what I am seeing. I just pinged the developer google group to see if it is still possible through some sort of workaround (maybe not even swarm?).Markusmarl
Any advice on extending your comment on kubernetes? How could one install sge (and add nodes) on a docker aimed at running them on Kubernetes?Mcrae
Unfortunately, I'm not familiar with sge, but when you have a functional sge setup in docker, it shouldn't be hard to convert to kubernetes. In fact, kubernetes just handles the schedule of the containers. You'll have to google it :)Luff

© 2022 - 2024 — McMap. All rights reserved.