hpc Questions
1
Solved
In a sbatch script, you can directly launch programs or scripts (for example an executable file myapp) but in many tutorials people use srun myapp instead.
Despite reading some documentation on th...
1
Solved
The terminology used in the sbatch man page might be a bit confusing. Thus, I want to be sure I am getting the options set right. Suppose I have a task to run on a single node with N threads. Am I ...
Decolorize asked 2/7, 2018 at 15:45
5
Can someone elaborate the differences between the OpenMPI and MPICH implementations of MPI ?
Which of the two is a better implementation ?
1
Solved
If one is running an array job on a slurm cluster, how can one restart a failed worker job?
In a Sun Grid Engine queue, one can add #$ -r y to the job file to indicate the job should be restarted ...
Neuropathy asked 2/6, 2018 at 22:34
1
Solved
I am wondering how I might be able to run 500 parallel jobs in R using the Rscript function. I currently have an R file that has the header on top:
args <- commandArgs(TRUE)
B <- as.nu...
Rumery asked 2/6, 2018 at 5:50
1
Solved
Let's say I'm generating tuples and I want to concatenate them as they come. How do I do this? The following does element-wise addition:
if ts = ("foo", "cat"), t = ("bar", "dog")
ts += t gives t...
Hl asked 26/1, 2018 at 0:30
2
Solved
I am trying to get a hybrid OpenMP / MPI job to run so that OpenMP threads are separated by core (only one thread per core). I have seen other answers which use numa-ctl and bash scripts to set env...
2
Solved
I am using a cluster with environment modules. This means that I must specifically load any R version other than the default (2.13) so to load R 3.0.1, I have to specify
module load R/3.0.1
R
I...
Trinitrotoluene asked 1/10, 2013 at 21:35
2
Solved
I found some very similar questions which helped me arrive at a script which seems to work however I'm still unsure if I fully understand why, hence this question..
My problem (example): On 3 node...
Execrative asked 25/8, 2017 at 14:12
1
Solved
I am working a python code with MPI (mpi4py) and I want to implement my code across many nodes (each node has 16 processors) in a queue in a HPC cluster.
My code is structured as below:
from mpi...
Neal asked 25/5, 2017 at 4:43
2
Solved
My program is well-suited for MPI. Each CPU does its own, specific (sophisticated) job, produces a single double, and then I use an MPI_Reduce to multiply the result from every CPU.
But I repeat t...
2
Solved
Usually when I use mpirun, I can "overload" it, using more processors than there acctually are on my computer. For example, on my four-core mac, I can run mpirun -np 29 python -c "print 'hey'" no p...
2
Solved
In terms of performance, what are the benefits of allocating a contiguous memory block versus separate memory blocks for a matrix? I.e., instead of writing code like this:
char **matrix = malloc(s...
Dorran asked 10/6, 2014 at 22:36
7
We work on scientific computing and regularly submit calculations to different computing clusters. For that we connect using linux shell and submitting jobs through SGE, Slurm, etc (it depends on t...
Freetown asked 15/9, 2016 at 13:39
1
Solved
Let
x = matrix(rnorm(1000000), nrow = 5000)
I would like to compute matrix multiplication with its transpose: x %*% t(x).
After googling I found a possible faster way of doing the above is
tc...
1
Solved
I have downloaded COMPSs 1.4 and some test programs from http://www.bsc.es/computer-sciences/grid-computing/comp-superscalar/downloads-and-documentation and I am trying to test them. Java execution...
Burka asked 28/7, 2016 at 10:39
1
I am running a Python script on a Windows HPC cluster. A function in the script uses starmap from the multiprocessing package to parallelize a certain computationally intensive process.
When I run...
Dossier asked 14/5, 2016 at 0:24
2
Solved
This post is closely related to another one I posted some days ago. This time, I wrote a simple code that just adds a pair of arrays of elements, multiplies the result by the values in another arra...
Rosewater asked 27/10, 2011 at 16:38
3
I have a problem where I must analyse 500C5 combinations (255244687600) of something. Distributing it over a 10-node cluster where each cluster processes roughly 10^6 combinations per second means ...
Sundsvall asked 15/1, 2011 at 8:15
1
Solved
I have developed a high performance Cholesky factorization routine, which should have peak performance at around 10.5 GFLOPs on a single CPU (without hyperthreading). But there is some phenomenon w...
Bleary asked 1/4, 2016 at 18:41
2
Solved
Is there a way to instruct GCC (I'm using 4.8.4) to unroll the while loop in the bottom function completely, i.e., peel this loop? The number of iterations of the loop is known at compilation time:...
Frescobaldi asked 20/3, 2016 at 5:36
1
Solved
I am running with COMPSs the Increment application shown in the COMPSs Sample Application Manual. I have added the -m flag to enable the monitoring feature:
$ runcompss -m --debug increment.Increm...
Thirteen asked 11/3, 2016 at 14:46
1
Solved
I am learning COMPSs. Until now, everything has been working really well, but I only executed the examples given in the manual.
Now that I want to run my own test application, I can't get it to wo...
Dimension asked 11/3, 2016 at 15:24
1
Solved
I have a cluster which has a shared disk between the different nodes.
How can I configure COMP superscalar to take into account this shared disk in order to avoid file transfers?
Liberia asked 11/3, 2016 at 14:35
0
I've googled for a while and the only useful infos are:
github.com/barnex/cuda5
mumax.github.io/
Unfortunately, the latest Arch Linux only provides CUDA 7.5 package, so the barnex's proje...
© 2022 - 2024 — McMap. All rights reserved.