numa Questions
5
I am working on a Java application for solving a class of numerical optimization problems - large-scale linear programming problems to be more precise. A single problem can be split up into smaller...
Retrospection asked 14/11, 2019 at 20:29
1
Solved
In the technical overview published by Intel, "Sub-NUMA Clustering" and "Hemisphere and Quadrant Modes" are described separately. But the main difference between them is not cle...
Sciamachy asked 28/4, 2023 at 8:51
1
Solved
This question is a spin-off of the one posted here: Measuring bandwidth on a ccNUMA system
I've written a micro-benchmark for the memory bandwidth on a ccNUMA system with 2x Intel(R) Xeon(R) Platin...
Acyclic asked 13/5, 2022 at 12:21
1
Solved
I'm attempting to benchmark the memory bandwidth on a ccNUMA system with 2x Intel(R) Xeon(R) Platinum 8168:
24 cores @ 2.70 GHz,
L1 cache 32 kB, L2 cache 1 MB and L3 cache 33 MB.
As a reference, ...
Seeder asked 10/5, 2022 at 7:55
3
Solved
I'm attempting to create a std::vector<std::set<int>> with one set for each NUMA-node, containing the thread-ids obtained using omp_get_thread_num().
Topo:
Idea:
Create data which is ...
Interested asked 3/3, 2022 at 16:50
4
Solved
I've set up my code to carefully load and process data locally on my NUMA system. I think. That is, for debugging purposes I'd really like to be able to use the pointer addresses being accessed ins...
Endothelioma asked 2/11, 2011 at 20:27
1
Solved
I'm building a topological tree of sockets, NUMA nodes, caches, cores, and threads for any Intel or AMD system in C.
Building this hierarchy, I want to ensure hardware threads are grouped together ...
3
I have an Intel Xeon Phi 64-core CPU with 16GB on-chip memory set as NUMA node 1. I want to bind a process running inside a Docker container to this NUMA node, but it errors out:
root@Docker$ sudo...
Perth asked 6/4, 2017 at 23:44
3
Linux can have both standard 4KiB page memory and 1GiB (huge) paged memory (and 2MiB pages, but I don't know if anyone uses that).
Is there a standard call to get the page size from an arbitrary vi...
5
Solved
2
The MPI-3 standard introduces shared-memory, that can be read and written by all processes sharing this memory without using calls to the MPI library.
While there are examples of one-sided communic...
Bullroarer asked 19/2, 2020 at 10:33
1
Solved
This question is for:
kernel 3.10.0-1062.4.3.el7.x86_64
non transparent hugepages allocated via boot parameters and might or might not be mapped to a file (e.g. mounted hugepages)
x86_64
Accord...
Unrepair asked 14/1, 2020 at 1:8
1
Solved
#include <cstdint>
#include <iostream>
#include <numaif.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <errno.h>
#include <unistd.h>
#include <str...
1
Solved
TL;DR
How are MMIO, IO and PCI configuration requests routed to the right node in a NUMA system?
Each node has a "routing table" but I'm under the impression that the OS is supposed to be...
Resurge asked 30/7, 2019 at 18:31
5
We've just bought a 32-core Opteron machine, and the speedups we get are a little disappointing: beyond about 24 threads we see no speedup at all (actually gets slower overall) and after about 6 th...
Mullion asked 20/11, 2012 at 1:45
1
Solved
I see that g++ generates a simple mov for x.load() and mov+mfence for x.store(y).
Consider this classic example:
#include<atomic>
#include<thread>
std::atomic<bool> x,y;
bool r1...
Mydriatic asked 12/2, 2019 at 14:46
0
Using mbind, one can set the memory policy for a given mapped memory segment.
Q: How can I tell mbind to interleave a segment on all nodes?
If done after allocation but before usage, MPOL_INTERLEAV...
Vermin asked 18/11, 2018 at 0:12
2
Solved
I have a dual socket Xeon E5522 2.26GHZ machine (with hyperthreading disabled) running ubuntu server on linux kernel 3.0 supporting NUMA. The architecture layout is 4 physical cores per socket.
An ...
Kape asked 14/8, 2012 at 20:4
1
Solved
I have Jetson TX2, python 2.7, Tensorflow 1.5, CUDA 9.0
Tensorflow seems to be working but everytime, I run the program, I get this warning:
with tf.Session() as sess:
print (sess.run(y,feed_dict)...
Giff asked 7/8, 2018 at 18:18
2
I'm trying to understand what node distances in numactl --hardware mean?
On our cluster, it outputs the following
numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 12 13 14 15 ...
1
Solved
I'm working on a legacy application initially developed for multicore processor systems. To leverage multicore processing OpenMP and PPL have been used.
Now a new requirement is to run the software...
Hellhole asked 5/3, 2018 at 7:31
1
Recently I have been observing performance effects in memory-intensive workloads I was unable to explain. Trying to get to the bottom of this I started running several microbenchmarks in order to d...
Comedian asked 11/12, 2017 at 9:36
1
Solved
Consider this scenario: a user process running on a NUMA machine calls mmap to creates a new mapping in the virtual address space. It then uses the memory returned by mmap for its processing (stori...
Chokeberry asked 3/11, 2017 at 13:40
0
I am developing a real-time application on a server with two NUMA nodes. Below is a simplified version of the system diagram (the OS is Ubuntu14.04):
.-------------. .-------------.
| Device 0 | |...
Selvage asked 4/8, 2017 at 9:29
0
I built tensorflow from sources using bazel and when I finally open a session, I get the following warning:
2017-05-07 15:45:40.816127: I
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893]...
Ecosphere asked 7/5, 2017 at 10:22
1 Next >
© 2022 - 2024 — McMap. All rights reserved.