cpu-architecture Questions
4
Solved
It is unclear to me how the compiler will automatically know to compile for 64-bit when it needs to. How does it know when it can confidently target 32-bit?
I am mainly curious about how the com...
Severance asked 22/8, 2012 at 5:13
1
Solved
A question I have on Problem 5.5 of the book Computer Systems: A Programmer's Perspective about clock cycles required to perform arithmetic operations:
For context, clock cycles required for...
Arana asked 8/10, 2022 at 6:27
0
This is the original tests. The figure shows that CPU=8 in Intel Core i9-12900K has the fastest access to all other cores.
Therefore, I am wondering what makes it happen.
Besides, I am also curious...
Afforest asked 19/9, 2022 at 1:38
1
Solved
I am aware of a few tools that measure power consumption of programs, such as powerTOP, RAPL and the like.
However, I was wondering if there exists some kind of benchmark such as Agner Fog's benchm...
Forespent asked 12/9, 2022 at 16:0
1
Solved
Consider massiveley SIMD-vectorized loops on very large amounts of floating point data (hundreds of GB) that, in theory, should benefit from non-temporal ("streaming" i.e. bypassing cache...
Vivyanne asked 27/8, 2022 at 10:29
0
In MESI protocol you write to the cache line only when holding it in the Exclusive/Modified state. To acquire the Exclusive state, you send an Invalidate request to all the cores holding the same c...
Ritual asked 27/8, 2022 at 6:48
1
While trying to benchmark implementations of a simple sparse unit lower triangular backward solve in CSC format, I observe strange behavior. The performance seems to vary drastically, depending on ...
Underage asked 20/8, 2022 at 16:8
1
Can masking improve the performance of AVX-512 memory operations (load/store/gather/scatter and non-shuffling load-ops)?
Seeing as masked out elements don't trigger memory faults, one would assume ...
Midmost asked 10/8, 2022 at 10:18
6
Solved
Is there a way to identify the iOS device CPU architecture in runtime?
Thank you.
Cassidycassie asked 8/11, 2013 at 12:42
4
I found references to hart on page 35 of the RISC-V 2.1 spec. However, I could not find a definition for hart in this document. Does hart refer to a hardware-thread or something more sinister?
Garibald asked 8/3, 2017 at 16:33
9
Solved
As part of my app I'm using the NDK and was wondering if it's worth bundling x86 and mips binaries alongside the standard ARM binaries.
I figured the best way would be to track what my users actua...
Miyokomizar asked 16/8, 2012 at 14:39
2
Solved
Before you laugh at me: I want to ask arithmetic operation is done in ALU unit or accumulator. I read a book it says accumulator is a register for doing arithmetic.
This Accumulator said
Withou...
Peewee asked 26/6, 2016 at 18:7
2
Short version
I would like to know the technical reasons why do Docker images need to be created for multiple architectures. Also, it is not clear whether the point here is creating an image for e...
Mariellamarielle asked 9/6, 2020 at 23:52
4
Essentially, how does 4Gb turn into 4GB? If the memory is addressing Bytes, should not the possibilities be 2(32/8)?
Antiphlogistic asked 13/9, 2014 at 7:26
1
Solved
Is there a way that we can write some code to produce a "spurious failures" for the "weak" version of compare_exchange? While the same code should work well for compare_exchange...
Melmon asked 27/6, 2022 at 1:50
3
Solved
I understand how things like numbers and letters are encoded in binary, and thus can be stored as 0's and 1's.
But how are functions stored in memory? I don't see how they could be stored as 0's a...
Propositus asked 15/8, 2014 at 23:58
2
Solved
CPUs intended to provide high-performance number crunching, end up with some kind of vector instruction set. There are basically two kinds:
SIMD. This is conceptually straightforward, e.g. instead...
Cawnpore asked 29/5, 2022 at 9:35
1
It's clear why a 16-bit floating-point format has started seeing use for machine learning; it reduces the cost of storage and computation, and neural networks turn out to be surprisingly insensitiv...
Wideawake asked 2/6, 2022 at 10:33
7
Solved
I was reading Agner Fog's optimization manuals, and I came across this example:
double data[LEN];
void compute()
{
const double A = 1.1, B = 2.2, C = 3.3;
int i;
for(i=0; i<LEN; i++) {
dat...
Mikamikado asked 19/5, 2022 at 14:39
1
Solved
There are the words in linux/Documentation/x86/x86_64/5level-paging.rst
Original x86-64 was limited by 4-level paging to 256 TiB of virtual address space and 64 TiB of physical address space.
I k...
Volition asked 28/5, 2022 at 16:19
1
Solved
I've been wondering.. It's called SIMD as in single instruction multiple data. So why does it have single data instructions?
For example, vaddss is the single data equivalent of the multiple data v...
Asmodeus asked 27/5, 2022 at 1:9
1
Solved
This question is a spin-off of the one posted here: Measuring bandwidth on a ccNUMA system
I've written a micro-benchmark for the memory bandwidth on a ccNUMA system with 2x Intel(R) Xeon(R) Platin...
Acyclic asked 13/5, 2022 at 12:21
5
In the book Computer Systems: A Programmer's Perspective, the Exercise 5.5 shows a piece of code to compute the value of a polynomial
double poly(double a[], double x, int degree)
{
long in...
Trici asked 9/2, 2013 at 9:30
1
Solved
After serious development, CPUs gained many cores, gained distributed blocks of cores on multiple chiplets, numa systems, etc but still a piece of data has to pass through not only L1 cache (if on ...
Cherimoya asked 11/5, 2022 at 19:36
1
Solved
I know store buffer and invalidate queues are reasons that cause memory reordering. What I don't know is if Out-of-Order-Execution can cause memory reordering.
In my opinion, Out-of-Order-Execution...
Bailee asked 6/4, 2022 at 14:32
© 2022 - 2024 — McMap. All rights reserved.