cpu-architecture Questions
0
I have been experimenting with a simple true/false sharing benchmark, which does regular load+increment+write on a pointer. Basically this:
static void do_increments(volatile size_t *buffer, size_t...
Touraco asked 30/6, 2024 at 10:55
4
I have been curious about this for awhile since compression is used in about everything.
Are there any basic compression support instructions in the silicon on a typical modern CPU chip?
If not...
Kimkimball asked 4/5, 2018 at 22:19
3
Solved
A handy method to verify if a positive integer n is a power of two (like 1, 2, 4, 8, etc.) is to use the following test for having no more than 1 bit set:
bool test = n & (n - 1) == 0;
This op...
Perilous asked 12/6, 2024 at 14:2
1
My program adds float arrays and is unrolled 4x when compiled with max optimizations by MSVC and G++. I didn't understand why both compilers chose to unroll 4x so I did some testing and found only ...
Melonie asked 19/6, 2022 at 5:11
4
Solved
I am on the hook to analyze some "timing channels" of some x86 binary code. I am posting one question to comprehend the bsf/bsr opcodes.
So high-levelly, these two opcodes can be modeled as a "loo...
Hemeralopia asked 4/2, 2019 at 2:46
2
I understand big endian and little endian. However, all the processors of all the computers accessible to me -- AMD, Intel, Broadcom -- are little endian. This leads me to wonder whether ther...
Tittivate asked 23/1, 2022 at 1:7
7
Solved
Background
The Von-Neumann architecture describes the stored-program computer where instructions and data are stored in memory and the machine works by changing its internal state, i.e an instruct...
Ulund asked 6/5, 2010 at 14:55
1
I am wondering why this code:
size_t hash_word(const char* c, size_t size) {
size_t hash = uchar(c[0]);
hash ^= uchar(c[size - 1]);
hash ^= uchar(c[size - 2]);
return hash;
}
When compiled:
m...
Enfeeble asked 5/2, 2024 at 2:21
2
Solved
Today I got a different understand with my professor on the Parallel Programming class, about what is "false sharing". What my professor said makes little sense so I pointed it out immediately. She...
Contraction asked 31/3, 2014 at 15:49
4
Solved
This is a great article which talks about low level optimization techniques and shows an example where the author converts expensive divisions into cheap comparisons.
https://www.facebook.com/notes...
Muncy asked 7/8, 2013 at 0:57
1
We compile our code with g++ -march=ivybridge -mtune=skylake. In case somebody runs on older/incompatible architecture I want app to inform and exit gracefully. How do I do this? How about AMD proc...
Burgee asked 18/9, 2020 at 9:18
5
Solved
CPU Switches from User mode to Kernel Mode : What exactly does it do? How does it makes this transition?
EDIT:
Even if it is architecture dependent please provide me with an answer. The architect...
Ecumenicity asked 19/3, 2010 at 16:59
1
I am conducting a test to measure the message synchronization latency between different cores of a CPU. Specifically, I am measuring how many clock cycles it takes for CPU2 to detect changes in the...
Chema asked 24/11, 2023 at 14:0
1
I see in AVX2 instruction set, Intel distinguishes the XOR operations of integer, double and float with different instructions. For Integer there's "VPXORD", and for double "VXORPD", for float "VXO...
Halfmoon asked 5/3, 2019 at 18:32
2
Solved
Is there any way for a programmer to write data directly into video memory? I know OS's are very strict about this, but then how some types of applications (like videos players or computer games) c...
Wolfram asked 20/2, 2016 at 8:5
2
This is my makefile:
task0 : main.o numbers.o add.o
gcc -m32 -g -Wall -o task0 main.o numbers.o add.o
main.o : main.c
gcc -g -Wall -m32 -ansi -c -o main.c
numbers.o : numbers.c
gcc -g -Wall -m3...
Collbaith asked 3/3, 2014 at 21:4
2
Solved
I came across several references to the concept of a dual issue processor (I hope this even makes sense in a sentence). I can't find any explanation of what exactly dual issue is. Google gives me l...
Kriss asked 4/11, 2011 at 19:28
2
Solved
On Intel AVX, there is a possibility of branchless code.
Instead of branching for case0 or case1, you can compute both cases, and blend the results based on a condition.
AVX does this 8 way for flo...
Pomposity asked 22/5, 2022 at 19:27
0
After a release operation A is performed on an atomic object M, the
longest continuous subsequence of the modification order of M that
consists of:
Writes performed by the same thread that perfor...
Vastha asked 10/9, 2023 at 11:57
11
Solved
In a book I read the following:
32-bit processors have 2^32 possible addresses, while current 64-bit processors have a 48-bit address space
My expectation was that if it's a 64-bit processor, ...
Header asked 16/7, 2011 at 11:9
10
Solved
I was trying to figure out how much memory I can malloc to maximum extent on my machine
(1 Gb RAM 160 Gb HD Windows platform).
I read that the maximum memory malloc can allocate is limited to phys...
Prebend asked 9/5, 2010 at 16:31
4
Solved
I want to write a program to get my cache size(L1, L2, L3). I know the general idea of it.
Allocate a big array
Access part of it of different size each time.
So I wrote a little program.
Here'...
Rushton asked 2/10, 2013 at 12:25
4
Solved
I was just thinking, how do machines interpreter binary code? All I understand is your code get's turned into 1 and 0's so the machine can understand them, but how do they do that? Is it just a nor...
Monorail asked 3/3, 2012 at 15:0
5
Solved
Most architectures I've seen that support native scalar hardware FP support shove them off into a completely separate register space, separate from the main set of registers.
Most architectures I'...
Ribosome asked 23/7, 2018 at 5:20
4
Solved
The problem:
I'm trying to figure out how to write a code (C preffered, ASM only if there is no other solution) that would make the branch prediction miss in 50% of the cases.
So it has to be a p...
Smyth asked 10/3, 2015 at 10:34
1 Next >
© 2022 - 2025 — McMap. All rights reserved.