cpu-architecture Questions

0

I have been experimenting with a simple true/false sharing benchmark, which does regular load+increment+write on a pointer. Basically this: static void do_increments(volatile size_t *buffer, size_t...

4

I have been curious about this for awhile since compression is used in about everything. Are there any basic compression support instructions in the silicon on a typical modern CPU chip? If not...
Kimkimball asked 4/5, 2018 at 22:19

3

Solved

A handy method to verify if a positive integer n is a power of two (like 1, 2, 4, 8, etc.) is to use the following test for having no more than 1 bit set: bool test = n & (n - 1) == 0; This op...

1

My program adds float arrays and is unrolled 4x when compiled with max optimizations by MSVC and G++. I didn't understand why both compilers chose to unroll 4x so I did some testing and found only ...
Melonie asked 19/6, 2022 at 5:11

4

Solved

I am on the hook to analyze some "timing channels" of some x86 binary code. I am posting one question to comprehend the bsf/bsr opcodes. So high-levelly, these two opcodes can be modeled as a "loo...
Hemeralopia asked 4/2, 2019 at 2:46

2

I understand big endian and little endian. However, all the processors of all the computers accessible to me -- AMD, Intel, Broadcom -- are little endian. This leads me to wonder whether ther...
Tittivate asked 23/1, 2022 at 1:7

7

Solved

Background The Von-Neumann architecture describes the stored-program computer where instructions and data are stored in memory and the machine works by changing its internal state, i.e an instruct...

1

I am wondering why this code: size_t hash_word(const char* c, size_t size) { size_t hash = uchar(c[0]); hash ^= uchar(c[size - 1]); hash ^= uchar(c[size - 2]); return hash; } When compiled: m...
Enfeeble asked 5/2 at 2:21

2

Solved

Today I got a different understand with my professor on the Parallel Programming class, about what is "false sharing". What my professor said makes little sense so I pointed it out immediately. She...

4

Solved

This is a great article which talks about low level optimization techniques and shows an example where the author converts expensive divisions into cheap comparisons. https://www.facebook.com/notes...

1

We compile our code with g++ -march=ivybridge -mtune=skylake. In case somebody runs on older/incompatible architecture I want app to inform and exit gracefully. How do I do this? How about AMD proc...
Burgee asked 18/9, 2020 at 9:18

5

Solved

CPU Switches from User mode to Kernel Mode : What exactly does it do? How does it makes this transition? EDIT: Even if it is architecture dependent please provide me with an answer. The architect...
Ecumenicity asked 19/3, 2010 at 16:59

1

I am conducting a test to measure the message synchronization latency between different cores of a CPU. Specifically, I am measuring how many clock cycles it takes for CPU2 to detect changes in the...
Chema asked 24/11, 2023 at 14:0

1

I see in AVX2 instruction set, Intel distinguishes the XOR operations of integer, double and float with different instructions. For Integer there's "VPXORD", and for double "VXORPD", for float "VXO...
Halfmoon asked 5/3, 2019 at 18:32

2

Solved

Is there any way for a programmer to write data directly into video memory? I know OS's are very strict about this, but then how some types of applications (like videos players or computer games) c...
Wolfram asked 20/2, 2016 at 8:5

2

This is my makefile: task0 : main.o numbers.o add.o gcc -m32 -g -Wall -o task0 main.o numbers.o add.o main.o : main.c gcc -g -Wall -m32 -ansi -c -o main.c numbers.o : numbers.c gcc -g -Wall -m3...
Collbaith asked 3/3, 2014 at 21:4

2

Solved

I came across several references to the concept of a dual issue processor (I hope this even makes sense in a sentence). I can't find any explanation of what exactly dual issue is. Google gives me l...
Kriss asked 4/11, 2011 at 19:28

2

Solved

On Intel AVX, there is a possibility of branchless code. Instead of branching for case0 or case1, you can compute both cases, and blend the results based on a condition. AVX does this 8 way for flo...
Pomposity asked 22/5, 2022 at 19:27

0

After a release operation A is performed on an atomic object M, the longest continuous subsequence of the modification order of M that consists of: Writes performed by the same thread that perfor...
Vastha asked 10/9, 2023 at 11:57

11

Solved

In a book I read the following: 32-bit processors have 2^32 possible addresses, while current 64-bit processors have a 48-bit address space My expectation was that if it's a 64-bit processor, ...
Header asked 16/7, 2011 at 11:9

10

Solved

I was trying to figure out how much memory I can malloc to maximum extent on my machine (1 Gb RAM 160 Gb HD Windows platform). I read that the maximum memory malloc can allocate is limited to phys...

4

Solved

I want to write a program to get my cache size(L1, L2, L3). I know the general idea of it. Allocate a big array Access part of it of different size each time. So I wrote a little program. Here'...
Rushton asked 2/10, 2013 at 12:25

4

Solved

I was just thinking, how do machines interpreter binary code? All I understand is your code get's turned into 1 and 0's so the machine can understand them, but how do they do that? Is it just a nor...

5

Solved

Most architectures I've seen that support native scalar hardware FP support shove them off into a completely separate register space, separate from the main set of registers. Most architectures I'...
Ribosome asked 23/7, 2018 at 5:20

4

Solved

The problem: I'm trying to figure out how to write a code (C preffered, ASM only if there is no other solution) that would make the branch prediction miss in 50% of the cases. So it has to be a p...

© 2022 - 2024 — McMap. All rights reserved.