cpu-architecture Questions
1
Solved
I wrote a toy program that compares the performance of two very similar functions. The entire file (minus a couple of macros) looks like this:
constexpr int iterations = 100000;
using u64 = uint64...
Dahlberg asked 1/5, 2022 at 9:11
3
Solved
Specifically is:
mov %eax, %ds
Slower than
mov %eax, %ebx
Or are they the same speed. I've researched online, but have been unable to find a definitive answer.
I'm not sure if this is a sill...
Ovalle asked 3/7, 2018 at 22:56
1
Solved
How can I check particular cpu core belongs to P-core or E-core group? Is there any way to list information about Performance/Energy cores in a running Linux x86_64 alder lake system? Like, Printin...
Bern asked 15/2, 2022 at 7:54
4
I want to know the way variables are initialized :
#include <stdio.h>
int main( void )
{
int ghosts[3];
for(int i =0 ; i < 3 ; i++)
printf("%d\n",ghosts[i]);
return 0;
}
thi...
Huntingdon asked 25/2, 2022 at 15:21
0
My understanding of Intel CPUs in general is that demand loads to consecutive physical addresses trigger the L2 hardware stream prefetcher, which can prefetch quite far in advance up to the page bo...
Dogberry asked 24/2, 2022 at 0:40
3
My professor claimed that LOOP is faster on 8086 because only one instruction is fetched instead of two, like in dec cx, jnz. So I think we are saving time by avoiding the extra fetch and decode pe...
Adversary asked 14/2, 2022 at 19:17
2
Solved
I'm a complete novice to computer architecture and the low level stuff that happens at the processor/memory level. I'll start by saying that. What i've done with computers has pretty much always be...
Rhineland asked 11/12, 2014 at 17:13
1
Solved
int main()
{
00211000 push ebp
00211001 mov ebp,esp
00211003 sub esp,10h
char charVar1;
short shortVar1;
int intVar1;
long longVar1;
charVar1 = 11;
00211006 mov byte ptr [charVar1],0Bh
...
Ciliary asked 15/1, 2022 at 5:36
6
Solved
I always wondered what's the purpose of the rotate instructions some CPUs have (ROL, RCL on x86, for example). What kind of software makes use of these instructions?
I first thought they may be use...
Casandra asked 12/2, 2011 at 6:7
3
Solved
Modern x86 CPUs break down the incoming instruction stream into micro-operations (uops1) and then schedule these uops out-of-order as their inputs become ready. While the basic idea is clear, I'd l...
Tonguetied asked 18/11, 2016 at 15:58
1
Solved
NodeJ has built-in methods for detecting the machine it is running on:
process.arch returns the operating system CPU architecture. Possible values:
arm - 32-bit Advanced RISC Machine
arm64 - 64-bi...
Snooker asked 11/12, 2021 at 4:29
1
Solved
Intel recommends using instruction prefixes to mitigate the performance consequences of JCC Erratum.
MSVC if compiled with /QIntel-jcc-erratum follows the recommendation, and inserts prefixed instr...
Dyslexia asked 3/12, 2021 at 15:27
13
Solved
In C++,
Why is a boolean 1 byte and not 1 bit of size?
Why aren't there types like a 4-bit or 2-bit integers?
I'm missing out the above things when writing an emulator for a CPU
Montpelier asked 7/1, 2011 at 15:2
2
Solved
I'm using godbolt to get assembly of the following program:
#include <stdio.h>
volatile int a = 5;
volatile int res = 0;
int main() {
res = a * 36;
return 1;
}
If I use -Os optimization, t...
Fremont asked 11/12, 2021 at 15:59
1
Solved
I read about Herb's atomic<> Weapons talk and had a question about page 42:
He mentioned that (50:00 in the video):
(x86) stores are much stronger than they need to be...
What I don't unde...
Frambesia asked 6/12, 2021 at 17:43
1
Solved
Consider the store buffer litmus test with SC atomics:
// Initial
std::atomic<int> x(0), y(0);
// Thread 1 // Thread 2
x.store(1); y.store(1);
auto r1 = y.load(); auto r2 = x.load();
Can th...
Hob asked 2/12, 2021 at 18:7
1
Solved
Since different processes have their own Page table, How does the TLB cache differentiate between two page tables?
Or is the TLB flushed every time a different process gets CPU?
Spinner asked 1/12, 2021 at 15:59
1
Solved
I am trying to compare the methods mentioned by Peter Cordes in his answer to the question that 'set all bits in CPU register to 1'.
Therefore, I write a benchmark to set all 13 registers to all bi...
Allynallys asked 27/11, 2021 at 3:10
1
Hardware instruction for integer division has been historically very slow. For example, DIVQ on Skylake has latency of 42-95 cycles [1] (and reciprocal throughput of 24-90), for 64-bits inputs.
The...
Orgy asked 27/11, 2021 at 7:40
0
I would like to ask for some clarification about what discussed in this thread How is load->store reordering possible with in-order commit ? -- sorry I've not enough reputation to add comments d...
Auriferous asked 19/11, 2021 at 16:23
1
First of all, I do not know whether I should be asking this here or in the Electronics StackExchange, so please let me know if you think I should ask it there.
I am interested in measuring the ener...
Ween asked 10/6, 2021 at 16:46
1
Solved
I'm trying to verify the conclusion that two fuseable pairs can be decoded in the same clock cycle, using my Intel i7-10700 and ubuntu 20.04.
The test code is arranged like below, and it is copied ...
Excited asked 12/11, 2021 at 3:12
7
What is the most reliable way to find out CPU architecture when compiling C or C++ code? As far as I can tell, different compilers have their own set of non-standard preprocessor definitions (_M_X8...
Suberin asked 30/9, 2008 at 6:57
25
Solved
In this C++ code, sorting the data (before the timed region) makes the primary loop ~6x faster:
#include <algorithm>
#include <ctime>
#include <iostream>
int main()
{
// Generat...
Mozarab asked 27/6, 2012 at 13:51
1
Consider the following two alternative pieces of code:
Alternative 1:
if (variable != new_val) // (1)
variable = new_val;
f(); // This function reads `variable`.
Alternative 2:
variable = ...
Busterbustle asked 25/10, 2021 at 16:36
© 2022 - 2024 — McMap. All rights reserved.