cpu-architecture Questions

1

Solved

I have the following snippet: static long F(long a, long b, long c, long d) { return a + b + c + d; } which generates: <Program>$.<<Main>$>g__F|0_0(Int64, Int64, Int64, Int64) ...

4

Solved

In my opinion: soft reset: boots from the reset vector. hard reset: pull the electrical level of the cpu.
Gramnegative asked 8/5, 2012 at 2:48

2

All benchmarks are run on either Icelake or Whiskey Lake (In Skylake Family). Summary I am seeing a strange phenomina where it appears that when a loop transitions from running out of the Uop Cache...

1

I searched for a answer and I know at a high level how to use lock etc. in a multithreading environment. This question has bugged me for a long time and I think I am not the only one. TL;DR at the ...
Burgrave asked 8/4, 2021 at 3:14

1

Do non-temporal stores (such as movnti), to the same cache line, issued by the same thread, reach the memory in program order? So that for a system with NVRAM (like Intel Cascade Lake processor wit...
Hayfork asked 2/4, 2021 at 10:46

3

Solved

I'm quite new to the MIPS assembly language and am currently taking a class on computer architecture which has a large section on MIPS coding. I've studied several other high-level programming lang...
Triplicate asked 26/10, 2013 at 22:11

1

Solved

My goal is to load a static structure into the L1D cache. After that performing some operation using those structure members and after done with the operation run invd to discard all the modified c...
Engelbert asked 23/3, 2021 at 23:20

1

This is about cache coherency protocol across different layers of cache. My understanding(X86_64) about L1 is that, it is owned exclusively by a core and L2 is between 2 cores and L3 for all the co...
Pincenez asked 21/3, 2021 at 10:43

2

Suppose there are 2 caches L1 and L2 L1 Hit rate of L1=0.8 Access time of l1=2ns and transfer time b/w L1 and CPU is 10ns L2 Hit rate of L2=0.9 Access time of L2 =5ns and transfer time b/w L...
Bartz asked 30/11, 2015 at 14:32

2

Solved

For fun, I'm writing a bignum library in Rust. My goal (as with most bignum libraries) is to make it as efficient as I can. I'd like it to be efficient even on unusual architectures. It seems intu...
Thaxter asked 23/5, 2020 at 15:10

4

Being a beginner and self-learner, I am learning assembly and currently reading the chapter 3 of the book, The C Companion by Allen Hollub. I can't understand the description of Program Counter or...
Carbonization asked 25/8, 2018 at 16:59

2

Solved

So I decided to take a look at how to use SSE, AVX, ... in C via Intel® Intrinsics. Not because of any actual interest to use it for something, but out of pure curiosity. Trying to check if code us...
Sanhedrin asked 12/3, 2021 at 14:44

1

Solved

I discovered this popular ~9-year-old SO question and decided to double-check its outcomes. So, I have AMD Ryzen 9 5950X, clang++ 10 and Linux, I copy-pasted code from the question and here is what...
Falgout asked 7/3, 2021 at 20:57

3

The following example comes from the MSDN. public class ThreadSafe { // Field totalValue contains a running total that can be updated // by multiple threads. It must be protected from unsynchroni...

1

Solved

I have the following C++17 code that I compile with VS 2019 (version 16.8.6) in x64 mode: struct __declspec(align(16)) Vec2f { float v[2]; }; struct __declspec(align(16)) Vec4f { float v[4]; }; st...
Godfearing asked 2/3, 2021 at 11:36

3

Solved

So, this AVX thing - it's like a small machine for each core? Or it's just like one engine-unit for whole CPU? Like, can I use it on each core somehow? I'm playing with it, and I'm feeling like I m...
Matchmaker asked 20/2, 2021 at 18:25

0

Checkout Edit3 I was getting the wrong results because I was measuring without including prefetch triggered events as discussed here. That being said AFAIK I am only see a reduction in RFO requests...

1

Solved

"Late-forwarding" is mentioned in "Arm Neoverse E1 Core Software Optimization Guide" (as well as in their optimization guides for some other CPU models): Instruction Group I...
Reticent asked 15/2, 2021 at 17:3

1

Solved

Modern CPUs use a store buffer to delay commit into cache until retirement, also avoiding WAR and WAW memory hazards. I'm wondering how weak ISAs resolve WAW hazards using the store buffer, which i...
Brannon asked 16/2, 2021 at 12:14

0

All benchmarks are done on: Icelake: Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz (ark) Edit: I was not able to reproduce this on broadwell and @PeterCordes was unable to reproduce it on skylake I was...
Janenejanenna asked 12/2, 2021 at 1:22

2

Solved

Consider two threads, T1 and T2, that store and load an atomic integer a_i respectively. And let's further assume that the store is executed before the load starts being executed. By before, I mean...
Omniscient asked 4/2, 2021 at 22:16

4

Solved

Quoting Intel® 64 and IA-32 architectures optimization reference manual, §2.4.6 "REP String Enhancement": The performance characteristics of using REP string can be attributed to two components:...
Tori asked 24/11, 2015 at 19:21

2

Solved

I see several posts (such as size_t vs. uintptr_t) about size_t versus uintptr_t/ptrdiff_t, but none about the relative sizes of these new c99 ptr size types. example machine: vanilla ubuntu 14lts...
Beller asked 5/6, 2015 at 23:51

1

As I understood, Return Stack Buffer only supports 4 to 16 entries (from wiki: http://en.wikipedia.org/wiki/Branch_predictor#Prediction_of_function_returns) and is not pair of key-value(based on in...

3

Solved

Back in 1982, when Intel released the 80286, they added 4 privilege levels to the segmentation scheme (rings 0-3), specified by 2 bits in the Global Descriptor Table (GDT) and Local Descriptor Tabl...

© 2022 - 2024 — McMap. All rights reserved.