cpu-architecture Questions

2

What are the costs of a failed store-to-load forwarding on recent x86 architectures? In particular, store-to-load forwarding that fails because the load partly overlaps an earlier store, or becaus...
Parallelepiped asked 9/9, 2017 at 21:43

7

Solved

Starting with Pentium Pro (P6 microarchitecture), Intel redesigned it's microprocessors and used internal RISC core under the old CISC instructions. Since Pentium Pro all CISC instructions are divi...
Utilize asked 27/4, 2011 at 15:27

1

Solved

I'm currently trying to understand the performance properties of certain loops on x86_64 (specifically, my Intel(R) Core(TM) i3-8145U CPU @ 2.10GHz processor). Specifically, adding an extra instruc...
Koine asked 17/1, 2021 at 1:6

1

Solved

I made a bubble sort implementation in C, and was testing its performance when I noticed that the -O3 flag made it run even slower than no flags at all! Meanwhile -O2 was making it run a lot faster...
Mons asked 9/10, 2021 at 2:35

1

Solved

I am designing a 16 bit ALU in verilog based on an existing RISC ISA. The ISA says that the carry flag is set when the operation is unsigned, and overflow is set when the operation is signed. The i...
Jackqueline asked 9/9, 2021 at 21:23

1

Solved

I've been studying the memory model and saw this (quote from https://research.swtch.com/hwmm): Litmus Test: Write Queue (also called Store Buffer) Can this program see r1 = 0, r2 = 0? // Thread 1 /...
Galactometer asked 9/9, 2021 at 3:51

1

I'm struggling to understand the difference between data dependence and control dependence . So what I saw as an example was : data dependence e.g., instruction uses data created by another instruc...

3

Solved

I was reading about differences between threads and processes, and literally everywhere online, one difference is commonly written without much explanation: If a process gets blocked, remaining pr...

3

Solved

Disassembling write(1,"hi",3) on linux, built with gcc -s -nostdlib -nostartfiles -O3 results in: ba03000000 mov edx, 3 ; thanks for the correction jester! bf01000000 mov edi, 1 31c0 xor eax, eax ...
Yclept asked 10/1, 2017 at 16:23

1

Solved

I'm trying to wrap my head around the x86 instruction encoding format. All the sources that I read still make the subject confusing. I'm starting to understand it a little bit but one thing that I'...

1

With ARMv8.3 a new instruction has been introduced: LDAPR. When there is a STLR followed by a LDAR to a different address, then these 2 can't be reordered and hence it is called RCsc (release consi...

2

Solved

Assume we have 3 bits to play with. I am going to represent plus and minus 3 in 2's complement: +3 = 0b011 -3 = 0b101 When doing addition you always have a dangling bit when an overflow happens li...
Vashtivashtia asked 29/7, 2021 at 21:9

2

Solved

My textbook (Computer Systems: A programmer's perspective) states that a latency bound is encountered when a series of operations must be performed in strict sequence, while a throughput bound char...
Kleptomania asked 26/7, 2020 at 2:1

1

Solved

Given this code snippet from this textbook that I am currently studying. Randal E. Bryant, David R. O’Hallaron - Computer Systems. A Programmer’s Perspective [3rd ed.] (2016, Pearson) (global editi...
Farad asked 10/7, 2021 at 6:48

0

64 bit architecture like x86-64 have word size of 64bits. In this case, if a memory access crosses over the word boundary, then it will require double the time to access data. So alignment is requi...

4

Solved

Just going off wikipedia: The page table, generally stored in main memory, keeps track of where the virtual pages are stored in the physical memory. This method uses two memory accesses (one for...
Saltus asked 15/12, 2016 at 20:53

3

Solved

I am not able to understand the difference between Instruction set and Instruction set architecture. I know what is an instruction set. Instruction set just defines the possible instructions we ca...

2

Solved

I did searched on web and intel Software manual . But am unable to confirm if all Intel 64 architectures support upto SSSE3 or upto SSE4.1 or upto SSE4.2 or AVX etc. So that I would be able to use ...
Archil asked 28/1, 2015 at 6:14

1

I'm reading this textbook: Randal E. Bryant, David R. O’Hallaron - Computer Systems. A Programmer’s Perspective [3rd ed.] (2016, Pearson). Currently I am unsure about Problem 5.5 in the text book. ...
Polestar asked 26/6, 2021 at 14:24

0

I'm a long-time user of cachegrind for program profiling, and recently went back to check the official documentation once more: https://valgrind.org/docs/manual/cg-manual.html In it, there are mult...

4

Solved

I want to understand what exactly an interrupt is for my 6502 work-alike processor project in Logisim. I know that an interrupt does the following steps: Stops the current program from processing ...
Doriedorin asked 21/3, 2019 at 11:31

1

Solved

I was running some tests to compare C to Java and ran into something interesting. Running my exactly identical benchmark code with optimization level 1 (-O1) in a function called by main, rather th...
Adena asked 7/6, 2021 at 19:52

12

Solved

I have a need to work with Windows executables which are made for x86, x64, and IA64. I'd like to programmatically figure out the platform by examining the files themselves. My target language is ...
Elle asked 13/10, 2008 at 15:16

0

An analysis on https://ridiculousfish.com/blog/posts/benchmarking-libdivide-m1-avx512.html finds that the new Apple CPU has spent a lot of resources making integer division massively faster. This i...

2

Solved

I was just trying to get a clearer understanding of what exactly multiple cores are being used for, and what the difference is between multiple cores and multiple CPUs. I was trying to understand i...
Verwoerd asked 10/5, 2021 at 0:54

© 2022 - 2024 — McMap. All rights reserved.