cpu-architecture - 3

4

Solved

What is the purpose of the "Prefer 32-bit" setting in Visual Studio and how does it actually work?

It is unclear to me how the compiler will automatically know to compile for 64-bit when it needs to. How does it know when it can confidently target 32-bit? I am mainly curious about how the com...

c#.net visual-studio compilation cpu-architecture

Severance asked 22/8, 2012 at 5:13

1

Solved

Clock cycles required for multiplication and addition operations [duplicate]

A question I have on Problem 5.5 of the book Computer Systems: A Programmer's Perspective about clock cycles required to perform arithmetic operations: For context, clock cycles required for...

c performance x86 cpu-architecture

Arana asked 8/10, 2022 at 6:27

0

Why does CPU=8 in Intel Core i9-12900K have the fastest access to all other cores?

This is the original tests. The figure shows that CPU=8 in Intel Core i9-12900K has the fastest access to all other cores. Therefore, I am wondering what makes it happen. Besides, I am also curious...

performance cpu intel cpu-architecture

Afforest asked 19/9, 2022 at 1:38

1

Solved

Energy consumption per x86 instruction?

I am aware of a few tools that measure power consumption of programs, such as powerTOP, RAPL and the like. However, I was wondering if there exists some kind of benchmark such as Agner Fog's benchm...

assembly x86 intel cpu-architecture energy

Forespent asked 12/9, 2022 at 16:0

1

Solved

Does anyone have an example where _mm256_stream_load_si256 (non-tempral load to bypasse cache) actually improves performance?

Consider massiveley SIMD-vectorized loops on very large amounts of floating point data (hundreds of GB) that, in theory, should benefit from non-temporal ("streaming" i.e. bypassing cache...

performance x86 cpu-architecture hpc avx

Vivyanne asked 27/8, 2022 at 10:29

0

Does Cache Coherence always prevent reading a stale value? Do invalidation queues allow it?

In MESI protocol you write to the cache line only when holding it in the Exclusive/Modified state. To acquire the Exclusive state, you send an Invalidate request to all the cores holding the same c...

caching cpu-architecture cpu-cache memory-barriers mesi

Ritual asked 27/8, 2022 at 6:48

1

How to handle branch mispredictions that seem to depend on machine code position?

While trying to benchmark implementations of a simple sparse unit lower triangular backward solve in CSC format, I observe strange behavior. The performance seems to vary drastically, depending on ...

c++performance benchmarking cpu-architecture branch-prediction

Underage asked 20/8, 2022 at 16:8

1

Performance of AVX-512 masked memory accesses

Can masking improve the performance of AVX-512 memory operations (load/store/gather/scatter and non-shuffling load-ops)? Seeing as masked out elements don't trigger memory faults, one would assume ...

performance x86 cpu-architecture avx512

Midmost asked 10/8, 2022 at 10:18

6

Solved

How can I get the iOS device CPU architecture in runtime

Is there a way to identify the iOS device CPU architecture in runtime? Thank you.

ios objective-c cpu-architecture

Cassidycassie asked 8/11, 2013 at 12:42

4

RISC-V spec references the word 'hart' - what does 'hart' mean?

I found references to hart on page 35 of the RISC-V 2.1 spec. However, I could not find a definition for hart in this document. Does hart refer to a hardware-thread or something more sinister?

cpu-architecture riscv hyperthreading cpu-cores

Garibald asked 8/3, 2017 at 16:33

9

Solved

API call to get processor architecture

As part of my app I'm using the NDK and was wondering if it's worth bundling x86 and mips binaries alongside the standard ARM binaries. I figured the best way would be to track what my users actua...

android cpu-architecture

Miyokomizar asked 16/8, 2012 at 14:39

2

Solved

Must CPU have an accumulator?

Before you laugh at me: I want to ask arithmetic operation is done in ALU unit or accumulator. I read a book it says accumulator is a register for doing arithmetic. This Accumulator said Withou...

performance assembly cpu cpu-architecture cpu-registers

Peewee asked 26/6, 2016 at 18:7

2

Why are Docker multi-architecture needed (instead of the Docker Engine abstracting the differences)

Short version I would like to know the technical reasons why do Docker images need to be created for multiple architectures. Also, it is not clear whether the point here is creating an image for e...

docker cpu-architecture docker-image docker-engine

Mariellamarielle asked 9/6, 2020 at 23:52

4

How does 32-bit address 4GB if 2³² bits = 4 Billion bits not Bytes?

Essentially, how does 4Gb turn into 4GB? If the memory is addressing Bytes, should not the possibilities be 2(32/8)?

cpu-architecture memory-address addressing

Antiphlogistic asked 13/9, 2014 at 7:26

1

Solved

c++11: how to produce "spurious failures" upon compare_exchange_weak?

Is there a way that we can write some code to produce a "spurious failures" for the "weak" version of compare_exchange? While the same code should work well for compare_exchange...

c++cpu-architecture stdatomic

Melmon asked 27/6, 2022 at 1:50

3

Solved

How are functions encoded/stored in memory?

I understand how things like numbers and letters are encoded in binary, and thus can be stored as 0's and 1's. But how are functions stored in memory? I don't see how they could be stored as 0's a...

function memory encoding cpu-architecture machine-code

Propositus asked 15/8, 2014 at 23:58

2

Solved

Are there any problems for which SIMD outperforms Cray-style vectors?

CPUs intended to provide high-performance number crunching, end up with some kind of vector instruction set. There are basically two kinds: SIMD. This is conceptually straightforward, e.g. instead...

vectorization cpu-architecture simd instruction-set

Cawnpore asked 29/5, 2022 at 9:35

1

Why does bfloat16 have so many exponent bits?

It's clear why a 16-bit floating-point format has started seeing use for machine learning; it reduces the cost of storage and computation, and neural networks turn out to be surprisingly insensitiv...

machine-learning neural-network floating-point cpu-architecture half-precision-float

Wideawake asked 2/6, 2022 at 10:33

7

Solved

Why does this code execute more slowly after strength-reducing multiplications to loop-carried additions?

I was reading Agner Fog's optimization manuals, and I came across this example: double data[LEN]; void compute() { const double A = 1.1, B = 2.2, C = 3.3; int i; for(i=0; i<LEN; i++) { dat...

assembly optimization x86-64 cpu-architecture simd

Mikamikado asked 19/5, 2022 at 14:39

1

Solved

Why 4-level paging can only cover 64 TiB of physical address

There are the words in linux/Documentation/x86/x86_64/5level-paging.rst Original x86-64 was limited by 4-level paging to 256 TiB of virtual address space and 64 TiB of physical address space. I k...

linux-kernel x86-64 cpu-architecture paging page-tables

Volition asked 28/5, 2022 at 16:19

1

Solved

Why does SIMD have single data instructions when it's called SIMD?

I've been wondering.. It's called SIMD as in single instruction multiple data. So why does it have single data instructions? For example, vaddss is the single data equivalent of the multiple data v...

cpu-architecture simd sse cpu-registers avx

Asmodeus asked 27/5, 2022 at 1:9

1

Solved

Explanation for why effective DRAM bandwidth reduces upon adding CPUs

This question is a spin-off of the one posted here: Measuring bandwidth on a ccNUMA system I've written a micro-benchmark for the memory bandwidth on a ccNUMA system with 2x Intel(R) Xeon(R) Platin...

performance parallel-processing intel cpu-architecture numa

Acyclic asked 13/5, 2022 at 12:21

5

Determine the critical path in the data flow [duplicate]

In the book Computer Systems: A Programmer's Perspective, the Exercise 5.5 shows a piece of code to compute the value of a polynomial double poly(double a[], double x, int degree) { long in...

cpu cpu-architecture cpu-cycles

Trici asked 9/2, 2013 at 9:30

1

Solved

Why didn't x86 implement direct core-to-core messaging assembly/cpu instructions?

After serious development, CPUs gained many cores, gained distributed blocks of cores on multiple chiplets, numa systems, etc but still a piece of data has to pass through not only L1 cache (if on ...

assembly x86 intel cpu-architecture message-passing

Cherimoya asked 11/5, 2022 at 19:36

1

Solved

Can CPU Out-of-Order-Execution cause memory reordering?

I know store buffer and invalidate queues are reasons that cause memory reordering. What I don't know is if Out-of-Order-Execution can cause memory reordering. In my opinion, Out-of-Order-Execution...

cpu cpu-architecture cpu-cache riscv memory-barriers

Bailee asked 6/4, 2022 at 14:32

cpu-architecture Questions

Recommended topics

Hot tags