x86 Questions

4

Solved

Many of you may recall the old DOS program--debug. Though outdated in many respects, one of the nice things about it was that one could easily find the byte-sequence for a given instruction w...
Chloe asked 8/7, 2010 at 17:56

8

Solved

I want to create a wrapper around the x86 instructions PDEP (Parallel Bit Deposit) and PEXT (Parallel Bit Extract). On architectures where these aren't available (and the corresponding intrinsics a...
Boxer asked 17/1, 2024 at 16:54

9

Solved

Those familiar with x86 assembly programming are very used to the typical function prologue / epilogue: push ebp ; Save old frame pointer. mov ebp, esp ; Point frame pointer to top-of-stack. sub e...

3

Solved

While writing new code for Windows, I stumbled upon _cpuinfo() from the Windows API. As I am mainly dealing with a Linux environment (GCC) I want to have access to the CPUInfo. I have tried the fol...
Threephase asked 10/1, 2013 at 20:33

3

I want to clamp 32-bit unsigned ints to fixed value (0x10000) using only SSE2 instructions. Basically, this C code: if (c>0x10000) c=0x10000; This code below works, but I'm wondering if it can b...
Franzen asked 2/2, 2024 at 17:46

1

This question used to be a part of this (now updated) question, but it seems like it should be another question, since it didn't help to get an answer to the other one. My starting point is a lo...
Housewares asked 27/1, 2020 at 17:59

0

Are there processors on which VPMASKMOVD generates faults for the masked-out elements? Going by the Intel Software Developer's Manual, the answer is plainly "no": Faults occur only due t...
Eastereasterday asked 28/1, 2024 at 15:16

4

Solved

There is a recent publication at nature.com, Faster sorting algorithms discovered using deep reinforcement learning, where it talks about AlphaDev discovering a faster sorting algorithm. This caugh...
Whitechapel asked 22/6, 2023 at 3:21

2

Solved

I am trying to wrap my mind around pointers in Assembly. What exactly is the difference between: mov eax, ebx and mov [eax], ebx and when should dword ptr [eax] should be used? Also when I try to...
Nikolaos asked 3/5, 2017 at 20:31

2

GCC has 128-bit integers. Using these I can get the compiler to use the mul (or imul with only one operand) instructions. For example uint64_t x,y; unsigned __int128 z = (unsigned __int128)x*y; ...
Khat asked 13/3, 2015 at 10:11

1

We compile our code with g++ -march=ivybridge -mtune=skylake. In case somebody runs on older/incompatible architecture I want app to inform and exit gracefully. How do I do this? How about AMD proc...
Burgee asked 18/9, 2020 at 9:18

4

Solved

This is a somewhat low-level question. In x86 assembly there are two SSE instructions: MOVDQA xmmi, m128 and MOVNTDQA xmmi, m128 The IA-32 Software Developer's Manual says that the NT i...
Millisent asked 31/8, 2008 at 20:18

6

Solved

I know that x87 has higher internal precision, which is probably the biggest difference that people see between it and SSE operations. But I have to wonder, is there any other benefit to using x87?...
Multipara asked 4/12, 2009 at 3:33

2

Solved

I'm trying to make my own custom OS and I need some help with my code. This is my bootloader.asm: [ORG 0x7c00] start: cli xor ax, ax mov ds, ax mov ss, ax mov es, ax mov [BOOT_DRIVE], dl m...
Lateen asked 9/11, 2015 at 6:51

1

Solved

I am looking for compiler flags of GCC/CLANG to generate BEXTR instruction. template <auto uSTART, auto uLENGTH, typename Tunsigned> constexpr Tunsigned bit_extract(Tunsigned uInput) { retur...
Dominations asked 22/12, 2023 at 18:46

3

Solved

While debugging some software I noticed INT3 instructions are inserted in between subroutines in many cases. I assume these are not technically inserted 'between' functions, but instead after t...
Nigro asked 19/10, 2016 at 9:40

1

I would like to implement the following function using SSE. It blends elements from a with packed elements from b, where elements are only present if they are used. void packedBlend16(uint8_t mask...
Cherie asked 16/5, 2020 at 19:52

7

Often I hear people around me who like to discussion about writing in assembly language and which is one of those reasons I'm also want to learn to write it. Currently I'm learning assembly and C t...
Checkerboard asked 15/12, 2011 at 17:44

3

Solved

Two different threads within a single process can share a common memory location by reading and/or writing to it. Usually, such (intentional) sharing is implemented using atomic operations using ...
Shied asked 10/8, 2017 at 0:37

4

Solved

I have a row-wise array of floats (~20 cols x ~1M rows) from which I need to extract two columns at a time into two __m256 registers. ...a0.........b0...... ...a1.........b1...... // ... ...a7.......
Marolda asked 27/2, 2017 at 23:58

1

I am conducting a test to measure the message synchronization latency between different cores of a CPU. Specifically, I am measuring how many clock cycles it takes for CPU2 to detect changes in the...
Chema asked 24/11, 2023 at 14:0

8

Solved

I'd like to write a very small proof-of-concept JIT compiler for a toy language processor I've written (purely academic), but I'm having some trouble in the middle-altitudes of design. Conceptually...
Mendiola asked 6/2, 2011 at 6:32

11

Solved

Suppose a1, b1, c1, and d1 point to heap memory, and my numerical code has the following core loop. const int n = 100000; for (int j = 0; j < n; j++) { a1[j] += b1[j]; c1[j] += d1[j]; } This...
Scornful asked 17/12, 2011 at 20:40

1

I see in AVX2 instruction set, Intel distinguishes the XOR operations of integer, double and float with different instructions. For Integer there's "VPXORD", and for double "VXORPD", for float "VXO...
Halfmoon asked 5/3, 2019 at 18:32

2

Solved

Consider the following small search function: template <uint32_t N> int32_t countsearch(const uint32_t *base, uint32_t needle) { uint32_t count = 0; // #pragma clang loop vectorize(disable)...
Ambriz asked 22/7, 2018 at 4:0

© 2022 - 2025 — McMap. All rights reserved.