x86 Questions
4
Solved
Many of you may recall the old DOS program--debug. Though outdated in many respects, one of the nice things about it was that one could easily find the byte-sequence for a given instruction w...
8
Solved
I want to create a wrapper around the x86 instructions PDEP (Parallel Bit Deposit) and PEXT (Parallel Bit Extract).
On architectures where these aren't available (and the corresponding intrinsics a...
Boxer asked 17/1, 2024 at 16:54
9
Solved
Those familiar with x86 assembly programming are very used to the typical function prologue / epilogue:
push ebp ; Save old frame pointer.
mov ebp, esp ; Point frame pointer to top-of-stack.
sub e...
Learnt asked 12/10, 2014 at 8:20
3
Solved
While writing new code for Windows, I stumbled upon _cpuinfo() from the Windows API. As I am mainly dealing with a Linux environment (GCC) I want to have access to the CPUInfo.
I have tried the fol...
3
1
This question used to be a part of this (now updated) question, but it seems like it should be another question, since it didn't help to get an answer to the other one.
My starting point is a lo...
Housewares asked 27/1, 2020 at 17:59
0
Are there processors on which VPMASKMOVD generates faults for the masked-out elements?
Going by the Intel Software Developer's Manual, the answer is plainly "no":
Faults occur only due t...
Eastereasterday asked 28/1, 2024 at 15:16
4
Solved
There is a recent publication at nature.com, Faster sorting algorithms discovered using deep reinforcement learning, where it talks about AlphaDev discovering a faster sorting algorithm. This caugh...
2
Solved
I am trying to wrap my mind around pointers in Assembly.
What exactly is the difference between:
mov eax, ebx
and
mov [eax], ebx
and when should dword ptr [eax] should be used?
Also when I try to...
2
GCC has 128-bit integers. Using these I can get the compiler to use the mul (or imul with only one operand) instructions. For example
uint64_t x,y;
unsigned __int128 z = (unsigned __int128)x*y;
...
Khat asked 13/3, 2015 at 10:11
1
We compile our code with g++ -march=ivybridge -mtune=skylake. In case somebody runs on older/incompatible architecture I want app to inform and exit gracefully. How do I do this? How about AMD proc...
Burgee asked 18/9, 2020 at 9:18
4
Solved
This is a somewhat low-level question. In x86 assembly there are two SSE instructions:
MOVDQA xmmi, m128
and
MOVNTDQA xmmi, m128
The IA-32 Software Developer's Manual says that the NT i...
6
Solved
2
Solved
I'm trying to make my own custom OS and I need some help with my code.
This is my bootloader.asm:
[ORG 0x7c00]
start:
cli
xor ax, ax
mov ds, ax
mov ss, ax
mov es, ax
mov [BOOT_DRIVE], dl
m...
Lateen asked 9/11, 2015 at 6:51
1
Solved
I am looking for compiler flags of GCC/CLANG to generate BEXTR instruction.
template <auto uSTART, auto uLENGTH, typename Tunsigned>
constexpr Tunsigned bit_extract(Tunsigned uInput)
{
retur...
Dominations asked 22/12, 2023 at 18:46
3
Solved
While debugging some software I noticed INT3 instructions are inserted in between subroutines in many cases.
I assume these are not technically inserted 'between' functions, but instead after t...
Nigro asked 19/10, 2016 at 9:40
1
I would like to implement the following function using SSE. It blends elements from a with packed elements from b, where elements are only present if they are used.
void packedBlend16(uint8_t mask...
7
Often I hear people around me who like to discussion about writing in assembly language and which is one of those reasons I'm also want to learn to write it. Currently I'm learning assembly and C t...
3
Solved
Two different threads within a single process can share a common memory location by reading and/or writing to it.
Usually, such (intentional) sharing is implemented using atomic operations using ...
Shied asked 10/8, 2017 at 0:37
4
Solved
I have a row-wise array of floats (~20 cols x ~1M rows) from which I need to extract two columns at a time into two __m256 registers.
...a0.........b0......
...a1.........b1......
// ...
...a7.......
1
I am conducting a test to measure the message synchronization latency between different cores of a CPU. Specifically, I am measuring how many clock cycles it takes for CPU2 to detect changes in the...
Chema asked 24/11, 2023 at 14:0
8
Solved
I'd like to write a very small proof-of-concept JIT compiler for a toy language processor I've written (purely academic), but I'm having some trouble in the middle-altitudes of design. Conceptually...
Mendiola asked 6/2, 2011 at 6:32
11
Solved
Suppose a1, b1, c1, and d1 point to heap memory, and my numerical code has the following core loop.
const int n = 100000;
for (int j = 0; j < n; j++) {
a1[j] += b1[j];
c1[j] += d1[j];
}
This...
Scornful asked 17/12, 2011 at 20:40
1
I see in AVX2 instruction set, Intel distinguishes the XOR operations of integer, double and float with different instructions. For Integer there's "VPXORD", and for double "VXORPD", for float "VXO...
Halfmoon asked 5/3, 2019 at 18:32
2
Solved
Consider the following small search function:
template <uint32_t N>
int32_t countsearch(const uint32_t *base, uint32_t needle) {
uint32_t count = 0;
// #pragma clang loop vectorize(disable)...
Ambriz asked 22/7, 2018 at 4:0
© 2022 - 2025 — McMap. All rights reserved.