x86-64 Questions

1

Solved

If we look at a few modern calling conventions, like x86-64 SysV style or AArch64 style (document aapcs64.pdf titled "Procedure Call Standard for the Arm® 64-bit Architecture"), we see ex...
Stentorian asked 23/10 at 10:57

1

Solved

I have populated a zmm register with an array of byte integers from 0-63. The numbers serve as indices into a matrix. Non-zero elements represent rows in the matrix that contain data. Not all rows ...
Oxazine asked 10/5, 2020 at 19:28

2

Solved

I'm studying x86-64 NASM and here is current situation: These codes are for education only, not for running on client-facing system or so. RCX holds loop count, between 1 and 1000. At the beginnin...
Improbability asked 8/9 at 15:50

1

I was playing around with Compiler Explorer and ran into an anomaly (I think). If I want to make the compiler vectorize a sin calculation using libmvec, I would write: #include <cmath> #def...
Eugine asked 20/9, 2016 at 9:54

3

Solved

I want to get the address of _GLOBAL_OFFSET_TABLE_ in my program. One way is to use the nm command in Linux, maybe redirect the output to a file and parse that file to get address of _GLOBAL_OFFSET...
Onceover asked 13/3, 2012 at 15:11

2

Solved

I need to build OpenSSL on OS X for 32 and 64 bit architectures. What are the options I need to give to ./Configure so that I get it built for both architectures into same .a file?
Ithyphallic asked 27/8, 2014 at 14:54

4

Solved

At some point in my program I compute an integer divisor d. From that point onward d is going to be constant. Later in the code I will divide by that d several times - performing an integer divisi...
Blaise asked 27/7, 2017 at 14:23

1

So someone on a forum asked why this C function (which I added const and restrict to, just in case): void foo(int *const restrict dest, const int *const restrict source) { *dest = (*source != -1) ...

7

Solved

While running a program I've written in assembly, I get Illegal instruction error. Is there a way to know which instruction is causing the error, without debugging that is, because the machine I'm ...
Perspicacious asked 27/4, 2012 at 16:11

0

How can I use the LD_PRELOAD trick on Windows to circumvent MKL performance degradation on AMD CPUs? The documentation linked here explains that the LD_PRELOAD trick can be used to force MKL to use...

5

Solved

GCC and Clang both compile bool pred(); void f(); void g(); void h() { if (pred()) { f(); } else { g(); } } to some variation of # Clang -Os output. -O3 is the same h(): push rax call pred...
Parhelion asked 24/4 at 18:46

0

I was wondering how Golang does preemption of goroutines, after 1.14 version where scheduler became non-cooperative and studied the source code, but it seems my knowledge is not enough to comprehen...
Coniferous asked 19/4 at 11:23

2

I often forget the registers that I need to use for each argument in a syscall, and everytime I forget I just visit this question. The right order for integer/pointer args to x86_64 user-space func...
Balloon asked 14/9, 2020 at 21:21

1

Solved

I've stumbled across an oddity in MSVCs codegen, regarding structures that are used as return-values. Consider the following code (live demo here): struct Result { uint64_t value; }; Result makeR...
Crapulous asked 28/3 at 14:56

4

I am trying to compile and link my first program on Assembler. I try to compile the following code: ; %include "stud_io.inc" global _main section .text _main: xor eax, eax again: ; PRINT "He...
Ace asked 31/12, 2012 at 16:0

2

When compiling below code: global main extern printf, scanf section .data msg: db "Enter a number: ",10,0 format:db "%d",0 section .bss number resb 4 section .text main: mov rdi, msg mov a...
Weidman asked 27/6, 2018 at 20:13

1

I am wondering why this code: size_t hash_word(const char* c, size_t size) { size_t hash = uchar(c[0]); hash ^= uchar(c[size - 1]); hash ^= uchar(c[size - 2]); return hash; } When compiled: m...
Enfeeble asked 5/2 at 2:21

2

I recently came across this post describing the smallest possible ELF executable for Linux, however the post was written for 32 bit and I was unable to get the final version to compile on my machin...
Akeyla asked 19/11, 2018 at 21:3

2

Solved

From here, I learned that the support of AVX doesn't imply the support of BMI1. So how about AVX2: Do all CPUs that support AVX2 also support BMI2? Further, does the support of AVX2 imply the suppo...
Palermo asked 8/6, 2023 at 1:33

1

Solved

I've tried to write a few functions to carry out matrix-vector multiplication using a single matrix together with an array of source vectors. I've once written those functions in C++ and once in x8...
Ermeena asked 21/1 at 0:13

0

I'm just starting to learn C moving from C++, and I was just trying out a ton of variable types. I am using the MingGW-w64 toolset with the GCC compiler. This version supposedly uses UCRT runtime i...
Allomerism asked 12/1 at 13:8

2

I have a constant (64-bit) address that I want to load into a register. This address is located in the code, segment, so it could be addressed relative to RIP. What's the differences between movabs...
Disinterest asked 5/1 at 15:0

6

Solved

I know that x87 has higher internal precision, which is probably the biggest difference that people see between it and SSE operations. But I have to wonder, is there any other benefit to using x87?...
Multipara asked 4/12, 2009 at 3:33

1

I would like to implement the following function using SSE. It blends elements from a with packed elements from b, where elements are only present if they are used. void packedBlend16(uint8_t mask...
Cherie asked 16/5, 2020 at 19:52

0

Note: this question is about CPU instructions, not high-level languages (where you are at the mercy of the compiler) From a popular answer: The same floating-point operations, run on the same har...
Dunkirk asked 13/11, 2023 at 19:25

© 2022 - 2024 — McMap. All rights reserved.