x86-64 - McMap

1

Solved

Why modern calling conventions pass variadic arguments in registers?

If we look at a few modern calling conventions, like x86-64 SysV style or AArch64 style (document aapcs64.pdf titled "Procedure Call Standard for the Arm® 64-bit Architecture"), we see ex...

c assembly x86-64 arm64 calling-convention

Stentorian asked 23/10, 2024 at 10:57

1

Solved

Emulate AVX512 VPCOMPRESSB byte packing without AVX512_VBMI2

I have populated a zmm register with an array of byte integers from 0-63. The numbers serve as indices into a matrix. Non-zero elements represent rows in the matrix that contain data. Not all rows ...

x86-64 simd avx avx512

Oxazine asked 10/5, 2020 at 19:28

2

Solved

Is it "too clever" for using LEA to load constant to register?

I'm studying x86-64 NASM and here is current situation: These codes are for education only, not for running on client-facing system or so. RCX holds loop count, between 1 and 1000. At the beginnin...

assembly x86-64 nasm micro-optimization

Improbability asked 8/9, 2024 at 15:50

1

Vectorization of sin and cos

I was playing around with Compiler Explorer and ran into an anomaly (I think). If I want to make the compiler vectorize a sin calculation using libmvec, I would write: #include <cmath> #def...

c++gcc vectorization x86-64 trigonometry

Eugine asked 20/9, 2016 at 9:54

3

Solved

How can I get the _GLOBAL_OFFSET_TABLE_ address in my program?

I want to get the address of _GLOBAL_OFFSET_TABLE_ in my program. One way is to use the nm command in Linux, maybe redirect the output to a file and parse that file to get address of _GLOBAL_OFFSET...

c linux gcc x86-64

Onceover asked 13/3, 2012 at 15:11

2

Solved

Build Multiarch OpenSSL on OS X

I need to build OpenSSL on OS X for 32 and 64 bit architectures. What are the options I need to give to ./Configure so that I get it built for both architectures into same .a file?

macos openssl x86-64 i386

Ithyphallic asked 27/8, 2014 at 14:54

4

Solved

Repeated integer division by a runtime constant value

At some point in my program I compute an integer divisor d. From that point onward d is going to be constant. Later in the code I will divide by that d several times - performing an integer divisi...

c++assembly optimization x86-64 integer-division

Blaise asked 27/7, 2017 at 14:23

1

Why does this ternary generate more Assembly than an equivalent if?

So someone on a forum asked why this C function (which I added const and restrict to, just in case): void foo(int *const restrict dest, const int *const restrict source) { *dest = (*source != -1) ...

c assembly x86-64 compiler-optimization conditional-operator

Lange asked 14/6, 2024 at 13:3

7

Solved

Find which assembly instruction caused an Illegal Instruction error without debugging

While running a program I've written in assembly, I get Illegal instruction error. Is there a way to know which instruction is causing the error, without debugging that is, because the machine I'm ...

c linux assembly x86-64 yasm

Perspicacious asked 27/4, 2012 at 16:11

0

How can I use the LD_PRELOAD trick on Windows to circumvent MKL performance degradation on AMD CPUs?

How can I use the LD_PRELOAD trick on Windows to circumvent MKL performance degradation on AMD CPUs? The documentation linked here explains that the LD_PRELOAD trick can be used to force MKL to use...

windows x86-64 intel-mkl amd-processor numerical-computing

Epicene asked 14/5, 2024 at 23:27

5

Solved

Why do GCC and Clang pop on both branches instead of only once? (Factoring parts of the epilogue out of tail-duplication)

GCC and Clang both compile bool pred(); void f(); void g(); void h() { if (pred()) { f(); } else { g(); } } to some variation of # Clang -Os output. -O3 is the same h(): push rax call pred...

c++assembly gcc x86-64 compiler-optimization

Parhelion asked 24/4, 2024 at 18:46

0

Golang goroutine preemption

I was wondering how Golang does preemption of goroutines, after 1.14 version where scheduler became non-cooperative and studied the source code, but it seems my knowledge is not enough to comprehen...

go assembly x86-64 scheduler fibers

Coniferous asked 19/4, 2024 at 11:23

2

What's the best way to remember the x86-64 System V arg register order?

I often forget the registers that I need to use for each argument in a syscall, and everytime I forget I just visit this question. The right order for integer/pointer args to x86_64 user-space func...

assembly x86-64 cpu-registers calling-convention abi

Balloon asked 14/9, 2020 at 21:21

1

Solved

Why does MSVC never return struct in RAX for member-functions?

I've stumbled across an oddity in MSVCs codegen, regarding structures that are used as return-values. Consider the following code (live demo here): struct Result { uint64_t value; }; Result makeR...

c++assembly visual-c++x86-64 calling-convention

Crapulous asked 28/3, 2024 at 14:56

4

How to compile using nasm on MacOSX

I am trying to compile and link my first program on Assembler. I try to compile the following code: ; %include "stud_io.inc" global _main section .text _main: xor eax, eax again: ; PRINT "He...

macos assembly x86-64 nasm

Ace asked 31/12, 2012 at 16:0

2

glibc scanf Segmentation faults when called from a function that doesn't align RSP

When compiling below code: global main extern printf, scanf section .data msg: db "Enter a number: ",10,0 format:db "%d",0 section .bss number resb 4 section .text main: mov rdi, msg mov a...

linux assembly nasm x86-64 calling-convention

Weidman asked 27/6, 2018 at 20:13

1

Why does GCC use movzbl again to zero-extend a register that's already zero-extended?

I am wondering why this code: size_t hash_word(const char* c, size_t size) { size_t hash = uchar(c[0]); hash ^= uchar(c[size - 1]); hash ^= uchar(c[size - 2]); return hash; } When compiled: m...

c++assembly gcc x86-64 cpu-architecture

Enfeeble asked 5/2, 2024 at 2:21

2

Smallest executable program (x86-64 Linux)

I recently came across this post describing the smallest possible ELF executable for Linux, however the post was written for 32 bit and I was unable to get the final version to compile on my machin...

linux assembly x86-64 executable elf

Akeyla asked 19/11, 2018 at 21:3

2

Solved

Do all CPUs that support AVX2 also support BMI2 or popcnt?

From here, I learned that the support of AVX doesn't imply the support of BMI1. So how about AVX2: Do all CPUs that support AVX2 also support BMI2? Further, does the support of AVX2 imply the suppo...

assembly x86-64 avx2 bmi

Palermo asked 8/6, 2023 at 1:33

1

Solved

AVX512 auto-vectorized C++ matrix-vector functions are much slower when source = destination, in-place

I've tried to write a few functions to carry out matrix-vector multiplication using a single matrix together with an array of source vectors. I've once written those functions in C++ and once in x8...

c++assembly x86-64 avx512 auto-vectorization

Ermeena asked 21/1, 2024 at 0:13

0

Are long doubles broken using MinGW-w64?

I'm just starting to learn C moving from C++, and I was just trying out a ton of variable types. I am using the MingGW-w64 toolset with the GCC compiler. This version supposedly uses UCRT runtime i...

c printf x86-64 mingw-w64 long-double

Allomerism asked 12/1, 2024 at 13:8

2

LEA vs MOV imm64 for loading address-constant into register

I have a constant (64-bit) address that I want to load into a register. This address is located in the code, segment, so it could be addressed relative to RIP. What's the differences between movabs...

assembly x86-64 micro-optimization

Disinterest asked 5/1, 2024 at 15:0

6

Solved

Benefits of x87 over SSE

I know that x87 has higher internal precision, which is probably the biggest difference that people see between it and SSE operations. But I have to wonder, is there any other benefit to using x87?...

x86 x86-64 sse fpu x87

Multipara asked 4/12, 2009 at 3:33

1

Best way to do a packed 16 element blend using SSE

I would like to implement the following function using SSE. It blends elements from a with packed elements from b, where elements are only present if they are used. void packedBlend16(uint8_t mask...

assembly x86 x86-64 intel sse

Cherie asked 16/5, 2020 at 19:52

0

How reproducible are floating point CPU operations on x86-64?

Note: this question is about CPU instructions, not high-level languages (where you are at the mercy of the compiler) From a popular answer: The same floating-point operations, run on the same har...

assembly x86 floating-point x86-64 ieee-754

Dunkirk asked 13/11, 2023 at 19:25

x86-64 Questions

Recommended topics

Hot tags