Returning a __m128d from MASM procedure to a C caller
Asked Answered
D

1

9

I am porting a function from inline assembly to MASM in Visual Studio 2013 and am having trouble getting a return value out of it.

Here is the C caller and the assembly function prototype:

extern "C" void AbsMax(__m128d* samples, int len, __m128d* pResult);

__m128d AbsMax(__m128d* samples, int len)
{
    __m128d absMax = { 0, 0 };
    AbsMax(samples, len, &absMax);
    return absMax;
}

And the assembly function:

.686              ;Target processor.  Use instructions for Pentium class machines
.xmm

.model flat, c    ;Use the flat memory model. Use C calling conventions
.code             ;Indicates the start of a code segment.

AbsMax proc samples:PTR DWORD, len:DWORD, result:PTR XMMWORD
    ;; Load up registers. xmm0 is min, xmm1 is max. L is Ch0, H is Ch1.
    mov     ecx,  [len]
    shl     ecx,  4
    mov     esi,  [samples]
    lea     esi,  [esi+ecx]
    neg     ecx
    pxor    xmm0, xmm0
    pxor    xmm1, xmm1

ALIGN 16
_loop:
    movaps  xmm2, [esi+ecx]
    add     ecx,  16
    minpd   xmm0, xmm2
    maxpd   xmm1, xmm2
    jne     _loop

    ;; Store larger of -min and max for each channel. xmm2 is -min.
    pxor    xmm2, xmm2
    subpd   xmm2, xmm0
    maxpd   xmm1, xmm2
    movaps  [result], xmm1  ; <=== access violation here

    xor eax, eax
    xor ebx, ebx
    ret
AbsMax ENDP 
END 

As I understand the convention for MASM, return values are normally returned out through the EAX register. However, since I'm trying to return a 128-bit value I'm assuming an out parameter is the way to go. As you can see in the assembly listing, assigning the out parameter (movaps [result]) is causing an access violation (Access violation reading location 0x00000000). I've validated the address of result in the debugger and it looks fine.

What am I doing wrong?

Desberg answered 9/12, 2014 at 19:18 Comment(12)
Is the address aligned properly?Sainfoin
Is it possible that you could modify the caller to return a pointer to the __m128d instead?Nictitate
@Mehrdad. Yes, it is. __m128d is defined with a __declspec to align it properly and I double checked the address in the debugger.Desberg
@Nictitate In this particular case I can return the value out thru xmm0. However, I have some other functions where I need to return multiple values so I really need to figure out how to get an out parameter working. Is that what you were meaning?Desberg
@Desberg Yes, that's what I was asking.Nictitate
When I last programmed in assembly, we used CDECL calling conventions, and we never used out params. If there's room, you could use registers to hold the return values and access them in the caller afterwards. I don't know how to do it another way aside from passing a reference to an array holding the return values.Nictitate
@Nictitate Thanks. I'm using cdecl too. Also, the whole reason I'm going through this exercise is that VC++ doesn't have an inline assembler in 64-bit mode. So I don't know how I'd even access the registers from the caller. ThanksDesberg
Can't you use __asm { ... } and read any register you want into some variables? See http://msdn.microsoft.com/en-us/library/fabdxz08.aspx. If this helps, I'll put it in an answer.Nictitate
Or is this intrinsically not possible because there is no inline assembler?Nictitate
@Desberg I tried searching Google for ways to access registers in c++. Here's one that may help: accu.org/index.php/journals/281Nictitate
I'm curious what happens if you load result into a register and do the indirect write through it: movaps XMMWORD PTR [ebx], xmm1 for example.Oviparous
You can also use intrinsics instead of MASM: _mm_setzero_pd, _mm_min_pd, _mm_max_pd, _mm_sub_pd, and _mm_store_pd should do the job, and be portable between x86 and x64 targets.Oviparous
O
3

For educational purposes, I wrote up a version of your function that uses intrinsics:

#include <immintrin.h>

extern "C" void AbsMax(__m128d* samples, int len, __m128d* pResult)
{
    __m128d min = _mm_setzero_pd();
    __m128d max = _mm_setzero_pd();
    while (len--)
    {
        min = _mm_min_pd(min, *samples);
        max = _mm_max_pd(max, *samples);
        ++samples;
    }
    *pResult = _mm_max_pd(max, _mm_sub_pd(_mm_setzero_pd(), min));
}

Then I compiled using the VC++ x64 compiler using cl /c /O2 /FA absmax.cpp to generate an assembly listing (edited to remove line comments):

; Listing generated by Microsoft (R) Optimizing Compiler Version 18.00.31101.0 
include listing.inc

INCLUDELIB LIBCMT
INCLUDELIB OLDNAMES

PUBLIC  AbsMax
_TEXT   SEGMENT
samples$ = 8
len$ = 16
pResult$ = 24
AbsMax PROC                     ; COMDAT
    xorps   xmm3, xmm3
    movaps  xmm2, xmm3
    movaps  xmm1, xmm3
    test    edx, edx
    je  SHORT $LN6@AbsMax
    npad   3
$LL2@AbsMax:
    minpd   xmm2, XMMWORD PTR [rcx]
    maxpd   xmm1, XMMWORD PTR [rcx]
    lea rcx, QWORD PTR [rcx+16]
    dec edx
    jne SHORT $LL2@AbsMax
$LN6@AbsMax:
    subpd   xmm3, xmm2
    maxpd   xmm1, xmm3
    movaps  XMMWORD PTR [r8], xmm1
    ret 0
AbsMax  ENDP
_TEXT   ENDS
END

Noting that x64 uses a __fastcall convention by default, and shadows the parameters on the stack, I see that the out parameter is in fact being written indirectly through r8, which is the third integer parameter for x64 code, per MSDN. I think if your assembly code adopts this parameter convention, it will work.

The shadowed stack space is not initialized with the actual parameter values; it's intended for callees, if they need a place to stash the values while using the registers. That's why you're getting a zero value dereference error in your code. There's a calling convention mismatch. The debugger knows about the calling convention, so it can show you the enregistered value for the parameter.

Oviparous answered 18/12, 2014 at 22:31 Comment(1)
Using intrinsics isn't possible for me in all cases. The code emitted by VC++ at least can be pretty horrific in some cases and the inline assembly that I'm trying to port is highly optimized inner loops for signal processing. I do like the idea though of using intrisincs to at least model my function prototypes. Thanks.Desberg

© 2022 - 2024 — McMap. All rights reserved.