You can write your own memory allocation routines that allocate aligned data in the heap. You can specify your own alignment size (not just 16 bytes but also 32 bytes, 64 bytes and so on...):
procedure GetMemAligned(const bits: Integer; const src: Pointer;
const SrcSize: Integer; out DstAligned, DstUnaligned: Pointer;
out DstSize: Integer);
var
Bytes: NativeInt;
i: NativeInt;
begin
if src <> nil then
begin
i := NativeInt(src);
i := i shr bits;
i := i shl bits;
if i = NativeInt(src) then
begin
// the source is already aligned, nothing to do
DstAligned := src;
DstUnaligned := src;
DstSize := SrcSize;
Exit;
end;
end;
Bytes := 1 shl bits;
DstSize := SrcSize + Bytes;
GetMem(DstUnaligned, DstSize);
FillChar(DstUnaligned^, DstSize, 0);
i := NativeInt(DstUnaligned) + Bytes;
i := i shr bits;
i := i shl bits;
DstAligned := Pointer(i);
if src <> nil then
Move(src^, DstAligned^, SrcSize);
end;
procedure FreeMemAligned(const src: Pointer; var DstUnaligned: Pointer;
var DstSize: Integer);
begin
if src <> DstUnaligned then
begin
if DstUnaligned <> nil then
FreeMem(DstUnaligned, DstSize);
end;
DstUnaligned := nil;
DstSize := 0;
end;
Then use pointers and procedures as a third argument to return the result.
You can also use functions, but it is not that evident.
type
PVector^ = TVector;
TVector = packed array [1..4] of Single;
Then allocate these objects that way:
const
SizeAligned = SizeOf(TVector);
var
DataUnaligned, DataAligned: Pointer;
SizeUnaligned: Integer;
V1: PVector;
begin
GetMemAligned(4 {align by 4 bits, i.e. by 16 bytes}, nil, SizeAligned, DataAligned, DataUnaligned, SizeUnaligned);
V1 := DataAligned;
// now you can work with your vector via V1^ - it is aligned by 16 bytes and stays in the heap
FreeMemAligned(nil, DataUnaligned, SizeUnaligned);
end;
As you have pointed out, we have passed nil
to GetMemAligned and FreeMemAligned - this parameter is needed when we want to align existing data, e.g. one which we have received as a function argument, for example.
Just use straight register names rather than parameter names in assembly routines. You will not mess anything with that when using register calling convension - otherwise you risk to modify the registers without knowing that the parameter names used are just aliases for the registers.
Under Win64, with Microsoft calling convention, first parameter is always passed as RCX, second - RDX, third R8, fourth - R9, the rest in stack. A function returns the result in
RAX. But if a function returns a structure ("record") result, it is not returned in RAX, but in an implicit argument, by address.
The following registers may be modifyed by your function after the call: RAX,RCX,RDX,R8,R9,R10,R11. The rest should be preserved.
See https://msdn.microsoft.com/en-us/library/ms235286.aspx for more details.
Under Win32, with Delphi register calling convention, a call passes first parameter in EAX, second in EDX, third in ECX, and rest in stack
The following table summarizes the differences:
64 32
--- ---
1) rcx eax
2) rdx edx
3) r8 ecx
4) r9 stack
So, your function will look like this (32-bit):
procedure add4(const a, b: TVector; out Result: TVector); register; assembler;
asm
movaps xmm0, [eax]
movaps xmm1, [edx]
addps xmm0, xmm1
movaps [ecx], xmm0
end;
Under 64-bit;
procedure add4(const a, b: TVector; out Result: TVector); register; assembler;
asm
movaps xmm0, [rcx]
movaps xmm1, [rdx]
addps xmm0, xmm1
movaps [r8], xmm0
end;
By the way, according to Microsoft, floating point arguments in 64-bit calling convention are passed in direct in the XMM registers: first in XMM0, second in XMM1, third in XMM2, and fourth in XMM3, and rest in stack. So you can pass them by value, not by reference.
CODEALIGN
aligns code. If you want to align data you can use theALIGN
directive. – Encasemovaps xmm1, [b]
is pointless. Useaddps xmm0, [b]
if your inputs are aligned. – Rescission