Efficiently copying Swift Array to memory buffer for iOS Metal
Asked Answered
J

3

7

I am writing an iOS application using Apple's new Metal framework. I have an array of Matrix4 objects (see Ray Wenderlich's tutorial) that I need to pass in to a shader via the MTLDevice.newBufferWithLength() method. The Matrix4 object is leveraging Apple's GLKit (it contains a GLKMatrix4 object).

I'm leveraging instancing with the GPU calls.

I will later change this to a struct which includes more data per instance (beyond just the Matrix4 object.

  1. How can I efficiently copy the array of [Matrix4] objects into this buffer?

  2. Is there a better way to do this? Again, I'll expand this to use a struct with more data in the future.

Below is a subset of my code:

let sizeofMatrix4 = sizeof(Float) * Matrix4.numberofElements()

// This returns an array of [Matrix4] objects.
let boxArray = createBoxArray(parentModelViewMatrix)

let sizeOfUniformBuffer = boxArray.count * sizeOfMatrix4
var uniformBuffer = device.newBufferWithLength(sizeofUniformBuffer, options: .CPUCacheModeDefaultCache)
let bufferPointer = uniformBuffer?.contents()

// Ouch - way too slow.  How can I optimize?
for i in 0..<boxArray.count
{
    memcpy(bufferPointer! + (i * sizeOfMatrix4), boxArray[i].raw(), sizeOfMatrix4)
}

renderEncoder.setVertexBuffer(uniformBuffer, offset: 0, atIndex: 2)

Note: The boxArray[i].raw() method is defined as this in the Objective-C code:

- (void *)raw {
    return glkMatrix.m;
}

You can see I'm looping through each array object and then doing a memcpy. I did this since I was experiencing problems treating the array as a contiguous set of memory.

Thanks!

Jasper answered 31/8, 2015 at 18:26 Comment(1)
You should be using simd.float4x4.Admittance
A
8

A Swift Array is promised to be contiguous memory, but you need to make sure it's really a Swift Array and not secretly an NSArray. If you want to be completely certain, use a ContiguousArray. That will ensure contiguous memory even if the objects in it are bridgeable to ObjC. If you want even more control over the memory, look at ManagedBuffer.

With that, you should be using newBufferWithBytesNoCopy(length:options:deallocator) to create a MTL buffer around your existing memory.

Allium answered 31/8, 2015 at 19:21 Comment(2)
Rob, thanks for the feedback. I've been trying to work this out since you posted a response but simply am not having any luck. Would you mind providing source code illustrating how you start with an array of [Matrix4] objects, create the MTLDevice buffer and then memcpy to that buffer?Jasper
Rob, do you think ManagedBuffer provides page-aligned storage? A cursory glance at the source code and the documentation suggests that this only enforces element alignment (by honoring MemoryLayout<Element>.alignment), not page alignment. Page alignment is potentially far more wasteful and so I'd expect to see it explicitly mentioned.Slurp
O
4

I've done this with an array of particles that I pass to a compute shader.

In a nutshell, I define some constants and declare a handful of mutable pointers and a mutable buffer pointer:

let particleCount: Int = 1048576
var particlesMemory:UnsafeMutablePointer<Void> = nil
let alignment:UInt = 0x4000
let particlesMemoryByteSize:UInt = UInt(1048576) * UInt(sizeof(Particle))
var particlesVoidPtr: COpaquePointer!
var particlesParticlePtr: UnsafeMutablePointer<Particle>!

var particlesParticleBufferPtr: UnsafeMutableBufferPointer<Particle>!

When I set up the particles, I populate the pointers and use posix_memalign() to allocate the memory:

    posix_memalign(&particlesMemory, alignment, particlesMemoryByteSize)

    particlesVoidPtr = COpaquePointer(particlesMemory)
    particlesParticlePtr = UnsafeMutablePointer<Particle>(particlesVoidPtr)

    particlesParticleBufferPtr = UnsafeMutableBufferPointer(start: particlesParticlePtr, count: particleCount)

The loop to populate the particles is slightly different - I now loop over the buffer pointer:

    for index in particlesParticleBufferPtr.startIndex ..< particlesParticleBufferPtr.endIndex
    {
        [...]

        let particle = Particle(positionX: positionX, positionY: positionY, velocityX: velocityX, velocityY: velocityY)

        particlesParticleBufferPtr[index] = particle
    }

Inside the applyShader() function, I create a copy of the memory which is used as both the input and output buffer:

    let particlesBufferNoCopy = device.newBufferWithBytesNoCopy(particlesMemory, length: Int(particlesMemoryByteSize),
        options: nil, deallocator: nil)

    commandEncoder.setBuffer(particlesBufferNoCopy, offset: 0, atIndex: 0)

    commandEncoder.setBuffer(particlesBufferNoCopy, offset: 0, atIndex: 1)

...and after the shader has run, I put the shared memory (particlesMemory) back into the buffer pointer:

    particlesVoidPtr = COpaquePointer(particlesMemory)
    particlesParticlePtr = UnsafeMutablePointer(particlesVoidPtr)

    particlesParticleBufferPtr = UnsafeMutableBufferPointer(start: particlesParticlePtr, count: particleCount)

There's an up to date Swift 2.0 version of this at my GitHub repo here

Opponent answered 4/9, 2015 at 10:45 Comment(1)
Can you outline the Swift 2 differences?Norge
S
3

Obviously the point of using shared memory and MTLDevice.makeBuffer(bytesNoCopy:...) is to avoid redundant memory copies. Therefore, ideally we look for a design that allows us to easily manipulate the data after it's already been loaded into the MTLBuffer object.

After researching this for a while, I've decided to try and create a semi-generic solution to allow for simplified allocation of page-aligned memory, loading your content into that memory, and subsequently manipulating your items in that shared memory block.

I've created a Swift array implementation called PageAlignedArray that matches the interface and functionality of the built-in Swift array, but always resides on page-aligned memory, and so can be very easily made into an MTLBuffer. I've also added a convenience method to directly convert PageAlignedArray into a Metal buffer.

Of course, you can continue to mutate your array afterwards and your updates will be automatically available to the GPU courtesy of the shared-memory architecture. However, keep in mind that you must regenerate your MTLBuffer object whenever the array's length changes.

Here's a quick code sample:

  var alignedArray : PageAlignedContiguousArray<matrix_double4x4> = [matrixTest, matrixTest]
  alignedArray.append(item)
  alignedArray.removeFirst() // Behaves just like a built-in array, with all convenience methods

  // When it's time to generate a Metal buffer:
  let testMetalBuffer = device?.makeBufferWithPageAlignedArray(alignedArray)

The sample uses matrix_double4x4, but the array should work for any Swift value types. Please note that if you use a reference type (such as any kind of class), the array will contain pointers to your elements and so won't be usable from your GPU code.

Slurp answered 15/10, 2016 at 4:57 Comment(4)
Brilliant!!! Just a question - I was thinking of just creating a mutable array with the initializer that takes a mutable buffer pointer - did you consider this, and if so, why did you reject it?Commutual
@DavidH How would the array grow if set up in this manner? I had my own class do the allocation so as to allow the array to grow as needed.Slurp
Of course you are right. I was thinking of a fixed size mutable array but of course no way to prevent someone from trying too append. Again, great post!Commutual
How allocate all memory at once? if I use native arrays I can call [Float](repeating: 0, count: 40_000_000) to allocate 160mb of ram at once But with this library I have to run loop over 40_000_000 and append array. And this takes ~40secEscrow

© 2022 - 2024 — McMap. All rights reserved.