Passing a JavaScript string to a Rust function compiled to WebAssembly
Asked Answered
A

3

12

I have this simple Rust function:

#[no_mangle]
pub fn compute(operator: &str, n1: i32, n2: i32) -> i32 {
    match operator {
        "SUM" => n1 + n2,
        "DIFF" => n1 - n2,
        "MULT" => n1 * n2,
        "DIV" => n1 / n2,
        _ => 0
    }
}

I am compiling this to WebAssembly successfully, but don't manage to pass the operator parameter from JS to Rust.

The JS line which calls the Rust function looks like this:

instance.exports.compute(operator, n1, n2);

operator is a JS String and n1, n2 are JS Numbers.

n1 and n2 are passed properly and can be read inside the compiled function so I guess the problem is how I pass the string around. I imagine it is passed as a pointer from JS to WebAssembly but can't find evidence or material about how this works.

I am not using Emscripten and would like to keep it standalone (compilation target wasm32-unknown-unknown), but I see they wrap their compiled functions in Module.cwrap, maybe that could help?

Ayeshaayin answered 27/2, 2018 at 17:29 Comment(4)
WebAssembly doesn't have the concept of strings. It only has numbers. See the related How to return a string (or similar) from Rust in WebAssembly?Reider
Possible duplicate of How to return a string (or similar) from Rust in WebAssembly?Wicker
Never, ever return Rust types (e.g. &str) across an FFI boundary. Check out my Rust FFI Omnibus. While it doesn't have anything for WebAssembly (yet), the concepts are all still valid.Reider
for practical uses, I think serializing the types in cap'n proto or protobuf is a sensible thing to do for crossing FFI boundariesSaluki
R
25

Easiest and most idiomatic solution

Most people should use wasm-bindgen, which makes this whole process much simpler!

Low-level manual implementation

To transfer string data between JavaScript and Rust, you need to decide

  1. The encoding of the text: UTF-8 (Rust native) or UTF-16 (JS native).
  2. Who will own the memory buffer: the JS (caller) or Rust (callee).
  3. How to represent the strings data and length: NUL-terminated (C-style) or distinct length (Rust-style).
  4. How to communicate the data and length, if they are separate.

Common setup

It's important to build C dylibs for WASM to help them be smaller in size.

Cargo.toml

[package]
name = "quick-maths"
version = "0.1.0"
authors = ["An Devloper <[email protected]>"]

[lib]
crate-type = ["cdylib"]

.cargo/config

[target.wasm32-unknown-unknown]
rustflags = [
    "-C", "link-args=--import-memory",
]

package.json

{
  "name": "quick-maths",
  "version": "0.1.0",
  "main": "index.js",
  "author": "An Devloper <[email protected]>",
  "license": "MIT",
  "scripts": {
    "example": "node ./index.js"
  },
  "dependencies": {
    "fs-extra": "^8.0.1",
    "text-encoding": "^0.7.0"
  }
}

I'm using NodeJS 12.1.0.

Execution

$ rustup component add rust-std --target wasm32-unknown-unknown
$ cargo build --release --target wasm32-unknown-unknown

Solution 1

I decided:

  1. To convert JS strings to UTF-8, which means that the TextEncoder JS API is the best fit.
  2. The caller should own the memory buffer.
  3. To have the length be a separate value.
  4. Another struct and allocation should be made to hold the pointer and length.

lib/src.rs

// A struct with a known memory layout that we can pass string information in
#[repr(C)]
pub struct JsInteropString {
    data: *const u8,
    len: usize,
}

// Our FFI shim function    
#[no_mangle]
pub unsafe extern "C" fn compute(s: *const JsInteropString, n1: i32, n2: i32) -> i32 {
    // Check for NULL (see corresponding comment in JS)
    let s = match s.as_ref() {
        Some(s) => s,
        None => return -1,
    };

    // Convert the pointer and length to a `&[u8]`.
    let data = std::slice::from_raw_parts(s.data, s.len);

    // Convert the `&[u8]` to a `&str`    
    match std::str::from_utf8(data) {
        Ok(s) => real_code::compute(s, n1, n2),
        Err(_) => -2,
    }
}

// I advocate that you keep your interesting code in a different
// crate for easy development and testing. Have a separate crate
// with the FFI shims.
mod real_code {
    pub fn compute(operator: &str, n1: i32, n2: i32) -> i32 {
        match operator {
            "SUM"  => n1 + n2,
            "DIFF" => n1 - n2,
            "MULT" => n1 * n2,
            "DIV"  => n1 / n2,
            _ => 0,
        }
    }
}

index.js

const fs = require('fs-extra');
const { TextEncoder } = require('text-encoding');

// Allocate some memory.
const memory = new WebAssembly.Memory({ initial: 20, maximum: 100 });

// Connect these memory regions to the imported module
const importObject = {
  env: { memory }
};

// Create an object that handles converting our strings for us
const memoryManager = (memory) => {
  var base = 0;

  // NULL is conventionally at address 0, so we "use up" the first 4
  // bytes of address space to make our lives a bit simpler.
  base += 4;

  return {
    encodeString: (jsString) => {
      // Convert the JS String to UTF-8 data
      const encoder = new TextEncoder();
      const encodedString = encoder.encode(jsString);

      // Organize memory with space for the JsInteropString at the
      // beginning, followed by the UTF-8 string bytes.
      const asU32 = new Uint32Array(memory.buffer, base, 2);
      const asBytes = new Uint8Array(memory.buffer, asU32.byteOffset + asU32.byteLength, encodedString.length);

      // Copy the UTF-8 into the WASM memory.
      asBytes.set(encodedString);

      // Assign the data pointer and length values.
      asU32[0] = asBytes.byteOffset;
      asU32[1] = asBytes.length;

      // Update our memory allocator base address for the next call
      const originalBase = base;
      base += asBytes.byteOffset + asBytes.byteLength;

      return originalBase;
    }
  };
};

const myMemory = memoryManager(memory);

fs.readFile('./target/wasm32-unknown-unknown/release/quick_maths.wasm')
  .then(bytes => WebAssembly.instantiate(bytes, importObject))
  .then(({ instance }) => {
    const argString = "MULT";
    const argN1 = 42;
    const argN2 = 100;

    const s = myMemory.encodeString(argString);
    const result = instance.exports.compute(s, argN1, argN2);

    console.log(result);
  });

Execution

$ yarn run example
4200

Solution 2

I decided:

  1. To convert JS strings to UTF-8, which means that the TextEncoder JS API is the best fit.
  2. The module should own the memory buffer.
  3. To have the length be a separate value.
  4. To use a Box<String> as the underlying data structure. This allows the allocation to be further used by Rust code.

src/lib.rs

// Very important to use `transparent` to prevent ABI issues
#[repr(transparent)]
pub struct JsInteropString(*mut String);

impl JsInteropString {
    // Unsafe because we create a string and say it's full of valid
    // UTF-8 data, but it isn't!
    unsafe fn with_capacity(cap: usize) -> Self {
        let mut d = Vec::with_capacity(cap);
        d.set_len(cap);
        let s = Box::new(String::from_utf8_unchecked(d));
        JsInteropString(Box::into_raw(s))
    }

    unsafe fn as_string(&self) -> &String {
        &*self.0
    }

    unsafe fn as_mut_string(&mut self) -> &mut String {
        &mut *self.0
    }

    unsafe fn into_boxed_string(self) -> Box<String> {
        Box::from_raw(self.0)
    }

    unsafe fn as_mut_ptr(&mut self) -> *mut u8 {
        self.as_mut_string().as_mut_vec().as_mut_ptr()
    }
}

#[no_mangle]
pub unsafe extern "C" fn stringPrepare(cap: usize) -> JsInteropString {
    JsInteropString::with_capacity(cap)
}

#[no_mangle]
pub unsafe extern "C" fn stringData(mut s: JsInteropString) -> *mut u8 {
    s.as_mut_ptr()
}

#[no_mangle]
pub unsafe extern "C" fn stringLen(s: JsInteropString) -> usize {
    s.as_string().len()
}

#[no_mangle]
pub unsafe extern "C" fn compute(s: JsInteropString, n1: i32, n2: i32) -> i32 {
    let s = s.into_boxed_string();
    real_code::compute(&s, n1, n2)
}

mod real_code {
    pub fn compute(operator: &str, n1: i32, n2: i32) -> i32 {
        match operator {
            "SUM"  => n1 + n2,
            "DIFF" => n1 - n2,
            "MULT" => n1 * n2,
            "DIV"  => n1 / n2,
            _ => 0,
        }
    }
}

index.js

const fs = require('fs-extra');
const { TextEncoder } = require('text-encoding');

class QuickMaths {
  constructor(instance) {
    this.instance = instance;
  }

  difference(n1, n2) {
    const { compute } = this.instance.exports;
    const op = this.copyJsStringToRust("DIFF");
    return compute(op, n1, n2);
  }

  copyJsStringToRust(jsString) {
    const { memory, stringPrepare, stringData, stringLen } = this.instance.exports;

    const encoder = new TextEncoder();
    const encodedString = encoder.encode(jsString);

    // Ask Rust code to allocate a string inside of the module's memory
    const rustString = stringPrepare(encodedString.length);

    // Get a JS view of the string data
    const rustStringData = stringData(rustString);
    const asBytes = new Uint8Array(memory.buffer, rustStringData, encodedString.length);

    // Copy the UTF-8 into the WASM memory.
    asBytes.set(encodedString);

    return rustString;
  }
}

async function main() {
  const bytes = await fs.readFile('./target/wasm32-unknown-unknown/release/quick_maths.wasm');
  const { instance } = await WebAssembly.instantiate(bytes);
  const maffs = new QuickMaths(instance);

  console.log(maffs.difference(100, 201));
}

main();

Execution

$ yarn run example
-101

Note that this process can be used for other types. You "just" have to decide how to represent data as a set of bytes that both sides agree on then send it across.

See also:

Reider answered 28/2, 2018 at 1:6 Comment(2)
Regarding solution 1, how can you be sure that you are not overwriting memory in use by the rust program? Does the rust program allocate on a separate memory arena?Fluctuation
So the stuff mentioned here is not working yet or? rustwasm.github.io/docs/wasm-bindgen/reference/types/…Fecundity
B
3

A WebAssembly program has it's own memory space. And this space is often managed by the WebAssembly program itself, with the help of an allocator library, such as the wee_alloc.

The JavaScript can see and modify that memory space, but it has no way of knowing how the allocator library structures are organized. So if we simply write to the WASM memory from the JavaScript then we'll likely overwrite something important and mess things up. Therefore the WebAssembly program itself must allocate the memory region first, pass it to JavaScript, and then the JavaScript can fill that region with the data.

In the following example we do just that: allocate a buffer in the WASM memory space, copy the UTF-8 bytes there, pass the buffer location to a Rust function, then free the buffer.

Rust:

#![feature(allocator_api)]

use std::heap::{Alloc, Heap, Layout};

#[no_mangle]
pub fn alloc(len: i32) -> *mut u8 {
    let mut heap = Heap;
    let layout = Layout::from_size_align(len as usize, 1).expect("!from_size_align");
    unsafe { heap.alloc(layout).expect("!alloc") }
}

#[no_mangle]
pub fn dealloc(ptr: *mut u8, len: i32) {
    let mut heap = Heap;
    let layout = Layout::from_size_align(len as usize, 1).expect("!from_size_align");
    unsafe { heap.dealloc(ptr, layout) }
}

#[no_mangle]
pub fn is_foobar(buf: *const u8, len: i32) -> i32 {
    let js = unsafe { std::slice::from_raw_parts(buf, len as usize) };
    let js = unsafe { std::str::from_utf8_unchecked(js) };
    if js == "foobar" {
        1
    } else {
        0
    }
}

TypeScript:

// cf. https://github.com/Microsoft/TypeScript/issues/18099
declare class TextEncoder {constructor (label?: string); encode (input?: string): Uint8Array}
declare class TextDecoder {constructor (utfLabel?: string); decode (input?: ArrayBufferView): string}
// https://github.com/DefinitelyTyped/DefinitelyTyped/blob/master/types/webassembly-js-api/index.d.ts
declare namespace WebAssembly {
  class Instance {readonly exports: any}
  interface ResultObject {instance: Instance}
  function instantiateStreaming (file: Promise<Response>, options?: any): Promise<ResultObject>}

var main: {
  memory: {readonly buffer: ArrayBuffer}
  alloc (size: number): number
  dealloc (ptr: number, len: number): void
  is_foobar (buf: number, len: number): number}

function withRustString (str: string, cb: (ptr: number, len: number) => any): any {
  // Convert the JavaScript string to an array of UTF-8 bytes.
  const utf8 = (new TextEncoder()).encode (str)
  // Reserve a WASM memory buffer for the UTF-8 array.
  const rsBuf = main.alloc (utf8.length)
  // Copy the UTF-8 array into the WASM memory.
  new Uint8Array (main.memory.buffer, rsBuf, utf8.length) .set (utf8)
  // Pass the WASM memory location and size into the callback.
  const ret = cb (rsBuf, utf8.length)
  // Free the WASM memory buffer.
  main.dealloc (rsBuf, utf8.length)
  return ret}

WebAssembly.instantiateStreaming (fetch ('main.wasm')) .then (results => {
  main = results.instance.exports
  // Prints "foobar is_foobar? 1".
  console.log ('foobar is_foobar? ' +
    withRustString ("foobar", function (buf, len) {return main.is_foobar (buf, len)}))
  // Prints "woot is_foobar? 0".
  console.log ('woot is_foobar? ' +
    withRustString ("woot", function (buf, len) {return main.is_foobar (buf, len)}))})

P.S. The Module._malloc in Emscripten might be semantically equivalent to the alloc function we implemented above. Under the "wasm32-unknown-emscripten" target you can use the Module._malloc with Rust.

Batish answered 11/3, 2018 at 22:12 Comment(1)
Can you add some further prose that describes why this is different / better than the existing answer(s)?Reider
A
-2

As pointed out by Shepmaster, only numbers can be passed to WebAssembly, so we need to convert the string into an Uint16Array.

To do so we can use this str2ab function found here:

function str2ab(str) {
  var buf = new ArrayBuffer(str.length*2); // 2 bytes for each char
  var bufView = new Uint16Array(buf);
  for (var i=0, strLen=str.length; i < strLen; i++) {
    bufView[i] = str.charCodeAt(i);
  }
  return buf;
}

This now works:

instance.exports.compute(
    str2ab(operator), 
    n1, n2
);

Because we're passing a reference to an array of unsigned integers.

Ayeshaayin answered 27/2, 2018 at 19:19 Comment(3)
which is just what Rust is expecting as str — this is not what a &str (note the ampersand) is. A &str is a pointer to data and a length, where the pointer is to a set of u8, not u16. You are not passing the length anywhere. You should never have such a type in a FFI function.Reider
What would be a better solution?Ayeshaayin
You have to do the same as the linked question, but in reverse. You'll need to decide on an encoding (such as UTF-8 or UTF-16), put that encoded data into a buffer, decide on how to transfer that pointer and length across the boundary, then "just do it".Reider

© 2022 - 2024 — McMap. All rights reserved.