Higher-order functions in C as a syntactic sugar with minimal effort
Asked Answered
I

2

6

I want to implement higher-order functions (HOFs) in C as a syntactic sugar with minimal effort. For example, for the following code

function add(int x) {
  return int(int y) {
    return x + y;
  };
}

int main() {
  function add1 = add(1);
  return add1(2);
}

it is transcompiled into pure C as

#include <stdlib.h>

typedef struct {
  void *ctx;
  void* (*fun)(void *arg, void *ctx);
} function;

function new_function(void *ctx, void* (*fun)(void *, void *)) {
  function f = {.ctx=ctx, .fun=fun};
  return f;
}

void* apply(function f, void *arg) {
  return (*(f.fun))(arg, f.ctx);
}

typedef struct {
  int x;
} context$1;

void* new_context$1(int x) {
  context$1 *ctx = malloc(sizeof(context$1));
  ctx->x = x;
  return ctx;
}

void* function$1(void *arg, void *ctx) {
  int y = (int)arg;
  int x = ((context$1*)ctx)->x;
  return (void*)(x + y);
}

function add(int x) {
  return new_function(new_context$1(x), function$1);
}

int main() {
  function add1 = add(1);
  return (int)apply(add1, (void*)2);
}

I have run this (manually) transcompiled version and it works fine. For implementation, I believe some AST manipulation and lambda lifting would be enough.

Are there any potential flaws in my approach? Are there any easier methods for HOFs, or can I improve my approach to make it easier to implement?

Igniter answered 4/1, 2013 at 16:3 Comment(0)
V
2

There are two obvious problems by now:

  • Using void* to represent parameters and return values of any type will eventually break. Consider "long long" parameters on ia-32, or any structure passed by value.
  • It's hard (if possible) to make higher-order functions useful without garbage collection. You can see it from your own example where context$1 is malloc'ed but never free'd. Boehm GC may help here.
Viglione answered 4/1, 2013 at 16:25 Comment(5)
I totally agree with your second point and +1 for Boehm GC :-). But could please educate me how would void* eventually break? I believe they would work fine because they're automatically generated.Igniter
@XiaoJia: the problem with void* is in function$1. What if the type of x + y is bigger than void*? In this example that would mean sizeof(int) > sizeof(void*), which would be a strange C implementation, but not so strange if x had type long long instead of int. Then the cast to void* loses information.Chellean
Well, you can always say that your automatic translator will take care of it somehow (as long as you don't describe the algorithm but only provide an example translation). But I just look at (void*)2 and see that it would lose information if there's some wider-than-pointer value instead of 2. (And casting to/from void* won't work at all for structures passed by value; are you refusing to support them at all, or will you translate such functions differently?)Viglione
@AntonKovalenko I see. Thanks for pointing out. I will try to use a better strategy for translation.Igniter
@SteveJessop Thanks! I think I will drop the generic function structure and create a customized structure for each HOF in the code.Igniter
A
1

Be aware that your generated code invokes undefined behavior. In several places, you are converting between integral types and pointer types via direct cast. This isn't legal in C. You have no guarantee that a pointer and an int are the same size or that you can even cast between them without altering or corrupting the value. The code may work on your particular system by coincidence, but it will break on many other systems.

The only way that you can generically handle both pointers and integral types (as well as structures, for that matter) is to pass parameters using variable-length argument lists. That way, you can extract each argument regardless of the size of the underlying data type.

Another option is to drop the generic function structure and create a customized structure for each HOF in the code. The function pointer could then list all of the necessary parameters directly instead of trying to abstract them behind a generic interface, which would eliminate the problematic casts and allow the code to work as expected regardless of the data types used.

As far as usability goes, consider changing your interface so that the return type of the HOF is specified (or at least listed) on the line where it is declared. C objects always list their type in the declaration. In your example, the declaration is function add1 = add(1);. To find out what data type this function returns, you have to dig through the definition for the HOF. This isn't a difficult task for your sample code, but this could be non-trivial for more complex code. You may not have code for the HOF at all if it is coming from a library. Perhaps something like function(int) add1 = add(1); might be more explicit.

On a side note, I recommend that you define auto-generated functions as static to help prevent name collisions between modules.

Abirritate answered 4/1, 2013 at 19:4 Comment(1)
Very helpful comment. Thanks! I will try to update my strategy for translation.Igniter

© 2022 - 2024 — McMap. All rights reserved.