How to store Fortran-style long character scalar in C++ data structure
Asked Answered
C

2

6

I am working with a legacy Fortran library that requires a character scalar PATH as an argument to the subroutine. The original interface was:

SUBROUTINE MINIMAL(VAR1, ..., PATH)

CHARACTER (LEN=4096) PATH

...

I need to be able to call this from C++ so I have made the following changes:

SUBROUTINE MINIMAL(VAR1, ..., PATH) &
    BIND (C, NAME="minimal_f")

    USE ISO_C_BINDING, ONLY: C_CHAR, C_NULL_CHAR

    CHARACTER (KIND=C_CHAR, LEN=1), DIMENSION(4096), INTENT(IN) :: PATH
    CHARACTER (LEN=4096):: new_path

!   Converting C char array to Fortran CHARACTER.
    new_path = " "
    loop_string: do i=1, 4096
        if ( PATH (i) == c_null_char ) then
            exit loop_string
        else
            new_path (i:i) = PATH (i)
        end if
    end do loop_string

as per this answer. This works to convert the C-style char array to its Fortran scalar equivalent, with two problems:

  1. This code is on the critical path so doing the same conversion every time when the answer is the same is inefficient
  2. I would strongly prefer to not have to edit legacy code

I have tried:

  • Just accepting a CHARACTER (LENGTH=4096) :: new_path directly with the ISO C binding, but I get the following compiler error: Error: Character argument 'new_path' at (1) must be length 1 because procedure 'minimal' is BIND(C) This answer and others that I have read suggest that the ISO C binding seems to restrict what I can pass as parameters to the function, although I haven't found any official documentation yet.
  • This answer, which gives another algorithm to turn a C-style string into a Fortran-style equivalent in the C code and passing it to the Fortran subroutine without using the ISO C binding. (This function suggests a similar algorithm). This seems like exactly what I want but I have a linker error without the binding:
Undefined symbols for architecture x86_64:
  "_minimal", referenced from:

C++-side function declaration:

extern "C" { 
    double minimal(int* var1, ..., const char* path);
}

This suggests that my compiler (gcc) prepends the function name with an underscore when in an extern block. gfortran, however, does not let me name the subroutine _minimal so the linker can't find the symbol _minimal. (The aforementioned link suggests adding an underscore to the end of the C-side function name but this doesn't work either because of the leading underscore.)

I want to process a C-style string into a Fortran-style character scalar once in my C++ code and be able to pass it into the original interface. Any ideas?

Crackdown answered 19/8, 2021 at 0:11 Comment(2)
The simplest way would be to keep a copy of the last string passed, either on the Fortran side or the calling C++ side, and compare against that. The downsides are the mechanism for storing that string (e.g. if a global, your code will be limited in ability to be used in multiple threads) and, obviously, the overhead of storing and comparison. Depending on your needs, a simple pointer comparison will be enough (e.g. if the C++ code only passes string literals) rather than comparing strings character by character.Harlen
The ISO_C_BINDING solution you have is correct. It's a bit unfortunate that ISO_C_BINDING does not have a clean way of passing C strings to Fortran strings without requiring the copy, but it is what it is. Then again, I suspect in the vast majority of cases the PATH is going to be short (if it's a name of a file, maybe a dozen characters at most?), so you'll be hard pressed to notice any inefficiency. The other alternative, as you have seen, goes into compiler (and compiler-version) specific calling convention issues.Hereupon
K
4

Fortran 2018 allows interoperable procedures to have character dummy arguments of assumed length, relaxing the restriction that such dummy arguments must be of length one.

So we can write a Fortran procedure as

subroutine minimal(path) bind(c)
  use, intrinsic :: iso_c_binding, only : c_char
  character(*,c_char), intent(in) :: path
  ...
end subroutine minimal

and continue our life knowing that we've also improved our Fortran code by using an assumed length scalar instead of an explicit length one. No "Fortran side" copy of this character dummy is required.

The sad part of this story is that the dummy argument path is not interoperable with a char. So instead of the formal parameter of the C (or C++) function being char * , it must be CFI_cdesc_t *. For (C) example:

#include "ISO_Fortran_binding.h"
#include "string.h"

void minimal(CFI_cdesc_t *);

int main(int argc, char *argv[]) {
  /* Fortran argument will be a scalar (rank 0) */
  CFI_CDESC_T(0) fpath;
  CFI_rank_t rank = 0;

  char path[46] = "afile.txt";

  CFI_establish((CFI_cdesc_t *)&fpath, path, CFI_attribute_other, 
        CFI_type_char, strlen(path)*sizeof(char), rank, NULL);

  minimal((CFI_cdesc_t *)&fpath);
  return 0;
}

A C++ example will be similar.

An notable part of the story is that you'll need a Fortran compiler which implements this part of Fortran 2018. GCC 11 does not.


IanH's answer draws attention to an approach which avoids modifying the original Fortran subroutine at all. There certainly are times when avoiding any change there is good (repeating slightly what IanH said):

  • using bind(c) means an explicit interface will now always be required when calling the modified subroutine through Fortran itself. Perhaps some parts of your code used it with an implicit interface
  • the original was tested (or wasn't) and you don't want to break anything
  • you don't want to potentially change the argument from default kind to interoperable kind (if these do differ)
  • the explicit length dummy argument really is wanted
  • you just don't want to modify it if not required

Any one of those would make a good argument, so in that spirit I'll add to the C example with the thin wrapper.

Fortran:

subroutine minimal_wrap(path) bind(c, name='minimal')
  use, intrinsic :: iso_c_binding, only : c_char
  character(*,c_char), intent(in) :: path

  call minimal(path)
end subroutine minimal_wrap

subroutine minimal(path)
  character(4096) path
  print*, trim(path)
end subroutine minimal

C:

#include "ISO_Fortran_binding.h"
#include "string.h"

void minimal(CFI_cdesc_t *);

static const int pathlength=4096;

int main(int argc, char *argv[]) {
  /* Fortran argument will be a scalar (rank 0) */
  CFI_CDESC_T(0) fpath;
  CFI_rank_t rank = 0;

  char path[pathlength];

  /* Set path as desired. Recall that it shouldn't be null-terminated
     for Fortran */

  CFI_establish((CFI_cdesc_t *)&fpath, path, CFI_attribute_other, 
        CFI_type_char, pathlength*sizeof(char), rank, NULL);

  minimal((CFI_cdesc_t *)&fpath);
  return 0;
}

C++ using containers will arguably be nicer.

Recall that this puts responsibility on the C side to ensure the array is long enough (as you have in pure Fortran calls).

Equally if you need to be robust to differences in default character and interoperable character with that copy (as in IanH's answer) you can apply those same tricks to copy as required (or you can do this with conditional compilation and configure-time checks). By this point however, you may as well just assume always copy or use the array argument.

Krutz answered 19/8, 2021 at 9:18 Comment(0)
A
2

The answer to the question title is typically a std::string object, padded to the relevant fixed Fortran CHARACTER scalar length with spaces. Alternative storage objects (std::vector<char>, or a C-style char array) could be used on the C++ side, but the approach is similar.

(If the Fortran code used an assumed length character argument, rather than fixed length, then the padding would not be required. Whether this change is possible depends on the details of the MINIMAL subroutine. Fixed length character variables are typically an anachronism - this answer is not advocating their use in new code.)

On the Fortran side, you can write a thin wrapper that the C++ can call, that uses sequence and pointer association to avoid the need to copy the string data, for typical C++/Fortran platforms of today. A copy (or modification of the legacy Fortran code) is unavoidable if the interoperable character kind is not the same as the character kind of the legacy Fortran procedure. The example code below is robust to this situation, but I expect platforms that require that code path to be rare.

For default character and C_CHAR interoperable character arguments, sequence association permits an array dummy argument to be associated with the sequence of characters designated by the actual argument. This effectively permits association between character scalars and arrays with different lengths.

(Do not confuse the ISO_C_BINDING intrinsic module with the BIND(C) procedure suffix. BIND(C) fundamentally changes the interface of a procedure to enable calls between C and Fortran - ISO_C_BINDING is just a module with some handy types, constants and procedures for such calls.)

Example C++:

#include <string>
#include <cassert>

const int path_length = 4096;

extern "C" int legacy_cintf(char* array);

int main()
{
  std::string some_long_text 
      = "It was the best of times, it was the worst of times, it was "
        "the age of wisdom, it was the age of foolishness, it was the "
        "epoch of belief, it was the epoch of incredulity, it was the "
        "season of light, it was the season of darkness, it was the "
        "spring of hope, it was the winter of despair.";
  
  assert(some_long_text.size() < path_length);
  
  std::string path = std::string(path_length, ' ');
  path.replace(0, some_long_text.size(), some_long_text);
  
  legacy_cintf(&path[0]);
  
  return 0;
}

Example Fortran:

MODULE m
  IMPLICIT NONE
CONTAINS
  SUBROUTINE legacy_cintf(array) BIND(C, NAME='legacy_cintf')
    USE, INTRINSIC :: ISO_C_BINDING, ONLY: C_INT, C_CHAR
    CHARACTER(LEN=1,KIND=C_CHAR), TARGET :: array(4096)
    
    CHARACTER(LEN=SIZE(array)), POINTER :: scalar
    LOGICAL :: copy_required
    
    copy_required = C_CHAR /= KIND(scalar)
    IF (copy_required) THEN
      ALLOCATE(scalar)
      CALL do_copy(array, scalar)
    ELSE
      CALL do_associate(array, scalar)
    END IF
    
    CALL LEGACY(scalar)
    
    IF (copy_required) DEALLOCATE(scalar)
  END SUBROUTINE legacy_cintf
  
  
  SUBROUTINE do_associate(arg, scalar)
    CHARACTER(*), INTENT(OUT), POINTER :: scalar
    CHARACTER(LEN=LEN(scalar)), INTENT(IN), TARGET :: arg(1)
    scalar => arg(1)
  END SUBROUTINE
  
  
  SUBROUTINE do_copy(arg, scalar)
    USE, INTRINSIC :: ISO_C_BINDING, ONLY: C_CHAR
    CHARACTER(*), INTENT(OUT) :: scalar
    CHARACTER(LEN=LEN(scalar), KIND=C_CHAR), INTENT(IN) :: arg(1)
    scalar = arg(1)
  END SUBROUTINE do_copy
END MODULE m

      SUBROUTINE LEGACY(PATH)
      CHARACTER(4096) :: PATH
      PRINT *, TRIM(PATH)
      END SUBROUTINE LEGACY
Ample answered 19/8, 2021 at 22:26 Comment(1)
Is it worth being more explicit on the "trick" of making the dummy an array of shape [1] to allow the sequence association?Krutz

© 2022 - 2024 — McMap. All rights reserved.