How is standard library for programming language implemented?
Asked Answered
F

1

8

I have problem understanding how are standard libraries for programming languages, other than C, written.

As far as i understand, C standard libraries can be implemented in mixture of C and assembler, where assembler is needed so system calls can be invoked and thus fopen, fscanf ... can be used.

How do the other programming languages accomplish this functionality(working with i/o, files, all other stuff for which system calls are needed) with their standard libary? Do they all allow inlining of assembler like C or is there some other way?

I have read that C and its standard library can be used, for implementing other languages libraries, but i am not sure how this is done.

edit1. Trying to be more specific.
(Language for which standard library is implemented is referred to as new_lang.)

If someone can elaborate how second approach is done(using C runtime) at the object code level and implementation level, because somethings i cant get my head around are:

  1. Is C runtime invoked using C syntax or new_lang syntax? How do we call ssize_t write(int fd, const void *buf, size_t count) from somewhere within new_lang library?
  2. What happens if new_lang doesn't have pointers as data types, how is second argument, const void *buf to write passed from new_lang? How does new_lang follow C runtime api if it doesn't have C data types?
  3. If some function from new_lang library calls C runtime, does it mean that it must obey its abi? Data sizes for types of integer, char, must match in new_lang and C for given platform(and other stuff which is specified by abi, are arguments passed by stack or registers etc.)?
    Isn't this little overrestricting, for example what if new_lang needs more bytes to be reserved for char?

I tried to be as general as possible, but i am not sure how to explain the problem without going into a little detail.

Fenny answered 2/4, 2016 at 20:16 Comment(1)
I'm not flagging this question, but it strikes me as being too broad.Monanthous
T
1

It depends on the language, and it can be even multiple choice. Note that standard libraries/runtimes implemented in C often use compiler specific extensions and attributes and therefore are not written in standard unextended C.

For a language as Pascal, multiple approaches are possible and do exist. Pascal is a language on the same level as C (and/or C++ since most surviving ones are object oriented too), and e.g. FreePascal has its runtime library in Pascal and assembler, and can run on Linux without linking to any C compiled code.

The reasons to go for C are usually more management (availability of tools and programmers) than technical

While at the same time Gnu Pascal is basically a gcc mod, and builds on libgcc, glibc etc.

Answer to edit1:

  1. Afaik that is very internal to the exact target that you are using. There is something write() callable from the system compiler, but that might be a runtime (3) function that wraps the syscall, not (2) syscall directly. Afaik it is guaranteed that (3) functions are really functions and not macros, but I'm not entirely sure about that.

On BSD the syscalls are fairly equal to a function call, on Linux/i386 not. Syntax doesn't matter, the generated code must be equivalent (not the same, but close). The syntax itself doesn't matter, it is how the C compiler interprets the syntax. And usually the only thing guaranteed to work (as far as classic POSIX philosophy goes) is the system C compiler, which is the only one that is guaranteed to be able to interpret system headers, since they often contain non standard extensions or modifiers. Anything else will have to make sure it matches, possibly on a per target basis. Most languages therefore build on top of the C runtime and usually have a C part of their own runtime.

  1. You must make somehow make them match to the C compiler for each target on a per target basis, either by adapting automatically (your whole system is based on C and the C compilers and type equivalency propagates somewhat automatically), by painful target-by-target crafting some equivalence, or wrapping each function in C or assembler code. And sometimes multiple times per target (e.g. MS VC and mingw, though recently these are more compatible then say 10-15 years ago, when gcc wasn't e.g. COM compatible)

E.g. Free Pascal has a cdecl; modifier to mark C callable functions, and the compiler then generates calling code equivalent to the system C compiler on that target.

This sounds bad, but there are usually a few variants only. But that still doesn't make it easy, e.g. the x86_64 API differs slightly between Linux/FreeBSD one one side(sysv), Windows (win64 own convention) and OS X (aix convention). One can avoid it by implementing your whole system as much as possible in C, but then you are stuck with it (and an hybrid language system) forever. Moreover this way Cisms and Unixisms creep into your new language, because it easier.

Many languages on *nix go this way, because it is easier to make quick initial ports to something new. But in turn you get to maintain a hybrid language system. Also usually inherits many build related C traits like external preprocessors, header-are-included as text and reinterpreted over and over again, and a make based build system.

For a list of possible issues see How to design a C / C++ library to be usable in many client languages?

  1. Yes, but only the binary part of it, since of course the C compiler can't do strict forms of typechecking. But sizes, field offsets (packing), calling order, register use, and things like if small structs are passed in registers or not must match.
Trapshooting answered 2/4, 2016 at 20:37 Comment(4)
How is the linking phase done, if C runtime is used? How can object code from, lets say pascal and c be compatible. Even if they follow same abi, data sizes of some data types can differ and thus c runtime code calculates differently where arguments to function are. Exampple: If i wanted to implement writeln from pascal i would use printf for example and when WriteLn('something') is found in pascal file it will ultimately call printf("something"). But what if char in pascal is 2 bytes and in c 1 byte. When code for printf calculates where its arguments are using ebp register it will "miss".Fenny
Yes, so used types must match. But that is if you link code between two different C compilers too, and is simply the reality of using hybrid systems. printf is no substitute to writeln. Writeln is internally quite complex and routes to Pascal file I/O. (taking advantage of buffering if need be), on *nix systems ultimately routed to write(2). Worse trying to would cause valid code like writeln('%s%d%s') to crashTrapshooting
I realize WriteLn is more complex, i tried to emphasize the passing of arguments(following api and abi) which i don't understand and so i made my comment about printf and writeln unrealistically simple(and i only had 600 chars to explain it :)).<br> I edited the question so it explains the question more precisely. (If we are on *nix system, how are arguments passed to that write(2) call...). Btw. Voted your answer up but my reputation isn't high enough for change to be updated immediately.Fenny
I tried to write up an answer. It will probably be very unsatisfactory, but the reality is simply that very little is guaranteed for other languages to interface with the C system, specially if you are not based on the C system (e.g. your compiler is written in C or something compatible (e.g. C++)). Free Pascal is a nice example. The only recommendation I can give you is to not try to solve all problems directly. Better start with a few good targets, (e.g. Linux/x86 /arm and Windows) and prepare them well and in an userfriendly way, and forget about catering to everybody in 1.0Trapshooting

© 2022 - 2024 — McMap. All rights reserved.