Linux memcpy restrict keyword syntax
Asked Answered
G

2

30

I know that the restrict qualifier in C specifies that the memory region pointed by two pointers should not overlap. It was my understanding that the Linux (not SUS) prototype for memcpy looks like -

void* memcpy(void *restrict dest, const void *restrict src, size_t count);

However, when I looked at man7.org/memcpy it seems that the declarations is -

void *memcpy(void dest[restrict .n], const void src[restrict .n], size_t n);

My questions are -

  1. When did this syntax get introduced? C99 or later or is this some GNU extension?
  2. What does the . before n signify? I am familiar with the variable length array declaration. Is the . for the variable appearing after the array specification? Is this part of the standard?
Gracye answered 4/9, 2023 at 5:41 Comment(3)
I don't think that's valid syntax even in gcc. It's probably just a new way of documenting. Can't say I like it.Elaterite
I think the important thing is what those extra things mean in the Linux documentation. It is not like OP is not sure about the C standard itself.Robeson
The thing with pointer to VLA is that n needs to be known in advance in order to be used. But in memcpy it is the right-most parameter, so that isn't possible. Then in the Linux world, there's a lot of people who love to complicate things as much as possible just for the heck of it...Praemunire
R
37

TLDR: It's an ad hoc syntax created in a discussion in a Linux mailing list that is used to express the size of VLA before the variable is declared, the . in .n means n refers to a parameter in the current function declaration, but n may appear after the currently declared parameter. They have also extended the usual int a[restrict n] parameter declaration to void type. I have no idea where such syntax can be found in the official documentation, but the mailing list has all the details.


The change to the memcpy syntax in the Linux library functions manual was introduced by commit c64cd13e. The commit message is copied here verbatim for reference.

Various pages: SYNOPSIS: Use VLA syntax in 'void *' function parameters

Use VLA syntax also for void *, even if it's a bit more weird.

Admittedly, it is weird enough from the C language perspective, because while void f(int n, int[restrict n]) is valid VLA syntax, void f(int n, void[restrict n]) is not because we are not allowed to have arrays of void.

For the . before n, if we dig deeper we can find this thread from the linux-man mailing list.

Let's take an example:

    int getnameinfo(const struct sockaddr *restrict addr,
                    socklen_t addrlen,
                    char *restrict host, socklen_t hostlen,
                    char *restrict serv, socklen_t servlen,
                    int flags);

and some transformations:

    int getnameinfo(const struct sockaddr *restrict addr,
                    socklen_t addrlen,
                    char host[restrict hostlen], socklen_t hostlen,
                    char serv[restrict servlen], socklen_t servlen,
                    int flags);


    int getnameinfo(socklen_t hostlen;
                    socklen_t servlen;
                    const struct sockaddr *restrict addr,
                    socklen_t addrlen,
                    char host[restrict hostlen], socklen_t hostlen,
                    char serv[restrict servlen], socklen_t servlen,
                    int flags);

(I'm not sure if I used correct GNU syntax, since I never used that extension myself.)

The first transformation above is non-ambiguous, as concise as possible, and its only issue is that it might complicate the implementation a bit too much. I don't think forward-using a parameter's size would be too much of a parsing problem for human readers.

I personally find the second form not terrible. Being able to read code left-to-right, top-down is helpful in more complicated examples.

The second one is unnecessarily long and verbose, and semicolons are not very distinguishable from commas, for human readers, which may be very confusing.

    int foo(int a; int b[a], int a);
    int foo(int a, int b[a], int o);

Those two are very different to the compiler, and yet very similar to the human eye. I don't like it. The fact that it allows for simpler compilers isn't enough to overcome the readability issues.

This is true, I would probably use it with a comma and/or syntax highlighting.

I think I'd prefer having the forward-using syntax as a non-standard extension --or a standard but optional language feature-- to avoid forcing small compilers to implement it, rather than having the GNU extension standardized in all compilers.

The problems with the second form are:

  • it is not 100% backwards compatible (which maybe ok though) as the semantics of the following code changes:

int n; int foo(int a[n], int n); // refers to different n!

Code written for new compilers could then be misunderstood by old compilers when a variable with 'n' is in scope.

  • it would generally be fundamentally new to C to have backwards references and parser might need to be changes to allow this

  • a compiler or tool then has to deal also with ugly corner cases such as mutual references:

int foo(int (*a)[sizeof(*b)], int (*b)[sizeof(*a)]);

We could consider new syntax such as

int foo(char buf[.n], int n);

Personally, I would prefer the conceptual simplicity of forward declarations and the fact that these exist already in GCC over any alternative. I would also not mind new syntax, but then one has to define the rules more precisely to avoid the aforementioned problems.

According to my understanding, this basically means the . is a way to refer to a VLA array size parameter that is used before declaration, and one use case is to handle mutual references.

There is a follow-up thread that states,

I am ok with the syntax, but I am not sure how this would work. If the type is determined only later you would still have to change parsers (some C compilers do type checking and folding during parsing, so need the types to be known during parsing) and you also still have the problem with the mutual dependencies.

We thought about using this syntax

int foo(char buf[.n], int n);

because it is new syntax which means we can restrict the size to be the name of a parameter instead of allowing arbitrary expressions, which then makes forward references less problematic. It is also consistent with designators in initializers and could also be extend to annotate flexible array members or for storing pointers to arrays in structures:

struct { int n; char buf[.n]; };

struct { int n; char (*buf)[.n]; };

Of course, there was also objection, which I think many people in the SO community would agree with,

the only point i strongly care about is this one:

Manual pages should not use

  • non-standard syntax
  • non-portable syntax
  • ambiguous syntax (i.e. syntax that might have different meanings with different compilers or in different contexts)
  • syntax that might be invalid or dangerous with some widely used compiler collections like GCC or LLVM
Robeson answered 4/9, 2023 at 7:19 Comment(7)
That's a glibc man page, not the Linux kernel. The memcpy being documented is the one user-space C programs can use in libc.so, not the kernel internals. Some of the Linux manual pages maintained as part of the same project are for system calls, and the glibc wrapper for them, so those man pages will document differences between the library API vs. the raw kernel syscall. e.g. man7.org/linux/man-pages/man2/brk.2.html#NOTES and man7.org/linux/man-pages/man2/clone.2.html#VERSIONS. But others are purely for glibc user-space functions like printf and memcpy.Diphyllous
Adjusted the wordings. You are right, section 3 are not for syscalls and the original wording is a bit misleading. Still, I think the phrase Linux kernel mailing list is an acceptable description for the mailing list.Robeson
The Linux kernel mailing list is a specific mailing list, the LKML, [email protected] . en.wikipedia.org/wiki/Linux_kernel_mailing_list / FAQ: vger.kernel.org/lkml. The list where the discussion took place was the Linux man-pages list. The topic of that mailing list is the Linux man-pages project, not the Linux kernel specifically. It's a Linux mailing list, but not precisely a Linux kernel mailing list. (kernel.org/doc/man-pages/linux-man-ml.html). The fact that it's hosted on kernel.org doesn't make it a Linux kernel mailing list.Diphyllous
Fixed. Also replaced all links with those from marc.info.Robeson
All this effort thinking about forward declarations, and no one found the obvious solution void *memcpy(dst, src, n) void *[restrict n] dst; void const *[restrict n] src; size_t n; { ... }Rattletrap
For what it's worth, types are required for the parameters list since C23, and K&R syntax like yours will become invalid. However, as they have already gone that far in creating an ad hoc syntax just for the sake of it, it doesn't seem they actually care that much about standard conformity. Also refer to the points in the answer by @JaredoMills.Robeson
@WeijunZhou: I don't think there has ever really been a consensus among Committee members as to whether the Standard should seek to define all the constructs programmers might need, or whether it should allow implementations to extend the language to accommodate such needs with the expectation that they will do so. Unfortunately, when there is consensus neither that a construct should be included, or that it should be excluded, the Standard waives jurisdiction without making clear that it's doing so, and some compiler writers interpret that as an intention to forbid the construct.Mccarthy
C
9

For both questions, the VLA notation appears to be a goal of a design principle for C23 whereby "APIs should be self-documenting when possible". See Programming Language C - C23 Charter.

The dot notation does not appear in the April 2023 C23 draft, and I speculate it is a wish-list item for a future revision of the standard. The author of the dot notation openly admits that it's not valid syntax, and gives reasons why he chose it, at 1eed67e

The notation seems to originate in the Linux development community, and its use in published man-pages documentation appears to be somewhat speculative. It was introduced with commits 1eed67e (the commit message is a better answer to this question than I can manage) and c64cd13, and the language "Use VLA syntax also for void *, even if it's a bit more weird.".

The language "even if it's a bit more weird" tells me that the author hopes the syntax might eventually be considered for inclusion in the C standard, since he doesn't cite any authoritative source like a draft or a compiler implementation.

As far as the variable length array feature, it has been supported in GCC as extension since C90 and as a standard since C99: GCC Variable Length documentation. The dot notation used is man-pages in not yet implemented in any GCC version, AFAIK.

glibc uses the void * notation in the header files at the time of this writing (Sep 3, 2023).

Chukar answered 4/9, 2023 at 7:45 Comment(10)
This syntax will not be in C23 so it seems like nonsense to me. And you can't have arrays of void anyway so I don't know what they were even thinking...Praemunire
I think the goal of "self-documenting APIs" is fantastic. void issue aside, the readability of the language would benefit from this VLA notation. The idea needs work, but I think it has merit.Chukar
"the notation seems to be in the proposal stage" is a bit misleading. This normally means that a proposal was written and submitted to the C standard committee, which doesn't seem to be the case here. Or do you mean proposed for use in the man pages?Crape
@Crape I took the text of the introductory commit to literally be a proposal to WG14 members: 1eed67e - it is written, public, and aimed at the standard committee. With the text of this commit message in mind, how could I word my phrase better?Chukar
A proposal is normally a paper submitted to the committee as described here. "how could I word my phrase better" I would get rid of the word "proposal", and of the "might eventually/soon become part of a draft" part.Crape
@Crape you're right; I will amend.Chukar
@JaredoMills "I think the goal of "self-documenting APIs" is fantastic" Except if you don't know how memcpy works without reading man, you probably shouldn't be writing kernel level stuff anyway... :)Praemunire
@Lundin, quite right! The 1eed67e commit's message shows more ambition (that one day all C code will be better self-documented, not only Linux code). It's a beautifully written commit messageChukar
As for if this was proposed to the ISO WG, I think not. There's a copy of the commit/proposal sent by email to members of the ISO WG14 but that's about it. It is not a formal proposal. Notably, Uecker is the one behind the VLA changes proposal N2992 etc. This got voted in and will be in C23. But it only means that pointer to VLA types is once again mandatory like in C99. Declaring objects of VLA type is still optional as per __STDC_NO_VLA__. And this alien VLA syntax in man will not be in C23.Praemunire
Right... the "weird" notation/syntax is an issue distinct from the VLA features that have been adopted for inclusion in C23, like the automatic scoped allocation. OP's question is about the exotic notation and its origin, rather than what new bells and whistles will be added in the next iteration of the C standard; So, that's what the answer focuses on... the notation and the goal of improving documentation (self-documentation and manpage docs)Chukar

© 2022 - 2024 — McMap. All rights reserved.