Are there any platforms where pointers to different types have different sizes?
Asked Answered
C

7

71

The C standard allows pointers to different types to have different sizes, e.g. sizeof(char*) != sizeof(int*) is permitted. It does, however, require that if a pointer is converted to a void* and then converted back to its original type, it must compare as equal to its original value. Therefore, it follows logically that sizeof(void*) >= sizeof(T*) for all types T, correct?

On most common platforms in use today (x86, PPC, ARM, and 64-bit variants, etc.), the size of all pointers equals the native register size (4 or 8 bytes), regardless of the pointed-to type. Are there any esoteric or embedded platforms where pointers to different types might have different sizes? I'm specifically asking about data pointers, although I'd also be interested to know if there are platforms where function pointers have unusual sizes.

I'm definitely not asking about C++'s pointer-to-members and pointer-to-member-functions. Those take on unusual sizes on common platforms, and can even vary within one platform, depending on the properties of the pointer-to class (non-polymorphic, single inheritance, multiple inheritance, virtual inheritance, or incomplete type).

Candicecandid answered 27/5, 2009 at 14:35 Comment(8)
Curious, what section of the standard allows for the different pointer sizes? Would you mind posting that sectionPlacebo
Nit-pick: the "native integral type" in C has to be int, which is rarely 64-bit even on 64-bit platforms, AFAIK. In other words, LP64 is more common than ILP64.Feud
@JaredPar: I'm not exactly sure where it says so in the standard, but this page lysator.liu.se/c/rat/d9.html#4-9-6-1 makes mention of it, regarding the %p fprintf format specifier. @unwind: s/native integer size/native register size/Candicecandid
I would like to know if there is N for which the following holds: sizeof(anypointer) <= N*sizeof(int). Fo instance, if I assume that an int is no more than N=4 times smaller than a pointer, what are the odds that this assumption is wrong?Output
Minor observation: The size of all struct X * pointers is the same. The reason is that you can forward declare a struct, e.g. struct X; and then put a pointer to it in another struct, struct Y { struct X *x; }; and the compiler can still know how much space struct Y requires, without knowing anything about struct X.Yuki
Itanium and PowerPC had something like two-word function pointers: one word pointed to the actual instructions, and one word pointed to globals used by the function (which might differ between libraries). See, for example, stackoverflow.com/questions/18907934/… and mikedimmick.blogspot.com/2004/01/ia64s-global-pointer.html. The "pointer" from the perspective of your C code was often a single-word pointer to a pair of words, though.Erdah
Just curious, doesn't the wording in the initial statement of the question has to be "pointers of different types to have..." instead of "pointers to different types to have...". char * p_dummy is a pointer of type char not a pointer to type charElwina
A variation of the problem was exhibited on ICL (and Three Rivers Corporation) Perq machines back in the mid-80s (predating standard C and void pointers). The machine was (16-bit) word addressed. The char * address for a memory location was different from the 'anything_bigger *` address for the same memory location. AFAICR, the pointers were all the same size (32 bits), but the bit patterns were different. This is from the days when going from 1 MiB to 2 MiB of main memory was a massive improvement — the fact that the bit patterns in the most significant bits were different didn't matter.Donar
T
51

Answer from the C FAQ:

The Prime 50 series used segment 07777, offset 0 for the null pointer, at least for PL/I. Later models used segment 0, offset 0 for null pointers in C, necessitating new instructions such as TCNP (Test C Null Pointer), evidently as a sop to all the extant poorly-written C code which made incorrect assumptions. Older, word-addressed Prime machines were also notorious for requiring larger byte pointers (char *'s) than word pointers (int *'s).

The Eclipse MV series from Data General has three architecturally supported pointer formats (word, byte, and bit pointers), two of which are used by C compilers: byte pointers for char * and void *, and word pointers for everything else. For historical reasons during the evolution of the 32-bit MV line from the 16-bit Nova line, word pointers and byte pointers had the offset, indirection, and ring protection bits in different places in the word. Passing a mismatched pointer format to a function resulted in protection faults. Eventually, the MV C compiler added many compatibility options to try to deal with code that had pointer type mismatch errors.

Some Honeywell-Bull mainframes use the bit pattern 06000 for (internal) null pointers.

The CDC Cyber 180 Series has 48-bit pointers consisting of a ring, segment, and offset. Most users (in ring 11) have null pointers of 0xB00000000000. It was common on old CDC ones-complement machines to use an all-one-bits word as a special flag for all kinds of data, including invalid addresses.

The old HP 3000 series uses a different addressing scheme for byte addresses than for word addresses; like several of the machines above it therefore uses different representations for char * and void * pointers than for other pointers.

The Symbolics Lisp Machine, a tagged architecture, does not even have conventional numeric pointers; it uses the pair (basically a nonexistent handle) as a C null pointer.

Depending on the ``memory model'' in use, 8086-family processors (PC compatibles) may use 16-bit data pointers and 32-bit function pointers, or vice versa.

Some 64-bit Cray machines represent int * in the lower 48 bits of a word; char * additionally uses some of the upper 16 bits to indicate a byte address within a word.

Additional links: A message from Chris Torek with more details about some of these machines.

Threap answered 8/10, 2009 at 17:7 Comment(3)
@David Cowden Suspect this answer was written 4 months after the accepted one. Since the OP asked for an example and got it - that answer was accepted. This post, certainly more complete, deserved its high vote rating. Maybe highly rated non-accepted answers deserve a community "accept override"? That sounds like a "meta" question though.Scyphate
@chux probably better to just notify the asker that a response to their question has vastly surpassed the accepted answer and let them reconsider. Ultimately it's up to the asker to determine which response best answers their question. I simply commented to underscore the fact that I very much enjoy this answer (=Sumikosumma
@chux that's what the 'Populist' badge is for. stackoverflow.com/help/badges/62/populistElora
E
33

Not quite what you're asking, but back in the 16-bit DOS/Windows days, you did have the distinction between a pointer and a far-pointer, the latter being 32-bits.

I might have the syntax wrong...

int *pInt = malloc(sizeof(int));
int far *fpInt = _fmalloc(sizeof(int));

printf("pInt: %d, fpInt: %d\n", sizeof(pInt), sizeof(fpInt));

Output:

pInt: 2, fpInt 4

Elora answered 27/5, 2009 at 14:41 Comment(3)
Bah, I completely forgot about near and far pointers. I was aware of their existence, but when I was writing this question, they totally slipped my mind.Candicecandid
Is 16bit DOS supposed to be an example with a standard conforming compiler?Erbe
@sellibitze: You are right - the far attribute is not in any C standard, so the snippet in the answer is not valid standard C. Thus this arguably does not quite answer the question (which seems to ask about standard C) - but it's still a valuable answer IMHO.Icy
D
16

Therefore, it follows logically that sizeof(void*) >= sizeof(T*) for all types T, correct?

That doesn't necessarily follow, since sizeof is about the storage representation, and not all bit-patterns have to be valid values. I think you could write a conformant implementation where sizeof(int*) == 8, sizeof(void*) == 4, but there are no more than 2^32 possible values for an int*. Not sure why you'd want to.

Dissipation answered 27/5, 2009 at 15:8 Comment(7)
It depends on your definition of "logically" ;v)Erida
How could malloc() be used to allocate, and then assign, an int*? Also, did you mean "There are no more than 2^32 possible values for an int*" or "for an int"?Carpet
@smci: I meant for an int*. The point is that provided there are no more than 2^32 legal values for an int* then it doesn't matter what the size of int* is, or which values are the legal ones, the implementation can still implement conversion from int* to a 4-byte void* and back again. You can allocate and then assign an int* using malloc as follows: int **ppint = malloc(sizeof(*ppint)); *ppint = 0;. But I don't see what that has to do with either the question or my answer.Dissipation
But you were talking about the case sizeof(int*) == 8. I guess you meant "8 bytes, but with the high 4 bytes zero-padded". That's not really the general case sizeof(int*) == 8Carpet
Right, it could be the high bytes that are padding. It could be some other bytes. The distribution of used and unused values could be very complex. As I said, I can't think of any good reasons to actually implement that, but I'm not talking about the general case, I'm talking about whether or not something "follows logically". The general case is that sizeof(T*) is the same for all object types T.Dissipation
I believe the original statement is true, but only for value pointers, and not function pointers!Cockatoo
The Standard makes little effort to distinguish between things implementations might theoretically do, that would obviously be silly and useless on most platforms, but might conceivably be useful on some obscure platform, versus those which wouldn't possibly be useful on any platform. The more obviously useless something would be, the less need the authors of the Stadndard would have seen to prohibit it (a fact which unfortunately leads to modern compilers doing some absurdly silly things).Desma
N
14

Back in the golden years of DOS, 8088s and segmented memory, it was common to specify a "memory model" in which e.g. all code would fit into 64k (one segment) but data could span multiple segments; this meant that a function pointer would be 2 bytes, a data pointer, 4 bytes. Not sure if anybody is still programming for machines of that kind, maybe some still survive in embedded uses.

Nickolas answered 27/5, 2009 at 14:39 Comment(2)
They are not that uncommon in the embedded world. DOS is still used a lot.Ozellaozen
@Nils, I recently (well after I'd posted this) interviewed a new grad (EE) and his main assembly experience (from embedded uses, of course) turned out to be with Intel 8051 and Freescale 6811 -- 8-bit descendants of CPUs I studied at college in the '70s (!), and even then we hankered for more powerful ones such as Zilog Z80. So 8088 and DOS would be a big step up there...!Nickolas
H
8

Near and far pointers are still used on some embedded microcontrollers with paged flash or RAM, to allow you to point to data in the same page (near pointer), or another page (far pointer, which is larger because it includes page information).

For example, Freescale's HCS12 microcontroller uses a 16-bit Von Neumann architecture, which means that no address can be more than 16 bits. Because of the limitation this would put on the amount of code space available, there is an 8-bit page register.

So to point to data in the same code page, you just specify the 16-bit address; this is a near pointer.

To point to data in another code page, you have to include both the 8-bit page number and the 16-bit address within that page, resulting in a 24-bit far pointer.

Higgle answered 27/5, 2009 at 15:51 Comment(0)
P
8

It is possible that the size of pointers to data differs from pointers to functions for example. It is common for this to occur in microprocessor for embedded system. Harvard architecture machines like dmckee mentioned makes this easy to happen.

It turns out that it makes gcc backends a pain to develop! :)

Edit: I can't go into the details of the specific machine I am talking about but let me add why Harvard machines make this easy to happen. The Harvard architecture has different storage and pathways to instructions and data, therefore if the bus for the instructions is 'larger' than the one for data, you're bound to have a function pointer whose size is bigger than a pointer to data!

Pawl answered 10/3, 2010 at 12:59 Comment(0)
J
7

One could easily imagine a Harvard architecture machine having different sizes for function pointers and all other pointers. Don't know of an example...

Jahdal answered 27/5, 2009 at 14:49 Comment(3)
Harvard Architecture used often in embedded processors (PIC) in 2013.Scyphate
Yes, Harvard architectures show up in a lot of embedded chips. But do you know of one that implements function-pointers that are a different size than the other pointers on the same platform?Jahdal
PIC16 (CCS compiler) used a goofy 9-10 bit RAM (1-2 bit page reg and 8-bit offset.) Cumbersome to even memcpy(). Non-volatile data is stuck in ROM (I think 14-16 bit even address pointing to 1 byte) and used a special memcpy()/strcpy(). Functions are hard to get/use via a function pointer, also have the 14-16 bit address into a half 14-bit instruction word, so address must be even. Sure I've gotten this story messed up as I use a PIC24 much more often and let the compiler deal with addresses knowing, I, the coder, must not wily-nilly mix pointer types.Scyphate

© 2022 - 2024 — McMap. All rights reserved.