Array memory allocation - paging

Asked 26/10, 2011 at 9:6 Answered 26/10, 2011 at 9:35

Not sure if the answer would be the same for Java, C# and C++, so I categorized all of them. Answer for all languages would be nice.

All days I've been thinking, that if I allocate array all the cells would be in one, contiguous space. So if there isn't enough memory in one piece in system there will be raised out of memory exception.

Is it all right, what I said? Or is there possibility, that allocated array would be paginated?

Lashundalasker answered 26/10, 2011 at 9:6 Comment(0)

C++ arrays are contiguous, meaning that the memory has consecutive addresses, i.e. it's contiguous in virtual address space. It need not be contiguous in physical address space, since modern processors (or their memory subsystems) have a big map that associates virtual pages with physical pages. Processes running in user mode never see physical addresses of their arrays.

I think in practice most or all Java implementations are the same. But the programmer never sees an actual address of an array element, just a reference to the array and the means to index it. So in theory, a Java implementation could fracture arrays and hide that fact in the [] operator, although JNI code can still view the array in the C++ style, at which point a contiguous block would be needed. This is assuming there's nothing in the JVM spec about the layout of arrays, which jarnbjo tells me there isn't.

I don't know C#, but I expect the situation is pretty similar to Java - you can imagine that an implementation might use the [] operator to hide the fact that an array isn't contiguous in virtual address space. The pretense would fail as soon as someone obtained a pointer into it. [Edit: Polynomial says that arrays in C# can be discontiguous until someone pins them, which makes sense since you know you have to pin objects before passing them into low-level code that uses addresses.]

Note that if you allocate an array of some large object type, then in C++ the array actually is that many large structures laid end-to-end, so the required size of the contiguous allocation depends on the size of the object. In Java, an array of objects is "really" an array of references. So that's a smaller contiguous block than the C++ array. For native types they're the same.

Reimer answered 26/10, 2011 at 9:19 Comment(2)

There is no requirement in the language- or VM-specification that Java arrays have to be contiguous in memory. This works even with JNI, since you have no direct access to the Java array from native code, but have to call one of the Get<DataType>ArrayElements to get a native "view" of your Java array. AFAIK all JVM implementation do however use contiguous memory for array storage and newer VMs use heap defragmentation techniques if not enough contiguous space is available to allocate a requested array. – Langton 26/10, 2011 at 10:58

@jarnbjo: fair enough, so it's only at the point where GetWhateverArrayElements is called that the implementation needs to find a contiguous address range for the array. Answer updated, thanks. – Reimer 26/10, 2011 at 11:1

In C# you can't guarantee that the memory block will be contiguous. The CLR tries to allocate the memory in one contiguous block, but it may allocate it in several blocks. There is little defined behaviour about how a CLR should manage C# memory, because it is designed to be abstracted away by managed constructs.

The only time it should really matter in C# is if you're passing the array as a pointer via P/Invoke to some unmanaged code, in which case you should use GC.Pin to lock the object's location in memory. Perhaps someone else will be able to explain how the CLR and GC handles the need for contiguous memory in this case.

Nobility answered 26/10, 2011 at 9:18 Comment(2)

Just to clarify, there are obvious performance issues with fragmenting objects, but the CLR and GC will always try to allocate and collect in an optimal way. – Nobility 26/10, 2011 at 9:20

"Perhaps someone else will be able to explain how the CLR and GC handles the need for contiguous memory in this case" -- it "handles" it because your claim that it doesn't guarantee contiguous memory for a single allocation is false. There's too much in .NET that would just plain break if memory for arrays wasn't allocated as a single contiguous block (e.g. Buffer.BlockCopy()). In addition, there would be no need for the large-object heap if .NET could just allocate large arrays as some collection of smaller allocations. – Jornada 26/9, 2017 at 18:48

Is it all right, what I said?

True, in Java and C#, but C++ will only get an error when you have reached the process or system limit. The difference is that in Java and C# its the application imposing a limit on itself. In C++ the limit is imposed by the OS.

Or is there possibility, that allocated array would be paginated?

This is also possible. However in Java, having the heap paged is very bad for performance. When a GC runs, all the objects examined have to be in memory. In C++ its not great but has less impact.

If you want large structures which could be paged in Java you can use ByteBuffer.allocateDirect() or memory mapped files. This works by using memory off the heap (basicaly what C++ uses)

Buckboard answered 26/10, 2011 at 9:18 Comment(0)

In C(++) programs typically (that is, unless we're talking about interpreting code instead of compiling it+executing it directly) arrays are contiguous in the virtual address space (if, of course, there is such a thing on the platform in question).

There, if a big array can't be allocated contiguously, even if there's enough free memory, you will get either the std::bad_alloc exception (in C++) or NULL (from malloc()-like functions in C/C++ or nonthrowing operator new in C++).

Virtual memory (and paging to/from disk) usually doesn't solve virtual address space fragmentation problems, or, at least, not directly, its purpose is different. It's normally used to let programs think there's enough memory, when in fact there isn't. The RAM is effectively extended by the free disk space at the expense of lower performance because the OS has to exchange data between the RAM and disk when there's memory pressure.

Your array (in parts or in whole) can be offloaded to the disk by the OS. But this is made transparent to your program because whenever it needs to access something from the array the OS will load it back (again, in parts or in whole, as the OS deems necessary).

On systems without virtual memory, there's no virtual to physical address translation and your program will work directly with physical memory, hence, it will have to deal with the physical memory fragmentation and also compete with other programs for both free memory and the address space, making allocation failures more likely to occur in general (systems with virtual memory often run programs in separate virtual address spaces and fragmentation in app A's virtual address space won't affect that of app B's).

Aftermost answered 26/10, 2011 at 9:26 Comment(0)

With Java and C# certainly. We can show this by running byte[] array = new byte[4097]; on a Windows machine where the memory page size is 4096bytes. It hence must be in more than one page.

Of course paging impacts performance, but this can be one of the cases where GC using frameworks like .NET or Java can have an advantage, because the GC was written by people who know paging happens. There are still advantages in structures that make it more likely to have related elements on the same page (favouring array-backed collections over pointer-chasing collections). This also has an advantage in terms of CPU caches. (Large arrays are still one of the best ways to cause heap fragmentation that the GC has to struggle with, still since the GC is pretty good at doing so, it's still going to be a win over many other ways of dealing with the same issue).

With C++ almost certainly, because we normally code at the level of the memory-management of the operating system - arrays are in contiguous virtual space (whether on the heap or the stack), not contiguous physical space. It's possible in C or C++ to code at a level below that, but that's normally only done by people actually writing the memory-management code itself.

Scrofulous answered 26/10, 2011 at 9:35 Comment(0)

Recommended topics

Hot tags