Or does the JRE allocate separate stacks for each threads?
Conceptually yes. (See this JVM spec link, for example.)
How the spec's conceptualization gets implemented in a particular JVM is ... implementation specific. However, my understanding is that current generation (e.g. Hotspot) JVMs allocate each thread stack in a separate block of memory requested from the OS; e.g. using a mmap
syscall1.
There is certainly no wholesale copying of stack content when a thread switch occurs. However thread context switching does entail saving and loading registers, and (indirectly) to extra load on memory cache and TLB entries. This can be significant ... which is why excessive thread context switches (e.g. caused by lock contention or excessive wait/notify) can be bad for performance.
1 - Some JVMs include a read-only "red-zone" page at the end of each stack segment. (This means that thread stack overflow triggers a memory fault, and the JVM doesn't need to explicitly check for stack overflow on each method call, which would be a significant performance hit.) Anyhow, my understanding is that the "red-zone" page requires the thread stacks to be requested using mmap.