As the documentation states and Adonis already mentions this, a write can only be performed by one thread at a time. You won't achieve performance gains through concurreny, moreover, you should only worry about performance if it's an actual issue, because writing concurrently to a disk may actually degrade your performance (probably less for SSDs than HDDs).
The underlying media is in most cases (SSD, HDD, Network) single-threaded - actually, there is no such thing as a thread on hardware level, threads are nothing but an abstraction.
In your case the media is an SSD.
While the SSD internally may write data to multiple modules concurrently ( they may reach a level of parallism where writes may be as fast and even outperform a read), the internal mapping datastructures are shared resource and therefore contended, especially on frequent updates such as concurrent writes. Nevertheless, the updates of this datastructure is quite fast and therefore nothing to worry about unless it becomes a problem.
But apart from this, those are just internals of the SSD. On the outside you communicate over a Serial ATA interface, thus one-byte-at-a-time (actually packets in a Frame Information Structure, FIS). On top of this is a OS/Filesystem that again has a probably contended datastructure and/or applies their own means of optimization such as write-behind-caching.
Further, as you know what your media is, you may optimize especially for that and SSDs are really fast when one single threads writes a large piece of data.
Thus, instead of using multiple threads for writing, you may create a large In-Memory Buffer (probably consider a memory-mapped file) and write concurrently into this buffer. The memory itself is not contended, as long as you ensure each thread access it's own address space of the buffer. Once all threads are done, you write this one buffer to the SSD (not needed if using memory-mapped file).
See also this good summary about developing for SSDs:
A Summary – What every programmer should know about solid-state drives
The point for doing pre-allocation (or to be more precise, file_.setLength()
, which acutally maps to ftruncate
) is that the resizing of the file may use extra-cycles and you may wan't to avoid that. But again, this may depend on the OS/Filesystem.