Because when you write to a socket, the headers of the packet are assembled in a different place from your user-data, so to be coalesced into a network packet, the device needs "gather" capability, at least to get the headers and data.
Also to avoid the CPU having to read the data (and thus, fill its cache up with useless stuff it's never going to need again), the network card also needs to generate its own IP and TCP checksums (I'm assuming TCP here, because 99% of your bulk data transfers are going to be TCP). This is OK, because nowadays they all can.
What I'm not sure is, how this all interacts with TCP_CORK.
Most protocols tend to have their own headers, so a hypothetical protocol looks like:
Client: Send request
Server: Send some metadata; send the file data
So we tend to have a server application assembling some headers in memory, issuing a write(), followed by a sendfile()-like operation. I suppose the headers still get copied into a kernel buffer in this case.