Answering to old message since there is stuff to correnct:
Unlike many answers here claim, TCP does not guarantee data to arrive uncorrupted. Not even practically.
TCP protocol has a 2-byte crc-checksum that obviously has a 1:65536 chance of collision if more than one bit flips. This is such a small chance it will never be encountered in tests, but if you are developing something that either transmits large amounts of data and/or is used by very many end users, that dice gets thrown trillions of times (not kidding, youtube throws it about 30 times a second per user.)
Option 2: size field is the only practical option for the reasons you yourself listed. Fixed length messages would be wasteful, and delimiter marks necessitate running the entire payload through some sort of encoding-decoding stage to replace at least three different symbols: start-symbol, end-symbol, and the replacement-symbol that signals replacement has occurred.
In addition to this one will most likely want to use some sort of error checking with a serious checksum. Probably implemented in tandem with the encryption protocol as a message validity check.
As to the possibility of getting out of sync:
This is possible per message, but has a remedy.
A useful scheme is to start each message with a header. This header can be quite short (<30 bytes) and contain the message payload length, eventual correct checksum of the payload, and a checksum for that first portion of the header itself. Messages will also have a maximum length. Such a short header can also be delimited with known symbols.
Now the receiving end will always be in one of two states:
- Waiting for new message header to arrive
- Receiving more data to an ongoing message, whose length and checksum are known.
This way the receiver will in any situation get out of sync for at most the maximum length of one message. (Assuming there was a corrupted header with corruption in message length field)
With this scheme all messages arrive as discrete payloads, the receiver cannot get stuck forever even with maliciously corrupted data in between, the length of arriving payloads is know in advance, and a successfully transmitted payload has been verified by an additional longer checksum, and that checksum itself has been verified. The overhead for all this can be a mere 26 byte header containing three 64-bit fields, and two delimiting symbols.
(The header does not require replacement-encoding since it is expected only in a state whout ongoing message, and the entire 26 bytes can be processed at once)