Vbence's solution of using a GZIPInputStream
is a good suggestion. The way this is done in most commercial software - Windows Remote Desktop, VNC, etc. is that only changes to the screen-buffer are sent. So you keep a copy on the server of what the client 'sees', and with each consecutive capture you calculate what is different in terms of screen areas. Then you only send these screen areas to the client along with their top-left coords, width, height. And update the server copy of the client 'view' with just these new areas.
That will MASSIVELY reduce the amount of network data you use, while I have been typing this answer, only 400 or so pixels (20x20) are changing with each keystroke. This on a 1920x1080 screen is just 1/10,000th of the screen, so clearly worth thinking about.
The only expensive part is how you calculate the 'difference' between one frame and the next. There are plenty of libraries out there to do that cheaply, most of them very mathematical (discrete cosine transform type stuff, way over my head), but it can be done relatively cheaply.