Unix vs BSD vs TCP vs Internet sockets?
Asked Answered
C

2

14

I am reading The Linux Programming Interface and it describes several different types of socket used on Linux:

  • Unix domain
  • Berkeley
  • TCP
  • Internet

One of the things the book said is that if you want to communicate between remote hosts, you couldn't use Unix domain sockets because they are for IPC on the same host. You have to use "Internet" sockets.

However, I am still a little confused how this relates with "TCP" sockets, Berkeley sockets and the other 2? What is their relationship? Why would you have an Internet socket as well as a TCP socket?

In short I am trying to understand all (have I missed any out?) the various different types of Unix sockets and under what circumstances I would use them?

Circularize answered 6/4, 2014 at 18:16 Comment(0)
G
25

A socket is an abstraction. The tag definition used on SO for a socket is as good as any:

An endpoint of a bidirectional inter-process communication flow. This often refers to a process flow over a network connection, but by no means is limited to such.

So from that a major distinction are sockets that (1) use a network and (2) sockets that do not.

Unix domain sockets do not use the network. Their API makes it appear to be (mostly) the same to the developer as a network socket but all the communication is done through the kernel and the sockets are limited to talking to processes on the machine upon which they are running.

Berkeley sockets are what we know as network sockets on POSIX platforms today. In the past there were different lines of Unix development (e.g. Berkeley or BSD, System V or sysV, etc.) Berkeley sockets essentially won in the marketplace and are effectively synonymous with Unix sockets today.

Strictly speaking there isn't a TCP socket. There are network sockets that can communicate using the TCP protocol. It's just a linguist shorthand to refer to them as a TCP socket to distinguish them from a socket using another protocol e.g. UDP, a routing protocol or whatnot.

An "Internet" socket is a mostly a meaningless distinction. It's a socket using a network protocol. That eliminates Unix domain sockets, but most network protocols can be used to communicate on a LAN or the Internet, which is just collection of networks. (Though do note there are protocols specific to local networks as well as those that manage collections of networks.)

Greige answered 6/4, 2014 at 18:48 Comment(2)
Thanks for this. So am I right in thinking there are only really two raw sockets- POSIX/Berkeley and Unix domain. You can take the POSIX/Berkeley and "bolt-on" various settings which effectively turns it in to an "internet" socket, or a TCP socket- but you need a Berkeley/POSIX socket for the underlying?Circularize
That sounds about right. You might want to use the term "base" rather than "raw" since raw sockets imply IP layer datagrams without any transport level protocol formatting e..g tcp, udp, etc.Greige
A
0

With due respect, where exactly do you see that subdivision in the book?

I'm assuming we're talking about The Linux Programming Interface — A Linux and UNIX® System Programming Handbook by Michael Kerrisk, published by the No Starch Press (SF, CA, USA) in 2010 with the ISBN-13: 978-1-59327-220-3?

I'm not sure what edition I've got (I believe it's the first one), but on Part 7, Chapter 56, which is where Kerrisk introduces sockets, I don't see any such subdivision between "Unix Domain", "Berkeley", "TCP", and "Internet". I was quite surprised to read that claim of yours, because, well, Kerrisk is supposed to be one of the authorities in the field, and I rather assumed that he wouldn't get such a simple fact wrong.

So, re-reading the first pages of the above chapter(s), I confirmed that Kerrisk doesn't give those four options at all. Instead — as expected — he starts by explaining that there is something else in a Unix kernel which allows IPC, using a rather different mechanism than queues on pipes (which he had explained on previous chapters) — sockets.

And he starts by explaining that there are two main domains of sockets, namely, the so-called "Unix Domain" (internal to the host) and the "Internet Domain" (using the Internet Protocol to communicate to the external world), of which there are two subtypes, IPv4 and IPv6.

Besides the domains, there are also (at least) two socket types, known as streaming ("a reliable, bidirectional, byte-stream communication channel") also known as connection-oriented, which on the Internet we usually associate with TCP; and datagram ("[...] message boundaries are preserved, but data transmission is not reliable. Messages may arrive out of order, be duplicated, or not arrive at all.") which we consider to be a type of connectionless communication, usually using UDP.

Kerrisk further explains that "Instead of using the terms Internet domain datagram socket and Internet domain stream socket, we’ll often just use the terms UDP socket and TCP socket, respectively."

I can imagine that your "TCP" classification might have come from this paragraph.

Note that Kerrisk very carefully explains how all of the above are just examples, or specific applications, of certain classifications. Just because the Linux kernel has, by default, three domains for sockets (at least that was how it worked in 2010!), it doesn't mean that all Unixes are restricted to "just three". You can use sockets over other communication protocols, and expose a standard socket interface to the library; the end-user (i.e., the actual developer opening the socket) may not even need to worry how exactly the underlying communication protocol works (it may, after all, be IPoAC, defined in RFC1149 and updated by RFC2549 with QoS, and finally RFC6214 for IPv6) — so long as you use the socket API (and assuming it is well implemented), any transport mechanism can (again: in theory) be abstracted using a socket.

Furthermore, all socket domains, in theory, can use any of the two types, depending on the application (indeed, even Unix Domain Sockets can be used in "stream" or "datagram" mode).

That said, the reference to "Berkeley sockets", IMHO, is merely a historical framework, as mentioned by @duck. One of the many ways to classify the complex Unix family tree is to (roughly) split it between the "System V" and "BSD" ("Berkeley Standard Distribution") flavours, which, again, are the result of the "Unix wars" in the past.

Anteroom answered 18/4 at 9:49 Comment(1)
Very good, but Berkeley sockets or BSD sockets refers to the actual 'sockets' API, as opposed to AT&T's misguided System V attempt to sell us the OSI API with TCP/IP sockets as a special case (the t_ functions). I spent many pointless weeks 30 years ago trying to paper over this crack in the universe.Satterwhite

© 2022 - 2024 — McMap. All rights reserved.