What are file descriptors, explained in simple terms?
Asked Answered
R

13

620
  1. What would be a more simplified description of file descriptors compared to Wikipedia's? Why are they required? Say, take shell processes as an example and how does it apply for it?

  2. Does a process table contain more than one file descriptor. If yes, why?

Reach answered 10/3, 2011 at 7:17 Comment(11)
What about the concepts of stdin stdout stderr etc ? I have an instance like say browser process opened and it has opened some temporary files for displayed my html . The process uses the same fd to read / write ? Also the process table ....... it has entries like fd0 pointer fd1 pointer fd2 pointer ..... does that mean all these files are in RAM ? Why else pointers ?Reach
When you open a file, OS creates a stream to that file and connect that stream to opened file, the descriptor in fact represents that stream. Similarly there are some default streams created by OS. These streams are connected to your terminal instead of files. So when you write something in terminal it goes to stdin stream and OS. And when you write "ls" command on terminal, the OS writes the output to stdout stream. stdout stream is connected to your monitor terminal so you can see the output there.Geosphere
Regarding browser example, it is not necessary that browser keeps the files opened. It depends on implementation of browser but in most cases browser open a temporary file, write the file, and close the file, so its not necessary that the file is opened even if the web page is open. And descriptor just holds the information of the file and doesn't necessarily keep the file in RAM. When you read the data from a descriptor, the OS read the data from the hard-disk. The information in file descriptor just represents the location of the file on hard-disk etc..Geosphere
File descriptor to file is not a one to one mapping. I could open() the same file 4 times and get 4 different file descriptors. Each of which could be used (depending on the flags passed to the open()) for reading, writing or both. As far as whether the file lives in RAM or on disk - this is hidden from you by the kernel, and its various caches. Ultimately what is the cache will match what is on the disk (for writing), and the kernel will not go back to disk, for reading, if the data is already in the cache.Tharp
My question may go too far but how does OS determine which tty to show outputs when the indexes of fd1 and fd2 are fixed?Imam
This is a good article to understand it easily bottomupcs.com/file_descriptors.xhtmlChekhov
@Imam They go to whatever they went to in the parent process, unless something overrides that.Disafforest
@KrishanGopal: Thanks for sharing the article. It was really well-written.Barometrograph
Here is a good survey of the history: Standard streams from wikipedia. The article goes back to the 1950's and discusses Unix design, and explains why they were standardized and the values.Grotto
@KrishanGopal thank you for the article. This is a great question, but none of the answers gave a concrete example like this article.Vestavestal
also this question might be usefulChrysotile
G
834

In simple words, when you open a file, the operating system creates an entry to represent that file and store the information about that opened file. So if there are 100 files opened in your OS then there will be 100 entries in OS (somewhere in kernel). These entries are represented by integers like (...100, 101, 102....). This entry number is the file descriptor. So it is just an integer number that uniquely represents an opened file for the process. If your process opens 10 files then your Process table will have 10 entries for file descriptors.

Similarly, when you open a network socket, it is also represented by an integer and it is called Socket Descriptor. I hope you understand.

Geosphere answered 10/3, 2011 at 7:31 Comment(17)
Great tayyab . I have put some comments in the question please answer to that too .Reach
Also, this is why you can run out of file descriptors, if you open lots of files at once. Which will prevent *nix systems from running, since they open descriptors to stuff in /proc all the time.Indecipherable
Strictly speaking the Process Table will not have the file descriptors, the file descriptors are held in the u structure of the process, and are therefore unique to the process.Tharp
if I open a file, close it, and open it again, would the fd be the same for both open?Clotilda
@ErbenMo: No it may not be same. When you open file, the operating system will assign a FD that is available and when you close it then OS release the FD and may assign that FD to another file opened after that. Its Operating system's way to track Opened Files and it has nothing to do with a specific file.Geosphere
Isn't a 'file descriptor' a characteristic of the process? Let me clarify: I have two processes, p1 and p2. By default the OS has connected their stdin, stdout and stderr to keyboard, monitor and monitor respectively. These 3 standard streams have their FDs 0, 1 and 2 in both the processes. I can connect, say stdin of p1 to some data.txt. Hence, the FDs of p1 are independent from the file FDs of p2. Am I wrong in my logic here somewhere?Stripteaser
"So it is just an integer number that uniquely represents an opened file in operating system." This is incorrect. That integer uniquely represents an opened file within a process. File descriptor 0, for example, will represent one opened file in one process and a completely different opened file in another process.Piperine
@KeithThompson: No that is not correct. File Descriptors are unique at operating system level. If File descriptor 293 is assigned to an opened file in Process A, then this number will not be assigned to any other file in any other process on same operating system. Of course if you close the file and descriptor is free then it can be assigned somewhere else.Geosphere
@Tayyab: I believe you're mistaken. File descriptors 0, 1, and 2 are standard input, standard output, and standard error for each running process. A successful initial call to open() will give you file descriptor 3, even if another running process happens to have a file descriptor 3. See the POSIX definition of open(): "The open() function shall return a file descriptor for the named file that is the lowest file descriptor not currently open for that process." (emphasis added).Piperine
@KeithThompson: Yes you are right. Actually its about the level of abstraction. Actually two tables are maintained, where first one is per-process and the second one is system wide. FD in per-process table (i.e fdtable) is not unique system wide. However it maps to v-node table that contains the system wide unique entries. So when you call fopen() and fileno() function to check the descriptor then you can get same FD number in 2 different processes because it returns the index of fdtable which is per-process. Thanks for bringing it up!!Geosphere
Are file descriptors unique? That is, if lsof -p $SOMEPID shows some common entries with lsof -p $SOMEOPTHERPID then both processes are accessing the same entry?Niggerhead
In Nginx, when a connection opens up a socket, does this count as a file descriptor? Or is an open socket different from an open file?Lacrimatory
WRONG ANSWER. File objects are NOT file descriptors, and are NOT numbered 1...100. File descriptors are per-process indicators that map a number (the descriptor), to a file object. Hence multiple processes can refer to the same file object, but with different descriptors (and numbers).Thunderstorm
@Geosphere : You should edit this answer to add more deails which came up in comments.Circuity
If I have not close the fd and my process is still alive. Will it always in the kernel till my process exit?Ronni
What happens from the programming perspective? Like suppose if I do something like variable file = fopen('file-name'), then is the variable file a file descriptor?Remitter
does file descriptors uses NIC cards to pass their data? or they are directly conversed by kernel and file reading protocol? this is especially when we talk about implementation of data comm. via protocols like RPC, grpc etc lets say on a single machine. or lets say we open up a shell of mongodb to communicate, so it uses tcp protocol to interact with mongod service, so does this mean it uses NIC card?Forensic
T
196

A file descriptor is an opaque handle that is used in the interface between user and kernel space to identify file/socket resources. Therefore, when you use open() or socket() (system calls to interface to the kernel), you are given a file descriptor, which is an integer (it is actually an index into the processes u structure - but that is not important). Therefore, if you want to interface directly with the kernel, using system calls to read(), write(), close() etc. the handle you use is a file descriptor.

There is a layer of abstraction overlaid on the system calls, which is the stdio interface. This provides more functionality/features than the basic system calls do. For this interface, the opaque handle you get is a FILE*, which is returned by the fopen() call. There are many many functions that use the stdio interface fprintf(), fscanf(), fclose(), which are there to make your life easier. In C, stdin, stdout, and stderr are FILE*, which in UNIX respectively map to file descriptors 0, 1 and 2.

Tharp answered 10/3, 2011 at 9:30 Comment(1)
what is meant by "an index into the processes u structure"?Hawaiian
H
167

Hear it from the Horse's Mouth : APUE (Richard Stevens).
To the kernel, all open files are referred to by File Descriptors. A file descriptor is a non-negative number.

When we open an existing file or create a new file, the kernel returns a file descriptor to the process. The kernel maintains a table of all open file descriptors, which are in use. The allotment of file descriptors is generally sequential and they are allotted to the file as the next free file descriptor from the pool of free file descriptors. When we closes the file, the file descriptor gets freed and is available for further allotment.
See this image for more details :

Two Process

When we want to read or write a file, we identify the file with the file descriptor that was returned by open() or create() function call, and use it as an argument to either read() or write().
It is by convention that, UNIX System shells associates the file descriptor 0 with Standard Input of a process, file descriptor 1 with Standard Output, and file descriptor 2 with Standard Error.
File descriptor ranges from 0 to OPEN_MAX. File descriptor max value can be obtained with ulimit -n. For more information, go through 3rd chapter of APUE Book.

Heisel answered 19/7, 2013 at 8:11 Comment(3)
Since 0, 1, 2 are associated with "stdin", "stdout", and "stderr" of a process, can we use those descriptors at the same time for different processes?Barometrograph
@Tarik: file descriptors are per process. To see this, download osquery and execute osqueryi <<< echo '.all process_open_files' in a bash shell.Braxton
You say "Hear it from the Horse's Mouth : APUE (Richard Stevens).", however tt's unclear what you're quoting from Stevens's APUE. Use > to highlight the quoted passage.Psychologize
E
106

Other answers added great stuff. I will add just my 2 cents.

According to Wikipedia we know for sure: a file descriptor is a non-negative integer. The most important thing I think is missing, would be to say:

File descriptors are bound to a process ID.

We know most famous file descriptors are 0, 1 and 2. 0 corresponds to STDIN, 1 to STDOUT, and 2 to STDERR.

Say, take shell processes as an example and how does it apply for it?

Check out this code

#>sleep 1000 &
[12] 14726

We created a process with the id 14726 (PID). Using the lsof -p 14726 we can get the things like this:

COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF    NODE NAME
sleep   14726 root  cwd    DIR    8,1     4096 1201140 /home/x
sleep   14726 root  rtd    DIR    8,1     4096       2 /
sleep   14726 root  txt    REG    8,1    35000  786587 /bin/sleep
sleep   14726 root  mem    REG    8,1 11864720 1186503 /usr/lib/locale/locale-archive
sleep   14726 root  mem    REG    8,1  2030544  137184 /lib/x86_64-linux-gnu/libc-2.27.so
sleep   14726 root  mem    REG    8,1   170960  137156 /lib/x86_64-linux-gnu/ld-2.27.so
sleep   14726 root    0u   CHR  136,6      0t0       9 /dev/pts/6
sleep   14726 root    1u   CHR  136,6      0t0       9 /dev/pts/6
sleep   14726 root    2u   CHR  136,6      0t0       9 /dev/pts/6

The 4-th column FD and the very next column TYPE correspond to the File Descriptor and the File Descriptor type.

Some of the values for the FD can be:

cwd – Current Working Directory
txt – Text file
mem – Memory mapped file
mmap – Memory mapped device

But the real file descriptor is under:

NUMBER – Represent the actual file descriptor. 

The character after the number i.e "1u", represents the mode in which the file is opened. r for read, w for write, u for read and write.

TYPE specifies the type of the file. Some of the values of TYPEs are:

REG – Regular File
DIR – Directory
FIFO – First In First Out

But all file descriptors are CHR – Character special file (or character device file)

Now, we can identify the File Descriptors for STDIN, STDOUT and STDERR easy with lsof -p PID, or we can see the same if we ls /proc/PID/fd.

Note also that file descriptor table that kernel keeps track of is not the same as files table or inodes table. These are separate, as some other answers explained.

fd table

You may ask yourself where are these file descriptors physically and what is stored in /dev/pts/6 for instance

sleep   14726 root    0u   CHR  136,6      0t0       9 /dev/pts/6
sleep   14726 root    1u   CHR  136,6      0t0       9 /dev/pts/6
sleep   14726 root    2u   CHR  136,6      0t0       9 /dev/pts/6

Well, /dev/pts/6 lives purely in memory. These are not regular files, but so called character device files. You can check this with: ls -l /dev/pts/6 and they will start with c, in my case crw--w----.

Just to recall most Linux like OS define seven types of files:

  • Regular files
  • Directories
  • Character device files
  • Block device files
  • Local domain sockets
  • Named pipes (FIFOs) and
  • Symbolic links
Explode answered 22/9, 2018 at 14:26 Comment(5)
Thanks. Indeed it is important to point out that it is per process! It helps to visualize things better.Reach
The types of files defined by OS, that you have mentioned in you answer, really help in understanding files on a lower level.Saltwater
where do I learn more about file descriptors and what comes along with it ?Doerr
@theartist An Operating Systems book, e.g., Arpaci-Dusseau's Operating Systems: Three Easy Pieces, is a good start, specifically the chapters about file systems. Then Stevens's Advanced Programming in the Unix Environment and/or Kerrisk's The Linux Programming Interface.Psychologize
I would word "File descriptors are bound to a process ID" as "File descriptors are bound/unique to a given process". The former make it sound as if file descriptors are computed based on the process ID.Psychologize
F
34

File Descriptors (FD)

  • In Linux/Unix, everything is a file. Regular "files", directories, and even devices are files. Every file has an associated number called the file descriptor (FD), a non-negative integer that starts at 0.

  • Your terminal/console is a device, and therefore has a file descriptor associated with it. When an executing program prints something into the screen, the output is sent to the screen's file descriptor, and then the output is displayed in your screen by the associated device. Similarly, if the program's output is sent to the printer's file descriptor, the program's output is printed.

  • Whenever you execute a program/command at the terminal, the shell opens three files by default, each of them with an associated file descriptor and a pre-assigned role.

    File File Descriptor Connected to (by default) Behavior
    Standard Input 0 keyboard A process uses to take input from a source. By default, from the keyboard.
    Standard Output 1 terminal/console A process uses to send normal output to a sink. By default, the terminal.
    Standard Error 2 terminal/console A process uses to senderror output to a sink. By default, the terminal.

    Because they're opened by default and have pre-assigned roles within any given process, they're collectively called standard streams.

Error Redirection

These standard streams are crucial for redirection and pipelines. For example, the command ls outputs an error to the terminal if the provided directory doesn't exist:

$ ls ./non-existent
ls: ./nonexistent: No such file or directory

However using output redirection, you can send that output into another file, e.g., errors.log, by using output redirection meta-character for standard error, 2>:

$ ls ./non-existent 2> errors.log

This means "whatever output that comes out of standard error, redirect it to file errors.log instead of the screen". Now errors.log contains the output that previously was sent to the screen.

Pipeline

The standard streams also make it easy to make a program's output another program's input. In order to do this in the shell, you use the pipe (|) meta-character.

$ ls -l | wc -l

This means "send the standard output of the ls -l command into the standard input of the wc -l command, and then send the result to the screen".

Fillbert answered 2/9, 2018 at 17:44 Comment(0)
N
27

More points regarding File Descriptor:

  1. File Descriptors (FD) are non-negative integers (0, 1, 2, ...) that are associated with files that are opened.

  2. 0, 1, 2 are standard FD's that corresponds to STDIN_FILENO, STDOUT_FILENO and STDERR_FILENO (defined in unistd.h) opened by default on behalf of shell when the program starts.

  3. FD's are allocated in the sequential order, meaning the lowest possible unallocated integer value.

  4. FD's for a particular process can be seen in /proc/$pid/fd (on Unix based systems).

Noticeable answered 24/3, 2017 at 17:56 Comment(0)
C
22

As an addition to other answers, unix considers everything as a file system. Your keyboard is a file that is read only from the perspective of the kernel. The screen is a write only file. Similarly, folders, input-output devices etc are also considered to be files. Whenever a file is opened, say when the device drivers[for device files] requests an open(), or a process opens an user file the kernel allocates a file descriptor, an integer that specifies the access to that file such it being read only, write only etc. [for reference : https://en.wikipedia.org/wiki/Everything_is_a_file ]

Conceptionconceptual answered 23/10, 2016 at 21:14 Comment(2)
File descriptors can also refer to things that don't exist in the file system, like anonymous pipes and network sockets.Margarettamargarette
"everything as a file system", I think you meant to say "everything as a file".Psychologize
I
18

File descriptors

  • To the kernel, all open files are referred to by file descriptors, including those entities that aren't files per se such as anonymous pipes and network sockets.
  • A file descriptor is a non-negative integer, starting at 0.
  • When we open an existing file or create a new one, the kernel returns a file descriptor to the calling code.
  • When we want to read or write on a file, we identify the file with the file descriptor that was retuned by functions such as open() or create(), and provided it to either read() or write().
  • Initially each UNIX process has 20 file descriptors at its disposal, numbered 0 through 19 but it was extended to 63 by many systems.
  • The first three are already opened when the process begins
    • 0, the standard input
    • 1, the standard output
    • 2, the standard error
  • When the parent process forks a process, the child process inherits the file descriptors of the parent.
Insociable answered 3/10, 2018 at 13:8 Comment(0)
R
14

All answer that are provided is great here is mine version --

File Descriptors are non-negative integers that act as an abstract handle to “Files” or I/O resources (like pipes, sockets, or data streams). These descriptors help us interact with these I/O resources and make working with them very easy. The I/O system is visible to a user process as a stream of bytes (I/O stream). A Unix process uses descriptors (small unsigned integers) to refer to I/O streams. The system calls related to the I/O operations take a descriptor as as argument.

Valid file descriptor ranges from 0 to a max descriptor number that is configurable (ulimit, /proc/sys/fs/file-max). Kernel assigns desc. for std input(0), std output(1) and std error(2) of the FD table. If a file open is not successful, fd return -1. FD

When a process makes a successful request to open a file, the kernel returns a file descriptor which points to an entry in the kernel's global file table. The file table entry contains information such as the inode of the file, byte offset, and the access restrictions for that data stream (read-only, write-only, etc.).

Rajiv answered 18/8, 2022 at 16:28 Comment(1)
That's a nice diagram, where did you get it?Psychologize
B
7

File descriptors are nothing but references for any open resource. As soon as you open a resource the kernel assumes you will be doing some operations on it. All the communication via your program and the resource happens over an interface and this interface is provided by the file-descriptor.

Since a process can open more than one resource, it is possible for a resource to have more than one file-descriptors.
You can view all file-descriptors linked to the process by simply running, ls -li /proc/<pid>/fd/ here pid is the process-id of your process

Benevento answered 18/7, 2020 at 13:16 Comment(1)
> "Since a process can open more than one resource, it is possible for a resource to have more than one file-descriptors." - This is not a proper cause-and-effect sentence...Poseur
U
5

Any operating system has processes (p's) running, say p1, p2, p3 and so forth. Each process usually makes an ongoing usage of files.

Each process is consisted of a process tree (or a process table, in another phrasing).

Usually, Operating systems represent each file in each process by a number (that is to say, in each process tree/table).

The first file used in the process is file0, second is file1, third is file2, and so forth.

Any such number is a file descriptor.

File descriptors are usually integers (0, 1, 2 and not 0.5, 1.5, 2.5).

Given we often describe processes as "process-tables", and given that tables has rows (entries) we can say that the file descriptor cell in each entry, uses to represent the whole entry.

In a similar way, when you open a network socket, it has a socket descriptor.

In some operating systems, you can run out of file descriptors, but such case is extremely rare, and the average computer user shouldn't worry from that.

File descriptors might be global (process A starts in say 0, and ends say in 1 ; Process B starts say in 2, and ends say in 3) and so forth, but as far as I know, usually in modern operating systems, file descriptors are not global, and are actually process-specific (process A starts in say 0 and ends say in 5, while process B starts in 0 and ends say in 10).

Uhl answered 20/12, 2016 at 23:23 Comment(1)
Read more on FD's in Linux here: unix.stackexchange.com/questions/358022/…Uhl
E
5

Addition to above all simplified responses.

If you are working with files in bash script, it's better to use file descriptor.

For example: If you want to read and write from/to the file "test.txt", use the file descriptor as show below:

FILE=$1 # give the name of file in the command line
exec 5<>$FILE # '5' here act as the file descriptor

# Reading from the file line by line using file descriptor
while read LINE; do
    echo "$LINE"
done <&5

# Writing to the file using descriptor
echo "Adding the date: `date`" >&5 
exec 5<&- # Closing a file descriptor
Elderly answered 20/4, 2019 at 16:37 Comment(2)
why do we need to use a FD when we can simply pass the file itself to while loop ?Canice
What does the exec 5<>$FILE do, in particular the <>, in other tutorials I see just <, but I think using <> will prevent a file creation which seems better, but not sure what it does exactly here.Ultramontanism
D
0

I'm don't know the kernel code, but I'll add my two cents here since I've been thinking about this for some time, and I think it'll be useful.

When you open a file, the kernel returns a file descriptor to interact with that file.

A file descriptor is an implementation of an API for the file you're opening. The kernel creates this file descriptor, stores it in an array, and gives it to you.

This API requires an implementation that allows you to read and write to the file, for example.

Now, think about what I said again, remembering that everything is a file — printers, monitors, HTTP connections etc.

That's my summary after reading https://www.bottomupcs.com/file_descriptors.xhtml.

Duky answered 8/8, 2021 at 20:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.