The exec
system call of the Linux kernel understands shebangs (#!
) natively
When you do on bash:
./something
on Linux, this calls the exec
system call with the path ./something
.
This line of the kernel gets called on the file passed to exec
: https://github.com/torvalds/linux/blob/v4.8/fs/binfmt_script.c#L25
if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!'))
It reads the very first bytes of the file, and compares them to #!
.
If the comparison is true, then the rest of the line is parsed by the Linux kernel, which makes another exec
call with:
- executable:
/usr/bin/env
- first argument:
node
- second argument: script path
therefore equivalent to:
/usr/bin/env node /path/to/script.js
env
is an executable that searches PATH
to e.g. find /usr/bin/node
, and then finally calls:
/usr/bin/node /path/to/script.js
The Node.js interpreter does see the #!
line in the file, but it must be programmed to ignore that line even though #
is not in general a valid comment character in Node (unlike many other languages such as Python where it is), see also: Pound Sign (#) As Comment Start In JavaScript?
And yes, you can make an infinite loop with:
printf '#!/a\n' | sudo tee /a
sudo chmod +x /a
/a
Bash recognizes the error:
-bash: /a: /a: bad interpreter: Too many levels of symbolic links
#!
just happens to be human readable, but that is not required.
If the file started with different bytes, then the exec
system call would use a different handler. The other most important built-in handler is for ELF executable files: https://github.com/torvalds/linux/blob/v4.8/fs/binfmt_elf.c#L1305 which checks for bytes 7f 45 4c 46
(which also happens to be human readable for .ELF
). Let's confirm that by reading the 4 first bytes of /bin/ls
, which is an ELF executable:
head -c 4 "$(which ls)" | hd
output:
00000000 7f 45 4c 46 |.ELF|
00000004
So when the kernel sees those bytes, it takes the ELF file, puts it into memory correctly, and starts a new process with it. See also: How does kernel get an executable binary file running under linux?
Finally, you can add your own shebang handlers with the binfmt_misc
mechanism. For example, you can add a custom handler for .jar
files. This mechanism even supports handlers by file extension. Another application is to transparently run executables of a different architecture with QEMU.
I don't think POSIX specifies shebangs however: https://unix.stackexchange.com/a/346214/32558 , although it does mention in on rationale sections, and in the form "if executable scripts are supported by the system something may happen". macOS and FreeBSD also seem to implement it however.
PATH
search motivation
Likely, one big motivation for the existence of shebangs is the fact that in Linux, we often want to run commands from PATH
just as:
basename-of-command
instead of:
/full/path/to/basename-of-command
But then, without the shebang mechanism, how would Linux know how to launch each type of file?
Hardcoding the extension in commands:
basename-of-command.js
or implementing PATH search on every interpreter:
node basename-of-command
would be a possibility, but this has the major problem that everything breaks if we ever decide to refactor the command into another language.
Shebangs solve this problem beautifully.
node
– Clarineclarinetnpm
to install a Node.js source script as a (potentially globally available) CLI, you must use a shebang line - andnpm
will even make that work on Windows; see my once-again updated answer. – Threadfin