Are shell scripts sensitive to encoding and line endings?
Asked Answered
S

16

107

I am making an NW.js app on macOS, and want to run the app in dev mode by double-clicking on an icon. In the first step, I'm trying to make my shell script work.

Using VS Code on Windows (I wanted to gain time), I have created a run-nw file at the root of my project, containing this:

#!/bin/bash

cd "src"
npm install

cd ..
./tools/nwjs-sdk-v0.17.3-osx-x64/nwjs.app/Contents/MacOS/nwjs "src" &

but I get this output:

$ sh ./run-nw

: command not found  
: No such file or directory  
: command not found  
: No such file or directory  

Usage: npm <command>

where <command> is one of:  (snip commands list)

(snip npm help)

[email protected] /usr/local/lib/node_modules/npm  
: command not found  
: No such file or directory  
: command not found

Some things I don't understand.

  • It seems that it takes empty lines as commands. In my editor (VS Code) I have tried to replace \r\n with \n (in case the \r creates problems) but it changes nothing.
  • It seems that it doesn't find the folders (with or without the dirname instruction), or maybe it doesn't know about the cd command ?
  • It seems that it doesn't understand the install argument to npm.
  • The part that really weirds me out, is that it still runs the app (if I did an npm install manually)...

Not able to make it work properly, and suspecting something weird with the file itself, I created a new one directly on the Mac, using vim this time. I entered the exact same instructions, and... now it works without any issues.
A diff on the two files reveals exactly zero difference.

What can be the difference? What can make the first script not work? How can I find out?

Update

Following the accepted answer's recommendations, after the wrong line endings came back, I checked multiple things. It turns out that since I copied my ~/.gitconfig from my Windows machine, I had autocrlf=true, so every time I modified the bash file under Windows, it re-set the line endings to \r\n.
So, in addition to running dos2unix (which you will have to install using Homebrew on a Mac), if you're using Git, check your .gitconfig file.

Stoma answered 16/9, 2016 at 9:5 Comment(7)
If you run a shell script on Linux, at least all the shell implementations I have encountered so far, would get upset if they found a \r somewhere. No you say that you have removed the \r, and I hope you verified that they are really gone. For the safe side, you should look at your file at the hexadecimal level, to ensure that you don't have other weird characters in it. The next step would then be to execute the script with sh -x ./run-nw, to get more information.Resourceful
Another good command to look for weird characters in a text file is LC_ALL=C cat -vet /path/to/file. If the file is normal, it'll look normal (except for having a "$" at the end of each line). Anything abnormal should stand out fairly well. DOS/Windows files will have "^M$" at the end of lines.Glasswork
You don't need to install dos2unix; the tr command will suffice, and is part of the standard OS install. One of the answers below shows how to use it, and probably deserves more upvotes.Huron
There's also a feature in dd which does this IIRC, but it's arguably too obscure to put in an answer.Huron
tr can't fix UTF-8 with BOM (which is an abomination anyway); perhaps see also https://mcmap.net/q/20691/-what-39-s-the-difference-between-utf-8-and-utf-8-with-bom for background and stackoverflow.com/questions/45240387/… for how to remove it. At least some versions of dos2unix can fix this, but I guess not all.Huron
Tangentially, if your shebang says bash, you should use bash ./run-nw to run the script, or simply chmod it so that you can run it with ./run-nw, in which case the shebang actually controls which interpreter to use. See also Difference between sh and bashHuron
See also stackoverflow.com/questions/45772525/… which is more explicitly phrased as a canonical for this topic. The answers are similar, of course.Huron
B
136

Yes. Bash scripts are sensitive to line-endings, both in the script itself and in data it processes. They should have Unix-style line-endings, i.e., each line is terminated with a Line Feed character (decimal 10, hex 0A in ASCII).

DOS/Windows line endings in the script

With Windows or DOS-style line endings , each line is terminated with a Carriage Return followed by a Line Feed character. You can see this otherwise invisible character in the output of cat -v yourfile:

$ cat -v yourfile
#!/bin/bash^M
^M
cd "src"^M
npm install^M
^M
cd ..^M
./tools/nwjs-sdk-v0.17.3-osx-x64/nwjs.app/Contents/MacOS/nwjs "src" &^M

In this case, the carriage return (^M in caret notation or \r in C escape notation) is not treated as whitespace. Bash interprets the first line after the shebang (consisting of a single carriage return character) as the name of a command/program to run.

  • Since there is no command named ^M, it prints : command not found
  • Since there is no directory named src^M, it prints : No such file or directory
  • It passes install^M instead of install as an argument to npm which causes npm to complain.

DOS/Windows line endings in input data

Like above, if you have an input file with carriage returns:

hello^M
world^M

then it will look completely normal in editors and when writing it to screen, but tools may produce strange results. For example, grep will fail to find lines that are obviously there:

$ grep 'hello$' file.txt || grep -x "hello" file.txt
$

(no match because the line actually ends in ^M)

Appended text will seem to overwrite the line because the carriage return moves the cursor to the start of the line:

$ sed -e 's/$/!/' file.txt
!ello
!orld

String comparison will fail, even though strings appear to be the same when writing to screen:

$ a="hello"; read b < file.txt
$ if [[ "$a" = "$b" ]]
  then echo "Variables are equal."
  else echo "Sorry, $a is not equal to $b"
  fi

Sorry, hello is not equal to hello

Solutions

The solution is to convert the file to use Unix-style line endings. There are a number of ways this can be accomplished:

  1. Using the dos2unix program:

    dos2unix filename
    
  2. Open the file in a capable text editor (Sublime, Notepad++, not Notepad) and configure it to save files with Unix line endings, e.g., with Vim, run the following command before (re)saving:

    :set fileformat=unix
    
  3. If you have a version of the sed utility that supports the -i or --in-place option, e.g., GNU sed, you could run the following command to strip trailing carriage returns:

    sed -i 's/\r$//' filename
    

    With other versions of sed, you could use output redirection to write to a new file. Be sure to use a different filename for the redirection target (it can be renamed later).

    sed 's/\r$//' filename > filename.unix
    
  4. Similarly, the tr translation filter can be used to delete unwanted characters from its input:

    tr -d '\r' <filename >filename.unix
    

Cygwin Bash

With the Bash port for Cygwin, there’s a custom igncr option that can be set to ignore the Carriage Return in line endings (presumably because many of its users use native Windows programs to edit their text files). This can be enabled for the current shell by running set -o igncr.

Setting this option applies only to the current shell process so it can be useful when sourcing files with extraneous carriage returns. If you regularly encounter shell scripts with DOS line endings and want this option to be set permanently, you could set an environment variable called SHELLOPTS (all capital letters) to include igncr. This environment variable is used by Bash to set shell options when it starts (before reading any startup files).

Useful utilities

The file utility is useful for quickly seeing which line endings are used in a text file. Here’s what it prints for for each file type:

  • Unix line endings: Bourne-Again shell script, ASCII text executable
  • Mac line endings: Bourne-Again shell script, ASCII text executable, with CR line terminators
  • DOS line endings: Bourne-Again shell script, ASCII text executable, with CRLF line terminators

The GNU version of the cat utility has a -v, --show-nonprinting option that displays non-printing characters.

The dos2unix utility is specifically written for converting text files between Unix, Mac and DOS line endings.

Useful links

Wikipedia has an excellent article covering the many different ways of marking the end of a line of text, the history of such encodings and how newlines are treated in different operating systems, programming languages and Internet protocols (e.g., FTP).

Files with classic Mac OS line endings

With Classic Mac OS (pre-OS X), each line was terminated with a Carriage Return (decimal 13, hex 0D in ASCII). If a script file was saved with such line endings, Bash would only see one long line like so:

#!/bin/bash^M^Mcd "src"^Mnpm install^M^Mcd ..^M./tools/nwjs-sdk-v0.17.3-osx-x64/nwjs.app/Contents/MacOS/nwjs "src" &^M

Since this single long line begins with an octothorpe (#), Bash treats the line (and the whole file) as a single comment.

Note: In 2001, Apple launched Mac OS X which was based on the BSD-derived NeXTSTEP operating system. As a result, OS X also uses Unix-style LF-only line endings and since then, text files terminated with a CR have become extremely rare. Nevertheless, I think it’s worthwhile to show how Bash would attempt to interpret such files.

Bidentate answered 16/9, 2016 at 9:26 Comment(16)
Great explanation, only one little piece is missing here: is there any real reason these days for the genuine bash to continue to treat \r as a meaningful character at the end of the line?Angleaangler
@AlexCohn There is no compelling functional reason, but changing this behavior could break existing scripts. I'm sure this must have been proposed and rejected by the maintainers multiple times. If you can devise a good transition plan for how to make it optional now and mandatory in the future, it could gain some support; but I can predict a lot of old-timers will tell you "teach the younguns to not use Windows editors instead".Huron
Thanks for the clarification, @Huron . I had started to research an answer to Alex's question and intended to experiment with scripts using Cygwin Bash with the igncr option set but I haven’t had convenient access to a Windows OS in a long time.Bidentate
@AlexCohn It isn't bash, it's the Linux kernel.Originative
I have been using cygwin for years with igncr option set, and I don't remember seeing any problems.Tomb
@NatanYellin: The Real WTF (tm) is the assumption that it's OK to load a script in binary mode -- instead of text mode, where the issue could be handled gracefully (as text mode is allowed to turn \r\n into just \n, while binary mode isn't)...Hindsight
@DevSolar, eh, "text mode" doesn't exist; it's just an abstraction some languages put on top of the real syscalls, but it's all "binary mode" under the hood. :PStaal
@CharlesDuffy It's an abstraction by the language C which is what both bash and the Linux kernel are implemented in. Line endings should not matter in a scripting language. That they do is, IMHO, a defect.Hindsight
@DevSolar, eh. It's an abstraction of the standard C library, but the kernel doesn't use that.Staal
@DevSolar, ...and I fully disagree that line endings shouldn't matter. You couldn't round-trip arbitrary binary data if bash treated content as text. It's difficult now (and requires ksh or bash extensions), but the change you propose would make it outright impossible.Staal
...binary mode, moreover, just makes more sense. You can seek() and pass an address direct to the kernel without needing to do translation between number-of-characters-in vs number-of-bytes-in. You don't have different, conflicting definitions of the length of your data and the size of your data. Sure, there's a call for high-level tools that understand multibyte characters, but I don't want to get any of that on me.Staal
@CharlesDuffy I don't see where you're coming from, like, at all. A bash script having the ability to handle binary data and a bash script itself being considered binary data are two different things entirely. Binary mode makes sense for binary. Text mode makes sense for text. Number of characters equals number of bytes only if you limit yourself to 8-bit encoding, a shortcut you cannot really afford even in kernel space and that has hurt computing so much it's crazy people still gloss it over. And you know that binary mode does not guarantee round-trip either? (Trailing nulls...)Hindsight
Re: NULs, it's solvable with arrays and craftiness. (One assumes that the last element has no trailing NUL, so if the data to be round-tripped has such a NUL, then one adds a '' onto the end of the array).Staal
This is a place where I have some sympathy for UTF-32; if you're going to do multi-byte characters, might as well do it in a way that lets one reliably map from number of characters to number of bytes and the inverse. Moving into a place where that mapping is inconsistent is voluntarily entering a world of pain, and I want nothing of it.Staal
@DevSolar, ...that said, you might want to interpret my position in this thread as that of the stereotypical old man on his porch shaking his cane and yelling at the clouds.Staal
@CharlesDuffy Due to combining characters, UTF-32 (which is a wide encoding, not a multibyte one) still does not have a 1:1 correlation between code points and characters printed. ;-) Sorry but this is a brave new world, old man... (>50 myself.)Hindsight
L
14

In JetBrains products (PyCharm, PHPStorm, IDEA, etc.), you'll need to click on CRLF/LF to toggle between the two types of line separators (\r\n and \n).

screenshot showing LF selected in the status bar

screenshot showing CRLF selected in the status bar

Lingwood answered 26/2, 2019 at 3:39 Comment(1)
On IntelliJ on Windows, open Settings (Ctrl+Alt+S) | Editor | Code Style. On the right select Unix and macOS (\n) for Line Separator. This is an alternative to changing the setting for each file.Amyotonia
P
6

I was trying to startup my Docker container from Windows and got this:

Bash script and /bin/bash^M: bad interpreter: No such file or directory

I was using Git Bash and the problem was with the Git config, then I just did the steps below and it worked. It will configure Git to not convert line endings on checkout:

  1. git config --global core.autocrlf input
  2. delete your local repository
  3. clone it again.

Many thanks to Jason Harmon in this link: https://forums.docker.com/t/error-while-running-docker-code-in-powershell/34059/6

Before that, I tried this, that didn't work:

  1. dos2unix scriptname.sh
  2. sed -i -e 's/\r$//' scriptname.sh
  3. sed -i -e 's/^M$//' scriptname.sh
Pharmacopsychosis answered 19/8, 2020 at 22:4 Comment(3)
I have the same issue, and dos2unix does not mitigate. Your command works! Thank you so much!Roer
Thanks for the answer, this answer must having more vote, the command works well for me, tooTerraqueous
it helped in my case, it was because of Git bash terminal configuration! Thank you so much :)Wart
G
5

If you're using the read command to read from a file (or pipe) that is (or might be) in DOS/Windows format, you can take advantage of the fact that read will trim whitespace from the beginning and ends of lines. If you tell it that carriage returns are whitespace (by adding them to the IFS variable), it'll trim them from the ends of lines.

In bash (or zsh or ksh), that means you'd replace this standard idiom:

IFS= read -r somevar    # This will not trim CR

with this:

IFS=$'\r' read -r somevar    # This *will* trim CR

(Note: the -r option isn't related to this, it's just usually a good idea to avoid mangling backslashes.)

If you're not using the IFS= prefix (e.g. because you want to split the data into fields), then you'd replace this:

read -r field1 field2 ...    # This will not trim CR

with this:

IFS=$' \t\n\r' read -r field1 field2 ...    # This *will* trim CR

If you're using a shell that doesn't support the $'...' quoting mode (e.g. dash, the default /bin/sh on some Linux distros), or your script even might be run with such a shell, then you need to get a little more complex:

cr="$(printf '\r')"
IFS="$cr" read -r somevar    # Read trimming *only* CR
IFS="$IFS$cr" read -r field1 field2 ...    # Read trimming CR and whitespace, and splitting fields

Note that normally, when you change IFS, you should put it back to normal as soon as possible to avoid weird side effects; but in all these cases, it's a prefix to the read command, so it only affects that one command and doesn't have to be reset afterward.

Glasswork answered 19/5, 2021 at 22:44 Comment(0)
F
4

Since VS Code is being used, we can see CRLF or LF in the bottom right depending on what's being used and if we click on it we can change between them (LF is being used in below example):

Screenshot of shortcut UI

We can also use the "Change End of Line Sequence" command from the command pallet. Whatever's easier to remember since they're functionally the same.

Freespoken answered 2/4, 2021 at 5:17 Comment(0)
H
3

Coming from a duplicate, if the problem is that you have files whose names contain ^M at the end, you can rename them with

for f in *$'\r'; do
    mv "$f" "${f%$'\r'}"
done

You properly want to fix whatever caused these files to have broken names in the first place (probably a script which created them should be dos2unixed and then rerun?) but sometimes this is not feasible.

The $'\r' syntax is Bash-specific; if you have a different shell, maybe you need to use some other notation. Perhaps see also Difference between sh and bash

Huron answered 19/4, 2019 at 6:12 Comment(1)
I haven’t had this problem but given that many users will arrive here from duplicate questions, this answer deserves greater visibility. I’m upvoting it to start its movement up the answer list.Bidentate
I
3

I ran into this issue when I use git with WSL. git has a feature where it changes the line-ending of files according to the OS you are using, on Windows it make sure the line endings are \r\n which is not compatible with Linux which uses only \n.

You can resolve this problem by adding a file name .gitattributes to your git root directory and add lines as following:

config/* text eol=lf
run.sh text eol=lf

In this example all files inside config directory will have only line-feed line ending and run.sh file as well.

Interlunation answered 4/9, 2020 at 7:27 Comment(1)
This is really smart solution without burdening subsequent code with additional seds, trs and similar. Good text editors such as Notepad++ or Idea do not turn it back to crlfs (and if they did, it would come out at git commit).Slung
C
3

For Notepad++ users, this can be solved by: Edit > EOL Conversion > Unix (LF)

screenshot showing above steps

Chimaera answered 6/5, 2022 at 15:12 Comment(1)
Using a Windows editor is usually the root cause of the problem in the first place. Probably avoid that.Huron
G
3

Lots of reference to git but not to renormalizing the line endings in place. Just go to the root of your repo and run:

git add --renormalize .

Only the files that need line endings refreshed will be re-checked in. It will appear that the files have no changes, because line endings are invisible.

Garrity answered 10/6, 2023 at 0:58 Comment(0)
T
2

One more way to get rid of the unwanted CR ('\r') character is to run the tr command, for example:

$ tr -d '\r' < dosScript.py > nixScript.py
Tufted answered 2/3, 2018 at 19:25 Comment(3)
It should be noted that a new user might assume they can also do tr -d '\r' < myFile > myFile which is NOT a good idea, as their myFile will now be deleted or at least truncated. When using < infile > outFile redirections, always use different filenames for infile and outfile. You can then rename as needed. Good luck to all.Pisciform
Also, tr is unusual in that it refuses to take a file name argument; you have to use redirection like tr x y <inputfile (not tr x y inputfile)Huron
This will also delete CRs that aren't at line endings, but hopefully that's rare. That's one reason to prefer the other options in Anthony's answer.Suksukarno
P
0

The simplest way on MAC / Linux - create a file using 'touch' command, open this file with VI or VIM editor, paste your code and save. This would automatically remove the windows characters.

Pugnacious answered 5/3, 2019 at 18:1 Comment(4)
This is very much not the simplest way, and would not necessarily remove the Windows characters, which are valid characters.Stoma
True, but copy/pasting in vi/vim is not what I'd call "easiest" :D I'll un-downvote, though.Stoma
Agree, kind of a life-hack for guys like me, who are not experts in shell scripting :)Pugnacious
touch is a programEure
V
0

If you are using a text editor like BBEdit you can do it at the status bar. There is a selection where you can switch.

Change the CRLF to LF using BBEdit

Vannavannatta answered 15/10, 2020 at 9:40 Comment(0)
C
0

For IntelliJ users, here is the solution for writing Linux script: File > Line Separators
Use LF - Unix and macOS (\n)

screenshot showing above steps

Cleisthenes answered 18/11, 2021 at 10:5 Comment(0)
P
0

Scripts may call each other. An even better magic solution is to convert all scripts in the folder/subfolders:

find . -name "*.sh" -exec sed -i -e 's/\r$//' {} +

You can use dos2unix too but many servers do not have it installed by default.

Penicillate answered 8/4, 2022 at 1:44 Comment(0)
P
0

I've had corrupted bash scripts so many times from this issue.

There are already many solutions posted on how to change the file. Though, I didn't see any on the built-in vim method to do this task.

Open vim with the shell script and run this command

:set ff=unix

Then edit your .gitattributes to get a permanent fix

Penthea answered 23/5, 2023 at 21:54 Comment(1)
In the acceptded answer there is: :set fileformat=unix. I think it's a built-in capability. Anyway, it's better to include useful links, supporting your content. Thank you!Tartarean
O
-1

For the sake of completeness, I'll point out another solution which can solve this problem permanently without the need to run dos2unix all the time:

sudo ln -s /bin/bash `printf 'bash\r'`
Originative answered 22/11, 2020 at 19:43 Comment(2)
While this works for python, it won't work for bash in general, as bash does not include \r in IFS by default (so it will be considered a real character and not whitespace) so \r characters other than on the shebang line will still cause problemsSutra
Shouldn't it be ... ln -s bash /bin/$(printf 'bash\r')? Otherwise bash\r will be created in whatever the working directory is. I also changed the backticks for the newer syntax, which is better practice, but inconsequential in this case.Suksukarno

© 2022 - 2024 — McMap. All rights reserved.