How to read lines from a file into an array?
Asked Answered
P

4

14

I'm trying to read in a file as an array of lines and then iterate over it with zsh. The code I've got works most of the time, except if the input file contains certain characters (such as brackets). Here's a snippet of it:

#!/bin/zsh
LIST=$(cat /path/to/some/file.txt)
SIZE=${${(f)LIST}[(I)${${(f)LIST}[-1]}]}
POS=${${(f)LIST}[(I)${${(f)LIST}[-1]}]}
while [[ $POS -le $SIZE ]] ; do
    ITEM=${${(f)LIST}[$POS]}
    # Do stuff
    ((POS=POS+1))
done

What would I need to change to make it work properly?

Peripatetic answered 29/9, 2012 at 8:57 Comment(0)
F
19
#!/bin/zsh
zmodload zsh/mapfile
FNAME=/path/to/some/file.txt
FLINES=( "${(f)mapfile[$FNAME]}" )
LIST="${mapfile[$FNAME]}" # Not required unless stuff uses it
integer POS=1             # Not required unless stuff uses it
integer SIZE=$#FLINES     # Number of lines, not required unless stuff uses it
for ITEM in $FLINES
    # Do stuff
    (( POS++ ))
done

You have some strange things in your code:

  1. Why are you splitting LIST each time instead of making it an array variable? It is just a waste of CPU time.
  2. Why don’t you use for ITEM in ${(f)LIST}?
  3. There is a possibility to directly ask zsh about array length: $#ARRAY. No need in determining the index of the last occurrence of the last element.
  4. POS gets the same value as SIZE in your code. Hence it will iterate only once.
  5. Brackets are problems likely because of 3.: (I) is matching against a pattern. Do read documentation.
Foliose answered 29/9, 2012 at 11:23 Comment(2)
I'd originally written it that way because I was parsing the output of elinks -dump, and I wanted the resulting string split by newlines. If the script got cancelled (I have a very unreliable net connection) then I could read in the list that elinks fetched and resume where I left off.Peripatetic
Hey @ZyX, you don't need zsh/mapfile to read a file, check my answer below.Canterbury
C
25

I know it's been a lot of time since the question was answered but I think it's worth posting a simpler answer (which doesn't require the zsh/mapfile external module):

#!/bin/zsh
for line in "${(@f)"$(</path/to/some/file.txt)"}"
{
  // do something with each $line
}
Canterbury answered 18/12, 2016 at 20:56 Comment(7)
mapfile does not spawn a subprocess and thus is a bit faster; also zsh/mapfile is shipped with zsh. BTW, I found what may be a problem for some people: none of the solutions to this question preserve empty lines. Mine loses empty lines twice: at (f) (need (@f) instead, and also needs removing trailing empty item) and at for ITEM in $LINES (need for ITEM in "${LINES[@]}"). Yours thrice: same plus ( $arr ) will loose them as well, need double quotes.Foliose
You're right about the empty lines, just fixed my answer. AFAIK the $(<file) syntax does not spawn a subprocess, can you prove otherwise?Canterbury
Indeed it does not (strace -e clone shows nothing relevant, but does show subprocess if I use cat in place of <), yet tests (time (for (( I=0; I<10000; I++ )) { V=$(</var/log/messages) }) vs time (for (( I=0; I<10000; I++ )) { V=${mapfile[/var/log/messages]} })) showed that mapfile is a bit (2.4s vs 1.7s total) faster on my system. Though cat is far slower. (Actually, I did not check whether there is a subprocess, just whether there is a difference in timing; but zsh has again shown that it is more optimized (bash does use clone() call and spawn a child).)Foliose
How is it that double quotes are not conflicting with each other? I mean you have a pair of them within another one.Trainload
@Trainload the shell does not parse it as nested strings but as a double-quoted string, followed a $() expansion, followed by another double-quoted string. Evaluates those in order and concatenates de results.Canterbury
@PabloLalloni You sure about that? The ${ and } are not treated as string components to be concatenated. Zsh can nest the quotes because ${ starts a variable expression that must be closed by the }.Cadell
This is kind of redundant, no? If you're just going to loop and discard the array, use while IFS=$'\n' read -r -d ''; do ...; done. See unix.stackexchange.com/a/29748/73256Salve
F
19
#!/bin/zsh
zmodload zsh/mapfile
FNAME=/path/to/some/file.txt
FLINES=( "${(f)mapfile[$FNAME]}" )
LIST="${mapfile[$FNAME]}" # Not required unless stuff uses it
integer POS=1             # Not required unless stuff uses it
integer SIZE=$#FLINES     # Number of lines, not required unless stuff uses it
for ITEM in $FLINES
    # Do stuff
    (( POS++ ))
done

You have some strange things in your code:

  1. Why are you splitting LIST each time instead of making it an array variable? It is just a waste of CPU time.
  2. Why don’t you use for ITEM in ${(f)LIST}?
  3. There is a possibility to directly ask zsh about array length: $#ARRAY. No need in determining the index of the last occurrence of the last element.
  4. POS gets the same value as SIZE in your code. Hence it will iterate only once.
  5. Brackets are problems likely because of 3.: (I) is matching against a pattern. Do read documentation.
Foliose answered 29/9, 2012 at 11:23 Comment(2)
I'd originally written it that way because I was parsing the output of elinks -dump, and I wanted the resulting string split by newlines. If the script got cancelled (I have a very unreliable net connection) then I could read in the list that elinks fetched and resume where I left off.Peripatetic
Hey @ZyX, you don't need zsh/mapfile to read a file, check my answer below.Canterbury
F
6

Let's say, for the purpose of example, that file.txt contains the following text:

one

two

three

The solution depends on whether or not you'd like to elide the empty lines in file.txt:


  • Creating an array lines from file file.txt, eliding empty lines:

    typeset -a lines=("${(f)"$(<file.txt)"}")
    
    print ${#lines}
    

    Expected output:

    3
    

  • Creating an array lines from file file.txt, without eliding empty lines:

    typeset -a lines=("${(@f)"$(<file.txt)"}")
    
    print ${#lines}
    

    Expected output:

    5
    

In the end, the difference in the resulting array is a result of whether or not the parameter expansion flag (@) is provided during brace expansion.

Fraya answered 28/10, 2021 at 11:52 Comment(4)
It's a great concise solution. Would you be kind enough to offer an explanation at the strange syntax if that's not too complcated since it's the first result on Google ? Humanity (and I) would thank you.Schonfeld
See zsh.sourceforge.io/Doc/Release/Parameters.html#Array-ParametersCombust
@Schonfeld to summarize: (@f) are "parameter expansion flags". The f splits the result into separate "words" at line breaks (i.e. \n, maybe also \r but I'm not sure about that). The @ ensures that the result of expansion is split into separate words even when quoted. The quoting itself ensures that empty strings '' are not dropped, so you usually want both @ and the quoting. The $(<file) syntax is an extension of command substitution syntax that simply reads the contents of a file, like $(cat file) but without spawning external process.Salve
Finally, ( ... ) defines an array, putting each word into a separate array element. So that whole incantation first reads from the file, splits each line into a separate word, and then puts each of those words into a separate array element, including completely empty lines. Note that typeset -a is redundant here, you can write only lines=( ... ) and it would work the same way.Salve
R
2
while read -r line; 
    do ARRAY+=("$line");
done < file.txt
Rachael answered 23/5, 2022 at 22:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.