Save modifications in place with NON GNU awk
Asked Answered
M

3

9

I have come across a question(on SO itself) where OP has to do edit and save operation into Input_file(s) itself.

I know for a single Input_file we could do following:

awk '{print "test here..new line for saving.."}' Input_file > temp && mv temp Input_file

Now lets say we need to make changes in same kind of format of files(assume .txt here).

What I have tried/thought for this problem: Its approach is going through a for loop of .txt files and calling single awk is a painful and NOT recommended process, since it will waste unnecessary cpu cycles and for more number of files it would be more slow.

So what possibly could be done here to perform inplace edit for multiple files with a NON GNU awk which does not support inplace option. I have also gone through this thread Save modifications in place with awk but there is nothing much for NON GNU awk vice and changing multiple files inplace within awk itself, since a non GNU awk will not have inplace option to it.

NOTE: Why I am adding bash tag since, in my answer part I have used bash commands to rename temporary files to their actual Input_file names so adding it.



EDIT: As per Ed sir's comment adding an example of samples here, though purpose of this thread's code could be used by generic purpose inplace editing too.

Sample Input_file(s):

cat test1.txt
onetwo three
tets testtest

cat test2.txt
onetwo three
tets testtest

cat test3.txt
onetwo three
tets testtest

Sample of expected output:

cat test1.txt
1
2

cat test2.txt
1
2

cat test3.txt
1
2
Mayhew answered 9/12, 2019 at 5:42 Comment(1)
Interesting and pertinent awk problem ++Fourthclass
M
6

Since main aim of this thread is how to do inplace SAVE in NON GNU awk so I am posting first its template which will help anyone in any kind of requirement, they need to add/append BEGIN and END section in their code keeping their main BLOCK as per their requirement and it should do the inplace edit then:

NOTE: Following will write all its output to output_file, so in case you want to print anything to standard output please only add print... statement without > (out) in following.

Generic Template:

awk -v out_file="out" '
FNR==1{
close(out)
out=out_file count++
rename=(rename?rename ORS:"") "mv \047" out "\047 \047" FILENAME "\047"
}
{
    .....your main block code.....
}
END{
 if(rename){
   system(rename)
 }
}
' *.txt


Specific provided sample's solution:

I have come up with following approach within awk itself (for added samples following is my approach to solve this and save output into Input_file itself)

awk -v out_file="out" '
FNR==1{
  close(out)
  out=out_file count++
  rename=(rename?rename ORS:"") "mv \047" out "\047 \047" FILENAME "\047"
}
{
  print FNR > (out)
}
END{
  if(rename){
    system(rename)
  }
}
' *.txt

NOTE: this is only a test for saving edited output into Input_file(s) itself, one could use its BEGIN section, along with its END section in their program, main section should be as per the requirement of specific question itself.

Fair warning: Also since this approach makes a new temporary out file in path so better make sure we have enough space on systems, though at final outcome this will keep only main Input_file(s) but during operations it needs space on system/directory



Following is a test for above code.

Execution of program with an example: Lets assume following are the .txt Input_file(s):

cat << EOF > test1.txt
onetwo three
tets testtest
EOF

cat << EOF > test2.txt
onetwo three
tets testtest
EOF

cat << EOF > test3.txt
onetwo three
tets testtest
EOF

Now when we run following code:

awk -v out_file="out" '
FNR==1{
  close(out)
  out=out_file count++
  rename=(rename?rename ORS:"") "mv \047" out "\047 \047" FILENAME "\047"
}
{
  print "new_lines_here...." > (out)
}
END{
  if(rename){
    system("ls -lhtr;" rename)
  }
}
' *.txt

NOTE: I have place ls -lhtr in system section intentionally to see which output files it is creating(temporary basis) because later it will rename them into their actual name.

-rw-r--r-- 1 runner runner  27 Dec  9 05:33 test2.txt
-rw-r--r-- 1 runner runner  27 Dec  9 05:33 test1.txt
-rw-r--r-- 1 runner runner  27 Dec  9 05:33 test3.txt
-rw-r--r-- 1 runner runner  38 Dec  9 05:33 out2
-rw-r--r-- 1 runner runner  38 Dec  9 05:33 out1
-rw-r--r-- 1 runner runner  38 Dec  9 05:33 out0

When we do a ls -lhtr after awk script is done with running, we could see only .txt files in there.

-rw-r--r-- 1 runner runner  27 Dec  9 05:33 test2.txt
-rw-r--r-- 1 runner runner  27 Dec  9 05:33 test1.txt
-rw-r--r-- 1 runner runner  27 Dec  9 05:33 test3.txt


Explanation: Adding a detailed explanation of above command here:

awk -v out_file="out" '                                    ##Starting awk program from here, creating a variable named out_file whose value SHOULD BE a name of files which are NOT present in our current directory. Basically by this name temporary files will be created which will be later renamed to actual files.
FNR==1{                                                    ##Checking condition if this is very first line of current Input_file then do following.
  close(out)                                               ##Using close function of awk here, because we are putting output to temp files and then renaming them so making sure that we shouldn't get too many files opened error by CLOSING it.
  out=out_file count++                                     ##Creating out variable here, whose value is value of variable out_file(defined in awk -v section) then variable count whose value will be keep increment with 1 whenever cursor comes here.
  rename=(rename?rename ORS:"") "mv \047" out "\047 \047" FILENAME "\047"     ##Creating a variable named rename, whose work is to execute commands(rename ones) once we are done with processing all the Input_file(s), this will be executed in END section.
}                                                          ##Closing BLOCK for FNR==1  condition here.
{                                                          ##Starting main BLOCK from here.
  print "new_lines_here...." > (out)                       ##Doing printing in this example to out file.
}                                                          ##Closing main BLOCK here.
END{                                                       ##Starting END block for this specific program here.
  if(rename){                                              ##Checking condition if rename variable is NOT NULL then do following.
    system(rename)                                         ##Using system command and placing renme variable inside which will actually execute mv commands to rename files from out01 etc to Input_file etc.
  }
}                                                          ##Closing END block of this program here.
' *.txt                                                    ##Mentioning Input_file(s) with their extensions here.
Mayhew answered 9/12, 2019 at 5:42 Comment(0)
B
5

I'd probably go with something like this if I were to try to do this:

$ cat ../tst.awk
FNR==1 { saveChanges() }
{ print FNR > new }
END { saveChanges() }

function saveChanges(   bak, result, mkBackup, overwriteOrig, rmBackup) {
    if ( new != "" ) {
        bak = old ".bak"
        mkBackup = "cp \047" old "\047 \047" bak "\047; echo \"$?\""
        if ( (mkBackup | getline result) > 0 ) {
            if (result == 0) {
                overwriteOrig = "mv \047" new "\047 \047" old "\047; echo \"$?\""
                if ( (overwriteOrig | getline result) > 0 ) {
                    if (result == 0) {
                        rmBackup = "rm -f \047" bak "\047"
                        system(rmBackup)
                    }
                }
            }
        }
        close(rmBackup)
        close(overwriteOrig)
        close(mkBackup)
    }
    old = FILENAME
    new = FILENAME ".new"
}

$ awk -f ../tst.awk test1.txt test2.txt test3.txt

I'd have preferred to copy the original file to the backup first and then operate on that saving changes to the original but doing so would change the value of the FILENAME variable for every input file which is undesirable.

Note that if you had an original files named whatever.bak or whatever.new in your directory then you'd overwrite them with temp files so you'd need to add a test for that too. A call to mktemp to get the temp file names would be more robust.

The FAR more useful thing to have in this situation would be a tool that executes any other command and does the "inplace" editing part since that could be used to provide "inplace" editing for POSIX sed, awk, grep, tr, whatever and wouldn't require you to change the syntax of your script to print > out etc. every time you want to print a value. A simple, fragile, example:

$ cat inedit
#!/bin/env bash

for (( pos=$#; pos>1; pos-- )); do
    if [[ -f "${!pos}" ]]; then
        filesStartPos="$pos"
    else
        break
    fi
done

files=()
cmd=()
for (( pos=1; pos<=$#; pos++)); do
    arg="${!pos}"
    if (( pos < filesStartPos )); then
        cmd+=( "$arg" )
    else
        files+=( "$arg" )
    fi
done

tmp=$(mktemp)
trap 'rm -f "$tmp"; exit' 0

for file in "${files[@]}"; do
    "${cmd[@]}" "$file" > "$tmp" && mv -- "$tmp" "$file"
done

which you'd use as follows:

$ awk '{print FNR}' test1.txt test2.txt test3.txt
1
2
1
2
1
2

$ ./inedit awk '{print FNR}' test1.txt test2.txt test3.txt

$ tail test1.txt test2.txt test3.txt
==> test1.txt <==
1
2

==> test2.txt <==
1
2

==> test3.txt <==
1
2

One obvious problem with that inedit script is the difficulty of identifying the input/output files separately from the command when you have multiple input files. The script above assumes all of the input files appear as a list at the end of the command and the command is run against them one at a time but of course that means you can't use it for scripts that require 2 or more files at a time, e.g.:

awk 'NR==FNR{a[$1];next} $1 in a' file1 file2

or scripts that set variables between files in the arg list, e.g.:

awk '{print $7}' FS=',' file1 FS=':' file2

Making it more robust left as an exercise for the reader but look to the xargs synopsis as a starting point for how a robust inedit would need to work :-).

Biome answered 12/12, 2019 at 2:8 Comment(0)
S
1

The shell solution is simple and likely quick enough:

for f in *.txt
do  awk '...' "$f" > "$f.tmp"
    mv "$f.tmp" "$f"
done

Only search for a different solution if you have conclusively demonstrated that this is too slow. Remember: premature optimization is the root of all evil.

Suffice answered 9/12, 2019 at 14:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.