Print lines from their beginning to selected characters
Asked Answered
G

8

6

I want to print lines from their beginning to selected characters.
Example:

a/b/i/c/f/d  
a/e/b/f/r/c  
a/f/d/g  
a/n/m/o  
a/o/p/d/l  
a/b/c/d/e  
a/c/e/v  
a/d/l/k/f  
a/e/f/c   
a/n/d/c

Command:

 ./hhh.csh 03_input.txt c

Output:

a/b/i/c  
a/e/b/f/r/c  
a/b/c  
a/c   
a/e/f/c  
a/n/d/c

I use this code but in the condition $i ==a I don't see the values ​​being checked against the first value I assigned.

awk'  
BEGIN{  
ARGC=2  
 first = ARGV[2]  
}  
{  
for(i=1;i<=NF;++i){  
arr[i]=$i  
if($i == first){  
print arr[i]  
}    
}  
}' "$1" "$2"  
Gielgud answered 14/6 at 6:15 Comment(4)
As an aside, Csh stopped being popular in the 1990s for very, very good reasons. You should probably write your scripts for standard sh, even when they are completely trivial. (Separately, probably don't put any extensions on your interactive scripts; you don't need to know or care which language ls is implemented in, and nor should it matter for tools of your own.)Mme
There needs to be a space between awk and the single quote.Mme
@ThụyPhạm, Good that you have accepted an answer, you could check all answers and for helpful of them you could Up-vote answers too.Kelley
Do you want to print up to a character or up to a /-separated string, or up to a string that matches a regexp or something else? Your current answers would do different things with different characters, different input, etc. and I strongly suspect they don't do what you actually want, you just haven't thought about or tested with the right input yet to discover the problems. See #65621825 for considerations and then clarify your requirements and provide more realistic sample input/output, maybe in a new questionPutscher
S
8

As awk is tagged, filter for a match on /c/, then print the substr from position 1 to position RSTART, which is where the pattern was found:

# expecting the filename (e.g. 03_input.txt) on $1,
# and the pattern (e.g. c) on $2
awk -v pat="$2" 'match($0, pat) {print substr($0, 1, RSTART)}' "$1"
a/b/i/c
a/e/b/f/r/c
a/b/c
a/c
a/e/f/c
a/n/d/c

Note: You may want to replace RSTART with RSTART+RLENGTH-1 if pattern longer than one character are expected.

Senegambia answered 14/6 at 6:26 Comment(0)
E
5

Another short awk approach is just to select only lines containing a 'c' and then substitute everything from the 'c' to the end of line with a 'c', e.g.

awk '/c/ {sub(/c.*$/,"c"); print}' file

Example Use/Output

With your example in file that would result in:

$ awk '/c/ {sub(/c.*$/,"c"); print}' file
a/b/i/c
a/e/b/f/r/c
a/b/c
a/c
a/e/f/c
a/n/d/c

Which matches your shown result. Let me know if you have questions.

Using sed

Even shorter and using the same logic:

sed -n '/c/s/c.*$/c/p' file

(same output)

Enrollment answered 14/6 at 6:51 Comment(2)
If I am not mistaken, the sed could be even shorter to sed -n 's/c.*/c/p' file right?Silvey
@Thefourthbird Right you are! The greedy nature of REGEX was overlooked :)Enrollment
K
5

With your shown samples please try following awk code. Written and tested with shown samples only. Also written and tested in GNU awk. Setting field separator as ^c\\/|^c/?$|\\/c[[:space:]]*$|\\/c\\/ and then in main program checking if number of fields are more than 1 then print 1st field following /c. Else checking if line matches regex $0~/^\/?c\/?$/ then print that line too.

awk -F'^c\\/|^c/?$|\\/c[[:space:]]*$|\\/c\\/' '
NF>1{
  print $1 "/c"
  next
}
$0~/^\/?c\/?$/
'   Input_file
Kelley answered 14/6 at 7:19 Comment(2)
I am an old-timer, I like the picket-fence look :). Glad to see you are alive and kicking!Enrollment
@DavidC.Rankin, trying to stick aroundKelley
M
3

This could be done with just grep.

#!/bin/sh
char=$1
shift
grep -o ".*$char" "$@"

I switched the order of the arguments so that you can specify an arbitrary number of input files, like you can for all sane file-prcessing tools in Unix (grep, cut, awk, ls, etc etc etc).

When you specify multiple input files to grep, it will prefix each match with the name of the file which contains it. Use grep -h if you don't want that.

If there are multiple occurrences of the desired character (or really, substring), this will print up through the last one.


The immediate problem with your Awk attempt is that you loop over whitespace-separated fields, but the logic appears to assume slash-separated fields. Did you forget -F /?

Mme answered 14/6 at 6:40 Comment(0)
A
3

After the awk and grep solutions, it can also be done with sed:

#!/bin/bash
char=$1
shift
sed -n "s/$char.*/$char/p" "$@"
Armijo answered 14/6 at 6:52 Comment(1)
I like the way you think.Enrollment
S
3

In your example there is no field separator set, so $1 would point to the first field which is the string with all the characters. Instead you can set the field separator to /

Instead of adding the character to an array, you can do a string concatenation of all the characters that should be printed when there is a match so you don't have to loop through the array to print the result.

Instead of using ARGV I would add a parameter "first" like -v first="c"

If there is a match, you can use next to stop processing and go to the next record.

awk -F/ -v first="c" '
{
  result = ""
  for(i=1;i<=NF;++i){
    result = (result == "" ? $i : result FS $i)
    if($i == first) {
      print result
      next
    }
  }
}' file

Output

a/b/i/c
a/e/b/f/r/c
a/b/c
a/c
a/e/f/c
a/n/d/c

As an alternative, if you want to print until the first occurrence of c you could use grep with a negated character class

grep -o "[^c]*c" file
Silvey answered 14/6 at 6:56 Comment(2)
The grep command is useful, but I didn't want to go there because the behavior isn't what you expect for a multi-character string; so, for example, grep -o "^[^ex]*ex" will look for lines which do not contain any e or x before the first occurrence of ex (i.e. it will not print anything for the line fedex); or if you omit the beginning of line anchor, it will print the text after the last e or x and up through the first ex (so dex out of fedex).Mme
@Mme Yes, but in this case is it about a single character cSilvey
S
2

I want to explain behavior of your code

awk'  
BEGIN{  
ARGC=2  
 first = ARGV[2]  
}  
{  
for(i=1;i<=NF;++i){  
arr[i]=$i  
if($i == first){  
print arr[i]  
}    
}  
}' "$1" "$2"

you are using for loop to iterate over fields, but you do not set FS (field separator) so GNU AWK assumes you are dealing with file where fields are sheared by one-or-more white-space characters, which for file like

a/b/i/c/f/d  
a/e/b/f/r/c  
a/f/d/g  
a/n/m/o  
a/o/p/d/l  
a/b/c/d/e  
a/c/e/v  
a/d/l/k/f  
a/e/f/c   
a/n/d/c

means each line has 1 file (awk '{print NF}' file.txt would output just 1s).

You might exploit GNU AWK field splitting to get your task done following way

awk -v character="c" 'BEGIN{FS=character;ORS=FS"\n"}NF>1{print $1}' file.txt

which for file as above gives output

a/b/i/c
a/e/b/f/r/c
a/b/c
a/c
a/e/f/c
a/n/d/c

Explanation: I inform GNU AWK that provided character should be treated as field separator (FS) and that output row separator (ORS) which is added after each print is said character followed by newline. For each line with more than 1 field I print 1st field. Disclaimer: this solution assumes that character variable always hold exactly 1 character, if you are unable to meet this requirement ignore this answer entirely.

(tested in GNU Awk 5.1.0)

Seigler answered 14/6 at 10:35 Comment(0)
G
0

The awk based solution is either 5 bytes longer than the sed one, end-to-end, or 2 bytes shorter.

echo 'a/b/i/c/f/d  
a/e/b/f/r/c
a/f/d/g
a/n/m/o
a/o/p/d/l
a/b/c/d/e
a/c/e/v
a/d/l/k/f
a/e/f/c
a/n/d/c' | 

awk 'NF=1<NF' FS=c ORS=c\\n     # 5-bytes longer
awk 'sub(/c.*/,"c")'            # 2-bytes shorter

sed -n '/c/s/c.*$/c/p'          # credit @ David C. Rankin

a/b/i/c
a/e/b/f/r/c
a/b/c
a/c
a/e/f/c
a/n/d/c
General answered 17/6 at 11:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.