I want to run ack or grep on HTML files that often have very long lines. I don't want to see very long lines that wrap repeatedly. But I do want to see just that portion of a long line that surrounds a string that matches the regular expression. How can I get this using any combination of Unix tools?
You could use the grep options -oE
, possibly in combination with changing your pattern to ".{0,10}<original pattern>.{0,10}"
in order to see some context around it:
-o, --only-matching Show only the part of a matching line that matches PATTERN. -E, --extended-regexp Interpret pattern as an extended regular expression (i.e., force grep to behave as egrep).
For example (from @Renaud's comment):
grep -oE ".{0,10}mysearchstring.{0,10}" myfile.txt
Alternatively, you could try -c
:
-c, --count Suppress normal output; instead print a count of matching lines for each input file. With the -v, --invert-match option (see below), count non-matching lines.
$ echo "eeeeeeeeeeeeeeeeeeeeqqqqqqqqqqqqqqqqqqqqMYSTRINGwwwwwwwwwwwwwwwwwwwwrrrrrrrrrrrrrrrrrrrrr" > fileonelongline.txt && grep -oE ".{0,20}MYSTRING.{0,20}" ./fileonelongline.txt
prints qqqqqqqqqqqqqqqqqqqqMYSTRINGwwwwwwwwwwwwwwwwwwww
–
Bebe oE ".{0,20}mysearchstring.{0,20}"
, you lose the highlighting of the inner "original" string against the context, because the whole thing becomes the search pattern. Would love to find a way to keep some non-highlighted context around the search results, for much easier visual scanning and result interpretation. –
Castaway -oE ".{0,x}foo.{0,x}"
approach (where x
is the number of characters of context) -- append ` | grep foo ` to the end. Works for either ack or grep solutions. More solutions also here: unix.stackexchange.com/questions/163726/… –
Castaway Pipe your results thru cut
. I'm also considering adding a --cut
switch so you could say --cut=80
and only get 80 columns.
| cut=c1-120
to the grep, worked for me (though don't know how to cut around matched text) –
Joost | cut=c1-120
didn't work for me, I needed to do | cut -c1-120
–
Necessitarianism | cut -c 1-100
https://mcmap.net/q/120607/-how-to-truncate-long-matching-lines-returned-by-grep-or-ack –
Khorma --no-wrap
option that uses $COLUMNS
? –
Hurd $COLUMNS
–
Arette You could use less as a pager for ack and chop long lines: ack --pager="less -S"
This retains the long line but leaves it on one line instead of wrapping. To see more of the line, scroll left/right in less with the arrow keys.
I have the following alias setup for ack to do this:
alias ick='ack -i --pager="less -R -S"'
--pager
command in your ~/.ackrc file, if you always want to use it. –
Arette ack
. –
Generation ack
is pretty much just like grep
, only simpler in the most common cases –
Castaway grep -oE ".{0,10}error.{0,10}" mylogfile.txt
In the unusual situation where you cannot use -E
, use lowercase -e
instead.
grep -oE ".{0,10}error.{0,10}" mylogfile.txt
- at least in Z zhell –
Keeney To get characters from 1 to 100.
cut -c 1-100
You might want to base the range off the current terminal, e.g.
cut -c 1-$(tput cols)
I put the following into my .bashrc
:
grepl() {
$(which grep) --color=always $@ | less -RS
}
You can then use grepl
on the command line with any arguments that are available for grep
. Use the arrow keys to see the tail of longer lines. Use q
to quit.
Explanation:
grepl() {
: Define a new function that will be available in every (new) bash console.$(which grep)
: Get the full path ofgrep
. (Ubuntu defines an alias forgrep
that is equivalent togrep --color=auto
. We don't want that alias but the originalgrep
.)--color=always
: Colorize the output. (--color=auto
from the alias won't work sincegrep
detects that the output is put into a pipe and won't color it then.)$@
: Put all arguments given to thegrepl
function here.less
: Display the lines usingless
-R
: Show colorsS
: Don't break long lines
Taken from: http://www.topbug.net/blog/2016/08/18/truncate-long-matching-lines-of-grep-a-solution-that-preserves-color/
The suggested approach ".{0,10}<original pattern>.{0,10}"
is perfectly good except for that the highlighting color is often messed up. I've created a script with a similar output but the color is also preserved:
#!/bin/bash
# Usage:
# grepl PATTERN [FILE]
# how many characters around the searching keyword should be shown?
context_length=10
# What is the length of the control character for the color before and after the
# matching string?
# This is mostly determined by the environmental variable GREP_COLORS.
control_length_before=$(($(echo a | grep --color=always a | cut -d a -f '1' | wc -c)-1))
control_length_after=$(($(echo a | grep --color=always a | cut -d a -f '2' | wc -c)-1))
grep -E --color=always "$1" $2 |
grep --color=none -oE \
".{0,$(($control_length_before + $context_length))}$1.{0,$(($control_length_after + $context_length))}"
Assuming the script is saved as grepl
, then grepl pattern file_with_long_lines
should display the matching lines but with only 10 characters around the matching string.
The Silver Searcher (ag) supports its natively via the --width NUM
option. It will replace the rest of longer lines by [...]
.
Example (truncate after 120 characters):
$ ag --width 120 '@patternfly'
...
1:{"version":3,"file":"react-icons.js","sources":["../../node_modules/@patternfly/ [...]
In ack3, a similar feature is planned but currently not implemented.
ag
, the width "begins" with first character, so this doesn't quite work when the string is in the middle of a very long line –
Lotti Here's what I do:
function grep () {
tput rmam;
command grep "$@";
tput smam;
}
In my .bash_profile, I override grep so that it automatically runs tput rmam
before and tput smam
after, which disabled wrapping and then re-enables it.
ag
can also take the regex trick, if you prefer it:
ag --column -o ".{0,20}error.{0,20}"
bgrep
if lines don't necessarily fit into memory
grep
only works if the lines fit into memory, but bgrep also works on huge lines that don't.
I keep coming back to this random repo from time to time: https://github.com/tmbinc/bgrep Install:
curl -L 'https://github.com/tmbinc/bgrep/raw/master/bgrep.c' | gcc -O2 -x c -o $HOME/.local/bin/bgrep -
Use:
bgrep `printf %s saf | od -t x1 -An -v | tr -d '\n '` myfile.bin
Sample output:
myfile.bin: c80000003
\x02abc
myfile.bin: c80000007
dabc
I have tested it on files that don't fit into memory, and it worked just fine.
I've given further details at: https://unix.stackexchange.com/questions/223078/best-way-to-grep-a-big-binary-file/758528#758528
© 2022 - 2024 — McMap. All rights reserved.
ack
? Is it a command you use when you don't like something? Something likeack file_with_long_lines | grep pattern
? :-) – Armsteadack
(known asack-grep
on Debian) isgrep
on steroids. It also has the--thpppt
option (not kidding). betterthangrep.com – Spoof--thpppt
feature is somewhat controversial, the key advantage appears to be that you can use Perl regexes directly, not some crazy[[:space:]]
and characters like{
,[
, etc. changing meaning with the-e
and-E
switches in a way that's impossible to remember. – Badalonagrep --color=always | less -S -R
. Then, type-R
to unfold/fold the lines. – Quimper