Regular expression to only match X number of characters from end of line
Asked Answered
G

3

8

Below you'll see a small excerpt of matches from the string 'octeon' in a 32b memory dump from a proprietary routing device. As you can see it contains some adjusted ASCII extending to 16 characters from the end of the line, then four 32-bit words (8 characters each, of course), then the address offset.

000b27a0: 41646a75 7374206f 6374656f 6e5f6970    Adjust octeon_ip
000b2850: 73740a00 00000000 6f637465 6f6e5f72    st......octeon_r
000b2870: 5f73697a 65000000 6f637465 6f6e5f72    _size...octeon_r
000b2990: 6164696e 672e0a00 6f637465 6f6e5f72    ading...octeon_r
000b29b0: 785f7369 7a650000 6f637465 6f6e5f72    x_size..octeon_r
000b3050: 780a0000 00000000 6f637465 6f6e5f70    x.......octeon_p
000b3650: 6564204f 6374656f 6e206d6f 64656c0a    ed Octeon model.
000bade0: 20307825 71780a00 6f637465 6f6e5f6c     0x%qx..octeon_l
000bafd0: 696e6720 4f637465 6f6e2045 78656375    ing Octeon Execu
000bd710: 6564204f 6374656f 6e204d6f 64656c21    ed Octeon Model!
000bd950: 4f435445 4f4e2070 61737320 3120646f    OCTEON pass 1 do
000bda20: 6564206f 6374656f 6e206d6f 64656c3a    ed octeon model:

While that data contains some useful information, tragically, the operating system (HiveOS) makes no attempt to allocate memory contiguously or to coalesce disparate heaps (and why should they?), so the vast majority of memory is a barren yet-to-be-malloc'd heap.

0004d6b0: 00000000 00000000 00000000 00000000    ................
0004d6c0: 00000000 00000000 00000000 00000000    ................
0004d6d0: 00000000 00000000 00000000 00000000    ................
0004d6e0: 00000000 00000000 00000000 00000000    ................
0004d6f0: 00000000 00000000 00000000 00000000    ................
0004d700: 00000000 00000000 00000000 00000000    ................
0004d710: 00000000 00000000 00000000 00000000    ................
0004d720: 00000000 00000000 00000000 00000000    ................
0004d730: 00000000 00000000 00000000 00000000    ................
0004d740: 00000000 00000000 00000000 00000000    ................
0004d750: 00000000 00000000 00000000 00000000    ................

I'd like to quickly and efficiently pull out strings of a certain size matching some arbitrary regular expression pattern ([a-zA-z] comes to mind) You might naturally think that running the perennial object dump examination favorite 'strings' would yield a result, but the md util is a cruel mistress -- due to the presence of ascii coded hexadecimal banks & addresses, it identifies every line as containing a 'string'.

Sure, we all know there exists a trivial scripting solution (for line in hexdump: f.write(line[-16:]) + grep '[A-z]' f).

However, sometimes I'm struck with the feeling that I should come to understand these dastardly oppressive, yet misunderstood regular expressions better, rather than slinking back to my easy to use newfangled programmin' languages. I really feel I can't start growing a real Unix neckbeard until I've completely replaced my entire development toolchain life with various stream editor and Awk script's regular expressions.

How does one match [a-zA-z] within a certain numbers of characters from the end of line (In my case, 16) -- it seems like a pretty pithy construction but all combination of +, ? {16} and otherwise that made sense to me in the past few minutes have promptly failed.

Ghyll answered 15/3, 2012 at 17:25 Comment(1)
Maybe I've misunderstood the request. Are you wanting to get the last 16 characters from each line or extract only letters from the last 16 characters?Oro
C
6

Use the "non-matching" switch -v:

grep -v \.{16}$

This will strip out all lines ending with 16 dots.

Here's the man documentation for it:

-v, --invert-match
Invert the sense of matching, to select non-matching lines.

Candide answered 16/3, 2012 at 23:15 Comment(0)
O
4

Does this do what you want? ".{16}$"

That will match any 16 characters from the end of the line. The $ ensures it matches the end of the line.


After closer inspection, if you want to extract only the lines that are not all periods, you could use this regex: "{4}(.*?\w.*?)$" There is a space before the {4} so that it matches the delimiter between the digits and the end of the line. It's not technically "only 16 characters," but given the data set, it does appear to provide the desired output. (Assuming the desired output is any line that has a word character in it, which is letters/numbers/underscore.)

Oro answered 15/3, 2012 at 17:34 Comment(6)
You're on the right track, but applying [a-zA-z]{16}$ doesn't work :(Ghyll
It looks like you want a result whenever it's not ................, so you could use a negative lookahead to make sure that doesn't match: (?![.]{16}).{16}$, which should match the last 16 characters on each line, unless it's a string of 16 periods.Old
@Old - I ran a quick test with (?![.]{16}).{16}$, but that seems to still return even the lines with all periods.Oro
You may need to enable multi-line mode (how to do this depends on what you are using to run the regex) for the $ anchor to behave as expected in this situation.Old
Actually, if you are just using grep, I don't think it supports lookaheads at all, so that won't work. You could consider just running grep with the initial regex suggested (.{16}$), and then filter out the lines with only periods with a grep -v?Old
[a-zA-z]{16}$ won't work because your original data set has more than just alphabetic characters in the last 16 spots on each line. They are intermixed with periods, spaces, underscores, and some other characters.Oro
C
2

A cheap trick to filter interesting lines is to fill selection with any character until end of line. Here I select a character which is not a point and which is no further than 15 character from the end of line. (You use posix regex so you should write the repetition quantifier between \{ \} and not { } )

grep '[^.].\{1,15\}$'

Then you can pipe result with another grep to test, or you can adapt the idea to another regex:

grep 'abc.\{1,13\}$'

will mach string "abc" in the 16 last characters.

Carriecarrier answered 16/3, 2012 at 22:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.