Using grep to search for hex strings in a file
Asked Answered
N

6

51

Does anyone know how to get grep, or similar tool, to retrieve offsets of hex strings in a file?

I have a bunch of hexdumps (from GDB) that I need to check for strings and then run again and check if the value has changed.

I have tried hexdump and dd, but the problem is because it's a stream, I lose my offset for the files.

Someone must have had this problem and a workaround. What can I do?

To clarify:

  • I have a series of dumped memory regions from GDB (typically several hundred MB)
  • I am trying to narrow down a number by searching for all the places the number is stored, then doing it again and checking if the new value is stored at the same memory location.
  • I cannot get grep to do anything because I am looking for hex values so all the times I have tried (like a bazillion, roughly) it will not give me the correct output.
  • The hex dumps are just complete binary files, the paterns are within float values at larges so 8? bytes?
  • The patterns are not line-wrapping, as far as I am aware. I am aware of the what it changes to, and I can do the same process and compare the lists to see which match.

Perl COULD be a option, but at this point, I would assume my lack of knowledge with bash and its tools is the main culprit.

Desired output format

It's a little hard to explain the output I am getting since I really am not getting any output.

I am anticipating (and expecting) something along the lines of:

<offset>:<searched value>

Which is the pretty well standard output I would normally get with grep -URbFo <searchterm> . > <output>

What I tried:

A. Problem is, when I try to search for hex values, I get the problem of if just not searching for the hex values, so if I search for 00 I should get like a million hits, because thats always the blankspace, but instead its searching for 00 as text, so in hex, 3030. Any idea's?

B. I CAN force it through hexdump or something of the link but because its a stream it will not give me the offsets and filename that it found a match in.

C. Using grep -b option doesnt seem to work either, I did try all the flags that seemed useful to my situation, and nothing worked.

D. Using xxd -u /usr/bin/xxd as an example I get a output that would be useful, but I cannot use that for searching..

0004760: 73CC 6446 161E 266A 3140 5E79 4D37 FDC6  s.dF..&j1@^yM7..
0004770: BF04 0E34 A44E 5BE7 229F 9EEF 5F4F DFFA  ...4.N[."..._O..
0004780: FADE 0C01 0000 000C 0000 0000 0000 0000  ................

Nice output, just what I want to see, but it just doesn't work for me in this situation..

E. Here are some of the things I've tried since posting this:

xxd -u /usr/bin/xxd | grep 'DF'
00017b0: 4010 8D05 0DFF FF0A 0300 53E3 0610 A003  @.........S.....

root# grep -ibH "df" /usr/bin/xxd
Binary file /usr/bin/xxd matches
xxd -u /usr/bin/xxd | grep -H 'DF'
(standard input):00017b0: 4010 8D05 0DFF FF0A 0300 53E3 0610 A003  @.........S.....
Nora answered 12/6, 2011 at 3:5 Comment(8)
I think we need a clearer walkthrough of what you are doing.Gutbucket
is it a stream or is it a file? What have you tried, what output do you expect, what are you getting for output. Good Luck!Jerky
What format do the hex dumps take? Do the patterns that you are looking for wrap around lines? Is there an offset at the start of each line? How long is the pattern you are looking for? When the pattern changes, do you know what it changes to? How big are the hex dumps? Did you consider using Perl?Zimbabwe
See the post #1733958 . This has helpful formatting and examples of hex data. Still to hard to tell what you're try to accomplish and what your problems. !\Questions in the form of 1. I have this input, 2. I want this output. 3. (but) I'm getting this output, 4. with this code .... {code here} .... have a much better chance of getting a reasonable response in a reasonable amount of time ;-) Good luck.Jerky
Also, do you know about grep -b srchTarget file file ...? The -b means binary search. Reading the man page for GNU grep doesn't make me certain that it will help with your situation, but it's worth a try. ( GNU grep man page seems to say that -b is for DOS versus Unix line endings. I had assumed from other posts here on S.O. that it would also deal with NUL (\000) chars, like in a hex dump. I don't have a way to test this right now). Good luck.Jerky
So xxd -u /usr/bin/xxd | grep 'srchTarg' doesn't give you what you want? With srchTarg being the hex string you are looking for? Give us an example besides 3030. Also please compose a sample of output you need and paste in in using the formatting tools at the top of the edit box; Hover over {} and you'll see 'Code Sample ..'. Good luck.Jerky
@user650649: Does @jm666 's answer help you? Thank you for updating your posting BUT you didn't tell us what is wrong with output from xxd -u /usr/bin/xxd | grep 'DF'. And if you don't like the output from xxd -u /usr/bin/xxd | grep -H 'DF' because of the header (like) (standard input):. then try xxd -u /usr/bin/xxd > /tmp/xxd.hex ; grep -H 'DF' /tmp/xxd.hex. Good luck.Jerky
xxd -u /usr/bin/xxd > /tmp/xxd.hex ; grep -H 'DF' /tmp/xxd.hex looks like it might just work, using the -ps flag instead to get a direct output. The only problem I am running into is that for some reason I am getting a . every 30 bytes. I found away from it using -c 10000000000 but its not a very elegant solution but it works. Thanks so much for all the help guys, you got it for me shelter!Nora
J
15

We tried several things before arriving at an acceptable solution:

xxd -u /usr/bin/xxd | grep 'DF'
00017b0: 4010 8D05 0DFF FF0A 0300 53E3 0610 A003  @.........S.....


root# grep -ibH "df" /usr/bin/xxd
Binary file /usr/bin/xxd matches
xxd -u /usr/bin/xxd | grep -H 'DF'
(standard input):00017b0: 4010 8D05 0DFF FF0A 0300 53E3 0610 A003  @.........S.....

Then found we could get usable results with

xxd -u /usr/bin/xxd > /tmp/xxd.hex ; grep -H 'DF' /tmp/xxd

Note that using a simple search target like 'DF' will incorrectly match characters that span across byte boundaries, i.e.

xxd -u /usr/bin/xxd | grep 'DF'
00017b0: 4010 8D05 0DFF FF0A 0300 53E3 0610 A003  @.........S.....
--------------------^^

So we use an ORed regexp to search for ' DF' OR 'DF ' (the searchTarget preceded or followed by a space char).

The final result seems to be

xxd -u -ps -c 10000000000 DumpFile > DumpFile.hex
egrep ' DF|DF ' Dumpfile.hex

0001020: 0089 0424 8D95 D8F5 FFFF 89F0 E8DF F6FF  ...$............
-----------------------------------------^^
0001220: 0C24 E871 0B00 0083 F8FF 89C3 0F84 DF03  .$.q............
--------------------------------------------^^
Jerky answered 13/6, 2011 at 0:23 Comment(2)
What I actually ended up using for xxd was: xxd -ps -u -c 100000000000000000000 input.file > output.file in order to get rid of the excess information and give me raw hex. This gave me a way to use grep to search the hex itself, but when it returns a offset, remember to divide the offset by 2 to get the actual offset. Thank you so much for your help! Oh, and I cant vote up yet..Nora
xxd has -g option which will help you prevent matching across two bytes. I.e. use xxd -g1 instead of xxd.Bozen
F
87

This seems to work for me:

LANG=C grep --only-matching --byte-offset --binary --text --perl-regexp "<\x-hex pattern>" <file>

short form:

LANG=C grep -obUaP "<\x-hex pattern>" <file>

Example:

LANG=C grep -obUaP "\x01\x02" /bin/grep

Output (cygwin binary):

153: <\x01\x02>
33210: <\x01\x02>
53453: <\x01\x02>

So you can grep this again to extract offsets. But don't forget to use binary mode again.

Note: LANG=C is needed to avoid utf8 encoding issues.

Flavopurpurin answered 18/6, 2013 at 12:27 Comment(10)
Caveat: Darwin's (OS X's) and hence presumably also BSD's grep does not have the --perl-regexp option.Courtnay
What am I missing? xxd -u system.raw.img.tmp | grep 53EF | wc -l gives me 2105, grep -obUaP "\x53\xEF" system.raw.img.tmp | wc -l gives me 18Verse
Got it; UTF8 messes this up. This works: LANG=C grep -obUaP "\x53\xEF" system.raw.img.tmpVerse
Unfortunately I can't use this to search hex strings that have \x0A in them. I had to write my own hex search tool.Clercq
Wow, this actually works with the Windows port of grep out of the box ^^Centavo
Had almost same issue - to extract a zip file list of files, but could not get the part of file directly into a variable, as the null bytes were removed. So my solution is to find byte offset of start and stop of that part, store in a variable, increase the address to appropriate place, then extract that part with dd. grep -boaP "\x50\x4b\x05\x06" weirdfiles.zip | cut -d: -f1 Result: 100483 Parameters: b gives byte offsett, o shows only match and gets match start instead of line start, a will search in binary files - ones with non printable characters, P Perl regexByandby
Thanks it works! I am able to find every file that has the hex byte sequence 46 41 43 00 using this command: find . -type f -exec grep -q -obUaP "\x46\x41\x43\x00" {} \; -exec echo {} \;Omega
You should get double points for "LANG=C", since otherwise grep can't find bytes that are plain as day in hexdump.Denudation
NOTE: this does not work reliably!!! I'm searching .o files plus the resulting files. Only some .o files show up but the the resulting file. So there are still some issues. \x04\xe7\x88\x2f\x00\x2f\x2a does not work, but \xe7\x88\x2f\x00\x2f\x2a finds more results, despite the \x04 is present.Engraving
@truthadjustr to list matching files, you should use the option -l, e.g.: find . -type f -exec grep -lUaP "\x46\x41\x43\x00" {}, or you can use GNU parallel: find . -type f | parallel grep -lUaP '"\x46\x41\x43\x00"'Moonshot
A
26

There's also a pretty handy tool called binwalk, written in python, which provides for binary pattern matching (and quite a lot more besides). Here's how you would search for a binary string, which outputs the offset in decimal and hex (from the docs):

$ binwalk -R "\x00\x01\x02\x03\x04" firmware.bin
DECIMAL     HEX         DESCRIPTION
--------------------------------------------------------------------------
377654      0x5C336     Raw string signature
Allisan answered 10/12, 2013 at 13:7 Comment(0)
J
15

We tried several things before arriving at an acceptable solution:

xxd -u /usr/bin/xxd | grep 'DF'
00017b0: 4010 8D05 0DFF FF0A 0300 53E3 0610 A003  @.........S.....


root# grep -ibH "df" /usr/bin/xxd
Binary file /usr/bin/xxd matches
xxd -u /usr/bin/xxd | grep -H 'DF'
(standard input):00017b0: 4010 8D05 0DFF FF0A 0300 53E3 0610 A003  @.........S.....

Then found we could get usable results with

xxd -u /usr/bin/xxd > /tmp/xxd.hex ; grep -H 'DF' /tmp/xxd

Note that using a simple search target like 'DF' will incorrectly match characters that span across byte boundaries, i.e.

xxd -u /usr/bin/xxd | grep 'DF'
00017b0: 4010 8D05 0DFF FF0A 0300 53E3 0610 A003  @.........S.....
--------------------^^

So we use an ORed regexp to search for ' DF' OR 'DF ' (the searchTarget preceded or followed by a space char).

The final result seems to be

xxd -u -ps -c 10000000000 DumpFile > DumpFile.hex
egrep ' DF|DF ' Dumpfile.hex

0001020: 0089 0424 8D95 D8F5 FFFF 89F0 E8DF F6FF  ...$............
-----------------------------------------^^
0001220: 0C24 E871 0B00 0083 F8FF 89C3 0F84 DF03  .$.q............
--------------------------------------------^^
Jerky answered 13/6, 2011 at 0:23 Comment(2)
What I actually ended up using for xxd was: xxd -ps -u -c 100000000000000000000 input.file > output.file in order to get rid of the excess information and give me raw hex. This gave me a way to use grep to search the hex itself, but when it returns a offset, remember to divide the offset by 2 to get the actual offset. Thank you so much for your help! Oh, and I cant vote up yet..Nora
xxd has -g option which will help you prevent matching across two bytes. I.e. use xxd -g1 instead of xxd.Bozen
A
11

grep has a -P switch allowing to use perl regexp syntax the perl regex allows to look at bytes, using \x.. syntax.

so you can look for a given hex string in a file with: grep -aP "\xdf"

but the outpt won't be very useful; indeed better do a regexp on the hexdump output;

The grep -P can be useful however to just find files matrching a given binary pattern. Or to do a binary query of a pattern that actually happens in text (see for example How to regexp CJK ideographs (in utf-8) )

Alleman answered 22/2, 2013 at 10:3 Comment(0)
L
9

I just used this:

grep -c $'\x0c' filename

To search for and count a page control character in the file..

So to include an offset in the output:

grep -b -o $'\x0c' filename | less

I am just piping the result to less because the character I am greping for does not print well and the less displays the results cleanly. Output example:

21:^L
23:^L
2005:^L
Lepto answered 20/5, 2014 at 11:0 Comment(2)
Just to point out this doesn't seem to work in OSX, and maybe not in FreeBSD either, but it does it just replacing the simple quotes by double ones like in: grep -c $"\x0c" filenameTrometer
you may pipe the result through hexdump, this will deal with non printable charsMeletius
H
7

If you want search for printable strings, you can use:

strings -ao filename | grep string

strings will output all printable strings from a binary with offsets, and grep will search within.

If you want search for any binary string, here is your friend:

Haemophilic answered 12/6, 2011 at 7:42 Comment(1)
I looked at bgrep, its not printable stings im looking for.. a lot of the time I am trying to find think that are unprintable and its certainly possible to end up with values in hex that end up being backspaces etc. I will try the bgrep and see if it works for me.Nora

© 2022 - 2024 — McMap. All rights reserved.