How to analyze binary file?
Asked Answered
S

13

40

I have a binary file. I don't know how it's formatted, I only know it comes from a delphi code.

Does it exist any way to analyze a binary file?

Does it exist any "pattern" to analyze and deserialize the binary content of a file with unknown format?

Stabler answered 22/6, 2009 at 8:24 Comment(4)
Could you tell us more about this Delphi code?Azarcon
It's a delphi program that allow to create exam tests. The produced file is a binary one.Stabler
Do you have access to a program that can read this file type and display the exam? If so that will make your reversing experience MUCH easier as you can hook into that app and watch what it does.Winfrid
Voting to close as unclear / too wide. What do you want to analyze?Batchelder
S
30

Try these:

  1. Deserialize data: analyze how it's compiled your exe (try File Analyzer). Try to deserialize the binary data with the language discovered. Then serialize it in a xml format (language-indipendent) that every programming language can understand
  2. Analyze the binary data: try to save various versions of the file with little variation and use a diff program to analyze the meaning of every bit with an hex editor. Use it in conjunction with binary hacking techniques (like How to crack a Binary File Format by Frans Faase)
  3. Reverse Engineer the application: try getting code using reverse engineering tools for the programming language used for build the app (found with File Analyzer). Otherwise use disassembler analysis tool like IDA Pro Disassembler
Stabler answered 22/6, 2009 at 14:11 Comment(0)
A
18

For my hobby project I had to reverse engineer some old game files. My approaches were:

  • Have a good hex editor.
  • Look for readable words in the binary file. Note how their distribution is. If the distance between them is constant you know it is a listing.
  • Look for 2-3 consequent zeros. Might indicate an int32 value.
  • Some dwords might be pointers into the file.
  • Try to identify reoccurring patterns in the file.
  • Seeing lots of C0-CF might indicate RLE compressed data.
Azarcon answered 22/6, 2009 at 9:7 Comment(0)
W
7

Reverse engineering a binary file when you have some idea of what it represents is a very time consuming process. If you have no idea what it is then it will be even harder.

It is possible though, but you have to have a pretty good reason for doing so.

The first step would be to open it up in a hex editor of your choice and see if you can find any English text to point you in the direction of what the file is even supposed to represent. From there, Google "Reverse Engineering binary files", there are much more knowledgeable people than me that have written guides about it.

Winfrid answered 22/6, 2009 at 8:31 Comment(0)
R
5

The "strings" program from GNU binutils is very useful. It will print the strings of printable characters in a file, quite often giving a clue to what a file contains or a program does.

Ragged answered 22/6, 2009 at 8:41 Comment(1)
I tried it, but it returns only a list of words like "sdf@1#£"Stabler
E
5

If the data represents serialized Delphi objects, you should start reading about the Delphi serialization process. If that's the case, I think your best bet would be to load it using Delphi and continue your analysis from the IDE. Some informations about Delphi serialization can be found here.

EDIT: if the file does contain serialized delphi objects, then you should write a small delphi program that loads it, and "convert" the data yourself to something neutral, like xml. If you manage to do this, you should check and see if delphi supports serializing to xml. Then, you could access those objects from any language.

Elrod answered 22/6, 2009 at 8:49 Comment(3)
If it's a serialized delphi data, how can I use it in a c# or objective-c program?Stabler
but it needs a delphi interpreter. If I have my single application that open this file I can't. I have to execute two distinct applications.Stabler
Yes. You need to convert the object to something usable from anywhere. You can execute the converter from your main application's code, and work on the resulted files. It's how I would do it.Elrod
R
3

The unix "file" command is really useful - I don't know if there is anything like it in windows. You run it like this:

file myfile.ext

And it spits out a text description based on the magic numbers and data contained therein.

Probably it is contained within cygwin.

Rettarettig answered 22/6, 2009 at 8:29 Comment(3)
He will probably get "octet-stream" which will confuse him more. ".bin" files (I guess it is) aren't "standardized" and as colithium said, he probably needs to RE.Estuarine
That's what "file" does - it doesn't look at the extension at allRettarettig
"file" looks for magic numbers as you said, but only magic numbers of knows filetypes. So it most likely will find .jpg, .tar.gz, .avi etc. etc., but a custom binary file-structure is not a known filetype (if it was, he wouldn't have this problem in the first place :) )Severable
R
3

If you have access to the application that creates the file, you can apply changes to the application, then save the file and see the effects (Keep in mind that numbers are probably stored in little endian):

  • First create the file repeatedly. If the files are not binary equal, the current date/time is probably stored in the area where hte differences occur.
  • Maybe you want to repeat that with the software running under different environments, to see if OS version etc are stored, but this is rather unusual.
  • Next you can try to change single variables and create several files that only differ in the value of this variable. This helps you identify where this variable is stored.
  • That way you can also exclude variables that are not stored in the file: If you change them, but the files created are identical, they are not stored.

In order to test the hypotheses you worked out with the steps above, edit one of the files and have the application read it.

If you don't have access to the application itself, I suggest that you forget about it and find another way to solve your problem. There is a very high probability that it will be faster...

Rutile answered 22/6, 2009 at 8:59 Comment(0)
M
3

If file does not give a meaningful answer, you may want to try TRiD by Marco Pontello to determine whether your data is stored in a known format.

Mccleary answered 22/6, 2009 at 14:30 Comment(1)
I tried it but it says about the file: "Program X Format". Well... I already know that it's a file coming from the program XStabler
P
3

Get the Delphi application and open it in IDA Pro freeware version, and find where it writes the file, and decode how it writes the file that way.

Unless it's plan text.

Plagiarism answered 23/6, 2009 at 4:31 Comment(0)
E
3

Unlike traditional hex editors which only display the raw hex bytes of a file, 010 Editor can also parse a file into a hierarchical structure using a Binary Template. The results of running a Binary Template are much easier to understand and edit than using just the raw hex bytes.

http://www.sweetscape.com/010editor/

Eats answered 6/10, 2014 at 20:12 Comment(0)
V
2

Do you know the program that uses it? If so you can hook that programs write to file function and get an idea of what data its writing, the size of the data and where.

More Info: http://www.codeproject.com/KB/DLL/Win32APIHooking_Trouble.aspx

Violinist answered 22/6, 2009 at 8:29 Comment(0)
H
0

Try to open it in a hex editor and analyse.

Hager answered 22/6, 2009 at 8:30 Comment(0)
C
0

Including the ones already mentioned, here are a few useful free software tools available in the standard repositories of most Linux distributions, approximately ordered from high level to low level:

  1. file: from the manpage, file tests each argument in an attempt to classify it using different tests. In practice, this is most likely to work if the file is already in a known format but can be useful if the extension has been incorrectly named as the file utility also runs other tests on the content of the file.
  2. strings: from the manpage, strings prints the sequences of printable characters in files. This will allow you to get a relatively high-level overview of what human-readable text might be contained within the file. Sometimes this can be enough to extract the information you need but you'll need more than this if you're attempting to understand the structure.
  3. binwalk. Usually not included in the standard install but often in the standard repositories. https://github.com/ReFirmLabs/binwalk is a Python utility that can analyse a binary (usually firmware) file and extract different parts of it for further analysis.
  4. gzip and other compression utilities. If you think you're deailing with encrypted binary data (or part of the file might be), this will allow you to assess the amount of entropy in the file. If the file size doesn't reduce much after compressing, you're likely to have an encrypted and/or compressed file.
  5. xxd: command line hexdump utility. Once you have identified parts of the file that may be of interest, you can inspect those parts (e.g. separators close to the strings, offsets identified with binwalk) using xxd. Useful options are: -cols N (number of columns) and -g N (how many bytes to group together). Pipe the xxd output through less to quicky search through bits of the file that might be of interest (/ key for search within less).
Cedilla answered 4/5, 2024 at 6:19 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.