Translation of machinecode into LLVM IR (disassembly / reassembly of X86_64. X86. ARM into LLVM bitcode)
Asked Answered
J

6

31

I would like to translate X86_64, x86, ARM executables into LLVM IR (disassembly).

What solution do you suggest ?

Jen answered 8/8, 2011 at 12:5 Comment(4)
I've checked llvm-objdump, but as far as I understand, it produces x86_64/x86 assembler - what I want is LLVM assembler.Jen
There are plenty of disassemblers for x86_86, x86, arm. Maybe some assemlber compiler with target "LLVM IR" ?Jen
That's not disassembly, that's translation!Reborn
At least for the X86 part of your question, there is blog.llvm.org/2010/01/x86-disassembler.htmlChong
E
16

mcsema is a production-quality binary lifter. It takes x86 and x86-64 and statically "lifts" it to LLVM IR. It's actively maintained, BSD licensed, and has extensive tests and documentation.

https://github.com/trailofbits/mcsema

Exorcism answered 6/8, 2015 at 15:29 Comment(1)
but it requires a PRO license of IDAEponymy
V
11

Consider using RevGen tool developed within the S2E project. It allows converting x86 binaries to LLVM IR. The source code could be checked out from Revgen branch of GIT repository available by url https://dslabgit.epfl.ch/git/s2e/s2e.git.

Vulpecula answered 31/1, 2012 at 7:46 Comment(6)
I see you've mentioned here another paper related with x86 -> LLVM translation. Thanks for great references.Jen
I have problems with links provided. git clone https://dslabgit.epfl.ch/git/s2e/s2e.git can not clone :/.Jen
I haven't got any problems on Ubuntu 10.10. Make sure you have git installed and correct firewall/proxy settings. Also you may find some related documentation on project's web site s2e.epfl.ch/embedded/s2e/index.htmlVulpecula
Ok, now it works. Probably it was temporary network problem or short server downtime. I'd love to take a look at RevGen asap :).Jen
Grzegorz Wierzowiecki, is here an ready-to-use converted from x86 binary into llvm ir? What part of s2e.git is the revgen itself?Aromatize
Hi osgx, you can find it at <s2e_root>/tools/tools/static-translator.Vulpecula
S
10

As regards to RevGen tool mentioned by @bsa2000, this latest paper "A compiler level intermediate representation based binary analysis and rewriting system" has pointed out some limitations in S2E and Revinc.

I pull them out here.

  1. shortcoming of dynamic translation:

    S2E [16] and Revnic [14] present a method for dynamically translating x86 to LLVM using QEMU. Unlike our approach, these methods convert blocks of code to LLVM on the fly which limits the application of LLVM analyses to only one block at a time.

  2. IR incomplete:

    Revnic [14] and RevGen [15] recover an IR by merging the translated blocks, but the recovered IR is incomplete and is only valid for current execution; consequently, various whole program analyses will provide incomplete information.

  3. no abstract stack or promoting information

    Further, the translated code retains all the assumptions of the original bi- nary about the stack layout. They do not provide any methods for obtaining an abstract stack or promoting memory locations to symbols, which are essential for the application of several source-level analyses.

Salpingitis answered 16/4, 2013 at 7:6 Comment(0)
R
2

I doubt there will be universal solution (think about indirect branches, etc.), LLVM IR is much "higher level" than any assembler. Though it's possible to translate on per-BB basis. You might want to check llvm-qemu and libcpu projects among others.

Rematch answered 14/8, 2011 at 11:41 Comment(2)
LLVM is able to capture high level information, IMHO it is not required. I believe there can exist solution - maybe not one universal approach, but still. Thanks for great references : llvm-qemu and libcpu looks interesting. :)Jen
Btw. If there is possible LLVM to Javascript trsnalation and it's actually implemented, another assembly is possible as well ;). Question, you and when will do it :).Jen
S
1

Just post some references on translating ARM binary to LLVM IR:

disarm - arm binary to llvm ir disassembler

https://code.google.com/p/disarm/

However, I have not tried it, thus not sure about its quality and stability. Anyone else may post additional information about this project?

Salpingitis answered 16/4, 2013 at 6:17 Comment(1)
It seems that the author have withdrew it... Fortunately, I have a local svn to backup its code.Salpingitis
A
1

There is new project, being in some early phases, The libbeauty: https://github.com/jcdutton/libbeauty

Article about project: Libbeauty: Another Reverse-Engineering Tool, 24 December 2013, Michael Larabel - http://www.phoronix.com/scan.php?page=news_item&px=MTU1MTU

It only supports subset of x86_64 as input now. One of the project goals - is to be able to compile the generated LLVM IR back to assembly to get the binary with same functionality.

Aromatize answered 18/1, 2014 at 11:8 Comment(2)
Project looks nice! Thanks!Jen
Project is dead and removed from GitHub.Topcoat

© 2022 - 2024 — McMap. All rights reserved.