JavaScript parser in Python [closed]
Asked Answered
L

5

69

There is a JavaScript parser at least in C and Java (Mozilla), in JavaScript (Mozilla again) and Ruby. Is there any currently out there for Python?

I don't need a JavaScript interpreter, per se, just a parser that's up to ECMA-262 standards.

A quick google search revealed no immediate answers, so I'm asking the SO community.

Licking answered 24/12, 2008 at 7:56 Comment(2)
I would suggest js2xml from scrapinghub: github.com/scrapinghub/js2xmlHydrothermal
Use Tree-sitter's JavaScript grammar through their Python bindingsThornie
O
24

ANTLR, ANother Tool for Language Recognition, is a language tool that provides a framework for constructing recognizers, interpreters, compilers, and translators from grammatical descriptions containing actions in a variety of target languages.

The ANTLR site provides many grammars, including one for JavaScript.

As it happens, there is a Python API available - so you can call the lexer (recognizer) generated from the grammar directly from Python (good luck).

Offshoot answered 24/12, 2008 at 9:18 Comment(1)
ANTLR 4 seems to be back alive.Pathoneurosis
D
50

Nowadays, there is at least one better tool, called slimit:

SlimIt is a JavaScript minifier written in Python. It compiles JavaScript into more compact code so that it downloads and runs faster.

SlimIt also provides a library that includes a JavaScript parser, lexer, pretty printer and a tree visitor.

Demo:

Imagine we have the following javascript code:

$.ajax({
    type: "POST",
    url: 'http://www.example.com',
    data: {
        email: '[email protected]',
        phone: '9999999999',
        name: 'XYZ'
    }
});

And now we need to get email, phone and name values from the data object.

The idea here would be to instantiate a slimit parser, visit all nodes, filter all assignments and put them into the dictionary:

from slimit import ast
from slimit.parser import Parser
from slimit.visitors import nodevisitor


data = """
$.ajax({
    type: "POST",
    url: 'http://www.example.com',
    data: {
        email: '[email protected]',
        phone: '9999999999',
        name: 'XYZ'
    }
});
"""

parser = Parser()
tree = parser.parse(data)
fields = {getattr(node.left, 'value', ''): getattr(node.right, 'value', '')
          for node in nodevisitor.visit(tree)
          if isinstance(node, ast.Assign)}

print fields

It prints:

{'name': "'XYZ'", 
 'url': "'http://www.example.com'", 
 'type': '"POST"', 
 'phone': "'9999999999'", 
 'data': '', 
 'email': "'[email protected]'"}
Dodwell answered 4/8, 2014 at 5:9 Comment(5)
This is just great! Also you can access each value as a Dictionary, like: print fields['url']Estop
This library from 2013 doesn't seem to be able to parse Javascript classes.Martell
Slimit is nice but only supports ES5 :(Standby
This appears to need yacc to be installed on the first run.Tympanum
slimit last updated in 2018, looks quite deadDamselfish
O
24

ANTLR, ANother Tool for Language Recognition, is a language tool that provides a framework for constructing recognizers, interpreters, compilers, and translators from grammatical descriptions containing actions in a variety of target languages.

The ANTLR site provides many grammars, including one for JavaScript.

As it happens, there is a Python API available - so you can call the lexer (recognizer) generated from the grammar directly from Python (good luck).

Offshoot answered 24/12, 2008 at 9:18 Comment(1)
ANTLR 4 seems to be back alive.Pathoneurosis
M
21

I have translated esprima.js to Python:

https://github.com/PiotrDabkowski/pyjsparser

>>> from pyjsparser import parse
>>> parse('var $ = "Hello!"')
{
"type": "Program",
"body": [
    {
        "type": "VariableDeclaration",
        "declarations": [
            {
                "type": "VariableDeclarator",
                "id": {
                    "type": "Identifier",
                    "name": "$"
                },
                "init": {
                    "type": "Literal",
                    "value": "Hello!",
                    "raw": '"Hello!"'
                }
            }
        ],
        "kind": "var"
    }
  ]
}

It's a manual translation so its very fast, takes about 1 second to parse angular.js file (so 100k characters per second). It supports whole ECMAScript 5.1 and parts of version 6 - for example Arrow functions, const, let.

If you need support for all the newest JS6 features you can translate esprima on the fly with Js2Py:

import js2py
esprima = js2py.require("[email protected]")
esprima.parse("a = () => {return 11};")
# {'body': [{'expression': {'left': {'name': 'a', 'type': 'Identifier'}, 'operator': '=', 'right': {'async': False, 'body': {'body': [{'argument': {'raw': '11', 'type': 'Literal', 'value': 11}, 'type': 'ReturnStatement'}], 'type': 'BlockStatement'}, 'expression': False, 'generator': False, 'id': None, 'params': [], 'type': 'ArrowFunctionExpression'}, 'type': 'AssignmentExpression'}, 'type': 'ExpressionStatement'}], 'sourceType': 'script', 'type': 'Program'}
Menton answered 15/12, 2014 at 0:37 Comment(2)
Tested and work pretty well. You can use it and reconstruct for example some JSON data from it, captured by crawlers.Alithia
good documentation & you can test some code online as well, esprima.org/demo/parse.htmlChanty
O
10

As pib mentioned, pynarcissus is a Javascript tokenizer written in Python. It seems to have some rough edges but so far has been working well for what I want to accomplish.

Updated: Took another crack at pynarcissus and below is a working direction for using PyNarcissus in a visitor pattern like system. Unfortunately my current client bought the next iteration of my experiments and have decided not to make it public source. A cleaner version of the code below is on gist here

from pynarcissus import jsparser
from collections import defaultdict

class Visitor(object):

    CHILD_ATTRS = ['thenPart', 'elsePart', 'expression', 'body', 'initializer']

def __init__(self, filepath):
    self.filepath = filepath
    #List of functions by line # and set of names
    self.functions = defaultdict(set)
    with open(filepath) as myFile:
        self.source = myFile.read()

    self.root = jsparser.parse(self.source, self.filepath)
    self.visit(self.root)


def look4Childen(self, node):
    for attr in self.CHILD_ATTRS:
        child = getattr(node, attr, None)
        if child:
            self.visit(child)

def visit_NOOP(self, node):
    pass

def visit_FUNCTION(self, node):
    # Named functions
    if node.type == "FUNCTION" and getattr(node, "name", None):
        print str(node.lineno) + " | function " + node.name + " | " + self.source[node.start:node.end]


def visit_IDENTIFIER(self, node):
    # Anonymous functions declared with var name = function() {};
    try:
        if node.type == "IDENTIFIER" and hasattr(node, "initializer") and node.initializer.type == "FUNCTION":
            print str(node.lineno) + " | function " + node.name + " | " + self.source[node.start:node.initializer.end]
    except Exception as e:
        pass

def visit_PROPERTY_INIT(self, node):

    # Anonymous functions declared as a property of an object
    try:
        if node.type == "PROPERTY_INIT" and node[1].type == "FUNCTION":
            print str(node.lineno) + " | function " + node[0].value + " | " + self.source[node.start:node[1].end]
    except Exception as e:
        pass


def visit(self, root):

    call = lambda n: getattr(self, "visit_%s" % n.type, self.visit_NOOP)(n)
    call(root)
    self.look4Childen(root)
    for node in root:
        self.visit(node)

filepath = r"C:\Users\dward\Dropbox\juggernaut2\juggernaut\parser\test\data\jasmine.js"
outerspace = Visitor(filepath)
Oribella answered 6/1, 2012 at 2:7 Comment(2)
This can't emit JavaScript from a parsed AST, can it? I wanted to modify the AST and emit new JavaScript, but it doesn't seem like this can do that.Montenegro
@gsingh2011 No, it kinda struggled with parsing so doing the other way is way past its capabilities.Oribella
R
2

You can try python-spidermonkey It is a wrapper over spidermonkey which is codename for Mozilla's C implementation of javascript.

Reassure answered 24/12, 2008 at 9:39 Comment(1)
Unfortunately spidermonkey is dead upstream.Tolle

© 2022 - 2024 — McMap. All rights reserved.