Converting Coq term in AST form to polish notation using Python
Asked Answered
B

2

26

Say I have an arbitrary Coq Term (in AST format using s-expressions/sexp) for example:

n = n + n

and I want to automatically convert it to:

= n + n n

by traversing the AST tree (which is simple a nested list of lists because of the sexp). Is there a standard library in Python that might be able to do this?

Right now if I were to write down the algorithm/pesudocode I would do (assuming I can convert the sexp to some actual tree object):

def ToPolish():
    '''
    "postfix" tree traversal
    '''
    text = ''
    for node in root.children:
        if node is atoms:
            text := text + node.text
        else:
            text := text + ToPolish(node,text)
    return text

which I think this is close but I think there is a small bug somewhere...


example AST:

 (ObjList
  ((CoqGoal
    ((fg_goals
      (((name 4)
        (ty
         (App
          (Ind
           (((Mutind (MPfile (DirPath ((Id Logic) (Id Init) (Id Coq))))
              (DirPath ()) (Id eq))
             0)
            (Instance ())))
          ((Ind
            (((Mutind (MPfile (DirPath ((Id Datatypes) (Id Init) (Id Coq))))
               (DirPath ()) (Id nat))
              0)
             (Instance ())))
           (App
            (Const
             ((Constant (MPfile (DirPath ((Id Nat) (Id Init) (Id Coq))))
               (DirPath ()) (Id add))
              (Instance ())))
            ((Construct
              ((((Mutind
                  (MPfile (DirPath ((Id Datatypes) (Id Init) (Id Coq))))
                  (DirPath ()) (Id nat))
                 0)
                1)
               (Instance ())))
             (Var (Id n))))
           (Var (Id n)))))
        (hyp
         ((((Id n)) ()
           (Ind
            (((Mutind (MPfile (DirPath ((Id Datatypes) (Id Init) (Id Coq))))
               (DirPath ()) (Id nat))
              0)
             (Instance ())))))))))
     (bg_goals ()) (shelved_goals ()) (given_up_goals ())))))

the above is simply:

 (ObjList
  ((CoqString  "\
              \n  n : nat\
              \n============================\
              \n0 + n = n"))))

or

= n + n n

Using sexp parser:

[Symbol('Answer'), 2, [Symbol('ObjList'), [[Symbol('CoqGoal'), [[Symbol('fg_goals'), [[[Symbol('name'), 4], [Symbol('ty'), [Symbol('App'), [Symbol('Ind'), [[[Symbol('Mutind'), [Symbol('MPfile'), [Symbol('DirPath'), [[Symbol('Id'), Symbol('Logic')], [Symbol('Id'), Symbol('Init')], [Symbol('Id'), Symbol('Coq')]]]], [Symbol('DirPath'), []], [Symbol('Id'), Symbol('eq')]], 0], [Symbol('Instance'), []]]], [[Symbol('Ind'), [[[Symbol('Mutind'), [Symbol('MPfile'), [Symbol('DirPath'), [[Symbol('Id'), Symbol('Datatypes')], [Symbol('Id'), Symbol('Init')], [Symbol('Id'), Symbol('Coq')]]]], [Symbol('DirPath'), []], [Symbol('Id'), Symbol('nat')]], 0], [Symbol('Instance'), []]]], [Symbol('App'), [Symbol('Const'), [[Symbol('Constant'), [Symbol('MPfile'), [Symbol('DirPath'), [[Symbol('Id'), Symbol('Nat')], [Symbol('Id'), Symbol('Init')], [Symbol('Id'), Symbol('Coq')]]]], [Symbol('DirPath'), []], [Symbol('Id'), Symbol('add')]], [Symbol('Instance'), []]]], [[Symbol('Construct'), [[[[Symbol('Mutind'), [Symbol('MPfile'), [Symbol('DirPath'), [[Symbol('Id'), Symbol('Datatypes')], [Symbol('Id'), Symbol('Init')], [Symbol('Id'), Symbol('Coq')]]]], [Symbol('DirPath'), []], [Symbol('Id'), Symbol('nat')]], 0], 1], [Symbol('Instance'), []]]], [Symbol('Var'), [Symbol('Id'), Symbol('n')]]]], [Symbol('Var'), [Symbol('Id'), Symbol('n')]]]]], [Symbol('hyp'), [[[[Symbol('Id'), Symbol('n')]], [], [Symbol('Ind'), [[[Symbol('Mutind'), [Symbol('MPfile'), [Symbol('DirPath'), [[Symbol('Id'), Symbol('Datatypes')], [Symbol('Id'), Symbol('Init')], [Symbol('Id'), Symbol('Coq')]]]], [Symbol('DirPath'), []], [Symbol('Id'), Symbol('nat')]], 0], [Symbol('Instance'), []]]]]]]]]], [Symbol('bg_goals'), []], [Symbol('shelved_goals'), []], [Symbol('given_up_goals'), []]]]]]]
Bitten answered 20/3, 2019 at 4:55 Comment(6)
Is there a reason why you want to use Python specifically? I ask because this would be much easier to do in OCaml, since you would have access to the AST manipulating functions in the Coq source code.Knack
@ArthurAzevedoDeAmorim I can get s-expressions from SerAPI (github.com/ejgallego/coq-serapi) easily and have them as a list of lists. I don't know ocaml and I know python.Bitten
If you're interested in python using lisp syntax, you should check out [Hy][docs.hylang.org/en/stable/index.html]. They've messed with the python AST module to create a new lisp syntax for python, that may be steer you in the right direction.Mvd
Your pseudo-code almost looks like python, but the assignment operator in python is denoted by =, not :=Interchange
In your pseudo-code, you write "post-fix tree traversal". However, polish notation is pre-fix notation. Polish and reverse-Polish are not the same. Which do you want? For reference pre-fix notation is +(n,n), in-fix notation is (n + n) and post-fix notation is (n,n)+. pre-fix notation is Polish notation. Post-fix is reverse polish notation.Interchange
I have looked at both the list returned by the sexp parser, and the original tree. I cannot tell which items are operators and which are operands. Clearly, for +(n, m) + is an operator and n and m are operands. The list simply has a bunch of instances of the Symbol class. It seems both operands and operators became symbols. Which symbols are operators (functions) and which symbols are operands (arguments)?Interchange
I
1

I have looked at your gigantic list.

At the top/root, we have:

`['Answer', 2, nested_list]`

Okay, it looks like 'Answer' is the operator. 'Answer' has two arguments, 2 and nested_list

Therefore, in English, it is saying "The answer to nested_list is 2"
What if we go deeper?

['Answer', 2, ['ObjList', 'nested_list']]

Well, now we have:

Answer(2, ObjList(nested_list))

Even deeper, we see...

['Answer',
    2,
    ['ObjList',
        [
            ['CoqGoal',                    
                 ['list', 'list', 'list', 'list']
            ]
        ]
    ]
]

It looks to me like your list of lists is already in pre-fix order. Be sure to leave me some comments if not. Maybe all that remains is to convert it to a string? Again, I'm not sure what you want.


In infix notation, operators (+, *, etc...) are also delimiters.

However, in prefix notation (aka "Polish notation") there is more of a need for delimiters.

The output = n + n n is highly ambiguous. This is compared to, =(n, +(n,n)), which is very clear.

I have written a program which uses the delimiters (, ), and ,. However, you can get rid of them by setting:

open_delim  = ""           
close_delim = "" 
comma_delim = ""

On a small trial run, we have:

TEST_DATA1 = ["+", 2, ["+", 1, 1, 1]  ]

# you can put **ALL** of the inputs to update_args
# inside of the constructor call. However, it is
# then very difficult to read.

loligag = Lol2Str(open_delim="(", close_delim=")")
loligag.update_args(comma_delim =",")
loligag.update_args(is_operaND= lambda x: isinstance(x, int))
loligag.update_args(is_operaTOR = lambda x: x == "+")

loligag(TEST_DATA1, file = sys.stdout)

The output is:

+(2+(1,1,1))

You can set the delimiters to be space characters. That is more like your original example of = n + n n instead of =(n, +(n, n))

loligag = Lol2Str(open_delim=" ", close_delim=" ")
loligag.update_args(comma_delim=" ")
loligag.update_args(is_operaND=lambda x: isinstance(x, int))
loligag.update_args(is_operaTOR=lambda x: x == "+")
loligag(TEST_DATA1, file=sys.stdout)

Output:

+ 2+ 1 1 1 

All of your lists elements are either symbols, like Symbol('ty') or integers, such as 4. If I am to fully answer your question, then I need to know which of things like Symbol('ty') are operators and which are operands.

If I run the code on your data...

loligag = Lol2Str(open_delim="(", close_delim=")")
loligag.update_args(comma_delim =",")
loligag.update_args(is_operaND= lambda x: isinstance(x, type(99)))
loligag.update_args(is_operaTOR = lambda x: isinstance(x, Symbol))
r = loligag(L, file = sys.stdout)
print("\nreturned: ", r)

The output is...

 Answer(2ObjList((CoqGoal(fg_goals(name(4)ty(App(Ind((Mutind(MPfile(DirPath((IdLogicId(Init)Id(Coq))))DirPath([])Id(eq)),0Instance([])))(Ind(Mutind(MPfile(DirPath((IdDatatypesId(Init)Id(Coq))))DirPath([])Id(nat)),0Instance([]))App(Const((ConstantMPfile(DirPath((IdNatId(Init)Id(Coq))))DirPath([])Id(add)Instance([])))(Construct((MutindMPfile(DirPath((IdDatatypesId(Init)Id(Coq))))DirPath([])Id(nat),0),1Instance([]))Var(Id(n))))Var(Id(n)))))hyp(((Idn)[]Ind((Mutind(MPfile(DirPath((IdDatatypesId(Init)Id(Coq))))DirPath([])Id(nat)),0Instance([]))))))bg_goals([])shelved_goals([])given_up_goals([])))))

... which is clearly wrong.

You've got a lot of empty lists. Clearly, [Symbol("+"), 1, 2, 3] makes sense. So does [Symbol("+"), 1, [Symbol("+"), 1, 1], 3]. However, what is [Symbol("+"), 1, [], 3] supposed to mean? is the [] is supposed to be an operand?

The code is given below. Feel free to toy with it. You can ignore the FileDescriptor class. Instead, I encourage you to do an edit>find on

  • __call___
  • attempt_write_operaTOR
  • write_operaNDs
class Symbol:
    def __init__(i, x):
        if isinstance(x, type(i)):
            raise NotImplementedError()
        elif isinstance(x, str):
            i._stryng = x.strip()
        else:
            raise NotImplementedError()
    def __str__(i):
        return i._stryng
    def __repr__(i):
        return repr(i._stryng)

import io
import itertools
import sys
import random
import string
import inspect

class FileDescriptor:
    """
        There is one descriptor instance
        for the one Lol2Str class.

        However, there are many instances
        of the Lol2Str class.

        when a new `Lol2Str` comes into existence
        i._master_table[id(instance)] is created

        Lol2Str has a custom `__del__` method
        this deletes id(instance) from the master
        table.

        `i._master_table[id(instance)]`
        is `True` if the Lol2Str instance
        owns the file and is responsible
        for closing it. The value is `False`
        if something else closes the file.
    """

    def __init__(i):
        i._master_table = dict()

    @classmethod
    def xkey2ikey(i, xkey):
        ikey = xkey
        if not isinstance(xkey, type(id(i))):
            ikey = id(xkey)
        return ikey

    def close(i, xkey):
        # master_table(True, file)
        #     ...... there is a file, we close it.
        #
        # master_table(False, file)
        #     ..... there is a file, we do **NOT** close it
        #
        # master_table(False, None)
        #     ..... there is no file
        #
        # master_table(True, None)
        #     ...... error. we close non-existant file.
        ikey = type(i).xkey2ikey(xkey)
        file = i._master_table[ikey][1]
        we_own = i._master_table[ikey][1]
        if we_own:
            file.close()
            i._master_table[ikey][0] = False
            i._master_table[ikey][1] = None
        else:
            msg = ""
            raise i.FileDescriptorException(msg)

    def _update_dead_kin(i, xkey):
        ikey = i.xkey2ikey(xkey)
        we_own_file = i._master_table[ikey][0]
        if we_own_file:
            file = i._master_table[ikey][0][1]
            file.close()
        del i._master_table[ikey]
        return

    @classmethod
    def verify_filestream(i, new_file):
        if not (hasattr(new_file, "write")):
            with io.StringIO() as string_stream:
                print(
                    object.__repr__(new_file)
                    ,
                    "has no `write` method"
                    ,
                    file=string_stream
                )
                msg = string_stream.getvalue()
            raise type(i).Lol2StrException(msg)
        if not callable(new_file.write):
            with io.StringIO() as string_stream:
                print(
                    object.__repr__(new_file)
                    ,
                    "has no `write` method"
                    ,
                    file=string_stream
                )
            msg = string_stream.getvalue()
            raise type(i).Lol2StrException(msg)

    def __get__(i, xkey, xkey_klass):
        if xkey:
            ikey = i.xkey2ikey(xkey)
            file_is_present = i._master_table[ikey][1]
            if not file_is_present:
                i.set_file_to_new_string_stream(ikey)
            return i.Shell(i, ikey)
        else:
            # called as class method instead of
            # as instance method
            return i

    class Shell:
        def __init__(i, descriptor, ikey):
            i._descriptor = descriptor
            i._ikey = ikey
        def write(i, material):
            return i._descriptor.write(i._ikey, material)
        def close(i):
            return i._descriptor.close(i._ikey)
        def getvalue(i):
            return i._descriptor.getvalue(i._ikey)

    def getvalue(i, xkey):
        ikey = i.xkey2ikey(xkey)
        file_is_str_stream = i._master_table[ikey][0]
        if file_is_str_stream:
            file = i._master_table[ikey][1]
            r = file.getvalue()
        else:
            with io.StringIO() as string_stream:
                print(
                    "`getvalue` is only allowed if",
                    "user did not input their own file",
                    file=string_stream
                )
                msg = string_stream.getvalue()
            raise i.FileDescriptorException(msg)
        return r

    def write(i, xkey, material):
        ikey = i.xkey2ikey(xkey)
        file = i._master_table[ikey][1]
        file.write(material)

    def set_file_to_new_string_stream(i, ikey):
        ikey = i.xkey2ikey(loligag)
        file = i._master_table[ikey][1]
        # if there is a file
        if file:
            msg = ''.join((
                "\n\nCannot create string stream when target file",
                " is already present.",
                "\nTarget file already set to be: ",
                object.__repr__(file)
            ))
            raise i.FileDescriptorException(msg)
        # end if
        file = io.StringIO()
        i._master_table[ikey][0] = True
        i._master_table[ikey][1] = file
        return

    def __set__(i, xkey, new_file):
        """
        `new_file = None` is allowed
        That means a different function
        was called with default arguments.

        The default argument for file is `None`
        We cannot simply set file to a new string
        stream as a default argument...
            func(lol, file=io.StringIO())
        ... then we don't know if owner of the
        file stream is the caller or callee.
        """
        ikey = i.xkey2ikey(xkey)
        if new_file:
            # new_file is not `None`
            type(i).verify_filestream(new_file)
            we_own_file = i._master_table[ikey][0]
            if we_own_file:
                old_file = i._master_table[ikey][1]
                stryng = old_file.getvalue()
                new_file.write(stryng)
                old_file.close()
            # end-if
            i._master_table[ikey][1] = new_file
            i._master_table[ikey][0] = False
        else:
            # `new file` is `None` or `False` or `0`
            #  or some other such non-sense
            i.set_file_to_new_string_stream(xkey)
        return

    class FileDescriptorException(Exception):
        pass

    def register(i, xkey):
        ikey = i.xkey2ikey(xkey)
        i._master_table[ikey] = [False, None]
        return
    # END FileDescriptorClass




class Lol2Str:
    """
    `lol`..............`list of lists`

    See doc string for `__call__` method    
    """

    ALLOWED_ARGS = frozenset((
        "open_delim",
        "close_delim",
        "comma_delim",
        "is_operaND",
        "is_operaTOR",
        "file"
    ))

    file = FileDescriptor()
    def __init__(i, **kwargs):
        """
        """
        type(i).file.register(i)
        i.update_args(**kwargs)
        # looks weird without a return statement,
        # but __init__ not allowed to return
    # END OF __init__

    class Lol2StrException(Exception):
        pass

    def update_args(i, **kwargs):
        for kw in kwargs.keys():
            if kw in type(i).ALLOWED_ARGS:
                kwarg = kwargs[kw]
                if not isinstance(kwarg, type(None)):
                # if not default argument:
                    setattr(i, kw, kwargs[kw])
                # else:
                #   leave arg as it's previous value
            else:
                with io.StringIO() as ss:
                    print(
                        kw, "is not an allowed argument.",
                        file=ss
                    )
                    msg = ss.getvalue()
                raise type(i).Lol2StrException(msg)

    def verify_ready_all(i):
        one_name = "something more"
        try:
            for one_name in type(i).ALLOWED_ARGS:
                i.verify_ready_one(one_name)
        finally:
            pass
        return

    def verify_ready_one(i, one_name):
        try:
            one = getattr(i, one_name)
        except AttributeError:
            with io.StringIO() as ss:
                print(
                    "You must set", repr(one_name), "before executing",
                    file=ss
                )
                msg = ss.getvalue()
            raise type(i).Lol2StrException(msg)
        except RecursionError:
            msg = "Infinite loop on " + one_name
            raise RecursionError(msg)

        if isinstance(one, type(None)):
            with io.StringIO() as string_stream:
                print(
                    object.__repr__(new_file), "is",
                    "is missing input argument", one_name,
                    file=string_stream
                )
                msg = string_stream.getvalue()
            raise type(i).Lol2StrException(msg)
        return

    def send_out(i, *args, sep=" ", end=""):
        return print(*args, end=end,file= i.file)

    def attempt_write_operaTOR(i, lol):
        try:
            elem = next(lol)
            elem_status = (
                True if i.is_operaTOR(elem) else False,
                True if i.is_operaND(elem) else False
            )

            if elem_status == (True, True):
                with io.StringIO() as ss:
                    print(
                        "A parsed item was determined to be both",
                        "an operAND and an operATOR. That is not allowed.",
                        file=ss
                    )
                    s = ss.getvalue()
                raise type(i).Lol2StrException(s)
            elif elem_status == (True, False):
                # Inside of this if-statement
                # `elem` is an operator, such as the plus sign in `2 + 3`
                i.send_out(elem, end="")
            elif elem_status == (False, True):
                # `elem` is an argument/operand
                # such as the `2` in `2 + 3`
                lol = itertools.chain((elem,), lol)
            else:
                # (operATOR, operAND) = (False, False)
                # Not operator and not operand either.
                if hasattr(elem, "__iter__"):
                    ###################################################
                    # Strings are a problem in python
                    #     iter('a') == iter(iter(iter(iter(iter('a')))))
                    #     'a'[0][0][0][0][0][0] = 'a'[0]
                    ##################################################
                    #
                    # redundantly nested args, such as `1 + (((2 + 3)))`
                    lol = itertools.chain(elem, lol)
                else:
                    # Not operator and not operand either.
                    # Not redundantly nested args
                    with io.StringIO() as ss:
                        print(
                            type(i).__name__,
                            "found something which is not an operator, operand,",
                            "or anything else it knows how to handle.",
                            "\nThe object was ", object.__repr__(elem),
                            file=ss
                        )
                        s = ss.getvalue()
                    raise type(i).Lol2StrException(s)
                # end if-else block
            # end if-else block
        finally:
            pass
        return lol

    def write_operaNDs(i, lol):
        try:
            comma_state = False
            i.send_out(i.open_delim)
            for elem in lol:
                if i.is_operaND(elem):
                    pass # You're golden. move on to the next step
                else:
                    # element contains nested elements
                    try:
                        x = i(elem)
                        if isinstance(x, type(None)):
                            elem = ""
                            comma_state = False
                        else:
                            elem = x

                    except StopIteration:
                        # element was an EMPTY LIST!!
                        # WHY AN EMPTY LIST?!? Who only knows...
                        elem = ""
                        comma_state = False
                # end if-else
                if comma_state:
                    i.send_out(i.comma_delim)
                i.send_out(elem)
                if not comma_state:
                    comma_state = True
            # end for-loop
            i.send_out(i.close_delim)
        finally:
            pass
        return lol

    def __call__(i, lol, **kwargs):
        """

        NOTES ON NOMENCLATURE:
            `delim` ..........`delimiter`
            `lol`.............`list of lists`

        INPUT ARGUMENTS:
            `file` ............ `file` has some sort of `write` method

                                Any outside file is NOT closed for you.
                                You must close it youri.

                                loligag = Lol2Str()
                                with open(p , mode = "w") as outf:
                                    loligag(lol, file = outf)
                                    loligag(lol, file = outf)
                                    loligag(lol, file = outf)
                                # `with` statement closes the file for you

                                Example1:   # send output to the console
                                    import sys
                                    loligag(lol, file=sys.stdout)

                                Example 2:   # also send output to the console
                                    file = type("_", tuple(), {"write": lambda i, x: print(x)})()
                                    loligag(lol, file=sys.stdout)

                                Example 3:
                                    with open("myfile.dat"), mode="w") as outf:
                                        lol_to_prefix_str(lol, file=outf)

                                If `file = None` then `instance.__call__`
                                returns a string

            `is_operaTOR`... the plus sign in `2 + 3` is an example of an operator
                            `is_operaTOR` is a callable receiving one argument.
                            `is_operaTOR(x)` is to return True if and only if `x`
                             is an operator. `is_operaTOR(x)` is to return False otherwise.

                                ###################################################
                                # Strings are a problem in python
                                #     iter('a') == iter(iter(iter(iter(iter('a')))))
                                #     'a'[0][0][0][0][0][0] = 'a'[0]
                                #
                                # do **NOT** set is_operaTOR = lambda x: hasattr(x, "__iter__")
                                ####################################################                                 

            `lol` ............. list of lists

            `open_delim` ...... opening delimeter, such as left parenthesis "("

        TODO: documentation for 
            close_delim 
            comma_delim
            is_operaND

        """
        return_val = None
        try:
            i.update_args(**kwargs)

            ############################################
            i.verify_ready_all()                       #
            ############################################
            # `open_delim`  is not `None`
            # `is_operaTOR` is not `None`
            # `close_delim` is not `None`
            # `comma_delim` is not `None`
            # `is_operaND`  is not `None`
            # `file`        is not `None`
            ############################################

            if hasattr(lol, "__iter__"):
                try:
                    lol = type(i).sanitize_lol(lol)
                    done = False
                except type(i).EmptyLol as exc:
                    # `lol` is empty
                    # assume empty list, empty tuple
                    # empty whatever it is, is an operand
                    i.send_out(lol)
                    done = True
                if not done:
                    # At this point, lol contains more than one item
                    lol = i.attempt_write_operaTOR(lol)
                    lol = i.write_operaNDs(lol)
            else:
                # ONLY ONE ITEM
                # must be an operand?
                i.send_out(lol)

            # if user originally failed to input a file:
            #     get string from string stream `i.file`
            #     close the string stream.
            #     return the string
            we_own_the_file = type(i).file._master_table[id(i)][0]
            if we_own_the_file:
                return_val = i.file.getvalue()
                i.file.close()
        finally:
            pass
        return return_val    

    @classmethod
    def sanitize_lol(cls, lol):
        try:
            lol = iter(lol)
            try:
                elem = next(lol)
                lol = itertools.chain([elem], lol)
            except StopIteration:
                with io.StringIO() as err_stream:
                    print(
                        "Unable to convert empty container",
                        "into prefix notation",
                        file=err_stream
                    )
                    msg = err_stream.getvalue()
                raise cls.EmptyLol(msg)
            finally:
                pass
        finally:
            pass
        return lol

    class EmptyLol(Exception):
         pass

    def __del__(i, *args, **kwargs):
        type(i).file._update_dead_kin(i)
        super_class = inspect.getmro(type(i))[1]
        if super_class != object:
            super_class.__del__(i, *args, **kwargs)
Interchange answered 4/11, 2019 at 2:53 Comment(0)
A
0

I don't know of any modules that do what you want I'm afraid, but if I've understood what you want to do correctly, and as you identified, you simply need to traverse the tree. Your recursive implementation is just slightly off. How about this?

class Node:
  def __init__(self, text, children=None):
    self.text = text
    self.children = children or []

  def is_atomic(self):
    return not self.children

def to_polish(root):
    ''' postfix tree traversal '''
    return root.text + ' ' + ''.join(to_polish(n) for n in root.children)

print(to_polish(
  Node('=', [
    Node('a', [
      Node('+', [
        Node('b'),
        Node('c')
      ])
    ])
  ])
))
Almuce answered 14/9, 2019 at 3:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.