Too many sections, assembler error, using boost::spirit
Asked Answered
U

4

10

I'm in the progress of writing a compiler for a subset of Java, using boost::spirit, for lexing and parsing. During compilation of the lexer/parser phase, the compiler consumes 1.6GB of RAM (g++ (GCC) 4.8.1), this is not an issue however, as there's plenty of memory on this machine.

What is an issue however, is that when the compiler is done, and the assembler starts running (GNU assembler (GNU Binutils) 2.23.52.20130604), it crashes with;

as: build/src/ast_generate.o: too many sections (33098)
/tmp/cc0ZyvKK.s: Assembler messages:
/tmp/cc0ZyvKK.s: Fatal error: can't write build/src/ast_generate.o: File too big
as: build/src/ast_generate.o: too many sections (33098)
/tmp/cc0ZyvKK.s: Fatal error: can't close build/src/ast_generate.o: File too big
scons: *** [build/src/ast_generate.o] Error 1

Adding '-Os' to my compiler flags, allows the assembler to process the compiler output, but as I see it, it's only a matter of time, until I'll hit the same issue, even with the small optimization flag.

Inspecting, the size optimized object file (ast_generate.o) using objdump, tells me that I'm generating pe-x86-64, which is what I'd expect on Windows.

The 2358 generated sections, is however a shock to me. Mostly as it seems that a section has been generated for each part of the boost::spirit;

 CONTENTS, ALLOC, LOAD, READONLY, DATA, LINK_ONCE_DISCARD
 ...
 60 .pdata$_ZNK5boost5lexer6detail8end_node9unique_idEv 0000000c  0000000000000000  0000000000000000  00030750  2**2
 61 .text$_ZNK5boost5lexer6detail8end_node11lexer_stateEv 00000010  0000000000000000  0000000000000000  0003075c  2**4
 ...

So my questions are;

  1. Is the number in the error (too many sections (X)), the number of sections to be generated, or is it an error code?
  2. Why is a section generated for each data-type?
  3. What can I do, to avoid having to pass '-Os' to my compiler. That is, what can I do to fix the issue, rather than to work around it?
  4. Would splitting the lexer and parse phase into two distinct phases (and compilation units), only connected through a lexer iterator help me out?

Note; I'm compiling using cygwin64.

Unwept answered 9/9, 2013 at 8:46 Comment(21)
Well, boost::spirit generates an awful lot of code depending on how you use it. Try to reduce complexity somehow.Sounder
@thokra: I've only written one forth of my parser, at the moment. So reducing complexity will likely not be the solution, as a lot more will be added in the future, anyway.Unwept
If you can share the code, I could have a look at somehow splitting this thing upPellucid
@Unwept complexity isn't naturally linear to scope or volume. If it doesn't scale, another approach is required :)Pellucid
@Skeen: To add to sehe's remark: The problem with TMP based frameworks like boost::spirit is the number of template instantiations the compiler generates, not necessarily due to the code you write but also due the code that's generated - which in turn might use other generated code and so on.Sounder
@Sounder I'm pretty sure the OP is aware of that, or he wouldn't be using Spirit, is my firm belief.Pellucid
@sehe: I should hope so. ;)Sounder
@thokra: I am aware of the fact, that the TMP framework, generates lots and lots of code, and as I stated the memory usage, and compile time, isn't of my concern, the ability to compile the project at all, however is.Unwept
@sehe: While I actually don't like sharing the code to the general public currently (as it's quite messy, and needs refactoring), it can be found here; sourceforge.net/p/joos2compiler/code-0/ci/default/tree/src/…Unwept
@sehe: Please be aware, that this is indeed not anywhere near release or production code.Unwept
@Skeen: That's the point. It's not the memory usage or the compile time that's your problem, but the amount of generated code that lands in your object files. Therefore, trying to reduce the number of template instantiations is a viable way to go - if possible, however.Sounder
@thokra: I wasn't aware of the fact, that the my tools were unable to handle 'huge' object files, when I started using spirit. I really just expected the tools to cope, and use whatever required time and space. - Which may have been a stupid assumption.Unwept
@Skeen: Have you read through sourceware.org/binutils/docs/as ? Maybe you'll get some clues there. Also, ELF might define a maximum permissible number of sections - I'm not finding anything on that right now.Sounder
@Skeen: Oversight: not ELF, PE.Sounder
@Unwept the most important things I can remark right away: stay away from semantic actions, prefer compile-time polymorphism (Inheritance Is The Base Class of Evil) and by all means, use automatic attribute propagation. If you insist on allocating all these AST nodes dynamically, consider Customization PointsPellucid
@sehe: I was eventually going to look into removing the semantic actions, however as I'm learning while doing this, I'd actually like to write a parser, that makes heavy use of semantic actions. That is before writing one, which doesn't, if this is possible at all? (Which I guess, comes down to whether my grammar is split-table?)Unwept
It is unrelated to it being splittable. Ah, case-insensitive file naming assumed. Arrrrggg.Pellucid
@sehe: As I said, the code is still in a VERY early stage ;)Unwept
@sehe: Here's a list of the used external libraries; sourceforge.net/p/joos2compiler/wiki/HomeUnwept
I'm moving to chat, see me there?Pellucid
Jeehah. That was a big endeavour, but at least I arrived at something 'ok' enough that it compiles: https://mcmap.net/q/1079631/-too-many-sections-assembler-error-using-boost-spirit - CheersPellucid
P
8

I've done some hacking here and refactored things a to show the non-runtime-polymorphic style:

I hope it doesn't increase compile times :) (I haven't actually gotten around to splitting the grammar up, but it got smaller).


Features:

  • no more heap allocated AST nodes (not even for trees like expression and/or statement); hence no more explicit cloning and/or spurious const members.
  • I have replaced Maybe.hpp with

    #pragma once
    #include <boost/optional.hpp>
    
    template <typename T> using Maybe = boost::optional<T>;
    

    It's quick-and-dirty, but it all compiles

  • I've replace open type-switching with my own minor effort (I couldn't get it to work; also with boost-variant it's all built in):

    namespace visitor_galore // this is my make-shift replacement for typeswitch (I couldn't find it/make it work)
    {
        template<typename T, class...Fs> struct visitor_t;
    
        template<typename T, class F1, class...Fs>
        struct visitor_t<T, F1, Fs...> : F1, visitor_t<T, Fs...>::type {
            typedef visitor_t type;
            visitor_t(F1 head, Fs...tail) : F1(head), visitor_t<T, Fs...>::type(tail...) {}
    
            using F1::operator();
            using visitor_t<T, Fs...>::type::operator();
        };
    
        template<typename T, class F> struct visitor_t<T, F> : F, boost::static_visitor<T> {
            typedef visitor_t type;
            visitor_t(F f) : F(f) {}
            using F::operator();
        };
    
        template<typename T=void, class...Fs>
        typename visitor_t<T, Fs...>::type make_visitor(Fs...x) { return {x...}; }
    }
    
    using visitor_galore::make_visitor;
    

    To see how this is used, have a look at e.g. ast_pp.cpp:

    void pretty_print(expression_incdec const& exp)
    {
         boost::apply_visitor(
                make_visitor(
                    [&exp](inc_dec_op_preinc const& op)  { std::cout << "++"; pretty_print(exp.variable); }, 
                    [&exp](inc_dec_op_predec const& op)  { std::cout << "--"; pretty_print(exp.variable); }, 
                    [&exp](inc_dec_op_postinc const& op) { pretty_print(exp.variable); std::cout << "++"; }, 
                    [&exp](inc_dec_op_postdec const& op) { pretty_print(exp.variable); std::cout << "--"; }
                    )
                , exp.operatur);
    }
    

    BONUS If you don't care much for listing all types in the branches, e.g. because they all default to calling the same free function (or overloads), you can use a polymorphic visitor:

    static const struct pretty_print_visitor_ : boost::static_visitor<>
    {
        template<typename T>
        void operator ()(T const& v) const { pretty_print(v); }
    } pretty_print_visitor;
    

    E.g. now you can replace the 24 branches for expression&:

    boost::apply_visitor(
            make_visitor(
                [](expression_binop const& exp)              { pretty_print(exp); }, 
                [](expression_unop const& exp)               { pretty_print(exp); }, 
                [](expression_integer_constant const& exp)   { pretty_print(exp); }, 
                [](expression_character_constant const& exp) { pretty_print(exp); }, 
                [](expression_string_constant const& exp)    { pretty_print(exp); }, 
                [](expression_boolean_constant const& exp)   { pretty_print(exp); }, 
                [](expression_null const& exp)               { pretty_print(exp); }, 
                [](expression_this const& exp)               { pretty_print(exp); }, 
                [](expression_static_invoke const& exp)      { pretty_print(exp); }, 
                [](expression_non_static_invoke const& exp)  { pretty_print(exp); }, 
                [](expression_simple_invoke const& exp)      { pretty_print(exp); }, 
                [](expression_ambiguous_invoke const& exp)   { pretty_print(exp); }, 
                [](expression_new const& exp)                { pretty_print(exp); }, 
                [](expression_new_array const& exp)          { pretty_print(exp); }, 
                [](expression_lvalue const& exp)             { pretty_print(exp); }, 
                [](expression_assignment const& exp)         { pretty_print(exp); }, 
                [](expression_incdec const& exp)             { pretty_print(exp); }, 
                [](expression_cast const& exp)               { pretty_print(exp); }, 
                [](expression_ambiguous_cast const& exp)     { pretty_print(exp); }, 
                [](expression_instance_of const& exp)        { pretty_print(exp); }, 
                [](expression_parentheses const& exp)        { pretty_print(exp); },
                [](lvalue_non_static_field const& exp)       { pretty_print(exp); },
                [](lvalue_array const& exp)                  { pretty_print(exp); },
                [](lvalue_ambiguous_name const& exp)         { pretty_print(exp); }
           )
            , exp);
    

    by a simple

    boost::apply_visitor(pretty_print_visitor, exp);
    
  • Note a few occasions where I've put // TODO or // FIXME comments (notable with concat, which didn't quite want to compile for me anymore).

  • Note that the Ast classes got noticeably simpler (especially more trivally correct regarding memory allocations)

  • Note that the Parser itself shrunk due to the reduced need for semantic actions and Phoenix adapted functions

  • Note that I opted to forget about LexerPosition information for now (that used to be 'hidden' in the base classes, now gone). There is a compiler tutorial example that shows how to use qi::on_error(qi::success, ...) to very elegantly attach source location information to selected Ast nodes (non-intrusively).

  • Instead of the various predicates in ast_helpers I'd anticipate that there could be a number of helpful traits-based predicates (e.g. is_lvalue or is_true_const). I've elected to "keep" the helpers more or less as-is (which may be totally wrong, I haven't tested anything).

  • I've pervasively tried to replace parameter passing by value to passing by const& (compare e.g. the ast_pp.hpp) but I'm aware I've left some spots behind because the task was big enough as it was.

GIANT DISCLAIMER: I've probably broken the parser in various ways. I haven't tried to parse anything with it. The edits are provided as is and without any claim to usefulness. I've solved similar problems in dissimilar ways (once a traits::tranform_attribute<> specialization, once a largish semantic action with at_c<>, and some other approaches) :

The goal was to show you what I had in mind when I mentioned maybe

  • reducing dynamic polymorphism significantly,
  • avoiding semantic actions
  • embracing boost constructs where possible to get more 'automagic' integration with spirit
  • showing various ideas you can pick your favorite(s) from :/
Pellucid answered 11/9, 2013 at 2:54 Comment(1)
PS. I forgot to mention: Note that the value semantics are probably already a net win, except for the string handling (identifiers). I'd suggest storing string 'atoms' by immutable refs (perhaps combined with the source location information!) to reduce the copying there. (String atoms (or "interned" strings) should also prove highly beneficial during lookup in other compilation phases (hashing could just use the pointer value).)Pellucid
P
3

Try

  • splitting it up in different translation units
  • disabling debug information (frequently, this is what trips up with large file sizes, because debug information gets emitted just like other object data)
  • disabling rtti (last resort)
Pellucid answered 9/9, 2013 at 8:50 Comment(5)
I do believe need rtti for use a library used with the parser. Disabling debug information, will ruin my ability to use gdb, but could be possible. As for splitting my parser into different translation units, then I'm uncertain about doing this, when the entire parser is defined inside a single constructor. - Can one split the rule-definitions into several constructors / methods and then chain / call these during initialization?Unwept
@Unwept at this point, you mainly want to pinpoint the cause of the problem. I'd try it before I'd declare "impossible" on all the options :)Pellucid
As I've commented I'm willing to take a look, but it's gonna prove hard without the code!Pellucid
@Sounder that's just nonsense, generally speakingPellucid
@sehe: Correct - wasn't thinking. Removed.Sounder
A
2

Turning optimizations on (-O1 flag) solved the problem for me.

Astonishing answered 16/7, 2014 at 9:11 Comment(1)
erm. yeah. pretty elementary, but probably worth noting nonetheless. I've seen people forget stuff. And with meta-programming the cost is rather prohibitivePellucid
E
0

Try adding -Wa,-mbig-obj to your CXX_FLAGS. This will work with new enough gcc.

Ettieettinger answered 8/2, 2017 at 1:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.