I have an auto-generated file which looks something like this...
static void do_SomeFunc1(void* parameter)
{
// Do stuff.
}
// Continues on for another 4000 functions...
void dispatch(int id, void* parameter)
{
switch(id)
{
case ::SomeClass1::id: return do_SomeFunc1(parameter);
case ::SomeClass2::id: return do_SomeFunc2(parameter);
// This continues for the next 4000 cases...
}
}
When I build it like this, the build time is enormous. If I inline all the functions automagically into their respective cases using my script, the build time is cut in half. GCC 4.5.0 says ~50% of the build time is being taken up by "variable tracking" when I use -ftime-report. What does this mean and how can I speed compilation while still maintaining the superior cache locality of pulling out the functions from the switch?
EDIT: Interestingly enough, the build time has exploded only on debug builds, as per the following profiling information of the whole project (which isn't just the file in question, but still a good metric; the file in question takes the most time to build):
- Debug: 8 minutes 50 seconds
- Release: 4 minutes, 25 seconds
If you're curious, here are a few sample do_func's, context removed. As you can see, I simplified the problem definition a bit to only show the relevant parts. In case you're wondering, all the self->func calls are calls to boost::signal's.
static void do_Match_Login(Registry* self, const uint8_t* parameters, uint16_t length)
{
const uint8_t* paramPtr = parameters;
std::string p0 = extract_string(parameters, ¶mPtr, length);
std::string p1 = extract_string(parameters, ¶mPtr, length);
int32_t p2 = extract_int32(parameters, ¶mPtr, length);
uint32_t p3 = extract_uint32(parameters, ¶mPtr, length);
tuple<Buffer, size_t, size_t> p4 = extract_blob(parameters, ¶mPtr, length);
return self->Match_Login(p0, p1, p2, p3, p4);
}
static void do_Match_ResponseLogin(Registry* self, const uint8_t* parameters, uint16_t length)
{
const uint8_t* paramPtr = parameters;
int32_t p0 = extract_int32(parameters, ¶mPtr, length);
std::string p1 = extract_string(parameters, ¶mPtr, length);
array<uint16_t, 3> p2 = extract_vector(parameters, ¶mPtr, length);
std::string p3 = extract_string(parameters, ¶mPtr, length);
uint8_t p4 = extract_uint8(parameters, ¶mPtr, length);
uint8_t p5 = extract_uint8(parameters, ¶mPtr, length);
uint64_t p6 = extract_MUID(parameters, ¶mPtr, length);
bool p7 = extract_bool(parameters, ¶mPtr, length);
tuple<Buffer, size_t, size_t> p8 = extract_blob(parameters, ¶mPtr, length);
return self->Match_ResponseLogin(p0, p1, p2, p3, p4, p5, p6, p7, p8);
}
id->func
lookup table. – Tahitiid
to lookup and getfunc
then invoke it. Same number of stack frames as now, much lessswitch
branching. – Tahitido_someFunc()
s? – Rhodolitehash_map<>
or something. As I said, that's basically what the compiler will turn your enormous switch into anyway. – Figurestd::map<>
is the equivalent. Anyway, I'm not even sure manually doing it yourself over letting the compiler do it would even help :) – Figure