I only came across this recently but want to clarify a misconception.
The Apache Impala minimum memory requirements are not a hard minimum - all functionality works fine with 4-8GB of memory (I use this every day). I would actually guess that, at least for the last few years, Impala is more tolerant of lower memory levels because it has a much more mature memory management and spill-to-disk implementation.
The 128GB recommendation is based on our experience with what you would want for a heavily used production cluster with a demanding workload - one of the worst mistakes people make when planning a deployment is trying to squeeze the memory requirements. It may be a little conservative but we really don't want to recommend something that would be under-resourced and lead to a bad experience.
As far as what the architectural differences are - the Impala dev team at Cloudera has been focused on building a product that works for our 1000s of customers, rather than building software to use by ourselves. What I've learned is that it's actually harder to build things that scale to 1000s of customers than it is to build things that scale to 1000s of nodes in specific deployments.
That means that every feature has to be built robustly and generally enough to handle being put through the paces by all of our customers - if there are any issues, it always comes back to us. We like to say that our customers are going to "use it in anger" - i.e. they are going to push everything to the limit.
We also have a heavy focus on security features that are critical to enterprise customers - authentication, column-level authorization, auditing, etc.
I don't want to get too much into benchmark debates, but I'll say that using the MPP architecture and technologies like LLVM has always given Impala a performance edge and I think we stack up well in any apples-to-apples comparison, particularly on concurrent workloads. I do hear about migrations from Presto-based-technologies to Impala leading to dramatic performance improvements with some frequency.
One disadvantage Impala has had in benchmarks is that we focused more on CPU efficiency and horizontal scaling than vertical scaling (i.e. using all of the CPUs on a node for a single query). That was the right call for many production workloads but is a disadvantage in some benchmarks. We've been addressing that over the last 8-9 months and we're also about to release some multithreading improvements that lead to 2-4x speedups on query latency on standard benchmarks in the upcoming Impala 4.0.