Is there a (open-source) MDX-based in-memory OLAP server? [closed]
Asked Answered
T

1

10

I was wondering if there is a in-memory OLAP server which can handle MDX.

I found the Wikipedia article. But it says nothing about in-memory functionality...

The only one I know of them is Mondrian, which is also one of the two open-source solutions mentioned in the wiki article above.

Until now I just worked with Mondrian a bit... I don't think Mondrian is a true in-memory OLAP server.
Here's why:
It has a in-memory cache, which contains queryresults. But their first execution takes long and has to be loaded from the RDBMS. (http://mondrian.pentaho.com/documentation/faq.php#Scalability)
I think the better way would be:
Load all the fact and dimension tables in Cache and then execute each query on this in-memory data.

AFAIK, Oracle will release with the 12c enterprise edition this year the possibility to have some (or all) tables in-memory. This would speed up the OLAP servers, which just queries the RDBMS-fact-tables with SQL.
...but the enterprise edition is very expensive...

I would like to hear some other opinions about this.

Best regards,
Dennis.

Tough answered 18/9, 2014 at 14:3 Comment(5)
You can pre-warm Mondrian's cache by running the queries you want to hold in memory upon ETL completion. However, when you say in memory... clearly your fact table is not a multi-billion row fact, otherwise it would be impossible to store everything in RAM. That's why Mondrian doesn't even try to load everything, it's supposed to handle multi Tb cubes, trying to hold it ALL in memory is pointless. Up to a certain detail level... that's a different game alltogether.Kuvasz
The pre-warm Idea sounds like a good and easy to implement workaround, thanks. Some suggestions about the cube-size: I think about 5GB data per month and data from last 3 years would be 180GB in-memory tables for an olap-system. The data for cube is most of the time pre-aggregated, like customer revenue in a day/month. So even if the DWH is a mulit-TB one, the datamarts and cubes don't have to be that large... Btw, how do you speed up mondrian, if your cubes are multi-TB large?Tough
There are way too many ways to improve performance, that in itself could be a book. A few general ideas: 1) Switch to a high performance column oriented DB (vertica ,infinidb, monetdb to name a few); 2) indexes, indexes, indexes (projections in vertica; nothing in infinidb; indexes in monet); 3) Aggregation tables (mondrian picks the smallest agg table that has the level of detail we asked for); 4) cache, lots of it, in multiple levels (DB cache, mondrian cache, query results cache, etc); 5) network path to the DB; 6) Simpler MDX queries and broken down in bits if they're too large. Etc...Kuvasz
Great advices by @nsousa. I will suggest you one more: IO bottleneck. On one project we used 512GB of RAM as DWH strorage. On every other DWH project I insist on SSD disks.Eward
no MDX interface and pretty stale dev currently but it did works and it is very fast. Not just because it is in-memory but because of underlying C engine which process queries in data.table package: github.com/jangorecki/data.cube I hope to reactivate this project some day, and eventually add MDX interface support.Rosabelle
N
0

At ActiveViam we have developed Sparkube, an extension to Apache Spark that exposes any Apache Spark dataset as an OLAP cube that you can query using MDX from an Excel Pivot Table, Tableau Software, or ActiveUI (a web frontend directly embbeded in Sparkube). Aggregations are distributed in the Spark cluster and performed on the fly, in-memory.

http://activeviam.com/sparkube

The MDX engine in Sparkube is the one we have developed in the commercial ActiveViam analytical platform, so it's used and tested by many.

Here is for instance how you mount the content of a CSV file in memory and expose it as a multidimensional cube:

// Load a file into a Spark dataset.
// Here we load a CSV file, get the column names from
// the CSV headers, and use Spark automatic type inference.
var ds = spark.read
  .format("csv")
  .option("header","true")
  .option("inferSchema","true")
  .load("path/to/file.csv")

// Import the sparkube library (the sparkube jar must be in the classpath)
import com.activeviam.sparkube._

// Expose the dataset as a multidimensional cube.
// You can start visualizing the cube right away at http://localhost:9090/ui
// or connect from Excel with the XMLA protocol at http://localhost:9090/xmla
new Sparkube().fromDataset(ds)
  .withName("My first cube")
  .expose()
Nnw answered 23/11, 2017 at 11:2 Comment(5)
You appear to have linked to your own project/application/service. If you do so, you need to disclose that it's your own. If you don't disclose that, it's often considered spam. See: What signifies "Good" self promotion? and How to not be a spammer.Filial
Woa, very good idea! so it's like a driver OLAP for Spark isnit?Mingy
Yes that's well said. Sparkube is like an OLAP driver for Apache Spark ;-) It implements the MDX query language and the XMLA protocol (the two standards of multidimensional OLAP) and delegates all the data processing to Spark.Nnw
Link is dead. @AntoineCHAMBILLE, what happened to the project?Ununa
Hello @BasilPeace, the Sparkube idea has grown into the Atoti project ( atoti.io ) With Atoti you can make OLAP cubes in-memory or on top of Databricks, Snowflake, BigQuery, ClickHouse, Redshift...Nnw

© 2022 - 2024 — McMap. All rights reserved.