We have a BI customer that generates about 40 Millions of rows every month in their sales data base tables, generated from their sales transactions. They want to build a Sales Data Mart with their historic data from 5 years, meaning that this fact table will potentially have about 240 millions of rows. ( 40 x 12 months x 5 years )
This is well structured data.
This is the first time Im confronted to this amount of data, and this took me to analyze vertical Data Bases tools like Inforbright and other ones. But still with this kind of software a simple query would take a very , very long time to run.
This took me to take a look at Hadoop, but after reading some articles, I concluded that Hadoop is not the best option ( even with Hive ) to create a fact table, since in my understanding is meant to work with unstructured data.
So, My question is: What would be the best way to build this challenge ?? , Am I not looking for the right technology ? What would be the best query response times I could get in a such big fact table ? ..or Am I confronting a real wall here and the only option is to build aggregated tables ?