I need to develop application that can be getting data from multiple data sources ( Oracle, Excel, Microsoft Sql Server
, and so on) using one SQL query
. For example:
SELECT o.employeeId, count(o.orderId)
FROM employees@excel e. customers@microsoftsql c, orders@oracle o
WHERE o.employeeId = e.employeeId and o.customerId = c.customerId
GROUP BY o.employeeId;
This sql and data sources must be changes dynamically by java program. My customers want to write and run sql-like query
from different database and storage in same time with group by, having, count, sum
and so on in web interface of my aplication. Other requirements is perfomance and light-weight.
I find this way to do it (and what drawbacks I see, please, fix me if I wrong):
Apache Spark (drawbacks: heavy solution, more better for BigData, slow if you need getting up-to-date informations without cached it in Spark),
Distributed queries in SQL server (Database link of Oracle, Linked server of Microsoft SQL Server, Power Query of Excel) - drawbacks: problem with change data sources dynamically by java program and problem with working with Excel,
Prestodb (drawbacks: heavy solution, more better for BigData),
Apache Drill (drawbacks: quite young solution, some problem with not latest odbc drivers and some bugs when working),
Apache Calcite (ligth framework that be used by Apache Drill, drawbacks: quite young solution yet),
Do join from data sources manually (drawbacks: a lot of work to develop correct join, "group by" in result set, find best execution plan and so on)
May be, do you know any other way (using free open-source solutions) or give me any advice from your experience about ways in above? Any help would be greatly appreciated.
SSIS
and gather data in central place(SQL Server) and then query it. – Tear