Summary:
Is it possible to:
- Import data into Hadoop with the «MongoDB Connector for Hadoop».
- Process it with Hadoop MapReduce.
- Export it with Sqoop in a single transaction.
I am building a web application with MongoDB. While MongoDB work well for most of the work, in some parts I need stronger transactional guarantees, for which I use a MySQL database.
My problem is that I want to read a big MongoDB collection for data analysis, but the size of the collection means that the analytic job would take too long to process. Unfortunately, MongoDB's built-in map-reduce framework would not work well for this job, so I would prefer to carry out the analysis with Apache Hadoop.
I understand that it is possible read data from MongoDB into Hadoop by using the «MongoDB Connector for Hadoop», which reads data from MongoDB, processes it with MapReduce in Hadoop, and finally outputs the results back into a MongoDB database.
The problem is that I want the output of the MapReduce to go into a MySQL database, rather than MongoDB, because the results must be merged with other MySQL tables.
For this purpose I know that Sqoop can export result of a Hadoop MapReduce into MySQL.
Ultimately, I want too read MongoDB data then process it with Hadoop and finally output the result into a MySQL database.
Is this possible? Which tools are available to do this?