Apache Ignite is a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time.Ignite is a data-source-agnostic platform and can distribute and cache data across multiple servers in RAM to deliver unprecedented processing speed and massive application scalability.
Apache Spark(cluster computing framework) is a fast, in-memory data processing engine with expressive development APIs to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.
By allowing user programs to load data into a cluster’s memory and query it repeatedly, Spark is well suited for high-performance computing and machine learning algorithms.
Some conceptual differences:
Spark doesn’t store data, it loads data for processing from other storages, usually disk-based, and then discards the data when the processing is finished. Ignite, on the other hand, provides a distributed in-memory key-value store (distributed cache or data grid) with ACID transactions and SQL querying capabilities.
Spark is for non-transactional, read-only data (RDDs don’t support in-place mutation), while Ignite supports both non-transactional (OLAP) payloads as well as fully ACID compliant transactions (OLTP)
Ignite fully supports pure computational payloads (HPC/MPP) that can be “dataless”. Spark is based on RDDs and works only on data-driven payloads.
Conclusion:
Ignite and Spark are both in-memory computing solutions but they target different use cases.
In many cases, they are used together to achieve superior results:
Ignite can provide shared storage, so the state can be passed from one Spark application or job to another.
Ignite can provide SQL with indexing so Spark SQL can be accelerated over 1,000x (spark doesn’t index the data)
When working with files instead of RDDs, the Apache Ignite In-Memory File System (IGFS) can also share state between Spark jobs and applications