Azure Data Explorer (ADX) vs Polybase vs Databricks
Asked Answered
P

2

9

Question

Today I discovered another Azure service called Azure Data Explorer (ADX). Sorry for such comparison of services, I have good understanding of all except ADX. I feel like there is a big functionality overlay, so want to know the exact role of ADX in Azure infrastructure.

What is the use case when ADX is significantly better than Synapse/Databricks?

My understanding of ADX

AFAIK, ADX is a cluster (with per hour billing, like Databricks or Synapse, not like ADLA) that is handling database for you and is optimized for streaming ingestion and ad-hoc queries at scale. It also supports external tables, that has worse performance but cheaper (you pay for Blob/ADLS storage).

Details

I don't understand why do we need ADX if:

  1. Azure Synapse has similar pricing model (cluster, per-hour), also it supports streaming ingestion and ad-hoc querying at scale. Azure Synapse support querying BlobStorage/ADLS through Polybase external tables.
  2. Databricks is another service that is capable of doing it. Using Databricks Ingest and Delta Lake - you can ingest streaming data and consume them in both: streaming and batching way. Actually you can have interactive cluster that will handle ad-hoc queries for you.
  3. Also if you want a real-time analytics - use Azure Stream Analytics. If you want Athena-like experience - use ADLA (still it doesn't support ADLS gen2).
Preventer answered 27/5, 2020 at 17:10 Comment(0)
M
10

Azure Data Explorer is focused on high velocity, high volume high variance (the 3 Vs of big data). It provides super fast interactive queries over such data that is streaming in. It supports json and text natively, including full text search and indexing.

It is used in a broad set of scenarios associated with sensing activity and time series in a large set of verticals: IoT, API logs, transaction monitoring and ad hoc data exploration.

Microsoft is offering ADX as a service as it is the major service that Microsoft is using for its own telemetry and all the analytical solutions as a service that we offer in Security, operational monitoring, game analytics, product insights usage analytics, Iot, Connected vehicles is built on ADX. You can find a full list in our docs. For clarity, SQL, Synapse, CosmosDB is storing its telemetry in Azure Data explorer...

SQL DW (AKA Synapse SQL pool) is an excellent data warehouse and implements the modern data warehouse pattern. ETL->Curated data model-> Load and serve via analysis services or power BI.
ADX is for real time analytics, enabling applying schema on read (SOR) on data as fresh as seconds old.

Consider ADX as a fully managed platform when replacing SOLR/Lucine based variants used for logs, time series databases and more.

Try it out in large workloads and you will see it is dramatically cheaper than the alternatives and much more powerful and performant.

Reach out to me if you need help.

Mia answered 1/6, 2020 at 15:57 Comment(3)
I think Spark & Databricks are also playing very well with 3V. As far as I understood from your answer, ADX handles for you real-time (or near real-time?). Could you please provide an use case when using ADX is bettern than using Databricks/Spark or Flink? Or maybe you have a reference to an article with architecture details of ADX?Preventer
I have built near real-time with Spark at it worked really well. For real-time I suppose someone may use Flink or Kafka Streams. When & why ADX is superior to those tools?Preventer
ADX is dramatically faster for interactive queries over large data sets. If you are using batch processing go for spark. If you want to query fresh and large data sets really quickly, ADX is way faster and easier to use even for non-programmers.Mia
H
1

Azure Data Explorer alias Kusto is focused on high volume data ingestion and almost real-time query and analytics. It is invented at Microsoft for log and telemetry analytics, but can be used for other purposes e.g. Iot, sensor data or web analytics. Same technology is used in Azure internal services like Azure Monitor and Log Analytics.

Similar capabilities could be build on Synapse or Databricks or HDInsight, but I see these as tools that fit much more broad use-cases. ADX has quite narrow focus. ADX does support queries (”KQL”) but has very limited SQL support. It is good for append only data, not for updates. It is not a data warehouse, database or data lake.

Microsoft material refers to the technology behind ADX with name Kusto. More info on this at https://learn.microsoft.com/en-us/azure/data-explorer/kusto/concepts/. A good comparison of services can be found in this blog post: https://vincentlauzon.com/2020/02/19/azure-data-explorer-kusto

Happ answered 27/5, 2020 at 19:55 Comment(1)
thank you for answer! But when someone say the tool X is like tool Y, but X has more narrow use case - I suppose X should be extremelly good at that use case (superior to Y). Could you please provide details/example of why/when ADX should replace Nifi/Spark/Databricks/etc.Preventer

© 2022 - 2024 — McMap. All rights reserved.