Database vs DataMart vs Data Warehouse vs Data Lake
Asked Answered
E

1

7

Looking for the high-level differences/comparison among

  • Database
  • Data Mart (Top-down approach)
  • Data Warehouse
  • Data Lake
  • Data Lakehouse

Please use relative comparison when specifics are not available.

Etymologize answered 12/5, 2020 at 12:23 Comment(2)
Would it be practical to update this question to also consider the Data Lakehouse?Hillyer
@ShawnEary - Good one! I updated the question and the answer as well!Etymologize
E
9

Included below is a high-level comparison between the various data tiers mentioned. Please feel free to drop a comment if any of these need corrections.

Feature Database Data Mart Data Warehouse Data Lake Data Lakehouse
Source Single Single Multiple Multiple Multiple
Structure Structured Structured Structured Raw Structured, semi-structured, and unstructured
Purpose Determined Determined Determined Determined Determined
Storage Centralized Decentralized Centralized Centralized Centralized
Data Format Detailed Summarized Both detailed and summarized All All
Flexibility Low Medium Medium High High
Primary Use Transactional Reporting Analytics & Reporting Analytics Analytics
Cost Low Medium Medium High High
Data Volume Low Low Medium High High
Development Top-down Bottom-up Top-down All All
Design Time Medium Medium High Low Low
Volatility Medium Low None None None
Data Operations CRUD CR CRU CR CRUD
Subject Area Single Single Multiple Multiple Multiple
Design Schema Relational Multi-dimensional Relational No schema Hybrid

Notes:

  • The cost of a data lakehouse can be lower than a data warehouse if the data is stored in a cloud-based object storage system.
  • The data volume of a data lake can be much higher than a data warehouse or data mart.
  • The development time for a data lakehouse can be lower than a data warehouse if the data is already stored in a cloud-based object storage system.
  • The volatility of a data lake can be higher than a data warehouse or data mart because the data is not always structured and may change frequently.

Fit-gap

  • If you need a low-cost, flexible repository for structured data, then a database is a good option.
  • If you need a repository for summarized data for reporting purposes, then a data mart is a good option.
  • If you need a repository for detailed data for analytics and reporting purposes, then a data warehouse is a good option.
  • If you need a repository for all types of data for analytics and machine learning purposes, then a data lakehouse is a good option.
Etymologize answered 12/5, 2020 at 12:23 Comment(2)
Nice answer. You know you can just type a markdown table into your answer, rather than a code snippet with generation and screenshot of that? Anyway, could you please clarify if "design time" is the design of the database, or design of things which use the database?Berceuse
Thanks @falsePockets. Updated it to Markdown Table :). As for the design time - it's for the design of the datastore, it's infra, layout, clustering, replication, scaling, security, integration mechanisms, etc.Etymologize

© 2022 - 2024 — McMap. All rights reserved.