data-warehouse Questions

3

I am writing an ETL script in Python that gets data in CSV files, validates and sanitizes the data as well as categorizes or classifies each row according to some rules, and finally loads it into a...
Deedeeann asked 8/3, 2012 at 19:45

4

Solved

Background: I have a PostgreSQL (v8.3) database that is heavily optimized for OLTP. I need to extract data from it on a semi real-time basis (some-one is bound to ask what semi real-time means a...

3

Solved

I am working on a data warehouse and looking for an ETL solution that uses Python. I have played with SnapLogic as an ETL, but I was wondering if there were any other solutions out there. This dat...
Haughay asked 21/9, 2010 at 16:4

3

Solved

Are staging tables used only in Data warehouse project or in any SSIS Project? I would like to know what is a staging table? Can anyone give me some examples on how to use it and in what circumstan...
Unattended asked 28/3, 2015 at 12:29

5

Solved

I am trying to install oracle 19c DB but after extraction the setup.exe file isn't executing. I have tried it with admin privileges, still doesn't work.
Leandro asked 20/7, 2020 at 15:57

5

Solved

I am creating a calendar table for my warehouse. I will use this as a foreign key for all the date fields. The code shown below creates the table and populates it. I was able to figure out how to ...
Kylie asked 29/7, 2009 at 16:56

7

Solved

I have a Kimball-style DW (facts and dimensions in star models - no late-arriving facts rows or columns, no columns changing in dimensions except expiry as part of Type 2 slowly changing dimensions...
Fortis asked 18/6, 2009 at 20:22

11

Solved

What is the difference between fact tables and dimension tables? An example could be very helpful.
Quamash asked 17/11, 2013 at 22:16

3

Solved

which data type should I choose for a unique key (id of a user for example) in postgresql database's table? does bigint is the one? thanks
Spoonful asked 2/8, 2012 at 13:9

7

I heard a new term Data Lake. I googled and got that A data lake is a large-scale storage repository and processing engine. A data lake provides "massive storage for any kind of data, enormou...
Draconic asked 14/3, 2016 at 12:24

7

From wiki, Data lineage is defined as a data life cycle that includes the data's origins and where it moves over time. It describes what happens to data as it goes through diverse processes. It he...
Hooked asked 13/4, 2017 at 3:46

4

Solved

We are currently using a summary table that aggregates information for our users on an hourly basis in UTC time. The problem we are having is that this table is becoming too large and slowing our s...
Liebowitz asked 6/8, 2010 at 22:30

3

Solved

We are conducting a migration project, and looking to replace most Rowstore indexes with Clustered Columnstore indexes for large Data Warehouse. We are adding a unique index on the identity column....

4

Solved

I have heard a few references that pk is not required on fact table. I believe every single table should have a pk. How could a person understand a row in a fact table if there is no pk and 10+ f...
Karena asked 22/1, 2014 at 16:30

5

Solved

I know several small companies do not do testing on ETL process, but that seems to be suboptimal from the perspective of software engineering. How do people usually do testing/unit test/functional ...
Christianity asked 14/6, 2016 at 10:15

2

Solved

I have many doubts related to Spark + Delta. 1) Databricks propose 3 layers (bronze, silver, gold), but in which layer is recommendable to use for Machine Learning and why? I suppose they propose...

13

What is the difference between a database and a data warehouse? Aren't they the same thing, or at least written in the same thing (ie. Oracle RDBMS)?
Ritualism asked 5/8, 2010 at 21:33

12

Solved

I was asked by a customer what the term "data warehouse" really means. I thought about ETL, details of the data model, differences to NoSQL, Clouds, 'normal' DBMS, MDM (Master Data Manageme...
Pistole asked 22/6, 2010 at 23:33

1

Solved

Looking for the high-level differences/comparison among Database Data Mart (Top-down approach) Data Warehouse Data Lake Data Lakehouse Please use relative comparison when specifics are not availa...
Etymologize asked 12/5, 2020 at 12:23

2

Consider the following two DWH architectures: DWH with Raw Data Vault, layers: Source systems Staging area (truncated on every load, exact schema of source tables) Raw Data Vault (modelled as Da...
Decca asked 20/1, 2020 at 20:28

1

I have an SSIS solution where depending on a parameter, it launches the extraction of different databases, each in a different file, and the name must contain the date of the extraction so we can't...
Duenna asked 19/3, 2020 at 11:40

5

I'm looking at the datekey column from the fact tables in AdventureWorksDW and they're all of type int. Is there a reason for this and not of type date? I understand that creating a clustered ind...
Warmth asked 5/7, 2017 at 19:4

2

Existing process - raw structure data are copied into a staging layer of Redshift. Then use ETL tools such as Informatica, Telend to do incremental loading into Fact and Dimension table of Datamart...
Merrymerryandrew asked 25/11, 2016 at 21:40

1

Solved

Has anyone seen this error before? If so, how did you fix it? I can't find anything on Google. Here is what I have done: I tried doing a Google search but practically nothing came up. I checked ...
Lithomarge asked 6/7, 2019 at 5:8

4

Solved

I have a table that records a row for each time a score for a location has changed. score_history: id int PK (uuid auto incrementing int) happened_at timestamp (when the score changed) location_...
Riddick asked 3/7, 2019 at 15:48

© 2022 - 2024 — McMap. All rights reserved.