how to check data exist in targeted table in pentaho pdi
Asked Answered
S

2

5

M using pentaho spoon first time, i have one input table and i written one sql query, and want that the resulting data of the input table to store in different targeted table, but before storing the data in targeted table check whether the data all ready exist in the table.

Shayneshays answered 11/7, 2018 at 7:51 Comment(0)
G
11

Although Cyrus answer is perfectly correct for your use case, you may also use the Merge Row (diff) step.

It compares an incoming flow with an existing table and tells if the record is New, Modified, Identical or Deleted. Then you can use a Switch/Case step to take the appropriate action.

This incremental load pattern is more efficient than the Insert/Update not only because it gives you more flexibility, but mainly because is not an efficient operation (you need 4 database access: look into the table, get the result, change it if needed and push the result).

Note that, with the PDI running all operations in parallel, you may inadvertently open the same table for reading and writing, which ends up in a dead lock. The usual trick is to sort the reference table in PDI, so it has to read all the records to be sure it has the last one. The sorting will cost you almost no time (and memory) if the flow is presorted in the Input table.enter image description here

Gauntlet answered 11/7, 2018 at 9:59 Comment(6)
This answer is better if performance and scaling are an issue. For a beginner the Insert/Update step is easier to configure and more foolproof.Secondbest
In merge rows (diff), its checks whole row (n column) of the table, can we check 3 columns out of 4 columns.Shayneshays
Yes Cyrus, I even upvoted your answer. However, I thought it was a good opportunity to introduce the incremental load pattern.Gauntlet
in merge rows (diff), is required that both the data is in sorted order and same level of rows?Shayneshays
Yes. If by the "same level of row" you mean the same keys in the same order. Additionally, the row must have exactly the same metadata (same fields, same name same type,...)Gauntlet
#51436263 @GauntletShayneshays
S
3

Use the Insert/Update step

This step does exactly what you want (check before inserting) and can update or skip existing records by checking "Don't perform any updates".

Be sure to configure the key field or fields that determine unique records.

Secondbest answered 11/7, 2018 at 9:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.