M using pentaho spoon first time, i have one input table and i written one sql query, and want that the resulting data of the input table to store in different targeted table, but before storing the data in targeted table check whether the data all ready exist in the table.
Although Cyrus answer is perfectly correct for your use case, you may also use the Merge Row (diff)
step.
It compares an incoming flow with an existing table and tells if the record is New, Modified, Identical or Deleted. Then you can use a Switch/Case
step to take the appropriate action.
This incremental load pattern is more efficient than the Insert/Update
not only because it gives you more flexibility, but mainly because is not an efficient operation (you need 4 database access: look into the table, get the result, change it if needed and push the result).
Note that, with the PDI running all operations in parallel, you may inadvertently open the same table for reading and writing, which ends up in a dead lock. The usual trick is to sort the reference table in PDI, so it has to read all the records to be sure it has the last one. The sorting will cost you almost no time (and memory) if the flow is presorted in the Input table
.
Use the Insert/Update step
This step does exactly what you want (check before inserting) and can update or skip existing records by checking "Don't perform any updates".
Be sure to configure the key field or fields that determine unique records.
© 2022 - 2024 — McMap. All rights reserved.