Passing Environment Variables to a Pentaho Kettle Transformation via script
Asked Answered
K

5

5

I am required to parameterize all variables in my kettle job and a transformation (the jobs will be run in AWS and all params are passed in as environment variables).

My connections, paths and various other parameters in the job and its attendant transformation use the ${SOURCE_DB_PASSWORD}, ${OUTPUT_DIRECTORY} style.

When I set these as environment variables in Data Integration UI, they all work and job runs successfully in the UI tool. When I run them from a bash script:

#!/bin/sh
export SOURCE_DB_HOST=services.db.dev
export SOURCE_DB_PORT=3306

kitchen.sh -param:SOURCE_DB_PORT=$SOURCE_DB_PORT -param:SOURCE_DB_HOST=$SOURCE_DB_HOST -file MY_JOB.kjb

The job and the transformation it calls do not pick up the variables. The error being:

Cannot load connection class because of underlying exception: 'java.lang.NumberFormatException: For input string: "${SOURCE_DB_PORT}"'

So without using jndi files, or kettle.properties, I need some way of mapping environment variables to parameters/variables inside PDI jobs and transformations.

[PDI version 8.1 on Mac OS X 10.13]

Kuwait answered 3/9, 2018 at 7:30 Comment(2)
Windows or linux?Housecarl
sorry, issue appears on Mac OS X 10.13, Java 8Kuwait
K
3

Using the the -param:SOURCE_DB_HOST=value syntax on the command line and the ${SOURCE_DB_HOST} syntax inside jobs and transformations is the correct way to go.

What you need to do in transformations (but not for jobs it appears) is to add to the parameters explicitly to the transformation properties (control-T or on mac command-T in the transformation workspace). Screen shot attached.

Running the job or the transformation directly from a shell script will then work.

enter image description here

Kuwait answered 3/9, 2018 at 23:52 Comment(0)
M
2

I've just been trying to get this stuff working. The -param (or /param) command line flags only seem to work if you specify the parameters in the job spec, and then explicitly pass them down to any transformations that need them.

To get system properties in that are universally accessible within the job I've used:

set "OPT=-Dname1=value -Dname2=value"

Before calling kitchen. This puts the -D flags into the java options. Hopefully this will work with database specs.

Of course you can put them in kettle.properties, but if you want to run against different environments that gets messy.

Magdala answered 13/3, 2019 at 16:17 Comment(0)
H
1

Maybe the issue is not caused by the parameters, but with the Table Input. Can you check if the Replace variables in script is checked.

enter image description here

Housecarl answered 3/9, 2018 at 8:11 Comment(1)
Yes it is checked. My query is: SELECT * FROM ${SOURCE_TABLE} WHERE updated_at > "2018-05-10 00:0:00" and updated_at < "2018-05-11 00:00:00". SOURCE_TABLE is passed in.....I edited the env vars in the question for clarity. As mentioned, the Job runs successfully in Data Integrator IDE tool.Kuwait
H
0

With the step Get variable, you can change the parameters into variables (columns), before the Table input step, without forgetting to specify the Insert data from step.

As the issue comes from a type format error, you may want to see what PDI thinks of the variable with the Write to log step.

Tell me if the Number format exception persist.

Housecarl answered 3/9, 2018 at 10:25 Comment(0)
F
0

This is a typical error when pentaho expects an integer in the port number but receives s string. Check whether there is any space after the port number in your shell script

Fleshly answered 29/10, 2018 at 22:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.