My workflow : KAFKA -> Dataflow streaming -> BigQuery
Given that having low-latency isn't important in my case, I use FILE_LOADS to reduce the costs. I'm using BigQueryIO.Write with a DynamicDestination (one new table every hour, with the current hour as a suffix).
This BigQueryIO.Write is configured like this :
.withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED)
.withMethod(Method.FILE_LOADS)
.withTriggeringFrequency(triggeringFrequency)
.withNumFileShards(100)
The first table is successfully created and is written to. But then the following tables are never created and I get these exceptions:
(99e5cd8c66414e7a): java.lang.RuntimeException: Failed to create load job with id prefix 5047f71312a94bf3a42ee5d67feede75_5295fbf25e1a7534f85e25dcaa9f4986_00001_00023, reached max retries: 3, last failed load job: {
"configuration" : {
"load" : {
"createDisposition" : "CREATE_NEVER",
"destinationTable" : {
"datasetId" : "dev_mydataset",
"projectId" : "myproject-id",
"tableId" : "mytable_20180302_16"
},
For the first table the CreateDisposition used is CREATE_IF_NEEDED as specified, but then this parameter is not taken into account and CREATE_NEVER is used by default.
I also created the issue on JIRA.
.withJsonSchema()
, and it works fine to create the first table. The problem seems to be thatCREATE_DISPOSITION
is ignored for the other tables than the ones in the first pane. Please see Jonas' comment on the JIRA issue for more details. – Precursory