Error using SpannerIO in apache beam
Asked Answered
S

2

4

This question is a follow-up to this one. I am trying to use apache beam to read data from a google spanner table (and then do some data processing). I wrote the following minimum example using the java SDK:

package com.google.cloud.dataflow.examples;
import java.io.IOException;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.PipelineResult;
import org.apache.beam.sdk.io.gcp.spanner.SpannerIO;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.values.PCollection;
import com.google.cloud.spanner.Struct;

public class backup {

  public static void main(String[] args) throws IOException {
    PipelineOptions options = PipelineOptionsFactory.create();

    Pipeline p = Pipeline.create(options);
    PCollection<Struct> rows = p.apply(
            SpannerIO.read()
                .withInstanceId("my_instance")
                .withDatabaseId("my_db")
                .withQuery("SELECT t.table_name FROM information_schema.tables AS t")
                );
    
    PipelineResult result = p.run();
    try {
      result.waitUntilFinish();
    } catch (Exception exc) {
      result.cancel();
    }
  }
}

When I try to execute the code using the DirectRunner, I get the following error message:

org.apache.beam.runners.direct.repackaged.com.google.common.util.concurrent.UncheckedExecutionException:

org.apache.beam.sdk.util.UserCodeException: java.lang.NoClassDefFoundError: Could not initialize class com.google.cloud.spanner.spi.v1.SpannerErrorInterceptor

[...] Caused by: org.apache.beam.sdk.util.UserCodeException: java.lang.NoClassDefFoundError: Could not initialize class com.google.cloud.spanner.spi.v1.SpannerErrorInterceptor

[...] Caused by: java.lang.NoClassDefFoundError: Could not initialize class com.google.cloud.spanner.spi.v1.SpannerErrorInterceptor

Or, using the DataflowRunner:

org.apache.beam.runners.direct.repackaged.com.google.common.util.concurrent.UncheckedExecutionException: org.apache.beam.sdk.util.UserCodeException: java.lang.NoSuchFieldError: internal_static_google_rpc_LocalizedMessage_fieldAccessorTable

[...] Caused by: org.apache.beam.sdk.util.UserCodeException: java.lang.NoSuchFieldError: internal_static_google_rpc_LocalizedMessage_fieldAccessorTable

[...] Caused by: java.lang.NoSuchFieldError: internal_static_google_rpc_LocalizedMessage_fieldAccessorTable

In both cases, the error message is rather cryptic, and I could not find any clear ideas as to what causes the error from a google search. I also could not find any example scripts using the SpannerIO module.

Is this error due to an obvious error in my code, or is it due to a bad installation of the google cloud tools ?

Swedenborgianism answered 11/10, 2017 at 9:1 Comment(2)
Argh, you are likely hitting dependency conflict issues.apache.org/jira/browse/BEAM-2837. It was fixed, but we need to wait for new version of beam. You can build beam binaries yourself from source or use this trick in your pom.xml gist.github.com/mairbek/0c770ff7b591e3db58936b0b9294215aAlti
Oh. Thanks ! I guess I'll try out the fix.Swedenborgianism
B
7

This issue is most likely caused by a dependency compatibility problem described here: BEAM-2837. Here's a quick workaround described in one of the comments in the JIRA issue:

<dependency>
    <groupId>com.google.api.grpc</groupId>
    <artifactId>grpc-google-common-protos</artifactId>
    <version>0.1.9</version>
</dependency>

<dependency>
    <groupId>org.apache.beam</groupId>
    <artifactId>beam-sdks-java-io-google-cloud-platform</artifactId>
    <version>${beam.version}</version>
    <exclusions>
        <exclusion>
            <groupId>com.google.api.grpc</groupId>
            <artifactId>grpc-google-common-protos</artifactId>
        </exclusion>
    </exclusions>
</dependency>

Explicitly define the required com.google.api.grpc dependency and exclude the version from org.apache.beam.

Backbone answered 16/2, 2018 at 23:41 Comment(1)
Thank you ! To be honest though, I ended up moving on to using the python SDK, and making a custom ParDo for reading/writing to spanner.Swedenborgianism
L
2

You need to specify the ProjectID:

    SpannerIO.read()
            .withProjectId("my_project")
            .withInstanceId("my_instance")
            .withDatabaseId("my_db")

And you need to set the credentials for your Spanner project. As the API of SpannerIO does not allow you to set any custom credentials, you must set Global Application Credentials using the environment variable GOOGLE_APPLICATION_CREDENTIALS.

You could also read (and write) to Cloud Spanner using JDBC. Reading is done like this:

        PCollection<KV<String, Long>> words = p2.apply(JdbcIO.<KV<String, Long>> read()
            .withDataSourceConfiguration(JdbcIO.DataSourceConfiguration.create("nl.topicus.jdbc.CloudSpannerDriver",
                    "jdbc:cloudspanner://localhost;Project=my-project-id;Instance=instance-id;Database=database;PvtKeyPath=C:\\Users\\MyUserName\\Documents\\CloudSpannerKeys\\cloudspanner-key.json"))
            .withQuery("SELECT t.table_name FROM information_schema.tables AS t").withCoder(KvCoder.of(StringUtf8Coder.of(), BigEndianLongCoder.of()))
            .withRowMapper(new JdbcIO.RowMapper<KV<String, Long>>()
            {
                private static final long serialVersionUID = 1L;

                @Override
                public KV<String, Long> mapRow(ResultSet resultSet) throws Exception
                {
                    return KV.of(resultSet.getString(1), resultSet.getLong(2));
                }
            }));

This method also allows you to use custom credentials by setting the PvtKeyPath. You can also write to Google Cloud Spanner using JDBC. Have a look here for an example: http://www.googlecloudspanner.com/2017/10/google-cloud-spanner-with-apache-beam.html

Luciusluck answered 11/10, 2017 at 10:12 Comment(1)
I had indeed forgotten the "projectID" line, though adding it did not fix the error. In fact, I am using the Eclipse google cloud tools plugin, and am logged in on my google account. So this should take care of the credentials ? I may have to try the JDBC version.Swedenborgianism

© 2022 - 2024 — McMap. All rights reserved.