uploading files to a dataset in CKAN / datahub.io through a Java client
Asked Answered
K

2

8

I am testing the uploading of files to a dataset on CKAN / datahub.io through a Java client of the API.

public String uploadFile()
        throws CKANException {

    String returned_json = this._connection.MultiPartPost("", "");

    System.out.println("r: " + returned_json);
    return returned_json;
}

and

   protected String MultiPartPost(String path, String data)
            throws CKANException {
        URL url = null;

        try {
            url = new URL(this.m_host + ":" + this.m_port + path);
        } catch (MalformedURLException mue) {
            System.err.println(mue);
            return null;
        }

        String body = "";

        HttpClient httpclient = new DefaultHttpClient();
        try {
            String fileName = "D:\\test.jpg";

            FileBody bin = new FileBody(new File(fileName),"image/jpeg");
            StringBody comment = new StringBody("Filename: " + fileName);

            MultipartEntity reqEntity = new MultipartEntity();
            reqEntity.addPart("bin", bin);
            reqEntity.addPart("comment", comment);
            HttpPost postRequest = new HttpPost("http://datahub.io/api/storage/auth/form/2013-01-24T130158/test.jpg");
            postRequest.setEntity(reqEntity);
            postRequest.setHeader("X-CKAN-API-Key", this._apikey);
            HttpResponse response = httpclient.execute(postRequest);
            int statusCode = response.getStatusLine().getStatusCode();
            System.out.println("status code: " + statusCode);

            BufferedReader br = new BufferedReader(
                    new InputStreamReader((response.getEntity().getContent())));

            String line;
            while ((line = br.readLine()) != null) {
                body += line;
            }
            System.out.println("body: " + body);
        } catch (IOException ioe) {
            System.out.println(ioe);
        } finally {
            httpclient.getConnectionManager().shutdown();
        }

        return body;
    }

2 responses I get to my POST request:

  • a 413 error ("request entity too large") when the jpeg I try to upload is 2.83 Mb. This disappears when I shrink the file to a smaller size. Is there a limit to file size uploads?

  • a 500 error ("internal server error"). This is where I am stuck. It might have to do with the fact that my dataset on datahub.io is not "datastore enabled"? (I see a disabled "Data API" button next to my resource files in the dataset, with a tooltip saying: "Data API is unavailable for this resource as DataStore is disabled"

=> is it a possible reason for this 500 error? If so, how could I enable it from the client side? (pointers to Python code would be useful!)

Thx!
PS: the dataset I am using for testing purposes: http://datahub.io/dataset/testapi

Kingsley answered 26/1, 2013 at 9:21 Comment(2)
"Data API is unavailable" is a red herring - that refers to an API for querying your data file, which is separate to what you want, which is blob storage.Ruggles
Have you tried running a CKAN instance locally and trying to upload a file to that? That would confirm whether your problem is specific to thedatahub. It would also give you better visibility into the ckan logs. Get it working locally and then try it on thedatahubSatanic
R
6

Only someone with access to the exception log could tell you why the 500 is occurring.

However, I'd check your request is the same as what you'd get from the python client that was written alongside the datastore: https://github.com/okfn/ckanclient/blob/master/ckanclient/init.py#L546

You're sending the "bin" image buffer and "comment" file_key in your multipart request. Note the file_key must be changed for every upload, so add in a timestamp or something. And maybe you need to add in a Content-Type: for the binary.

Ruggles answered 31/1, 2013 at 16:59 Comment(1)
Just for the record, I still don't get it to work. If a Java coder from the CKAN community reading this, please get in touch!Kingsley
K
2

I have been going through the same kind of troubles as the poster of this question. After quite a bit of trial and error, I came up with a solution to the problem. In my case, I had some control over the CKAN repository that I wanted to upload to. If you don't, your problem might be impossible to solve...

I assume you are using the 1.8 version of CKAN?

First of all, check whether the CKAN repository has been set up to allow file upload and if not, configure it to allow that. This can be done on the server using the steps posted here: http://docs.ckan.org/en/ckan-1.8/filestore.html#local-file-storage

The 413 error that you mentioned should be adressed next. This has to do with the general configuration of the server. In my case, the CKAN was hosted through nginx. I added a "client_max_body_size 100M" line to the nginx.conf file. See this post for instance: http://recursive-design.com/blog/2009/11/18/nginx-error-413-request-entity-too-large/

Then there is only the 500 error left. At the time of this writing, the api documentation of CKAN is still a little immature... It does indeed say that you have to build a request like you have made for file upload. However, this request is just to ask for permission for the file upload. If your credentials check out for file upload (not every user may be allowed to upload files), the response holds an object telling you where to send your file to... Because of the unclear api, you ended up merging these two requests.

The following scenario shows a follow up of two requests to handle the file upload. It might be that some steps in the scenario work out differently in your case, because of a repository that has been set up a little differently. If you get error messages, please be sure to check the response's body for clues!

Here is the authentication request that I used:

String body = "";
String generatedFilename=null;

HttpClient httpclient = new DefaultHttpClient();

try {

    // create new identifier for every file, use time
    SimpleDateFormat dateFormatGmt = new SimpleDateFormat("yyyyMMMddHHmmss");
    dateFormatGmt.setTimeZone(TimeZone.getTimeZone("GMT"));
    String date=dateFormatGmt.format(new Date());
    generatedFilename=date +"/"+filename;

    HttpGet getRequest = new HttpGet(this.CKANrepos+ "/api/storage/auth/form/"+generatedFilename);
    getRequest.setHeader(CKANapiHeader, this.CKANapi);

    HttpResponse response = httpclient.execute(getRequest);
    int statusCode = response.getStatusLine().getStatusCode();
    BufferedReader br = new BufferedReader(
             new InputStreamReader((response.getEntity().getContent())));

    String line;
    while ((line = br.readLine()) != null) {
         body += line;
    }
    if(statusCode!=200){
         throw new IllegalStateException("File reservation failed, server responded with code: "+statusCode+
          "\n\nThe message was: "+body);

    }
}finally {
     httpclient.getConnectionManager().shutdown();
}

Now, if all goes well, the server responds with a json object holding the parameters to use when doing the actual file upload. In my case, the object looked like:

{file_key:"some-filename-to-use-when-uploading"}

Be sure to check the json object though, as I'm given to understand that there may be custom ckan repositories that require more or different parameters.

These responses can then be used in the actual file upload:

        File file = new File("/tmp/file.rdf");
        String body = "";

        HttpClient httpclient = new DefaultHttpClient();

        try {

            FileBody bin = new FileBody(file,"application/rdf+xml");

            MultipartEntity reqEntity = new MultipartEntity();
            reqEntity.addPart("file", bin);

            reqEntity.addPart("key", new StringBody(filename));


            HttpPost postRequest = new HttpPost(this.CKANrepos+"/storage/upload_handle");
            postRequest.setEntity(reqEntity);
            postRequest.setHeader(CKANapiHeader, this.CKANapi);
            HttpResponse response = httpclient.execute(postRequest);
            int statusCode = response.getStatusLine().getStatusCode();
            BufferedReader br = new BufferedReader(
                    new InputStreamReader((response.getEntity().getContent())));

            String line;
            while ((line = br.readLine()) != null) {
                body += line;
            }
            if(statusCode!=200){
                getWindow().showNotification("Upload Statuscode: "+statusCode,
                        body,
                        Window.Notification.TYPE_ERROR_MESSAGE);

            }
        }finally {
            httpclient.getConnectionManager().shutdown();
        }

as you can see, the file_key property has now been transformed into the simple 'key' property. I don't know why.

This will get your file uploaded. The response to this upload request will hold a json object telling you where the file got uploaded to. edit: actually it seems that my ckan responded with a simple html page to tell me that the file got uploaded... I had to parse the page to confirm that the file was uploaded correctly :(

In my case, the file was at

this.CKANrepos +"/storage/f/"+location

where location is the filename returned in the authentication phase.

In the previous code fragments:

//the location of your ckan repository, including /api and possibly version, e.g.
this.CKANrepos = "http://datahub.io/api/3/";
this.CKANapiHeader="X-CKAN-API-Key";
this.CKANapi = "your ckan api key here";
Kairouan answered 24/2, 2013 at 19:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.