Import Freebase to Triplestore
Asked Answered
F

3

8

I'm currently planning a big project containing big data.

I already used the search and all results tell me that it's not possible to import Freebase into any triplestore without usage of 3rd Party Tools like BaseKB or Freebase to RDF

As I can see, the dump is already available as RDF, so where is the problem if I want to import the dump into my 4store triplestore and access the data via SPARQL?

Flexible answered 20/7, 2013 at 9:27 Comment(1)
Did you import the freebase data into triple store? If yes, how much time it took to complete the process and what is the machine configuration that you used. I am also planning to import the data. So please let me know the details. ThanksHandcar
F
3

For everybody having Problems importing the Freebase Dump:

1) Keep your RDF/Turtle Parser updated. (Latest Version of raptor 2 can recognize the '.', e.g. at ns:common.topic.notable_for.example

2) The dump must be cleaned up before you can import it. I used this scipt: http://people.apache.org/~andy/Freebase20121223/ (fixit)

3) The Turtle specification only allows these characters for URIs:

::= '<' ([^#x00-#x20<>\"{}|^`\] | UCHAR)* '>'

So it's very important to add this line to the fixit script at line 80:

$X =~ s/\\>/%3E/g ;
$X =~ s/\\.//g ;

# Add this Line
$X =~ [\x00-\x20\<\>\"\{\}\|\^\`] ;

$obj = "<".$X.">" ;

As a result, invalid syntax like this:

<http://www.wikipedia.org/object?key={invalid_braces}>

becomes

<http://www.wikipedia.org/object?key=invalid_braces>
Flexible answered 24/7, 2013 at 8:25 Comment(0)
P
2

You are probably getting search results from at least two, if not three, different data sets:

  1. the old quad format dump
  2. the early RDF dumps
  3. (perhaps) the current RDF dump

The format in #1 is what required conversion. The early RDF dumps (#2) were syntactically invalid, so wouldn't import to most tools. The RDF dump has been improving over time. I'm not sure whether it's still true that it won't import at all without preprocessing, but, regardless, it'll almost be more useful if you pre-process it to remove redundancy, normalize to the format that works best for your application, etc.

Did you try importing the current dump? What were your results?

Preteritive answered 20/7, 2013 at 16:0 Comment(3)
Thanks for your fast answer. Today I ordered a root server, installed 4store, DL Freebase Dump and splitted it by 10.000.000 triples for each file. Now I'm getting an error while importing: "URI file:///root/freebase/xaa:8 raptor error - syntax error". Is there a problem with the turtle syntax of the Freebase RDF in common?Flexible
The first line causes a "syntax error": ns:american_football.football_historical_roster_position.number ns:type.property.expected_type ns:type.int.Flexible
I fixed it... After updating Raptor2, it works right now with the help of this nice fix: people.apache.org/~andy/Freebase20121223Flexible
C
1

The problem with freebase turtle dump is this, they are not COMPLIANT with w3c turtle specification.

1) according to http://www.w3.org/TR/turtle/#sec-grammar, character '.' can only appear at the end of the triple, however freebase dump has lots of '.' before end of the triple. I read somewhere that "/" is not allowed as well outside uri, so they instead chose to use '.'

latest raptor2 library can get around this ('.'), but not the older ones

2) I think the way emit "blank node" is also not valid for e.g. line 141567 ns:m.01000m1 ns:common.topic.notable_for .

Cytolysis answered 24/7, 2013 at 7:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.