I'm trying to create a distant supervision corpus. Thus far I've assembled the data, and passed it through an NER system, so you can see an example below.
Original data:
<p>
Myles Brand, the president of the National Collegiate Athletic Association, said in a telephone interview that he had not been approached about whether the N.C.A.A. might oversee a panel for the major bowl games similar to the one that chooses teams for the men's and women's basketball tournaments.
</p>
Processed with Stanford NER:
<p>
<PERSON>Myles Brand</PERSON>, the president of the <ORGANIZATION>National Collegiate Athletic Association</ORGANIZATION>, said in a telephone interview that he had not been approached about whether the <ORGANIZATION>N.C.A.A.</ORGANIZATION> might oversee a panel for the major bowl games similar to the one that chooses teams for the men's and women's basketball tournaments.
</p>
Now here is a sentence which contains the person Myles Brand
and the organization National Collegiate Athletic Association
.
In Freebase we have these two entities sharing the relational bond of President
as you can observe:
Freebase Relationship:
One would think the following code would do the trick, based on this question, but actually it doesn't, though as you can see from the picture above Freebase seems to maintain the relationship between these two entities in their corpus. Is this something that I am doing wrong?
I've been playing around with it in here.
[{
"type" : "/type/link",
"source" : { "id" : "/en/myles_brand" },
"master_property" : null,
"target" : { "id" : "/en/national_collegiate_athletic_association" },
"target_value" : null
}]
Moreover, I have many thousands of entity pairs, I guess I can write some short java program using the Freebase Java API to figure out the relationships for all of these in turn, does anyone have an example of a program like that which I could take a peek at?
The real thing I want to know though is once I have the relationships, what is the best way to assosicate those with a distance supervision corpus, I'm confused about how it all looks when finally it's been fit together.