How to import XBRL data to MySQL? [closed]
Asked Answered
B

4

13

I am working on a project involving processing a large volume of XBRL documents (> 1m separate files). I am totally new to XBRL and feeling quite lost at the moment.

I have data relating to those XBRL documents in a separate MySQL database and I would like to add the XBRL data into MySQL to store everything in one db.

What are the best methods to go about transferring data from the XBRL docs into MySQL?

Are there any bulk processing libraries available for it?

I've been looking for tutorials on those issues but couldn't find anything providing a basic introduction, just a lot of high level info.

Buddha answered 30/6, 2016 at 11:50 Comment(2)
I don't think there is any, I've been trying to do the same thing about two years ago except the destination was SQL Server. What types of files do you have?Scala
Instead of SQL databases go for NoSql databases from performance and scalability perpectiveSulky
C
8

The natural paradigm in theory for storing XBRL in a database would be OLAP, because XBRL is about data cubes. OLAP on top of a relational database would be called ROLAP.

This is not a trivial problem, because facts taken from a large number of taxonomies can form a very large and sparse cube (for SEC filings it's 10k+ dimensions), and also because creating an SQL schema requires knowing the taxonomies before any import. If new taxonomies come up, one needs to re-ETL everything. This doesn't make relational databases suitable as a general solution.

If the filings share the same taxonomy and the taxonomy is very simple though (as in: not too many dimensions), it is possible to come up with an ad-hoc mapping to store all facts in a single table with many rows in the ROLAP sense (facts to rows, aspects to columns). Some vendors are specialized in storing non-dimensional XBRL facts, in which case traditional SQL (or "post-SQL" that scale with rows) offerings work well.

Some vendors create a table for each XBRL hypercube in the taxonomy, with a schema derived from the definition network but different for each hypercube. This can lead to a lot of tables in the database, and requires a lot of joins for queries involving multiple hypercubes.

Some other vendors make assumptions about the underlying XBRL structure, or about the kind of queries that their users need to run. Restricting the scope of the problem allows finding specific architectures or SQL schemas that can also do the job for these specific needs.

Finally, to import large amounts of filings, it is possible to build generic mappings on top of NoSQL data stores rather than relational databases. Large numbers of facts with a varying number of dimensions fit in large collections of semi-structured documents, and networks fit well in a hierarchical format.

Camiecamila answered 1/7, 2016 at 12:44 Comment(0)
A
5

First you need to realize that XBRL documents (instances) contain many different types of information. For example: it can contain daily pricing information for investment funds, but also quarterly VAT reports or credit worthiness information. XBRL is a standard way of communicating, but the contents have their own (XBRL standardized) taxonomies. For example: there is a Dutch Taxonomy, on which the Dutch revenue agency is built (with its own taxonomy), on which there is a specific taxonomy for filing VAT reports. These taxonomies are defined using XSD, Xlink and linkbase. Think of it as the concept of a Dictionairy: the way dictionairies are built is the same everywhere (use each letter of the alphabet to make 'chapters', sort words alphabetically, etc etc), but a Greek dictionairy uses its own alphabet, its own words and its own language to explain the contents.

So if you only use one or a few different types of XBRL documents (that share the same taxonomies), you could create a mapping from these taxonomies to your own (database) objects. If you have a wider range of taxonomies, you'll have to create a more generic solution that can 'import' taxonomies. That will be quite a challenge (which is the reason there aren't many tools available on the market).

If you(r company) can afford it, I recommend looking into existing tools like Altova's MapForce. That way there's no need to learn XBRL, XSD, Xlink and linkbase just in order to start developing your own tool to parse these files, you can leverage the existing products to map the XBRL taxonomies to your database/application.

Atilt answered 15/7, 2016 at 11:57 Comment(9)
Do you know if MapForce supports iXBRL, in use by Companies House (UK) ?Stupendous
iXBRL is inline XBRL, which is basically an HTML file containing the XBRL data, creating a human readable document of the exact same data as is contained in the 'raw' XBRL. When I look at your link, it offers the data in both formats. Normally, an iXBRL is created by applying a template (often an XSLT) to the XBRL file. As to your question: MapForce is not meant to work with iXBRL... But a quick skim of their site shows that Altova does have tools to handle iXBRL: StyleVision, their reporting tool.Atilt
Thank you for your reply. I have found only around 1% of files are offered as XBRL (.xml) and the rest are in iXBRL (.html) and so MapForce was unable to process most of them. I might see if StyleVision is able to read iXBRL and output to raw XBRL so that MapForce can read it although I was hoping I could find one tool to do it all.Stupendous
Well, Altova promotes her products to complement each other, to provide one big product suite... Maybe this is their way of cross promoting their products...Atilt
Any further updates, are there any parsers that one could use to parse XML to an object or JSON?Himyaritic
@user3437721: XML to JSON is easy, since it's equivalent anyway. Look at Altova's offering. If you're talking about XBRL though, there's a lot more to do first (parsing the xlinks etc)Atilt
Oops I meant XBRL, I guess can you use XML parsers for it, or is a proprietary library needed...Himyaritic
Well as I stated in my original answer: Altova has an XBRL parser. Don't expect anything open source/freeware that can handle all the features from XBRL though... XBRL is a standard used for businesses, so people won't make tools for it for fun :)Atilt
Don't underestimate the powers of Arelle, a free open source xbrl tool. It handles both taxonomies as instances (documents). It has a gui and a command-line interface. It's written in Python and has room for plugins.Sidestep
A
2

I hope you do know that MySQL is an structured data storage whereas XBRL is just a representation to map business document into digital format. XBRL is XML based document, which implies it is unstructured and the data you require from document can or cannot occur in that specific document. It can contain any other additional information too. XSD defines how XML can be structured and how many times any tag can occur. Now to answer your question, you could use eXistDB, which I also used in past to store the XBRL document. However, it can be slow at times. If you require just some data from XBRL and need it to store in MySQL database you could use XPATH. In the following simple python code, you could grab the values of EquityTotalEndingBalance and ReservesTotalEndingBalance from this document.

from lxml import etree
root = etree.fromstring(open("file.xml").read())
nsmap = root.nsmap
nsmap.pop(None) # There was some error without this.
data_one = root.xpath("//iascf-pfs:EquityTotalEndingBalance/text()",namespaces=nsmap)
data_two = root.xpath("//novartis:ReservesTotalEndingBalance/text()",namespaces=nsmap)
print data_one
print data_two

This code will print the values:

['37216000000', '36862000000', '42245000000']
['35903000000', '35558000000', '40971000000']

So how can you solve your problem then?

  1. Either you would have to choose a NoSQL based XML document storage like eXistDB and write Xpath to get the particular data.

  2. You can manually parse the XBRL document like above and fire the XPath directly and store the data.

The complexity will arise if you need to consume all data from all type of document. So you would have to limit on what you will be consuming from those documents.

Atomize answered 16/7, 2016 at 15:31 Comment(5)
Could this method support different contextRef? (without having to manually define them) I've seen some use contextRef="current-mud" and others use contextRef="cfwd_31_12_2014" or contextRef="FY1" and I would like to get the current/most recent value.Stupendous
Yes it is possible. You would have to modify the xpath to support attribute. consider you want the value from <iascf-pfs:EquityTotalEndingBalance numericContext="Group1999AsOf">37216000000</iascf-pfs:EquityTotalEndingBalance>. In this case the xpath would be data_one = root.xpath("//iascf-pfs:EquityTotalEndingBalance[@numericContext='Group1999AsOf']/text()",namespaces=nsmap). This will just return '37216000000'. Hope this helps. Please accept the answer if this works for you. Thanks,Atomize
Thank you for your reply. Do you know how to do this dynamically? I am trying to process ~6,000 files where the context is unkown to me.Stupendous
XPath is a very good approach to read XBRL indeed. @Paul, I would add that the contextRef's value is a foreign key that needs to be resolved to an xbrli:context element that contains the actual context information. XBRL requires that contextRef values refer to a context element defined in the same file. In order to get the most recent value (assuming no dimensions), the XPath expression would need to retrieve the context of the candidate facts, take the one with the latest period (or period end) and return the corresponding fact.Camiecamila
Well you can modify your XPATH as your need but since XBRL is an unstructured data format, trying to consume all data from all documents will be complex. As with examples given above, you should first analyze what data you want to consume and whether it can occur in XBRL document or not. Once you have listed out some items, begin preparing XPATH and fire it like I did above. What should you do if there are multiple results? You should consider this too. Once you need everything, inserting them into db is not hard. Hard part is to come up with items you want to consume and write XPATH for it.Atomize
G
0

There are some business solutions out there

for example

http://arelle.org/documentation/xbrl-database/

http://www.katelynnsreport.com/main/dataportal

Galliwasp answered 21/4, 2017 at 7:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.