Schema file does not exist in XBRL Parse file
Asked Answered
E

5

6

I have downloaded a zip file containing around 200,000 html files from Companies House.

Each file is in one of two formats: 1) inline XBRL format (.html file extension) or 2) XBRL format (.xml file extension). Looking at the most recent download available (6 December 2018) all the files seem to be the former format (.html file extensions).

I'm using the XBRL package in R to try and parse these files.

Question 1: is the XBRL package meant to parse inline XBRL format (.html) files, or is it only supposed to work on the XBRL (.xml) formats? If not, can anyone tell me where to look to parse inline XBRL format files? I'm not entirely sure what the difference is between inline and not inline.

Assuming the XBRL package is meant to be able to parse inline XBRL format files, I'm hitting an error telling me that the xbrl.frc.org.uk/FRS-102/2014-09-01/FRS-102-2014-09-01.xsd file does not exist. Here's my code:

install.packages("XBRL")
library(XBRL)

inst <- "./rawdata/Prod224_0060_00000295_20171130.html" # manually unzipped
options(stringsAsFactors = FALSE)
xbrl.vars <- xbrlDoAll(inst, cache.dir = "XBRLcache", prefix.out = NULL, verbose = TRUE)

and the error:

Schema:  ./rawdata/https://xbrl.frc.org.uk/FRS-102/2014-09-01/FRS-102-2014-09-01.xsd 
Level: 1 ==> ./rawdata/https://xbrl.frc.org.uk/FRS-102/2014-09-01/FRS-102-2014-09-01.xsd 
Error in XBRL::xbrlParse(file) : 
  ./rawdata/https://xbrl.frc.org.uk/FRS-102/2014-09-01/FRS-102-2014-09-01.xsd does not exists. Aborting.

Question 2. Can someone explain what this means in basic terms for me? I'm new to XBRL. Do I need to go and find this xsd file and put it somewhere? It seems to be located here, but I have no idea what to do with it or where to put it.

Here's a similar question that doesn't seem fully answered and the links are all in Spanish and I don't know Spanish.

Once i've been able to parse one single html XBRL file, my plan is to figure out how to parse all XBRL files inside multiple zip files from that website.

Enlarger answered 6/12, 2018 at 12:28 Comment(1)
I also tried copying what i think are the relevant schemas into a cache directory (files from frc.org.uk/accountants/accounting-and-reporting-policy/… and xbrl.org.uk/techguidance/taxonomies.html on advice from gov.uk/government/publications/xbrl-guide-for-uk-businesses/…)Chamois
G
3

I had the exactly same problem with the US SEC data.
And I just followed exactly the guidance of pdw and it worked!

FYI, the code I used for

if (substr(file.name, 1, 5) != "http:") { 

is

if (!(substr(file.name, 1, 5) %in% c("http:", "https"))) {

And I hacked it using trace('XBRL', edit=TRUE).

Gautier answered 2/10, 2019 at 0:4 Comment(0)
N
1

I'm not familiar with the XBRL package that you're using, but it seems clear that it's erroneously trying to resolve an absolute URL (https://...) as a local file.

A quick browse of the source code reveals the problem:

XBRL.R line 305:

fixFileName <- function(dname, file.name) {
if (substr(file.name, 1, 5) != "http:") {
   [...]

i.e. it decides whether or not a URL is absolute by whether it starts "http:", and you URL starts "https:". It's easy enough to hack in a fix to allow https URLs to pass this test too, and I suspect that that would fix you immediate problem, although it would be far better if this code used a URL library to decide if a URL was absolute or not rather than guessing based on protocol.

I'm not sure what the status is with respect to iXBRL documents. There's a note in the changelog saying "reported to work with inline XBRL documents" which I'm suspicious of. Whilst it might correctly find the taxonomy for an inline document, I can't see how it would correctly extract the facts with significant additional code which I can't see any sign of.

You might want to take a look at the Arelle project as an alternative open source processor that definitely does support Inline XBRL.

Neptunian answered 7/12, 2018 at 9:10 Comment(1)
Thanks, I have asked the package creator if he can update the package to use a URL library (github.com/bergant/XBRLFiles/issues/2)Enlarger
D
1

As pdw stated, the issue is that the package is hard coded to look for "http:" and erroneously treats "https" paths as local paths. This happens because XBRL files can refer to external files for standard definitions of schemas, etc. In your example, this happens on line 116 of Prod224_0081_00005017_20191231.html

Several people have forked the XBRL package on github and fixed this behavior. You can install one of the versions from https://github.com/cran/XBRL/network/members with devtools::install_git() and that should work out.

For example, using this fork the example Companies House statement is parsed.

# remotes:::install_github("adamp83/XBRL")

library(XBRL)
x <- xbrlDoAll("https://raw.githubusercontent.com/stackoverQs/stackxbrlQ/main/Prod224_0081_00005017_20191231.html",cache.dir = "cache" verbose=TRUE))
Darya answered 21/6, 2021 at 17:39 Comment(1)
Thanks avdeluca; I was about to say that of course I had tried this, and I had rebuilt my own patched package but it didn't work (as I wrote in the bounty description). Yet lo and behold the example I added to your answer parses. So thanksChamois
J
0

Here are a few more general explanations to give some context.

Inline XBRL vs. XBRL

An XBRL file, put simply, is just a flat list of facts.

Inline XBRL is a more modern version of an XBRL instance that, instead of storing these facts as a flat list, stores the facts within a human-readable documents, "stamping" the values. From an abstract XBRL-processing perspective, both an XBRL file and an inline XBRL file are XBRL instances and are simply sets of facts.

DTS

An XBRL instance (either inline or not) is furthermore linked to a few, or a lot of, taxonomy files known to XBRL users as the DTS (Discoverable Taxonomy Set). These files are either XML Schema files (.xsd) containing the report elements (concepts, dimensions, etc) or XML Link files (.xml) containing the linkbases (graphs of reports elements, labels, etc).

The machinery linking an XBRL instance to a DTS is a bit complex and heterogeneous: schema imports, schema includes, simple links pointing to other files, etc. It suffices to understand as a user that the DTS is made of all the files in the transitive closure of the instance via these links. It is the job of an XBRL processor (including the R package) to resolve the entire DTS.

Storage of DTS files

Typically, an XBRL instance points to a file (called entry point) located on the server of the taxonomy provider, and that file may itself point to further files on the same, and other servers.

However, many XBRL processors automatically cache these files locally in order to avoid overloading the servers, as is established practice. Normally, you do not need to do this yourself. It is very cumbersome to resolve the links oneself to download all files manually.

An alternate way is to download the entire DTS (as a zip file following a packaging standard) from the taxonomy provider's servers and use it locally. However, this also requires an XBRL processor to figure out the mapping between remote URLs and local files.

Janes answered 12/12, 2018 at 8:42 Comment(0)
A
0

I was able to solve it using what is discussed in https://github.com/sewardlee337/finreportr/issues/17

In particular, most of the errors related to URL are due to SEC EDGAR requiring a user agent authentication. It run for me when I just added

options(HTTPUserAgent = "myName [email protected]")

before calling the xbrlDoAll()

Alabaster answered 13/8, 2023 at 21:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.