How to set up local file references in python-jsonschema document?
Asked Answered
R

7

28

I have a set of jsonschema compliant documents. Some documents contain references to other documents (via the $ref attribute). I do not wish to host these documents such that they are accessible at an HTTP URI. As such, all references are relative. All documents live in a local folder structure.

How can I make python-jsonschema understand to properly use my local file system to load referenced documents?


For instance, if I have a document with filename defs.json containing some definitions. And I try to load a different document which references it, like:

{
  "allOf": [
    {"$ref":"defs.json#/definitions/basic_event"},
    {
      "type": "object",
      "properties": {
        "action": {
          "type": "string",
          "enum": ["page_load"]
        }
      },
      "required": ["action"]
    }
  ]
}

I get an error RefResolutionError: <urlopen error [Errno 2] No such file or directory: '/defs.json'>

It may be important that I'm on a linux box.


(I'm writing this as a Q&A because I had a hard time figuring this out and observed other folks having trouble too.)

Residential answered 29/12, 2018 at 10:39 Comment(0)
N
23

I had the hardest time figuring out how to resolve against a set of schemas that $ref each other without going to the network. It turns out the key is to create the RefResolver with a store that is a dict which maps from url to schema.

import json
from jsonschema import RefResolver, Draft7Validator

address="""
{
  "$id": "https://example.com/schemas/address",

  "type": "object",
  "properties": {
    "street_address": { "type": "string" },
    "city": { "type": "string" },
    "state": { "type": "string" }
  },
  "required": ["street_address", "city", "state"],
  "additionalProperties": false
}
"""

customer="""
{
  "$id": "https://example.com/schemas/customer",
  "type": "object",
  "properties": {
    "first_name": { "type": "string" },
    "last_name": { "type": "string" },
    "shipping_address": { "$ref": "/schemas/address" },
    "billing_address": { "$ref": "/schemas/address" }
  },
  "required": ["first_name", "last_name", "shipping_address", "billing_address"],
  "additionalProperties": false
}
"""

data = """
{
  "first_name": "John",
  "last_name": "Doe",
  "shipping_address": {
    "street_address": "1600 Pennsylvania Avenue NW",
    "city": "Washington",
    "state": "DC"
  },
  "billing_address": {
    "street_address": "1st Street SE",
    "city": "Washington",
    "state": "DC"
  }
}
"""

address_schema = json.loads(address)
customer_schema = json.loads(customer)
schema_store = {
    address_schema['$id'] : address_schema,
    customer_schema['$id'] : customer_schema,
}

resolver = RefResolver.from_schema(customer_schema, store=schema_store)
validator = Draft7Validator(customer_schema, resolver=resolver)

jsonData = json.loads(data)
validator.validate(jsonData)

The above was built with jsonschema==4.9.1.

Naphtha answered 6/5, 2020 at 9:46 Comment(4)
I initialize the RefResolver like this: jsonschema.RefResolver(None, referrer=None, store=schema_store). And then the store has entries with an "$id" field like: "https://example.com/path/subpath/filename.json". (This doesn't require any network calls--unless you specify a schema not in the store--since the store contains a cache of any reference we need).Cockaigne
why you used the # sign at the end of the $ref? {"$ref": "base.schema.json#"}, instead of putting it as prefix?Disorganization
the # sign (which delineates an URI fragment) is superfluous in the above example. In $ref URIs, the fragment refers to a path within a schema, so, in the above sample {"$ref": "base.schema.json#/properties/prop/type"} would resolve to "string".Naphtha
After learning a bunch about JSONSchemas over the past 2 years, I improved my sample quite a bit...Naphtha
R
10

You must build a custom jsonschema.RefResolver for each schema which uses a relative reference and ensure that your resolver knows where on the filesystem the given schema lives.

Such as...

import os
import json
from jsonschema import Draft4Validator, RefResolver # We prefer Draft7, but jsonschema 3.0 is still in alpha as of this writing 


abs_path_to_schema = '/path/to/schema-doc-foobar.json'
with open(abs_path_to_schema, 'r') as fp:
  schema = json.load(fp)

resolver = RefResolver(
  # The key part is here where we build a custom RefResolver 
  # and tell it where *this* schema lives in the filesystem
  # Note that `file:` is for unix systems
  schema_path='file:{}'.format(abs_path_to_schema),
  schema=schema
)
Draft4Validator.check_schema(schema) # Unnecessary but a good idea
validator = Draft4Validator(schema, resolver=resolver, format_checker=None)

# Then you can...
data_to_validate = `{...}`
validator.validate(data_to_validate)
Residential answered 29/12, 2018 at 10:39 Comment(3)
Is this because the JSON schema spec says it is a URI. And URI cannot be relative paths? So if we end up with a relative path, we are not writing a proper spec-compliant json schema?Statistician
My tests show that definitions is not necessary. One can just compose complete JSON schema documents without needing the #... part. I wonder if the definitions is just optional or convention.Statistician
jsonschema 3.0.1 with draft 7 as default is out now (per your comment in the example saying you prefer draft 7)Robledo
G
5

EDIT-1

Fixed a wrong reference ($ref) to base schema. Updated the example to use the one from the docs: https://json-schema.org/understanding-json-schema/structuring.html

EDIT-2

As pointed out in the comments, in the following I'm using the following imports:

from jsonschema import validate, RefResolver 
from jsonschema.validators import validator_for

This is just another version of @Daniel's answer -- which was the one correct for me. Basically, I decided to define the $schema in a base schema. Which then release the other schemas and makes for a clear call when instantiating the resolver.

  • The fact that RefResolver.from_schema() gets (1) some schema and also (2) a schema-store was not very clear to me whether the order and which "some" schema were relevant here. And so the structure you see below.

I have the following:

base.schema.json:

{
  "$schema": "http://json-schema.org/draft-07/schema#"
}

definitions.schema.json:

{
  "type": "object",
  "properties": {
    "street_address": { "type": "string" },
    "city":           { "type": "string" },
    "state":          { "type": "string" }
  },
  "required": ["street_address", "city", "state"]
}

address.schema.json:

{
  "type": "object",

  "properties": {
    "billing_address": { "$ref": "definitions.schema.json#" },
    "shipping_address": { "$ref": "definitions.schema.json#" }
  }
}

I like this setup for two reasons:

  1. Is a cleaner call on RefResolver.from_schema():

    base = json.loads(open('base.schema.json').read())
    definitions = json.loads(open('definitions.schema.json').read())
    schema = json.loads(open('address.schema.json').read())
    
    schema_store = {
      base.get('$id','base.schema.json') : base,
      definitions.get('$id','definitions.schema.json') : definitions,
      schema.get('$id','address.schema.json') : schema,
    }
    
    resolver = RefResolver.from_schema(base, store=schema_store)
    
  2. Then I profit from the handy tool the library provides give you the best validator_for your schema (according to your $schema key):

    Validator = validator_for(base)
    
  3. And then just put them together to instantiate validator:

    validator = Validator(schema, resolver=resolver)
    

Finally, you validate your data:

data = {
  "shipping_address": {
    "street_address": "1600 Pennsylvania Avenue NW",
    "city": "Washington",
    "state": "DC"   
  },
  "billing_address": {
    "street_address": "1st Street SE",
    "city": "Washington",
    "state": 32
  }
}
  • This one will crash since "state": 32:
>>> validator.validate(data)

ValidationError: 32 is not of type 'string'

Failed validating 'type' in schema['properties']['billing_address']['properties']['state']:
    {'type': 'string'}

On instance['billing_address']['state']:
    32

Change that to "DC", and will validate.

Gardas answered 4/12, 2020 at 20:43 Comment(5)
This answer worked perfectly for me. Just want to point out the import dependencies for others also trying this out from jsonschema import validate, RefResolver from jsonschema.validators import validator_forEarn
Thank you @khuang834. I adjusted/added a note about that.Gardas
How we can validate Nested properties with this approach. Say I have a conditional property "Zipcode" in address.schema.json . And want to validate based on value of "city" in definitions.schema.json.Sex
@Sex maybe I didn't understand your question or it is ill posed. Nevertheless, here are my thoughts that hopefully will help you go through your problem: zipcode and city go hand-in-hand, they are both part of the/an address. AFAIU, you want to verify that a given zipcode is part of a city. IMHO that this is external to a schema validation: roughly, schema validation is about data types/formats. That being said, you could include a "conditional property" for a set of cities and associate zipcodes (enum) for each city in another "cities-definitions.schema.json` probably.Gardas
it works for me but why you added # at the end of the ref value{ "$ref": "definitions.schema.json#" }? instead of { "$ref": "#/definitions.schema.json" }Disorganization
L
4

Following up on the answer @chris-w provided, I wanted to do this same thing with jsonschema 3.2.0 but his answer didn't quite cover it I hope this answer helps those who are still coming to this question for help but are using a more recent version of the package.

To extend a JSON schema using the library, do the following:

  1. Create the base schema:
base.schema.json
{
  "$id": "base.schema.json",
  "type": "object",
  "properties": {
    "prop": {
      "type": "string"
    }
  },
  "required": ["prop"]
}
  1. Create the extension schema
extend.schema.json
{
  "allOf": [
    {"$ref": "base.schema.json"},
    {
      "properties": {
        "extra": {
          "type": "boolean"
        }
      },
      "required": ["extra"]
    }
  ]
}
  1. Create your JSON file you want to test against the schema
data.json
{
  "prop": "This is the property",
  "extra": true
}
  1. Create your RefResolver and Validator for the base Schema and use it to check the data
#Set up schema, resolver, and validator on the base schema
baseSchema = json.loads(baseSchemaJSON) # Create a schema dictionary from the base JSON file
relativeSchema = json.loads(relativeJSON) # Create a schema dictionary from the relative JSON file
resolver = RefResolver.from_schema(baseSchema) # Creates your resolver, uses the "$id" element
validator = Draft7Validator(relativeSchema, resolver=resolver) # Create a validator against the extended schema (but resolving to the base schema!)

# Check validation!
data = json.loads(dataJSON) # Create a dictionary from the data JSON file
validator.validate(data)

You may need to make a few adjustments to the above entries, such as not using the Draft7Validator. This should work for single-level references (children extending a base), you will need to be careful with your schemas and how you set up the RefResolver and Validator objects.

P.S. Here is a snipped that exercises the above. Try modifying the data string to remove one of the required attributes:

import json

from jsonschema import RefResolver, Draft7Validator

base = """
{
  "$id": "base.schema.json",
  "type": "object",
  "properties": {
    "prop": {
      "type": "string"
    }
  },
  "required": ["prop"]
}
"""

extend = """
{
  "allOf": [
    {"$ref": "base.schema.json"},
    {
      "properties": {
        "extra": {
          "type": "boolean"
        }
      },
      "required": ["extra"]
    }
  ]
}
"""

data = """
{
"prop": "This is the property string",
"extra": true
}
"""

schema = json.loads(base)
extendedSchema = json.loads(extend)
resolver = RefResolver.from_schema(schema)
validator = Draft7Validator(extendedSchema, resolver=resolver)

jsonData = json.loads(data)
validator.validate(jsonData)
Lockridge answered 7/12, 2019 at 1:49 Comment(0)
J
1

My approach is to preload all schema fragments to RefResolver cache. I created a gist that illustrates this: https://gist.github.com/mrtj/d59812a981da17fbaa67b7de98ac3d4b

Justiceship answered 15/7, 2020 at 13:13 Comment(0)
P
1

This is what I used to dynamically generate a schema_store from all schemas in a given directory

base.schema.json

{
  "$id": "base.schema.json",
  "type": "object",
  "properties": {
    "prop": {
      "type": "string"
    }
  },
  "required": ["prop"]
}

extend.schema.json

{  
  "$id": "extend.schema.json",
  "allOf": [
    {"$ref": "base.schema.json"},
    {
      "properties": {
        "extra": {
          "type": "boolean"
        }
      },
    "required": ["extra"]
    }
  ]
}

instance.json

{
  "prop": "This is the property string",
  "extra": true
}

validator.py

import json

from pathlib import Path

from jsonschema import Draft7Validator, RefResolver
from jsonschema.exceptions import RefResolutionError

schemas = (json.load(open(source)) for source in Path("schema/dir").iterdir())
schema_store = {schema["$id"]: schema for schema in schemas}

schema = json.load(open("schema/dir/extend.schema.json"))
instance = json.load(open("instance/dir/instance.json"))
resolver = RefResolver.from_schema(schema, store=schema_store)
validator = Draft7Validator(schema, resolver=resolver)

try:
    errors = sorted(validator.iter_errors(instance), key=lambda e: e.path)
except RefResolutionError as e:
    print(e)
Picador answered 21/1, 2022 at 7:6 Comment(0)
G
0

Answers using RefResolver works great, but as of jsonschema v4.18.0, RefResolver is deprecated and will raise the following warning. Here's an example of using the recommended Registry to replace RefResolver.

DeprecationWarning: jsonschema.RefResolver is deprecated as of v4.18.0, in favor of the https://github.com/python-jsonschema/referencing library, which provides more compliant referencing behavior as well as more flexible APIs for customization. A future release will remove RefResolver. Please file a feature request (on referencing) if you are missing an API for the kind of customization you need.

from referencing import Registry, Specification
from jsonschema.validators import validator_for

root_schema = {
    "$id": "root.schema",  # Doesn't matter
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "type": "object",
    "properties": {
        "subschema1": {"$ref": "file://./subschemas/subschema1.schema.jsonc"},
        "subschema2": {"$ref": "file://./subschemas/subschema2.schema.jsonc"},
    },
}
subschema1 = {
    "$id": "file://./subschemas/subschema1.schema.jsonc",  # $id should match $ref in root schema
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "title": "subschema1",
    "type": "string",
}
subschema2 = {
    "$id": "file://./subschemas/subschema2.schema.jsonc",
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "title": "subschema2",
}
data_to_validate = {"subschema1": "1"}  # Change the value to integer 1 will cause ValidationError

registry = Registry().with_resources(
    [
        (schema["$id"], Specification.detect(schema).create_resource(schema))
        for schema in [subschema1, subschema2]
    ]
)
validator = validator_for(root_schema)(root_schema, registry=registry)
validator.validate(data_to_validate)

This example uses the $schema keyword to detect the dialect. It's recommended that all JSON Schemas have a $schema keyword to communicate to readers and tooling which specification version is intended.

Glynnis answered 18/4 at 15:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.