Utils Docs

Public api for methods and functions to handle/verify the jsonschemas.

class inspire_schemas.utils.LocalRefResolver(base_uri, referrer, store=(), cache_remote=True, handlers=(), urljoin_cache=None, remote_cache=None)[source]

Bases: jsonschema.validators.RefResolver

Simple resolver to handle non-uri relative paths.


Resolve a uri or relative path to a schema.

inspire_schemas.utils.author_id_normalize_and_schema(uid, schema=None)[source]

Detect and normalize an author UID schema.

  • uid (string) – a UID string
  • schema (string) – try to resolve to schema

a tuple (uid, schema) where: - uid: the UID normalized to comply with the id.json schema - schema: a schema of the UID or None if not recognised

Return type:

Tuple[string, string]

UnknownUIDSchema: if UID is too little to definitively guess the schema SchemaUIDConflict: if specified schema is not matching the given UID
inspire_schemas.utils.build_pubnote(title, volume, page_start, page_end, artid)[source]

Build pubnote string from parts (reverse of split_pubnote).


Normalize value to an Inspire category.

Parameters:value (str) – an Inspire category to properly case, or an arXiv category to translate to the corresponding Inspire category.
None if value is not a non-empty string,
otherwise the corresponding Inspire category.
Return type:str

Convert back a publication_info value from the new format to the old.

Does the inverse transformation of convert_old_publication_info_to_new(), to be used whenever we are sending back records from Labs to Legacy.

Parameters:publication_infos – a publication_info in the new format.
Returns:a publication_info in the old format.
Return type:list(dict)

Convert a publication_info value from the old format to the new.

On Legacy different series of the same journal were modeled by adding the letter part of the name to the journal volume. For example, a paper published in Physical Review D contained:

    'publication_info': [
            'journal_title': 'Phys.Rev.',
            'journal_volume': 'D43',

On Labs we instead represent each series with a different journal record. As a consequence, the above example becomes:

    'publication_info': [
            'journal_title': 'Phys.Rev.D',
            'journal_volume': '43',

This function handles this translation from the old format to the new. Please also see the tests for various edge cases that this function also handles.

Parameters:publication_infos – a publication_info in the old format.
Returns:a publication_info in the new format.
Return type:list(dict)

Decorator that is filtering empty parameters.

Parameters:func (function) – function that you want wrapping

Get the license abbreviation from an URL.

Parameters:url (str) – canonical url of the license.
Returns:the corresponding license abbreviation.
Return type:str
Raises:ValueError – when the url is not recognized
inspire_schemas.utils.get_schema_path(schema, resolved=False)[source]

Retrieve the installed path for the given schema.

  • schema (str) – relative or absolute url of the schema to validate, for example, ‘records/authors.json’ or ‘jobs.json’, or just the name of the schema, like ‘jobs’.
  • resolved (bool) – if True, the returned path points to a fully resolved schema, that is to the schema with all $ref replaced by their targets.

path to the given schema name.

Return type:



SchemaNotFound – if no schema could be found.

inspire_schemas.utils.load_schema(schema_name, resolved=False)[source]

Load the given schema from wherever it’s installed.

  • schema_name (str) – Name of the schema to load, for example ‘authors’.
  • resolved (bool) – If True will return the resolved schema, that is with all the $refs replaced by their targets.

the schema with the given name.

Return type:



Normalize arXiv category to be schema compliant.

This properly capitalizes the category and replaces the dash by a dot if needed. If the category is obsolete, it also gets converted it to its current equivalent.


>>> from inspire_schemas.utils import normalize_arxiv_category
>>> normalize_arxiv_category('funct-an')

Normalize collaboration string.

Parameters:collaboration – a string containing collaboration(s) or None
Returns:List of extracted and normalized collaborations
Return type:list


>>> from inspire_schemas.utils import normalize_collaboration
>>> normalize_collaboration('for the CMS and ATLAS Collaborations')
['CMS', 'ATLAS']

Split page_artid into page_start/end and artid.


Split pubnote into journal information.


List of all arXiv categories that ever existed.


>>> from inspire_schemas.utils import valid_arxiv_categories
>>> 'funct-an' in valid_arxiv_categories()
inspire_schemas.utils.validate(data, schema=None)[source]

Validate the given dictionary against the given schema.

  • data (dict) – record to validate.
  • schema (Union[dict, str]) – schema to validate against. If it is a string, it is intepreted as the name of the schema to load (e.g. authors or jobs). If it is None, the schema is taken from data['$schema']. If it is a dictionary, it is used directly.
  • SchemaNotFound – if the given schema was not found.
  • SchemaKeyNotFound – if schema is None and no $schema key was found in data.
  • jsonschema.SchemaError – if the schema is invalid.
  • jsonschema.ValidationError – if the data is invalid.