Utils Docs¶
Public api for methods and functions to handle/verify the jsonschemas.
-
class
inspire_schemas.utils.
LocalRefResolver
(base_uri, referrer, store=(), cache_remote=True, handlers=(), urljoin_cache=None, remote_cache=None)[source]¶ Bases:
jsonschema.validators.RefResolver
Simple resolver to handle non-uri relative paths.
Detect and normalize an author UID schema.
Parameters: - uid (string) – a UID string
- schema (string) – try to resolve to schema
Returns: a tuple (uid, schema) where: - uid: the UID normalized to comply with the id.json schema - schema: a schema of the UID or None if not recognised
Return type: Tuple[string, string]
- Raise:
- UnknownUIDSchema: if UID is too little to definitively guess the schema SchemaUIDConflict: if specified schema is not matching the given UID
-
inspire_schemas.utils.
build_pubnote
(title, volume, page_start=None, page_end=None, artid=None)[source]¶ Build pubnote string from parts (reverse of split_pubnote).
-
inspire_schemas.utils.
classify_field
(value)[source]¶ Normalize
value
to an Inspire category.Parameters: value (str) – an Inspire category to properly case, or an arXiv category to translate to the corresponding Inspire category. Returns: None
ifvalue
is not a non-empty string,- otherwise the corresponding Inspire category.
Return type: str
-
inspire_schemas.utils.
convert_new_publication_info_to_old
(publication_infos)[source]¶ Convert back a
publication_info
value from the new format to the old.Does the inverse transformation of
convert_old_publication_info_to_new()
, to be used whenever we are sending back records from Labs to Legacy.Parameters: publication_infos – a publication_info
in the new format.Returns: a publication_info
in the old format.Return type: list(dict)
-
inspire_schemas.utils.
convert_old_publication_info_to_new
(publication_infos)[source]¶ Convert a
publication_info
value from the old format to the new.On Legacy different series of the same journal were modeled by adding the letter part of the name to the journal volume. For example, a paper published in Physical Review D contained:
{ 'publication_info': [ { 'journal_title': 'Phys.Rev.', 'journal_volume': 'D43', }, ], }
On Labs we instead represent each series with a different journal record. As a consequence, the above example becomes:
{ 'publication_info': [ { 'journal_title': 'Phys.Rev.D', 'journal_volume': '43', }, ], }
This function handles this translation from the old format to the new. Please also see the tests for various edge cases that this function also handles.
Parameters: publication_infos – a publication_info
in the old format.Returns: a publication_info
in the new format.Return type: list(dict)
-
inspire_schemas.utils.
country_code_to_name
(code)[source]¶ The country’s name for the given code.
Parameters: code – needs to be alpha_2 country code.
-
inspire_schemas.utils.
country_name_to_code
(name)[source]¶ The country’s code for the given name.
Parameters: name – needs to be an ISO 3166-1 or ISO 3166-3 country name.
-
inspire_schemas.utils.
filter_empty_parameters
(func)[source]¶ Decorator that is filtering empty parameters.
Parameters: func (function) – function that you want wrapping
-
inspire_schemas.utils.
fix_reference_url
(url)[source]¶ Used to parse an incorect url to try to fix it with the most common ocurrences for errors. If the fixed url is still incorrect, it returns
None
.Returns: String containing the fixed url or the original one if it could not be fixed.
-
inspire_schemas.utils.
fix_url_add_http_if_missing
(string)[source]¶ Add the starting
http
to a url that is missing it
-
inspire_schemas.utils.
fix_url_bars_instead_of_slashes
(string)[source]¶ A common error in urls is that all
/
have been changed for|
, we fix that in this function
-
inspire_schemas.utils.
fix_url_replace_tilde
(string)[source]¶ Replace unicode characters by their working equivalent
-
inspire_schemas.utils.
get_license_from_url
(url)[source]¶ Get the license abbreviation from an URL.
Parameters: url (str) – canonical url of the license. Returns: the corresponding license abbreviation. Return type: str Raises: ValueError
– when the url is not recognized
-
inspire_schemas.utils.
get_refs_to_schemas
(references={})[source]¶ For every schema return path and index name for every referenced record :returns: index and path to the referenced record :rtype: dict(list(tuple))
-
inspire_schemas.utils.
get_schema_path
(schema, resolved=False)[source]¶ Retrieve the installed path for the given schema.
Parameters: - schema (str) – relative or absolute url of the schema to validate, for example, ‘records/authors.json’ or ‘jobs.json’, or just the name of the schema, like ‘jobs’.
- resolved (bool) – if True, the returned path points to a fully resolved schema, that is to the schema with all $ref replaced by their targets.
Returns: path to the given schema name.
Return type: Raises: SchemaNotFound
– if no schema could be found.
-
inspire_schemas.utils.
get_validation_errors
(data, schema=None)[source]¶ Validation errors for a given record.
Parameters: Yields: jsonschema.exceptions.ValidationError – validation errors.
Raises: SchemaNotFound
– if the given schema was not found.SchemaKeyNotFound
– ifschema
isNone
and no$schema
key was found indata
.jsonschema.SchemaError
– if the schema is invalid.
-
inspire_schemas.utils.
is_arxiv
(obj)[source]¶ Return
True
ifobj
contains an arXiv identifier.The
idutils
library’sis_arxiv
function has been modified here to work with two regular expressions instead of three and adding a check for valid arxiv categories only
-
inspire_schemas.utils.
is_orcid
(val)[source]¶ Test if argument is an ORCID ID. See http://support.orcid.org/knowledgebase/
articles/116780-structure-of-the-orcid-identifier
-
inspire_schemas.utils.
load_schema
(schema_name, resolved=False, _cache={'/home/docs/checkouts/readthedocs.org/user_builds/inspire-schemas/envs/latest/lib/python3.7/site-packages/inspire_schemas/records/elements/rank.json': {'enum': ['STAFF', 'SENIOR', 'JUNIOR', 'VISITOR', 'POSTDOC', 'PHD', 'MASTER', 'UNDERGRADUATE', 'OTHER', None], 'minLength': 1, 'title': 'Rank of academic position', 'type': 'string'}, 'elements/rank': {'enum': ['STAFF', 'SENIOR', 'JUNIOR', 'VISITOR', 'POSTDOC', 'PHD', 'MASTER', 'UNDERGRADUATE', 'OTHER', None], 'minLength': 1, 'title': 'Rank of academic position', 'type': 'string'}})[source]¶ Load the given schema from wherever it’s installed.
Parameters: Returns: the schema with the given name.
Return type:
-
inspire_schemas.utils.
normalize_arxiv_category
(category)[source]¶ Normalize arXiv category to be schema compliant.
This properly capitalizes the category and replaces the dash by a dot if needed. If the category is obsolete, it also gets converted it to its current equivalent.
Example
>>> from inspire_schemas.utils import normalize_arxiv_category >>> normalize_arxiv_category('funct-an') # doctest: +SKIP u'math.FA'
-
inspire_schemas.utils.
normalize_collaboration
(collaboration)[source]¶ Normalize collaboration string.
Parameters: collaboration – a string containing collaboration(s) or None Returns: List of extracted and normalized collaborations Return type: list Examples
>>> from inspire_schemas.utils import normalize_collaboration >>> normalize_collaboration('for the CMS and ATLAS Collaborations') ['CMS', 'ATLAS']
-
inspire_schemas.utils.
normalize_isbn
(isbn)[source]¶ Normalize an ISBN in order to be schema-compliant.
-
inspire_schemas.utils.
sanitize_html
(text)[source]¶ Sanitize HTML for use inside records fields.
This strips most of the tags and attributes, only allowing a safe whitelisted subset.
-
inspire_schemas.utils.
split_page_artid
(page_artid)[source]¶ Split page_artid into page_start/end and artid.
-
inspire_schemas.utils.
valid_arxiv_categories
()[source]¶ List of all arXiv categories that ever existed.
Example
>>> from inspire_schemas.utils import valid_arxiv_categories >>> 'funct-an' in valid_arxiv_categories() True
-
inspire_schemas.utils.
validate
(data, schema=None)[source]¶ Validate the given dictionary against the given schema.
Parameters: Raises: SchemaNotFound
– if the given schema was not found.SchemaKeyNotFound
– ifschema
isNone
and no$schema
key was found indata
.jsonschema.SchemaError
– if the schema is invalid.jsonschema.ValidationError
– if the data is invalid.