{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n# Defining Array Schema\n\nArray schema are json documents that act as metadata describing the\nstructure of array-like data. The submodule provides some python bindings\nfor generating the schema in python.\n\nSee\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Make a Registry\n\nFirst make a registry to store a collection of schema you want to\nuse. We will also import some example functions including in the\nrmellipse package, as well as the json package to format the\njson documents.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import rmellipse.arrschema as arrschema\nimport rmellipse.arrschema.examples as examples\nimport xarray as xr\nimport numpy as np\nimport json\n\nregistry = arrschema.ArrSchemaRegistry()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will define a basic schema of an arry with arbitrary shape\nmade of floats called \"float_zeros\". When the schema is made with\nthe arrschema function a uid is automatically added. Then we\nadd it to the registry.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "float_zeros = arrschema.arrschema(\n\tname='float_zeros', shape=(...,), dims=(...,), dtype=float\n)\n\nprint(json.dumps(float_zeros, indent=True))\n\nregistry.add_schema(float_zeros)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Validation\n\nCurrently validation is only implemented for xarray.DataArray objects\nand RMEmeas objects.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# the schema can be provided directly\nmy_data = xr.DataArray(np.zeros((4, 4), dtype=float))\narrschema.validate(my_data, schema=float_zeros)\n\n# reffered to by a registry and uid\nmy_data = xr.DataArray(np.zeros((4, 4), dtype=float))\narrschema.validate(my_data, schema=float_zeros)\n\n# reffered to by a registry and name\n# Names are not unique, this may fail if multiple\n# schemas share a name and is not recommended.\nmy_data = xr.DataArray(np.zeros((4, 4), dtype=float))\narrschema.validate(my_data, schema=float_zeros)\n\n# this will fail because the dtype isn't correct\nmy_data_fails = xr.DataArray(np.zeros((4, 4), dtype=complex))\ntry:\n\tarrschema.validate(my_data_fails, schema=float_zeros)\nexcept arrschema.ValidationError as e:\n\tprint(e)\n\n# validated datasets store the associated schema in the metadata\nprint(my_data.attrs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loaders and Savers\n\nArray schema provide a system for organizing and envoking different\nencoding and decoding functions for various schema. Often times\nwe work with a single in memory representation of a particular object,\nbut may need to be able to read/write to and from multiple different\nmethods of storing that data on disc. We do this by associating\na loader/saver with a function using a module spec, and related\nfile extensions.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "registry.add_loader(\n\t'rmellipse.arrschema.examples:load_group_saveable',\n\t['.h5', '.hdf5'],\n\tloader_type='group_saveable',\n\tschema=float_zeros,\n)\n\nregistry.add_saver(\n\t'rmellipse.arrschema.examples:save_group_saveable',\n\t['.h5', '.hdf5'],\n\tsaver_type='group_saveable',\n\tschema=float_zeros,\n)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Encoding and decoding functions are expected have function signatures\n# that look like ``fun(path, data, *args, **kwargs)``\n\narrschema.save('example.h5', my_data, 'my-name', registry=registry)\nmy_data_read = arrschema.load(\n\t'example.h5', group='my-name', registry=registry, schema=float_zeros\n)\nprint(my_data_read)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Otherwise, you will have to explicitly declare what schema your\ndata corresponds to.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "arrschema.save('example.h5', my_data, 'my-name', registry=registry, schema=float_zeros)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conversion\n\nConverters can be assigned as well. Converters are functions that take in\na single data set with an associated schema, and returns a new data set\nwith an associated schema.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# define a new format we care about\n\nint_zeros = arrschema.arrschema(name='int_zeros', shape=(...,), dims=(...,), dtype=int)\n\nregistry.add_schema(int_zeros)\n\nregistry.add_converter(\n\t'rmellipse.arrschema.examples:convert_float_to_int',\n\tinput_schema=float_zeros,\n\toutput_schema=int_zeros,\n)\n\nconverted = arrschema.convert(my_data, registry=registry, output_schema=int_zeros)\narrschema.validate(converted, schema=int_zeros)\nprint(converted)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.3" } }, "nbformat": 4, "nbformat_minor": 0 }