nexusLIMS.extractors package

Extract metadata from various electron microscopy file types.

Extractors should return a dictionary containing the values to be displayed in NexusLIMS as a sub-dictionary under the key nx_meta. The remaining keys will be for the metadata as extracted. Under nx_meta, a few keys are expected (although not enforced):

  • 'Creation Time' - ISO format date and time as a string

  • 'Data Type' - a human-readable description of the data type separated by underscores - e.g “STEM_Imaging”, “TEM_EDS”, etc.

  • 'DatasetType' - determines the value of the Type attribute for the dataset (defined in the schema)

  • 'Data Dimensions' - dimensions of the dataset, surrounded by parentheses, separated by commas as a string- e.g. ‘(12, 1024, 1024)’

  • 'Instrument ID' - instrument PID pulled from the instrument database

nexusLIMS.extractors.create_preview(fname: Path, *, overwrite: bool) Optional[Path][source]

Generate a preview image for a given file using one of a few different methods.

For most files, this method will try to load the file using HyperSpy and generate a preview using that library’s capabilities.

Parameters
  • fname – The filename from which to read data

  • overwrite – Whether to overwrite the .json metadata file and thumbnail image if either exists

Returns

preview_fname – The filename of the generated preview image; if None, a preview could not be successfully generated.

Return type

Optional[Path]

nexusLIMS.extractors.flatten_dict(_dict, parent_key='', separator=' ')[source]

Flatten a nested dictionary into a single level.

Utility method to take a nested dictionary structure and flatten it into a single level, separating the levels by a string as specified by separator.

Cribbed from: https://stackoverflow.com/a/6027615/1435788

Parameters
  • _dict (dict) – The dictionary to flatten

  • parent_key (str) – The “root” key to add to the existing keys

  • separator (str) – The string to use to separate values in the flattened keys (i.e. {‘a’: {‘b’: ‘c’}} would become {‘a’ + sep + ‘b’: ‘c’})

Returns

flattened_dict – The dictionary with depth one, with nested dictionaries flattened into root-level keys

Return type

str

nexusLIMS.extractors.parse_metadata(fname: Path, *, write_output: bool = True, generate_preview: bool = True, overwrite: bool = True) Tuple[Optional[Dict[str, Any]], Optional[Path]][source]

Parse metadata from a file and optionaly generate a preview image.

Given an input filename, read the file, determine what “type” of file (i.e. what instrument it came from) it is, filter the metadata (if necessary) to what we are interested in, and return it as a dictionary (writing to the NexusLIMS directory as JSON by default). Also calls the preview generation method, if desired.

Parameters
  • fname – The filename from which to read data

  • write_output – Whether to write the metadata dictionary as a json file in the NexusLIMS folder structure

  • generate_preview – Whether to generate the thumbnail preview of this dataset (that operation is not done in this method, it is just called from here so it can be done at the same time)

  • overwrite – Whether to overwrite the .json metadata file and thumbnail image if either exists

Returns

  • nx_meta (dict or None) – The “relevant” metadata that is of use for NexusLIMS. If None, the file could not be opened

  • preview_fname (Path or None) – The file path of the generated preview image, or None if it was not requested

Submodules

nexusLIMS.extractors.basic_metadata module

Handle basic metadata extraction from files that do not have an extractor defined.

nexusLIMS.extractors.basic_metadata.get_basic_metadata(filename)[source]

Get basic metadata from a file.

Returns basic metadata from a file that’s not currently interpretable by NexusLIMS.

Parameters

filename (str) – path to a file saved in the harvested directory of the instrument

Returns

mdict – A description of the file in lieu of any metadata extracted from it.

Return type

dict

nexusLIMS.extractors.digital_micrograph module

Parse and extract metadata from files saved by Gatan’s DigitalMicrograph software.

nexusLIMS.extractors.digital_micrograph.get_dm3_metadata(filename: Path)[source]

Get metadata from a dm3 or dm4 file.

Returns the metadata from a .dm3 file saved by Digital Micrograph, with some non-relevant information stripped out, and instrument specific metadata parsed and added by one of the instrument-specific parsers.

Parameters

filename (str) – path to a .dm3 file saved by Gatan’s Digital Micrograph

Returns

metadata – The extracted metadata of interest. If None, the file could not be opened

Return type

dict or None

nexusLIMS.extractors.digital_micrograph.get_pre_path(mdict: Dict) List[str][source]

Get the appropriate pre-path in the metadata tag structure for a given signal.

Get the path into a dictionary where the important DigitalMicrograph metadata is expected to be found. If the .dm3/.dm4 file contains a stack of images, the important metadata for NexusLIMS is not at its usual place and is instead under a plan info tag, so this method will determine if the stack metadata is present and return the correct path.

Parameters

mdict (dict) – A metadata dictionary as returned by get_dm3_metadata()

Returns

  • A list containing the subsequent keys that need to be traversed to

  • get to the point in the mdict where the important metadata is stored

nexusLIMS.extractors.digital_micrograph.parse_642_jeol(mdict)[source]

Add/adjust metadata specific to the 642 FEI Titan.

(’JEOL-JEM3010-TEM-565989 in *********’)

Parameters

mdict (dict) – “raw” metadata dictionary as parsed by get_dm3_metadata()

Returns

mdict – The original metadata dictionary with added information specific to files originating from this microscope with “important” values contained under the nx_meta key at the root level

Return type

dict

nexusLIMS.extractors.digital_micrograph.parse_642_titan(mdict)[source]

Add/adjust metadata specific to the 642 FEI Titan.

(’FEI-Titan-TEM-635816 in **********’)

Parameters

mdict (dict) – “raw” metadata dictionary as parsed by get_dm3_metadata()

Returns

mdict – The original metadata dictionary with added information specific to files originating from this microscope with “important” values contained under the nx_meta key at the root level

Return type

dict

nexusLIMS.extractors.digital_micrograph.parse_643_titan(mdict)[source]

Add/adjust metadata specific to the 643 FEI Titan.

(’FEI-Titan-STEM-630901 in *********’)

Parameters

mdict (dict) – “raw” metadata dictionary as parsed by get_dm3_metadata()

Returns

mdict – The original metadata dictionary with added information specific to files originating from this microscope with “important” values contained under the nx_meta key at the root level

Return type

dict

nexusLIMS.extractors.digital_micrograph.parse_dm3_eds_info(mdict)[source]

Parse EDS information from the dm3 metadata.

Parses metadata from the DigitalMicrograph tag structure that concerns any EDS acquisition or spectrometer settings, placing it in an EDS dictionary underneath the root-level nx_meta node. Metadata values that are commonly incorrect or may be placeholders are specified in a list under the nx_meta.warnings node.

Parameters

mdict (dict) – A metadata dictionary as returned by get_dm3_metadata()

Returns

mdict – The metadata dictionary with all the “EDS-specific” metadata added as sub-node under the nx_meta root level dictionary

Return type

dict

nexusLIMS.extractors.digital_micrograph.parse_dm3_eels_info(mdict)[source]

Parse EELS information from the metadata.

Parses metadata from the DigitalMicrograph tag structure that concerns any EELS acquisition or spectrometer settings, placing it in an EELS dictionary underneath the root-level nx_meta node.

Parameters

mdict (dict) – A metadata dictionary as returned by get_dm3_metadata()

Returns

mdict – The metadata dict with all the “EELS-specific” metadata added under nx_meta

Return type

dict

nexusLIMS.extractors.digital_micrograph.parse_dm3_microscope_info(mdict)[source]

Parse the “microscope info” metadata.

Parse the “important” metadata that is saved at specific places within the DM3 tag structure into a consistent place in the metadata dictionary returned by get_dm3_metadata(). Specifically looks at the “Microscope Info”, “Session Info”, and “Meta Data” nodes (these are not present on every microscope).

Parameters

mdict (dict) – A metadata dictionary as returned by get_dm3_metadata()

Returns

mdict – The same metadata dictionary with some values added under the root-level nx_meta key

Return type

dict

nexusLIMS.extractors.digital_micrograph.parse_dm3_spectrum_image_info(mdict)[source]

Parse “spectrum image” information from the metadata.

Parses metadata that concerns any spectrum imaging information (the “SI” tag) and places it in a “Spectrum Imaging” dictionary underneath the root-level nx_meta node. Metadata values that are commonly incorrect or may be placeholders are specified in a list under the nx_meta.warnings node.

Parameters

mdict (dict) – A metadata dictionary as returned by get_dm3_metadata()

Returns

mdict – The metadata dictionary with all the “EDS-specific” metadata added as sub-node under the nx_meta root level dictionary

Return type

dict

nexusLIMS.extractors.digital_micrograph.process_tecnai_microscope_info(microscope_info, delimiter='\u2028')[source]

Process the Microscope_Info metadata string into a dictionary of key-value pairs.

This method is only relevant for FEI Titan TEMs that write additional metadata into a unicode-delimited string at a certain place in the DM3 tag structure

Parameters
  • microscope_info (str) – The string of data obtained from the Tecnai.Microscope_Info leaf of the metadata

  • delimiter (str) – The value (a unicode string) used to split the microscope_info string.

Returns

info_dict – The information contained in the string, in a more easily-digestible form.

Return type

dict

nexusLIMS.extractors.edax module

Parse metadata from EDAX EDS spectra saved as .spc and .msa files.

nexusLIMS.extractors.edax.get_msa_metadata(filename: Path) Optional[Dict][source]

Return the metadata (as a dict) from an .msa spectrum file.

This file may be saved by a number of different EDS acquisition software, but most often is produced as an export from EDAX or Oxford software. This format is a standard, but vendors (such as EDAX) often add other values into the metadata header. See https://www.microscopy.org/resources/scientific_data/ for the fomal specification.

Parameters

filename – path to a .msa file saved by various EDS software packages

Returns

metadata – The metadata of interest extracted from the file. If None, the file could not be opened

Return type

Optional[Dict]

nexusLIMS.extractors.edax.get_spc_metadata(filename: Path) Optional[Dict][source]

Return the metadata (as a dict) from a .spc file.

This type of file is produced by EDAX EDS software. It is read by HyperSpy’s file reader and relevant metadata extracted and returned

Parameters

filename – path to a .spc file saved by EDAX software (Genesis, TEAM, etc.)

Returns

metadata – The metadata of interest extracted from the file. If None, the file could not be opened

Return type

Optional[Dict]

nexusLIMS.extractors.fei_emi module

Parses and extract metadata from files saved by the TIA software.

Handles files saved by FEI’s (now Thermo Fisher Scientific) TIA (Tecnai Imaging and Analysis) software. This software package saves data in two types of files: .ser and .emi. The .emi file contains metadata about the data acquisition, while the (one or more) .ser files contain the actual collected data. Thus, access to both is required for full metadata extraction and preview generation.

nexusLIMS.extractors.fei_emi.get_emi_from_ser(ser_fname: Path) Path[source]

Get the accompanying .emi filename from an ser filename.

This method assumes that the .ser file will be the same name as the .emi file, but with an underscore and a digit appended. i.e. file.emi would result in .ser files named file_1.ser, file_2.ser, etc.

Parameters

ser_fname – The absolute path of an FEI TIA .ser data file

Returns

  • emi_fname – The absolute path of the accompanying .emi metadata file

  • index (int) – The number of this .ser file (i.e. 1, 2, 3, etc.)

Raises

FileNotFoundError – If the accompanying .emi file cannot be resolved to be a file

nexusLIMS.extractors.fei_emi.get_ser_metadata(filename: Path)[source]

Get metadat from .ser file.

Returns metadata (as a dict) from an FEI .ser file + its associated .emi files, with some non-relevant information stripped.

Parameters

filename – Path to FEI .ser file

Returns

metadata – Metadata of interest which is extracted from the passed files. If files cannot be opened, at least basic metadata will be returned ( creation time, etc.)

Return type

dict

nexusLIMS.extractors.fei_emi.map_keys(term_mapping, base, metadata)[source]

Map keys into NexusLIMS metadata structure.

Given a term mapping dictionary and a metadata dictionary, translate the input keys within the “raw” metadata into a parsed value in the “nx_meta” metadata structure.

Parameters
  • term_mapping (dict) – Dictionary where keys are tuples of strings (the input terms), and values are either a single string or a list of strings (the output terms).

  • base (list) – The ‘root’ path within the metadata dictionary of where to start applying the input terms

  • metadata (dict) – A metadata dictionary as returned by get_ser_metadata()

Returns

metadata – The same metadata dictionary with some values added under the root-level nx_meta key, as specified by term_mapping

Return type

dict

Notes

The term_mapping parameter should be a dictionary of the form:

{
    ('val1_1', 'val1_2') : 'output_val_1',
    ('val1_1', 'val2_2') : 'output_val_2',
    etc.
}

Assuming base is ['ObjectInfo', 'AcquireInfo'], this would map the term present at ObjectInfo.AcquireInfo.val1_1.val1_2 into nx_meta.output_val_1, and ObjectInfo.AcquireInfo.val1_1.val2_2 into nx_meta.output_val_2, and so on. If one of the output terms is a list, the resulting metadata will be nested. e.g. ['output_val_1', 'output_val_2'] would get mapped to nx_meta.output_val_1.output_val_2.

nexusLIMS.extractors.fei_emi.parse_acquire_info(metadata)[source]

Parse acquisition conditions.

Parse the metadata that is saved at specific places within the .emi tag structure into a consistent place in the metadata dictionary returned by get_ser_metadata(). Specifically looks at the “AcquireInfo” node of the metadata structure.

Parameters

metadata (dict) – A metadata dictionary as returned by get_ser_metadata()

Returns

metadata – The same metadata dictionary with some values added under the root-level nx_meta key

Return type

dict

nexusLIMS.extractors.fei_emi.parse_basic_info(metadata, shape, instrument)[source]

Parse basic metadata from file.

Parse the metadata that is saved at specific places within the .emi tag structure into a consistent place in the metadata dictionary returned by get_ser_metadata(). Specifically, this method handles the creation date, equipment manufacturer, and data shape/type.

Parameters
  • metadata (dict) – A metadata dictionary as returned by get_ser_metadata()

  • shape – The shape of the dataset

  • instrument (Instrument) – The instrument this file was collected on

Returns

metadata – The same metadata dictionary with some values added under the root-level nx_meta key

Return type

dict

nexusLIMS.extractors.fei_emi.parse_data_type(s, metadata)[source]

Parse the data type from the signal’s metadata.

Determine “Data Type” and “DatasetType” for the given .ser file based off of metadata and signal characteristics. This method is used to determine whether the image is TEM or STEM, Image or Diffraction, Spectrum or Spectrum Image, etc.

Due to lack of appropriate metadata written by the FEI software, a heuristic of axis limits and size is used to determine whether a spectrum’s data type is EELS or EDS. This may not be a perfect determination.

Parameters
Returns

  • data_type (str) – The string that should be stored at metadata[‘nx_meta’][‘Data Type’]

  • dataset_type (str) – The string that should be stored at metadata[‘nx_meta’][‘DatasetType’]

nexusLIMS.extractors.fei_emi.parse_experimental_conditions(metadata)[source]

Parse experimental conditions.

Parse the metadata that is saved at specific places within the .emi tag structure into a consistent place in the metadata dictionary returned by get_ser_metadata(). Specifically looks at the “ExperimentalConditions” node of the metadata structure.

Parameters

metadata (dict) – A metadata dictionary as returned by get_ser_metadata()

Returns

metadata – The same metadata dictionary with some values added under the root-level nx_meta key

Return type

dict

nexusLIMS.extractors.fei_emi.parse_experimental_description(metadata)[source]

Parse experimental description.

Parse the metadata that is saved at specific places within the .emi tag structure into a consistent place in the metadata dictionary returned by get_ser_metadata(). Specifically looks at the “ExperimentalDescription” node of the metadata structure.

Parameters

metadata (dict) – A metadata dictionary as returned by get_ser_metadata()

Returns

metadata – The same metadata dictionary with some values added under the root-level nx_meta key

Return type

dict

Notes

The terms to extract in this section were

nexusLIMS.extractors.fei_emi.split_fei_metadata_units(metadata_term)[source]

Split metadata into value and units.

If present, separate a metadata term into its value and units. In the FEI metadata structure, units are indicated separated by an underscore at the end of the term. i.e. High tension_kV indicates that the High tension metadata value has units of kV.

Parameters

metadata_term (str) – The metadata term read from the FEI tag structure

Returns

mdata_and_unit – A length-2 tuple with the metadata value name as the first item and the unit (if present) as the second item

Return type

tuple of str

nexusLIMS.extractors.quanta_tif module

Parse metadata from FEI tif images (saved by FEI/Thermo Fisher FIBs and SEMs).

nexusLIMS.extractors.quanta_tif.get_quanta_metadata(filename: Path)[source]

Get metadata from a Quanta-style tif file.

Returns the metadata (as a dictionary) from a .tif file saved by the FEI Quanta SEM in the Nexus Microscopy Facility. Specific tags of interest are duplicated under the root-level nx_meta node in the dictionary.

Parameters

filename – path to a .tif file saved by the Quanta

Returns

mdict – The metadata text extracted from the file

Return type

dict

nexusLIMS.extractors.quanta_tif.parse_beam_info(mdict, beam_name)[source]

Parse the “Beam info” section of the metadata.

Parameters
  • mdict (dict) – A metadata dictionary as returned by get_quanta_metadata()

  • beam_name (str) – The “beam name” read from the root-level Beam node of the metadata dictionary

Returns

mdict – The same metadata dictionary with some values added under the root-level nx_meta key

Return type

dict

nexusLIMS.extractors.quanta_tif.parse_det_info(mdict, det_name)[source]

Parse the “Detector info” section of the metadata.

Parses the Detector portion of the metadata dictionary from the Quanta to get values such as brightness, contrast, signal, etc.

Parameters
  • mdict (dict) – A metadata dictionary as returned by get_quanta_metadata()

  • det_name (str) – The “detector name” read from the root-level Beam node of the metadata dictionary

Returns

mdict – The same metadata dictionary with some values added under the root-level nx_meta key

Return type

dict

nexusLIMS.extractors.quanta_tif.parse_image_info(mdict)[source]

Parse the “Image info” section of the metadata.

Parses the Image portion of the metadata dictionary from the Quanta to get values such as drift correction, image integration settings, etc.

Parameters

mdict (dict) – A metadata dictionary as returned by get_quanta_metadata()

Returns

mdict – The same metadata dictionary with some values added under the root-level nx_meta key

Return type

dict

nexusLIMS.extractors.quanta_tif.parse_nx_meta(mdict)[source]

Parse metadata into NexusLIMS format.

Parse the “important” metadata that is saved at specific places within the Quanta tag structure into a consistent place in the metadata dictionary returned by get_quanta_metadata().

The metadata contained in the XML section (if present) is not parsed, since it appears to only contain duplicates or slightly renamed metadata values compared to the typical config-style section that is always present.

Parameters

mdict (dict) – A metadata dictionary as returned by get_quanta_metadata()

Returns

mdict – The same metadata dictionary with some values added under the root-level nx_meta key

Return type

dict

nexusLIMS.extractors.quanta_tif.parse_scan_info(mdict, scan_name)[source]

Parse the “Scan info” section of the metadata.

Parses the Scan portion of the metadata dictionary (on a Quanta this is always “EScan”) to get values such as dwell time, field width, and pixel size.

Parameters
  • mdict (dict) – A metadata dictionary as returned by get_quanta_metadata()

  • scan_name (str) – The “scan name” read from the root-level Beam node of the metadata dictionary

Returns

mdict – The same metadata dictionary with some values added under the root-level nx_meta key

Return type

dict

nexusLIMS.extractors.quanta_tif.parse_system_info(mdict)[source]

Parse the “System info” section of the metadata.

Parses the System portion of the metadata dictionary from the Quanta to get values such as software version, chamber config, etc.

Parameters

mdict (dict) – A metadata dictionary as returned by get_quanta_metadata()

Returns

mdict – The same metadata dictionary with some values added under the root-level nx_meta key

Return type

dict

nexusLIMS.extractors.thumbnail_generator module

Generate preview images from various data files.

Data files are represented as either HyperSpy Signals, or as raw data files (in the case of tiff images)

nexusLIMS.extractors.thumbnail_generator.add_annotation_markers(s)[source]

Add annotation markers from a DM3/DM4 file to a HyperSpy signal.

Read annotations from a signal originating from DigitalMicrograph and convert the ones (that we can) into Hyperspy markers for plotting. Adapted from a currently (at the time of writing) open pull request in HyperSpy.

Parameters

s (hyperspy.signal.BaseSignal (or subclass)) – The HyperSpy signal for which a thumbnail should be generated

nexusLIMS.extractors.thumbnail_generator.down_sample_image(fname: Path, out_path: Path, output_size: Optional[Tuple[int, int]] = None, factor: Optional[int] = None)[source]

Load an image file from disk, down-sample it to the requested dpi, and save.

Sometimes the data doesn’t need to be loaded as a HyperSpy signal, and it’s better just to down-sample existing image data (such as for .tif files created by the Quanta SEM).

Parameters
  • fname – The filepath that will be resized. All formats supported by PIL.Image.open() can be used

  • out_path – A path to the desired thumbnail filename. All formats supported by PIL.Image.Image.save() can be used.

  • output_size – A tuple of ints specifying the width and height of the output image. Either this argument or factor should be provided (not both).

  • factor – The multiple of the image size to reduce by (i.e. a value of 2 results in an image that is 50% of each original dimension). Either this argument or output_size should be provided (not both).

nexusLIMS.extractors.thumbnail_generator.image_to_square_thumbnail(f: Path, out_path: Path, output_size: int) bool[source]

Generate a preview thumbnail from a non-data image file.

Images of common filetypes will be transformed into 500 x 500 pixel images by first scaling the largest dimension to 500 pixels and then padding the resulting image to square.

Parameters
  • f – The string of the path of an image file for which a thumbnail should be generated.

  • out_path – A path to the desired thumbnail filename. All formats supported by save() can be used.

  • output_size – The desired resulting size of the thumbnail image.

Return type

Whether a preview was generated

nexusLIMS.extractors.thumbnail_generator.sig_to_thumbnail(s, out_path: Path, dpi: int = 92)[source]

Generate a preview thumbnail from an arbitrary HyperSpy signal.

For a 2D signal, the signal from the first navigation position is used (most likely the top- and left-most position. For a 1D signal (i.e. a spectrum or spectrum image), the output depends on the number of navigation dimensions:

  • 0: Image of spectrum

  • 1: Image of linescan (a la DigitalMicrograph)

  • 2: Image of spectra sampled from navigation space

  • 2+: As for 2 dimensions

Parameters
  • s (hyperspy.signal.BaseSignal (or subclass)) – The HyperSpy signal for which a thumbnail should be generated

  • out_path – A path to the desired thumbnail filename. All formats supported by savefig() can be used.

  • dpi (int) – The “dots per inch” resolution for the outputted figure

Returns

f – Handle to a matplotlib Figure

Return type

matplotlib.figure.Figure

Notes

This method heavily utilizes HyperSpy’s existing plotting functions to figure out how to best display the image

nexusLIMS.extractors.thumbnail_generator.text_to_thumbnail(f: Path, out_path: Path, output_size: int = 500) Union[Figure, bool][source]

Generate a preview thumbnail from a text file.

For a text file, the contents will be formatted and written to a 500x500 pixel jpg image of size 5 in by 5 in.

If the text file has many newlines, it is probably data and the first 42 characters of each of the first 20 lines of the text file will be written to the image.

If the text file has a few (or fewer) newlines, it is probably a manually generated note and the text will be written to a 42 column, 18 row box until the space is exhausted.

Parameters
  • f – The path of a text file for which a thumbnail should be generated.

  • out_path – A path to the desired thumbnail filename. All formats supported by savefig() can be used.

  • output_size (int) – The pixel width (and height, since the image is padded to square) of the saved image file.

Returns

Handle to a matplotlib Figure, or the value False if a preview could not be generated

Return type

f

nexusLIMS.extractors.utils module

Methods (primarily intended to be private) that are used by the other extractors.