nexusLIMS.extractors package¶
Extract metadata from various electron microscopy file types.
Extractors should return a dictionary containing the values to be displayed
in NexusLIMS as a sub-dictionary under the key nx_meta
. The remaining keys
will be for the metadata as extracted. Under nx_meta
, a few keys are
expected (although not enforced):
'Creation Time'
- ISO format date and time as a string'Data Type'
- a human-readable description of the data type separated by underscores - e.g “STEM_Imaging”, “TEM_EDS”, etc.'DatasetType'
- determines the value of the Type attribute for the dataset (defined in the schema)'Data Dimensions'
- dimensions of the dataset, surrounded by parentheses, separated by commas as a string- e.g. ‘(12, 1024, 1024)’'Instrument ID'
- instrument PID pulled from the instrument database
- nexusLIMS.extractors.create_preview(fname: Path, *, overwrite: bool) Optional[Path] [source]¶
Generate a preview image for a given file using one of a few different methods.
For most files, this method will try to load the file using HyperSpy and generate a preview using that library’s capabilities.
- Parameters
fname – The filename from which to read data
overwrite – Whether to overwrite the .json metadata file and thumbnail image if either exists
- Returns
preview_fname – The filename of the generated preview image; if None, a preview could not be successfully generated.
- Return type
Optional[Path]
- nexusLIMS.extractors.flatten_dict(_dict, parent_key='', separator=' ')[source]¶
Flatten a nested dictionary into a single level.
Utility method to take a nested dictionary structure and flatten it into a single level, separating the levels by a string as specified by
separator
.Cribbed from: https://stackoverflow.com/a/6027615/1435788
- Parameters
- Returns
flattened_dict – The dictionary with depth one, with nested dictionaries flattened into root-level keys
- Return type
- nexusLIMS.extractors.parse_metadata(fname: Path, *, write_output: bool = True, generate_preview: bool = True, overwrite: bool = True) Tuple[Optional[Dict[str, Any]], Optional[Path]] [source]¶
Parse metadata from a file and optionaly generate a preview image.
Given an input filename, read the file, determine what “type” of file (i.e. what instrument it came from) it is, filter the metadata (if necessary) to what we are interested in, and return it as a dictionary (writing to the NexusLIMS directory as JSON by default). Also calls the preview generation method, if desired.
- Parameters
fname – The filename from which to read data
write_output – Whether to write the metadata dictionary as a json file in the NexusLIMS folder structure
generate_preview – Whether to generate the thumbnail preview of this dataset (that operation is not done in this method, it is just called from here so it can be done at the same time)
overwrite – Whether to overwrite the .json metadata file and thumbnail image if either exists
- Returns
nx_meta (dict or None) – The “relevant” metadata that is of use for NexusLIMS. If None, the file could not be opened
preview_fname (Path or None) – The file path of the generated preview image, or None if it was not requested
Submodules¶
nexusLIMS.extractors.basic_metadata module¶
Handle basic metadata extraction from files that do not have an extractor defined.
nexusLIMS.extractors.digital_micrograph module¶
Parse and extract metadata from files saved by Gatan’s DigitalMicrograph software.
- nexusLIMS.extractors.digital_micrograph.get_dm3_metadata(filename: Path)[source]¶
Get metadata from a dm3 or dm4 file.
Returns the metadata from a .dm3 file saved by Digital Micrograph, with some non-relevant information stripped out, and instrument specific metadata parsed and added by one of the instrument-specific parsers.
- nexusLIMS.extractors.digital_micrograph.get_pre_path(mdict: Dict) List[str] [source]¶
Get the appropriate pre-path in the metadata tag structure for a given signal.
Get the path into a dictionary where the important DigitalMicrograph metadata is expected to be found. If the .dm3/.dm4 file contains a stack of images, the important metadata for NexusLIMS is not at its usual place and is instead under a plan info tag, so this method will determine if the stack metadata is present and return the correct path.
- Parameters
mdict (dict) – A metadata dictionary as returned by
get_dm3_metadata()
- Returns
A list containing the subsequent keys that need to be traversed to
get to the point in the mdict where the important metadata is stored
- nexusLIMS.extractors.digital_micrograph.parse_642_jeol(mdict)[source]¶
Add/adjust metadata specific to the 642 FEI Titan.
(’JEOL-JEM3010-TEM-565989 in *********’)
- Parameters
mdict (dict) – “raw” metadata dictionary as parsed by
get_dm3_metadata()
- Returns
mdict – The original metadata dictionary with added information specific to files originating from this microscope with “important” values contained under the
nx_meta
key at the root level- Return type
- nexusLIMS.extractors.digital_micrograph.parse_642_titan(mdict)[source]¶
Add/adjust metadata specific to the 642 FEI Titan.
(’FEI-Titan-TEM-635816 in **********’)
- Parameters
mdict (dict) – “raw” metadata dictionary as parsed by
get_dm3_metadata()
- Returns
mdict – The original metadata dictionary with added information specific to files originating from this microscope with “important” values contained under the
nx_meta
key at the root level- Return type
- nexusLIMS.extractors.digital_micrograph.parse_643_titan(mdict)[source]¶
Add/adjust metadata specific to the 643 FEI Titan.
(’FEI-Titan-STEM-630901 in *********’)
- Parameters
mdict (dict) – “raw” metadata dictionary as parsed by
get_dm3_metadata()
- Returns
mdict – The original metadata dictionary with added information specific to files originating from this microscope with “important” values contained under the
nx_meta
key at the root level- Return type
- nexusLIMS.extractors.digital_micrograph.parse_dm3_eds_info(mdict)[source]¶
Parse EDS information from the dm3 metadata.
Parses metadata from the DigitalMicrograph tag structure that concerns any EDS acquisition or spectrometer settings, placing it in an
EDS
dictionary underneath the root-levelnx_meta
node. Metadata values that are commonly incorrect or may be placeholders are specified in a list under thenx_meta.warnings
node.- Parameters
mdict (dict) – A metadata dictionary as returned by
get_dm3_metadata()
- Returns
mdict – The metadata dictionary with all the “EDS-specific” metadata added as sub-node under the
nx_meta
root level dictionary- Return type
- nexusLIMS.extractors.digital_micrograph.parse_dm3_eels_info(mdict)[source]¶
Parse EELS information from the metadata.
Parses metadata from the DigitalMicrograph tag structure that concerns any EELS acquisition or spectrometer settings, placing it in an
EELS
dictionary underneath the root-levelnx_meta
node.- Parameters
mdict (dict) – A metadata dictionary as returned by
get_dm3_metadata()
- Returns
mdict – The metadata dict with all the “EELS-specific” metadata added under
nx_meta
- Return type
- nexusLIMS.extractors.digital_micrograph.parse_dm3_microscope_info(mdict)[source]¶
Parse the “microscope info” metadata.
Parse the “important” metadata that is saved at specific places within the DM3 tag structure into a consistent place in the metadata dictionary returned by
get_dm3_metadata()
. Specifically looks at the “Microscope Info”, “Session Info”, and “Meta Data” nodes (these are not present on every microscope).- Parameters
mdict (dict) – A metadata dictionary as returned by
get_dm3_metadata()
- Returns
mdict – The same metadata dictionary with some values added under the root-level
nx_meta
key- Return type
- nexusLIMS.extractors.digital_micrograph.parse_dm3_spectrum_image_info(mdict)[source]¶
Parse “spectrum image” information from the metadata.
Parses metadata that concerns any spectrum imaging information (the “SI” tag) and places it in a “Spectrum Imaging” dictionary underneath the root-level
nx_meta
node. Metadata values that are commonly incorrect or may be placeholders are specified in a list under thenx_meta.warnings
node.- Parameters
mdict (dict) – A metadata dictionary as returned by
get_dm3_metadata()
- Returns
mdict – The metadata dictionary with all the “EDS-specific” metadata added as sub-node under the
nx_meta
root level dictionary- Return type
- nexusLIMS.extractors.digital_micrograph.process_tecnai_microscope_info(microscope_info, delimiter='\u2028')[source]¶
Process the Microscope_Info metadata string into a dictionary of key-value pairs.
This method is only relevant for FEI Titan TEMs that write additional metadata into a unicode-delimited string at a certain place in the DM3 tag structure
- Parameters
- Returns
info_dict – The information contained in the string, in a more easily-digestible form.
- Return type
nexusLIMS.extractors.edax module¶
Parse metadata from EDAX EDS spectra saved as .spc and .msa files.
- nexusLIMS.extractors.edax.get_msa_metadata(filename: Path) Optional[Dict] [source]¶
Return the metadata (as a dict) from an .msa spectrum file.
This file may be saved by a number of different EDS acquisition software, but most often is produced as an export from EDAX or Oxford software. This format is a standard, but vendors (such as EDAX) often add other values into the metadata header. See https://www.microscopy.org/resources/scientific_data/ for the fomal specification.
- Parameters
filename – path to a .msa file saved by various EDS software packages
- Returns
metadata – The metadata of interest extracted from the file. If None, the file could not be opened
- Return type
Optional[Dict]
- nexusLIMS.extractors.edax.get_spc_metadata(filename: Path) Optional[Dict] [source]¶
Return the metadata (as a dict) from a .spc file.
This type of file is produced by EDAX EDS software. It is read by HyperSpy’s file reader and relevant metadata extracted and returned
- Parameters
filename – path to a .spc file saved by EDAX software (Genesis, TEAM, etc.)
- Returns
metadata – The metadata of interest extracted from the file. If None, the file could not be opened
- Return type
Optional[Dict]
nexusLIMS.extractors.fei_emi module¶
Parses and extract metadata from files saved by the TIA software.
Handles files saved by FEI’s (now Thermo Fisher Scientific) TIA (Tecnai Imaging and
Analysis) software. This software package saves data in two types of files: .ser
and .emi
. The .emi
file contains metadata about the data acquisition, while
the (one or more) .ser
files contain the actual collected data. Thus, access to
both is required for full metadata extraction and preview generation.
- nexusLIMS.extractors.fei_emi.get_emi_from_ser(ser_fname: Path) Path [source]¶
Get the accompanying .emi filename from an ser filename.
This method assumes that the .ser file will be the same name as the .emi file, but with an underscore and a digit appended. i.e.
file.emi
would result in .ser files namedfile_1.ser
,file_2.ser
, etc.- Parameters
ser_fname – The absolute path of an FEI TIA .ser data file
- Returns
emi_fname – The absolute path of the accompanying .emi metadata file
index (int) – The number of this .ser file (i.e. 1, 2, 3, etc.)
- Raises
FileNotFoundError – If the accompanying .emi file cannot be resolved to be a file
- nexusLIMS.extractors.fei_emi.get_ser_metadata(filename: Path)[source]¶
Get metadat from .ser file.
Returns metadata (as a dict) from an FEI .ser file + its associated .emi files, with some non-relevant information stripped.
- Parameters
filename – Path to FEI .ser file
- Returns
metadata – Metadata of interest which is extracted from the passed files. If files cannot be opened, at least basic metadata will be returned ( creation time, etc.)
- Return type
- nexusLIMS.extractors.fei_emi.map_keys(term_mapping, base, metadata)[source]¶
Map keys into NexusLIMS metadata structure.
Given a term mapping dictionary and a metadata dictionary, translate the input keys within the “raw” metadata into a parsed value in the “nx_meta” metadata structure.
- Parameters
term_mapping (dict) – Dictionary where keys are tuples of strings (the input terms), and values are either a single string or a list of strings (the output terms).
base (list) – The ‘root’ path within the metadata dictionary of where to start applying the input terms
metadata (dict) – A metadata dictionary as returned by
get_ser_metadata()
- Returns
metadata – The same metadata dictionary with some values added under the root-level
nx_meta
key, as specified byterm_mapping
- Return type
Notes
The
term_mapping
parameter should be a dictionary of the form:{ ('val1_1', 'val1_2') : 'output_val_1', ('val1_1', 'val2_2') : 'output_val_2', etc. }
Assuming
base
is['ObjectInfo', 'AcquireInfo']
, this would map the term present atObjectInfo.AcquireInfo.val1_1.val1_2
intonx_meta.output_val_1
, andObjectInfo.AcquireInfo.val1_1.val2_2
intonx_meta.output_val_2
, and so on. If one of the output terms is a list, the resulting metadata will be nested. e.g.['output_val_1', 'output_val_2']
would get mapped tonx_meta.output_val_1.output_val_2
.
- nexusLIMS.extractors.fei_emi.parse_acquire_info(metadata)[source]¶
Parse acquisition conditions.
Parse the metadata that is saved at specific places within the .emi tag structure into a consistent place in the metadata dictionary returned by
get_ser_metadata()
. Specifically looks at the “AcquireInfo” node of the metadata structure.- Parameters
metadata (dict) – A metadata dictionary as returned by
get_ser_metadata()
- Returns
metadata – The same metadata dictionary with some values added under the root-level
nx_meta
key- Return type
- nexusLIMS.extractors.fei_emi.parse_basic_info(metadata, shape, instrument)[source]¶
Parse basic metadata from file.
Parse the metadata that is saved at specific places within the .emi tag structure into a consistent place in the metadata dictionary returned by
get_ser_metadata()
. Specifically, this method handles the creation date, equipment manufacturer, and data shape/type.- Parameters
metadata (dict) – A metadata dictionary as returned by
get_ser_metadata()
shape – The shape of the dataset
instrument (Instrument) – The instrument this file was collected on
- Returns
metadata – The same metadata dictionary with some values added under the root-level
nx_meta
key- Return type
- nexusLIMS.extractors.fei_emi.parse_data_type(s, metadata)[source]¶
Parse the data type from the signal’s metadata.
Determine “Data Type” and “DatasetType” for the given .ser file based off of metadata and signal characteristics. This method is used to determine whether the image is TEM or STEM, Image or Diffraction, Spectrum or Spectrum Image, etc.
Due to lack of appropriate metadata written by the FEI software, a heuristic of axis limits and size is used to determine whether a spectrum’s data type is EELS or EDS. This may not be a perfect determination.
- Parameters
s (
hyperspy.signal.BaseSignal
(or subclass)) – The HyperSpy signal that contains the data of interestmetadata (dict) – A metadata dictionary as returned by
get_ser_metadata()
- Returns
data_type (str) – The string that should be stored at metadata[‘nx_meta’][‘Data Type’]
dataset_type (str) – The string that should be stored at metadata[‘nx_meta’][‘DatasetType’]
- nexusLIMS.extractors.fei_emi.parse_experimental_conditions(metadata)[source]¶
Parse experimental conditions.
Parse the metadata that is saved at specific places within the .emi tag structure into a consistent place in the metadata dictionary returned by
get_ser_metadata()
. Specifically looks at the “ExperimentalConditions” node of the metadata structure.- Parameters
metadata (dict) – A metadata dictionary as returned by
get_ser_metadata()
- Returns
metadata – The same metadata dictionary with some values added under the root-level
nx_meta
key- Return type
- nexusLIMS.extractors.fei_emi.parse_experimental_description(metadata)[source]¶
Parse experimental description.
Parse the metadata that is saved at specific places within the .emi tag structure into a consistent place in the metadata dictionary returned by
get_ser_metadata()
. Specifically looks at the “ExperimentalDescription” node of the metadata structure.- Parameters
metadata (dict) – A metadata dictionary as returned by
get_ser_metadata()
- Returns
metadata – The same metadata dictionary with some values added under the root-level
nx_meta
key- Return type
Notes
The terms to extract in this section were
- nexusLIMS.extractors.fei_emi.split_fei_metadata_units(metadata_term)[source]¶
Split metadata into value and units.
If present, separate a metadata term into its value and units. In the FEI metadata structure, units are indicated separated by an underscore at the end of the term. i.e.
High tension_kV
indicates that the High tension metadata value has units of kV.
nexusLIMS.extractors.quanta_tif module¶
Parse metadata from FEI tif images (saved by FEI/Thermo Fisher FIBs and SEMs).
- nexusLIMS.extractors.quanta_tif.get_quanta_metadata(filename: Path)[source]¶
Get metadata from a Quanta-style tif file.
Returns the metadata (as a dictionary) from a .tif file saved by the FEI Quanta SEM in the Nexus Microscopy Facility. Specific tags of interest are duplicated under the root-level
nx_meta
node in the dictionary.- Parameters
filename – path to a .tif file saved by the Quanta
- Returns
mdict – The metadata text extracted from the file
- Return type
- nexusLIMS.extractors.quanta_tif.parse_beam_info(mdict, beam_name)[source]¶
Parse the “Beam info” section of the metadata.
- Parameters
mdict (dict) – A metadata dictionary as returned by
get_quanta_metadata()
beam_name (str) – The “beam name” read from the root-level
Beam
node of the metadata dictionary
- Returns
mdict – The same metadata dictionary with some values added under the root-level
nx_meta
key- Return type
- nexusLIMS.extractors.quanta_tif.parse_det_info(mdict, det_name)[source]¶
Parse the “Detector info” section of the metadata.
Parses the Detector portion of the metadata dictionary from the Quanta to get values such as brightness, contrast, signal, etc.
- Parameters
mdict (dict) – A metadata dictionary as returned by
get_quanta_metadata()
det_name (str) – The “detector name” read from the root-level
Beam
node of the metadata dictionary
- Returns
mdict – The same metadata dictionary with some values added under the root-level
nx_meta
key- Return type
- nexusLIMS.extractors.quanta_tif.parse_image_info(mdict)[source]¶
Parse the “Image info” section of the metadata.
Parses the Image portion of the metadata dictionary from the Quanta to get values such as drift correction, image integration settings, etc.
- Parameters
mdict (dict) – A metadata dictionary as returned by
get_quanta_metadata()
- Returns
mdict – The same metadata dictionary with some values added under the root-level
nx_meta
key- Return type
- nexusLIMS.extractors.quanta_tif.parse_nx_meta(mdict)[source]¶
Parse metadata into NexusLIMS format.
Parse the “important” metadata that is saved at specific places within the Quanta tag structure into a consistent place in the metadata dictionary returned by
get_quanta_metadata()
.The metadata contained in the XML section (if present) is not parsed, since it appears to only contain duplicates or slightly renamed metadata values compared to the typical config-style section that is always present.
- Parameters
mdict (dict) – A metadata dictionary as returned by
get_quanta_metadata()
- Returns
mdict – The same metadata dictionary with some values added under the root-level
nx_meta
key- Return type
- nexusLIMS.extractors.quanta_tif.parse_scan_info(mdict, scan_name)[source]¶
Parse the “Scan info” section of the metadata.
Parses the Scan portion of the metadata dictionary (on a Quanta this is always “EScan”) to get values such as dwell time, field width, and pixel size.
- Parameters
mdict (dict) – A metadata dictionary as returned by
get_quanta_metadata()
scan_name (str) – The “scan name” read from the root-level
Beam
node of the metadata dictionary
- Returns
mdict – The same metadata dictionary with some values added under the root-level
nx_meta
key- Return type
- nexusLIMS.extractors.quanta_tif.parse_system_info(mdict)[source]¶
Parse the “System info” section of the metadata.
Parses the System portion of the metadata dictionary from the Quanta to get values such as software version, chamber config, etc.
- Parameters
mdict (dict) – A metadata dictionary as returned by
get_quanta_metadata()
- Returns
mdict – The same metadata dictionary with some values added under the root-level
nx_meta
key- Return type
nexusLIMS.extractors.thumbnail_generator module¶
Generate preview images from various data files.
Data files are represented as either HyperSpy Signals, or as raw data files (in the case of tiff images)
- nexusLIMS.extractors.thumbnail_generator.add_annotation_markers(s)[source]¶
Add annotation markers from a DM3/DM4 file to a HyperSpy signal.
Read annotations from a signal originating from DigitalMicrograph and convert the ones (that we can) into Hyperspy markers for plotting. Adapted from a currently (at the time of writing) open pull request in HyperSpy.
- Parameters
s (
hyperspy.signal.BaseSignal
(or subclass)) – The HyperSpy signal for which a thumbnail should be generated
- nexusLIMS.extractors.thumbnail_generator.down_sample_image(fname: Path, out_path: Path, output_size: Optional[Tuple[int, int]] = None, factor: Optional[int] = None)[source]¶
Load an image file from disk, down-sample it to the requested dpi, and save.
Sometimes the data doesn’t need to be loaded as a HyperSpy signal, and it’s better just to down-sample existing image data (such as for .tif files created by the Quanta SEM).
- Parameters
fname – The filepath that will be resized. All formats supported by
PIL.Image.open()
can be usedout_path – A path to the desired thumbnail filename. All formats supported by
PIL.Image.Image.save()
can be used.output_size – A tuple of ints specifying the width and height of the output image. Either this argument or
factor
should be provided (not both).factor – The multiple of the image size to reduce by (i.e. a value of 2 results in an image that is 50% of each original dimension). Either this argument or
output_size
should be provided (not both).
- nexusLIMS.extractors.thumbnail_generator.image_to_square_thumbnail(f: Path, out_path: Path, output_size: int) bool [source]¶
Generate a preview thumbnail from a non-data image file.
Images of common filetypes will be transformed into 500 x 500 pixel images by first scaling the largest dimension to 500 pixels and then padding the resulting image to square.
- Parameters
f – The string of the path of an image file for which a thumbnail should be generated.
out_path – A path to the desired thumbnail filename. All formats supported by
save()
can be used.output_size – The desired resulting size of the thumbnail image.
- Return type
Whether a preview was generated
- nexusLIMS.extractors.thumbnail_generator.sig_to_thumbnail(s, out_path: Path, dpi: int = 92)[source]¶
Generate a preview thumbnail from an arbitrary HyperSpy signal.
For a 2D signal, the signal from the first navigation position is used (most likely the top- and left-most position. For a 1D signal (i.e. a spectrum or spectrum image), the output depends on the number of navigation dimensions:
0: Image of spectrum
1: Image of linescan (a la DigitalMicrograph)
2: Image of spectra sampled from navigation space
2+: As for 2 dimensions
- Parameters
s (
hyperspy.signal.BaseSignal
(or subclass)) – The HyperSpy signal for which a thumbnail should be generatedout_path – A path to the desired thumbnail filename. All formats supported by
savefig()
can be used.dpi (int) – The “dots per inch” resolution for the outputted figure
- Returns
f – Handle to a matplotlib Figure
- Return type
Notes
This method heavily utilizes HyperSpy’s existing plotting functions to figure out how to best display the image
- nexusLIMS.extractors.thumbnail_generator.text_to_thumbnail(f: Path, out_path: Path, output_size: int = 500) Union[Figure, bool] [source]¶
Generate a preview thumbnail from a text file.
For a text file, the contents will be formatted and written to a 500x500 pixel jpg image of size 5 in by 5 in.
If the text file has many newlines, it is probably data and the first 42 characters of each of the first 20 lines of the text file will be written to the image.
If the text file has a few (or fewer) newlines, it is probably a manually generated note and the text will be written to a 42 column, 18 row box until the space is exhausted.
- Parameters
- Returns
Handle to a matplotlib Figure, or the value False if a preview could not be generated
- Return type
f
nexusLIMS.extractors.utils module¶
Methods (primarily intended to be private) that are used by the other extractors.