Extractors#

NexusLIMS extracts metadata from various electron microscopy file formats to create comprehensive experimental records. This page documents the supported file types, extraction capabilities, and level of support for each format.

Quick Reference#

Instrument/Software	Extension	Support	Data Types	Key Features
Gatan DigitalMicrograph	.dm3, .dm4	✅ Full	TEM/STEM Imaging, EELS, EDS, Diffraction, Spectrum Imaging	Comprehensive metadata, instrument-specific parsers, automatic type detection
FEI/Thermo Fisher SEM/FIB	.tif	✅ Full	SEM Imaging	Beam settings, stage position, vacuum conditions, detector config
Zeiss Orion HIM / Fibics HIM	.tif	✅ Full	SEM/HIM Imaging	Helium ion beam settings, stage position, detector configuration, image metadata
Tescan (P)FIB/SEM	.tif	✅ Full	SEM Imaging	High-voltage settings, stage position, detector gain/offset, scan parameters, stigmator values
FEI TIA Software	.ser, .emi	✅ Full	TEM/STEM Imaging, Diffraction, EELS/EDS Spectra & SI	Multi-file support, experimental conditions, acquisition parameters
EDAX (Genesis, TEAM)	.spc	✅ Full	EDS Spectrum	Detector angles, energy calibration, element identification
EDAX & others (standard)	.msa	✅ Full	EDS Spectrum	EMSA/MAS standard format, vendor extensions supported
Tofwerk fibTOF pFIB-ToF-SIMS	.h5	✅ Full	pFIB-ToF-SIMS Spectrum Image	FIB parameters, mass range, TIC map, ion depth profiles, composite preview
Various (exported images)	.png, .jpg, .tiff, .bmp, .gif	⚠️ Preview	Unknown	Basic metadata, square thumbnail generation
Various (logs, notes)	.txt	⚠️ Preview	Unknown	Basic metadata, text-to-image preview
Unknown Files	others	❌ Minimal	Unknown	Timestamp only, placeholder preview

Legend: ✅ Full = Comprehensive metadata extraction
⚠️ Preview = Basic metadata + custom preview
❌ Minimal = Timestamp only

Overview#

The extraction system uses a plugin-based architecture for flexible and extensible metadata extraction. The system consists of three main components:

Extractor Plugins: Parse comprehensive metadata from supported file formats
Preview Generator Plugins: Create thumbnail images for visualization
Extractor Registry: Manages plugin discovery, selection, and execution

Plugin Architecture#

The plugin system provides:

Auto-discovery: Plugins are automatically discovered from nexusLIMS/extractors/plugins/
Priority-based selection: Multiple extractors can support the same file type, with higher priority extractors preferred
Content sniffing: Extractors can examine file contents beyond just extensions
Multi-signal support: Files containing multiple datasets (e.g., DM3/DM4) are automatically expanded
Defensive design: All extractors implement robust error handling with graceful fallbacks

Extraction is performed automatically during record building. Each file is processed by the best available extractor, and both metadata (saved as JSON) and preview images (saved as PNG thumbnails) are generated in parallel to the original data files.

Fully Supported Formats#

These formats have dedicated extractors that parse comprehensive metadata specific to their structure.

Digital Micrograph Files (.dm3, .dm4)#

Support Level: ✅ Full

Description: Files saved by Gatan’s DigitalMicrograph (GMS) software, commonly used for TEM/STEM imaging, EELS, and EDS data.

Extractor Module: nexusLIMS.extractors.plugins.digital_micrograph

Key Metadata Extracted:

Microscope information (voltage, magnification, mode, illumination mode)
Stage position (X, Y, Z, α, β coordinates)
Acquisition device and camera settings (binning, exposure time)
Image processing settings
EELS spectrometer settings (if applicable)
- Acquisition parameters (exposure, integration time, number of frames)
- Experimental conditions (collection/convergence angles)
- Spectrometer configuration (dispersion, energy loss, slit settings)
- Processing information
EDS detector information (if applicable)
- Acquisition settings (dwell time, dispersion, energy range)
- Detector configuration (angles, window type, solid angle)
- Live/real time and count rates
Spectrum imaging parameters (if applicable)
- Pixel time, scan mode, spatial sampling
- Drift correction settings
- Acquisition duration

Instrument-Specific Parsing:

The extractor includes specialized parsers for specific instruments:

FEI Titan STEM (FEI-Titan-STEM): Custom handling for EFTEM diffraction mode detection
FEI Titan TEM (FEI-Titan-TEM): Parses Tecnai metadata tags including gun settings, lens strengths, apertures, filter settings, and stage positions
JEOL JEM 3010 (JEOL-JEM-TEM): Basic parsing with filename-based diffraction pattern detection

Data Types Detected:

TEM/STEM Imaging
TEM/STEM Diffraction
TEM/STEM EELS (Spectrum)
TEM/STEM EDS (Spectrum)
EELS/EDS Spectrum Imaging

Notes:

Automatically detects dataset type based on metadata (Image, Spectrum, SpectrumImage, Diffraction)
For stacked images, metadata is extracted from the first plane
Session info (Operator, Specimen, Detector) may be unreliable and is flagged in warnings

FEI/Thermo Fisher TIF Files (.tif)#

Support Level: ✅ Full

Description: TIFF images saved by FEI/Thermo Fisher FIB and SEM instruments (Quanta, Helios, etc.) with embedded metadata.

Extractor Module: nexusLIMS.extractors.plugins.fei_tif

Key Metadata Extracted:

Beam settings (voltage, emission current, spot size, field widths, working distance)
Beam positioning (beam shift, tilt, scan rotation)
Stage position (X, Y, Z, R, α, tilt angles)
Scan parameters (dwell time, frame time, pixel size, field of view)
Detector configuration (name, brightness, contrast, signal type, grid voltage)
System information (software version, chamber type, column type, vacuum pump)
Vacuum conditions (mode, chamber pressure)
Image settings (drift correction, frame integration, magnification mode)
Acquisition date and time
Specimen temperature (if available)
User/operator information

Special Features:

Handles both config-style and XML metadata sections
Supports MultiGIS gas injection system metadata
Converts units to display-friendly formats (e.g., SI to μm, μA, etc.)
Automatic detection and parsing of tilt correction settings

Data Types Detected:

SEM Imaging

Preview Generation:

Uses 2× downsampling for efficient thumbnail creation

Notes:

User/operator metadata is flagged as potentially unreliable (users may remain logged in)
Some instruments write duplicate metadata sections which are handled automatically
Works with both older config-style metadata and newer XML-based metadata

Zeiss Orion / Fibics HIM TIF Files (.tif)#

Support Level: ✅ Full

Description: TIFF images saved by Zeiss Orion and Fibics helium ion microscope (HIM) systems with embedded XML metadata in custom TIFF tags.

Extractor Module: nexusLIMS.extractors.plugins.orion_HIM_tif

File Format Details:

The extractor uses content sniffing to detect HIM TIFF files by checking for custom TIFF tags:

Zeiss Orion: TIFF tag 65000 contains <ImageTags> XML metadata
Fibics: TIFF tag 51023 contains <Fibics> XML metadata

This content-based detection allows proper identification even when files use .tif extension (used by multiple instruments).

Key Metadata Extracted:

Zeiss Orion Variant:

Helium ion beam settings (energy, current, spot size, aperture)
Beam positioning (shift, scan rotation)
Stage position (X, Y, Z, R, tilt)
Scan parameters (dwell time, frame time, pixel size, resolution)
Detector configuration (type, brightness, contrast)
System information (vacuum pressure, chamber type)
Image acquisition parameters (line averaging, digital gain, overlay settings)
Scan speed and resolution settings

Fibics Variant:

Helium ion beam parameters (energy, current, spot settings)
Stage coordinates and angles
Scan configuration (pixel dwell, acquisition time, resolution)
Detector settings and signal type
Image optimization parameters (contrast, brightness)
Data collection timestamps

Data Types Detected:

HIM Imaging

Special Features:

Content-based detection to differentiate from other TIFF formats (FEI, standard image files)
Priority 150 - checked before generic TIFF extractors to properly identify HIM files
Handles both Zeiss and Fibics XML metadata formats
Robust error handling for malformed or missing XML metadata
Supports .tif and .tiff extensions

Preview Generation:

Converts image to square thumbnail (500×500 px default)
Maintains aspect ratio with padding

Notes:

The extractor uses content sniffing rather than extension alone, ensuring correct identification even if .tif files from multiple instruments are present
If XML metadata is missing or corrupted, the extractor gracefully falls back to basic file information
Both Zeiss Orion and Fibics HIM variants store metadata as embedded XML, making extraction reliable across different software versions

Tescan PFIB/SEM TIF Files (.tif)#

Support Level: ✅ Full

Description: TIFF images saved by Tescan PFIB (Focused Ion Beam) and SEM instruments (e.g., AMBER X) with embedded INI-style metadata in custom TIFF tags or sidecar .hdr files.

Extractor Module: nexusLIMS.extractors.plugins.tescan_tif

File Format Details:

The extractor uses a three-tier strategy for metadata extraction:

Primary: Extracts metadata from embedded TIFF tag 50431 (custom Tescan metadata tag) containing INI-style metadata
Fallback: If embedded metadata fails or is incomplete, looks for a sidecar .hdr file with full metadata in INI format ([MAIN] and [SEM] sections)
Supplementary: Always extracts basic TIFF tags (tag 271 for Make, tag 305 for Software, tag 315 for Artist) to supplement or override other metadata

This multi-tier approach ensures complete metadata is available whether metadata is embedded in the TIFF or stored in a sidecar .hdr file.

Key Metadata Extracted:

From [MAIN] section:

Instrument identification (Device, Model, Serial Number)
User information (Operator name)
Acquisition timestamp (Date and Time)
Magnification
Software version

From [SEM] section:

Beam parameters (High Voltage, Spot Size, Emission Current)
Stage position (X, Y, Z coordinates and Rotation/Tilt angles)
Scan settings (Dwell time, Scan mode, Rotation)
Detector configuration (Name, Gain, Offset)
Vacuum conditions (Chamber pressure)
Stigmator values (X and Y corrections)
Gun type configuration
Working Distance
Session ID for traceability

Unit Conversions:

Magnification: Converted from raw values to kiloX (kX)
Voltages: Converted from millivolts to kilovolts (kV)
Distances: Converted from meters to millimeters (mm) or nanometers (nm) as appropriate
Currents: Converted from amperes to microamperes (μA)
Pressure: Converted to millipascals (mPa)
Pixel sizes: Calculated from image dimensions and field of view, converted to nanometers (nm)

Data Types Detected:

SEM Imaging

Special Features:

Priority 150 - Checked before generic TIFF extractors to properly identify Tescan files
Content-based detection via custom TIFF tags even if .hdr file is missing
Comprehensive stage position tracking (X, Y, Z, Rotation, Tilt)
Detector settings extraction (Gain and Offset values)
Automatic conversion of physics units to display-friendly formats
Empty field exclusion - Fields with empty values are not included in output
Session tracking with unique Session ID

Preview Generation:

Converts image to square thumbnail (500×500 px default)
Maintains aspect ratio with padding

Warnings:

The extractor flags the following fields as potentially unreliable:

Operator: May reflect a logged-in user rather than the actual operator who collected the data

Compatibility Notes:

Tescan AMBER X: Fully tested and verified
Other Tescan SEM/PFIB Instruments: Likely compatible due to consistent INI metadata format, but not yet tested
Both .tif and .tiff extensions are supported

Notes:

If .hdr file is present but cannot be read, the extractor falls back to embedded TIFF tag metadata
If both sidecar and embedded metadata are available, the sidecar is preferred (more reliable)
The extractor gracefully handles missing or incomplete metadata sections
Pixel size is calculated from magnification and field width when not directly available

FEI TIA Files (.ser, .emi)#

Support Level: ✅ Full

Description: Files saved by FEI’s TIA (Tecnai Imaging and Analysis) software. Data is stored in .ser files with accompanying .emi metadata files.

Extractor Module: nexusLIMS.extractors.plugins.fei_emi

File Relationship:

Each .emi file can reference multiple .ser data files (named as basename_1.ser, basename_2.ser, etc.)
Both files are required for complete metadata extraction
The extractor automatically locates the corresponding .emi file for a given .ser file

Key Metadata Extracted:

Manufacturer and acquisition date
Microscope accelerating voltage and tilt settings
Acquisition mode and beam position
Camera settings (name, binning, dwell time, frame time)
Detector configuration (energy resolution, integration time)
Scan parameters (area, drift correction, spectra count)
Experimental conditions from TIA software

Data Types Detected:

TEM/STEM Imaging
TEM/STEM Diffraction
EELS/EDS Spectrum and Spectrum Imaging

Type Detection Logic:

Uses Mode metadata field (if present) to distinguish TEM/STEM and Image/Diffraction
Signal dimension determines Image vs. Spectrum
Navigation dimension presence indicates Spectrum Imaging
Heuristic analysis of axis values used to distinguish EELS vs. EDS when not explicitly labeled

Notes:

If .emi file is missing, extractor falls back to .ser file only (limited metadata)
Multiple signals in one .emi file are handled; metadata is extracted from the appropriate index
Later signals in a multi-file series may have less metadata than the first

EDAX EDS Files (.spc, .msa)#

Support Level: ✅ Full

Description: EDS spectrum files saved by EDAX software (Genesis, TEAM, etc.) in proprietary (.spc) or standard EMSA (.msa) format.

Extractor Module: nexusLIMS.extractors.plugins.edax

.spc Files#

Key Metadata Extracted:

Azimuthal and elevation angles
Live time
Detector energy resolution
Accelerating voltage
Channel size and energy range
Number of spectrum channels
Stage tilt
Identified elements

Data Types Detected:

EDS Spectrum

.msa Files#

Description: MSA (EMSA/MAS) format is a standard spectral data format. See the Microscopy Society of America specification.

Key Metadata Extracted:

All standard MSA fields (version, format, data dimensions)
EDAX-specific extensions (angles, times, resolutions)
Analyzer and detector configuration
User-selected elements
Amplifier settings
FPGA version
Originating file information
Comments and title

Data Types Detected:

EDS Spectrum

Notes:

.msa files are vendor-agnostic and may be exported from various EDS software
EDAX adds custom fields beyond the MSA standard
Both formats are single-spectrum only (not spectrum images)

Tofwerk fibTOF pFIB-ToF-SIMS Files (.h5)#

Support Level: ✅ Full

Description: HDF5 files produced by the Tofwerk fibTOF time-of-flight secondary ion mass spectrometry (ToF-SIMS) system integrated with a Tescan plasma focused ion beam (pFIB). Two variants exist: raw files (acquired directly by TofDAQ, containing raw event lists) and pre-processed files (post-processed in Tofwerk software, containing integrated peak intensities).

Extractor Module: nexusLIMS.extractors.plugins.tofwerk_pfib

Preview Generator: nexusLIMS.extractors.plugins.preview_generators.tofwerk_pfib_preview

File Format Detection:

The extractor uses content sniffing to identify Tofwerk fibTOF files by checking for all of:

FullSpectra/SumSpectrum HDF5 dataset
FIBParams HDF5 group
FIBImages HDF5 group
TofDAQ Version root attribute

This ensures correct identification without relying on the .h5 extension alone (which is shared with other HDF5-based formats).

Two File Variants:

Raw (File Variant = "raw"): Contains FullSpectra/EventList (variable-length uint16 array of ion arrival times per pixel). No PeakData/PeakData dataset present. This is the file type written during acquisition.
Pre-processed (File Variant = "pre-processed"): Contains PeakData/PeakData (float32 array of integrated peak intensities, shape NbrWrites × NbrSegments × NbrX × NbrPeaks). Created by post-processing in the Tofwerk software. The raw EventList is not present.

Key Metadata Extracted:

Acquisition creation time (from AcquisitionLog/Log[0]['timestring'], which includes timezone; falls back to HDF5 File Creation Time root attribute or file mtime)
FIB hardware vendor (e.g., Tescan)
Accelerating voltage (kV)
Beam current (A)
Field of view (mm, from FIBParams.ViewField)
Pixel size (µm/pixel, derived as ViewField_mm × 1e3 / NbrX)
Data dimensions (NbrWrites × NbrSegments × NbrX sputter depth × Y × X pixels)
Number of peaks in the peak table
Mass range minimum and maximum (Da, from FullSpectra/MassAxis)
Ion mode (positive or negative)
Chamber pressure (Pa, mean over all writes from FibParams/FibPressure/TwData)
Fiblys GUI version and TofDAQ DAQ version
File variant (raw vs. pre-processed)

All vendor-specific fields are stored in nx_meta["extensions"].

Data Types Detected:

pFIB-ToF-SIMS Spectrum Image (PFIB_TOFSIMS)

Preview Generation:

The preview generator produces a composite 1500×1500 px PNG with a layout that differs by file variant:

Raw file layout (2-row grid):

[ FIB SE image ]  [ TIC map ]     [ Depth profile ]
[     Sum mass spectrum (full width, 3 cols)       ]

Pre-processed file layout (2-row grid):

[ FIB SE image ]  [ TIC map ]     [ RGB composite (top 3 peaks) ]
[ Sum spectrum (2 cols) ]         [ Depth profiles (top 3 peaks) ]

FIB SE image: Secondary electron image from the first FIB scan in FIBImages/Image0000
TIC map: Total ion count map summed across all sputter writes; computed one write at a time to avoid loading the full ragged 4D event array into memory
Depth profile: Total ion signal vs. sputter write index
Sum mass spectrum: FullSpectra/SumSpectrum with ion species annotated (top-N peaks ≥ 2 Da apart, log y-scale, positive/negative ion tables)
RGB composite (pre-processed only): Top 3 peaks by total counts, displayed as R/G/B channels with percentile clipping

Notes:

The FIBParams.ViewField attribute is in millimeters (not meters); pixel size is derived as ViewField_mm × 1e3 / NbrX µm/pixel
Configuration File Contents contains ADC voltage range parameters (Ch*FullScale) which are not spatial dimensions and are not used for pixel size calculation
At harvest time, only the raw file is typically available; both variants are fully supported

Partially Supported Formats#

These formats receive basic metadata extraction and custom preview generation, but do not have dedicated metadata parsers.

Image Formats#

Support Level: ⚠️ Preview Only

Formats: .png, .tiff, .bmp, .gif, .jpg, .jpeg

Extractor Module: nexusLIMS.extractors.plugins.basic_metadata

Preview Generator: nexusLIMS.extractors.plugins.preview_generators.image_preview

Metadata Extracted:

File creation/modification time
Instrument ID (inferred from file path)

Preview Generation:

Converts image to square thumbnail (500×500 px default)
Maintains aspect ratio with padding

Notes:

These are typically auxiliary files (screenshots, exported images, etc.)
Marked as DatasetType: Unknown in records

Text Files (.txt)#

Support Level: ⚠️ Preview Only

Extractor Module: nexusLIMS.extractors.plugins.basic_metadata

Preview Generator: nexusLIMS.extractors.plugins.preview_generators.text_preview

Metadata Extracted:

File creation/modification time
Instrument ID (inferred from file path)

Preview Generation:

Renders first ~20 lines of text as image thumbnail
Uses monospace font for readability

Notes:

Common for log files, notes, and exported data
Marked as DatasetType: Unknown in records

Unsupported Formats#

Support Level: ❌ Minimal

Files with extensions not in the above lists receive minimal processing:

Metadata Extracted:

File creation/modification time only
Marked as DatasetType: Unknown and Data Type: Unknown

Preview Generation:

A placeholder image is used indicating extraction failed

Handling Strategy:

The system’s behavior for unsupported files depends on the NX_FILE_STRATEGY environment variable:

exclusive (default): Only files with full extractors are included in records
inclusive: All files are included, with basic metadata for unsupported types

How Extraction Works#

File Discovery and Strategy#

During record building, NexusLIMS finds files within the session time window using the configured strategy:

# Only include files with dedicated extractors
NX_FILE_STRATEGY=exclusive

# Include all files found
NX_FILE_STRATEGY=inclusive

Extraction Process#

For each discovered file:

Plugin Discovery: The extractor registry auto-discovers all available extractor plugins from nexusLIMS/extractors/plugins/
Extractor Selection:
- The registry uses priority-based selection with content sniffing support
- Extractors are tried in descending priority order (higher priority first)
- Each extractor’s supports() method is called to determine compatibility
- If no specialized extractor matches, the BasicFileInfoExtractor fallback is used
Metadata Parsing: The selected extractor reads the file and returns a list of metadata dictionaries:
- Each dictionary contains nx_meta with NexusLIMS-specific metadata (standardized keys)
- Additional keys contain format-specific “raw” metadata
- For multi-signal files, the list contains one entry per signal
- For single-signal files, the list contains one entry for consistency
Metadata Writing: JSON file(s) are written to parallel path in NX_DATA_PATH
- Path: {NX_DATA_PATH}/{instrument}/{path/to/file}.json
- For multi-signal files: Multiple JSON files with _signalN suffixes
Preview Generation: Thumbnail PNG(s) are created
- Path: {NX_DATA_PATH}/{instrument}/{path/to/file}.thumb.png
- For multi-signal files: Multiple previews with _signalN.thumb.png suffixes
- Size: 500×500 px (default)
- Uses plugin-based preview generators with fallback to legacy methods

Expected Metadata Structure#

All extractors return a list of metadata dictionaries, with one entry per signal or dataset:

[
    {
        "nx_meta": {
            "Creation Time": "ISO 8601 timestamp with timezone",
            "Data Type": "Category_Modality_Technique",  # e.g., "STEM_EDS"
            "DatasetType": "Image|Spectrum|SpectrumImage|Diffraction|Misc|Unknown",
            "Data Dimensions": "(height, width)" or "(channels,)",  # Optional
            "Instrument ID": "instrument-name",  # Optional
            "warnings": ["field1", ["field2"]],  # Optional, list of field names flagged as unreliable
            # ... format-specific keys ...
        },
        # Additional format-specific metadata sections
        "ImageList": { ... },  # Example: DM3/DM4 files
        "ObjectInfo": { ... },  # Example: FEI .ser/.emi files
        # etc.
    },
    # For multi-signal files, additional entries follow the same structure
    {
        "nx_meta": { ... },
        # ... additional signal ...
    }
]

Return Format:

Single-signal files: Return a list with one element [{...}]
Multi-signal files: Return a list with multiple elements, one per signal [{...}, {...}]

Validation Using Pydantic Models (New in v2.2.0 - Work in Progress):

The nx_meta section is validated against Pydantic schemas to improve data consistency and quality. Starting with v2.2.0, NexusLIMS provides basic metadata validation with EM Glossary integration and unit standardization. This validation system is still evolving as standardization efforts progress.

Schema Selection by DatasetType:

The validation schema is chosen automatically based on the DatasetType field:

DatasetType	Schema	Purpose	Type-Specific Fields
`Image`	`ImageMetadata`	TEM/SEM/STEM images	`magnification`, `pixel_size`, `working_distance`
`Spectrum`	`SpectrumMetadata`	EELS/EDS spectra	`channel_size`, `starting_energy`, `live_time`
`SpectrumImage`	`SpectrumImageMetadata`	Spectrum imaging	`pixel_time`, `magnification`, `channel_size`
`Diffraction`	`DiffractionMetadata`	Diffraction patterns	`camera_length`, `exposure_time`, `convergence_angle`
`Misc` or `Unknown`	`NexusMetadata`	Other data	Base fields only

Common Core Fields (All Schemas):

All schemas inherit these required fields from nexusLIMS.schemas.metadata.NexusMetadata:

creation_time (datetime with timezone): When the data was acquired
data_type (str): Classification string (e.g., "STEM_Imaging", "TEM_EDS")
dataset_type (DatasetType enum): One of Image, Spectrum, SpectrumImage, Diffraction, Misc, Unknown

Optional Core Fields:

data_dimensions (str): Shape of the data (e.g., "(1024, 1024)")
instrument_id (str): Identifier of the instrument used
warnings (list[str]): Field names flagged as potentially unreliable
extensions (dict): Additional vendor/instrument-specific metadata

EM Glossary Standardized Fields:

Type-specific schemas include fields standardized against the Electron Microscopy Glossary v2.0 where available. Note that the EM Glossary vocabulary is also still developing, so not all fields have complete standardized definitions yet:

Beam parameters: acceleration_voltage, beam_current, emission_current, convergence_angle
Stage/Sample: stage_x, stage_y, stage_z, tilt_alpha, tilt_beta
Detector: detector_type, detector_angle, detector_distance, detector_binning
Acquisition: dwell_time, acquisition_time, live_time, pixel_time
Optical: magnification, working_distance, camera_length, pixel_size_x, pixel_size_y
Spectrum: channel_size, starting_energy, takeoff_angle

These fields are defined using the emg_field() helper function, which automatically injects EM Glossary metadata (display names, definitions, URIs) for standardization and interoperability where EMG terms are available.

See EM Glossary Field Reference for the complete field mapping and Internal Metadata Schema System for details on the schema system.

Validation Example:

When an extractor returns metadata, NexusLIMS automatically validates it:

from datetime import datetime, timezone
from nexusLIMS.schemas.metadata import ImageMetadata
from pint import Quantity

# Valid ImageMetadata that passes validation
nx_meta = {
    "creation_time": datetime(2024, 1, 15, 10, 30, 0, tzinfo=timezone.utc),
    "data_type": "STEM_Imaging",
    "dataset_type": "Image",
    "data_dimensions": "(1024, 1024)",
    "instrument_id": "FEI-Titan-TEM-635816",
    "warnings": ["Operator"],  # Flag unreliable fields
    # EM Glossary standardized fields with Pint Quantities
    "acceleration_voltage": Quantity(200, "kV"),
    "magnification": Quantity(50000, "dimensionless"),
    "pixel_size_x": Quantity(2.5, "nm"),
    "working_distance": Quantity(5.2, "mm"),
    # Extensions for vendor-specific or non-standardized data
    "extensions": {
        "detector_brightness": 50.0,
        "scan_rotation": Quantity(45, "degree"),
    },
}
validated = ImageMetadata.model_validate(nx_meta)

# Invalid metadata raises ValidationError with details
bad_meta = {
    "creation_time": datetime(2024, 1, 15, 10, 30, 0),  # ❌ Missing timezone!
    "data_type": "STEM_Imaging",
    "dataset_type": "InvalidType",  # ❌ Not in allowed values!
}
ImageMetadata.model_validate(bad_meta)  # Raises pydantic.ValidationError

Unit Handling:

NexusLIMS automatically normalizes physical quantities to preferred units:

Voltages → kilovolts (kV)
Distances → nanometers (nm) or millimeters (mm) depending on scale
Times → seconds (s)
Energies → electron volts (eV)

You can provide quantities in any compatible unit, and validation will convert them automatically:

# These are all equivalent after validation:
{"acceleration_voltage": Quantity(200, "kV")}
{"acceleration_voltage": Quantity(200000, "V")}
{"acceleration_voltage": Quantity(0.2, "MV")}

See Internal Metadata Schema System for complete unit handling details.

The nx_meta section in each element contains standardized, human-readable metadata that is displayed in the experimental record. The additional sections contain the complete “raw” metadata tree for reference.

This consistent list-based approach combined with Pydantic validation ensures the Activity layer can automatically and safely expand multi-signal files into multiple datasets in the experimental record.

Extension Fields in Practice#

The extensions dictionary provides a flexible way to include vendor-specific or non-standardized metadata that doesn’t fit into the core schema fields. Here are real-world examples from NexusLIMS extractors:

FEI/Thermo Quanta TIF Extractor (nexusLIMS/extractors/plugins/fei_tif.py:428):

# Vendor-specific vacuum and detector settings
extensions = {
    "chamber_type": "LargeChamber",
    "column_type": "HighResolution",
    "vacuum_mode": "HighVacuum",
    "chamber_pressure": Quantity(5.2e-5, "Pa"),
    "detector_signal": "ETD",
    "detector_grid_voltage": Quantity(300, "V"),
    "drift_correction_enabled": True,
    "frame_integration": 4,
}

Digital Micrograph DM3/DM4 Extractor (nexusLIMS/extractors/plugins/digital_micrograph.py:337):

# EELS-specific detector and acquisition settings
extensions = {
    "eels_dispersion": Quantity(0.5, "eV/channel"),
    "eels_energy_loss": Quantity(150, "eV"),
    "eels_slit_width": Quantity(2.5, "eV"),
    "eels_collection_angle": Quantity(15, "mrad"),
    "spectrometer_name": "GIF Quantum 965",
    "drift_correction_enabled": True,
}

Tescan TIF Extractor (nexusLIMS/extractors/plugins/tescan_tif.py:285):

# Instrument-specific settings not covered by EM Glossary
extensions = {
    "stigmator_x": Quantity(12.5, "percent"),
    "stigmator_y": Quantity(-8.3, "percent"),
    "gun_type": "Schottky",
    "scan_mode": "FrameIntegration",
    "session_id": "abc123xyz789",
}

When to Use Extensions:

Use the extensions dictionary when:

Vendor-specific features: Settings unique to one manufacturer (e.g., FEI MultiGIS, Gatan GIF)
Non-standardized parameters: Values not yet in EM Glossary (e.g., drift correction flags)
Instrument-specific calibrations: Stigmator values, lens strengths, custom aperture settings
Session tracking: Unique IDs or timestamps for instrument logging
Software-specific metadata: Settings from acquisition software that don’t map to standard fields

Do NOT use extensions for:

Standard EM parameters covered by EM Glossary (voltage, magnification, stage position, etc.)
Required fields (creation_time, data_type, dataset_type)
Data that should be validated (use core fields for type safety)

See Helper Functions for the add_to_extensions() helper function and Writing Extractor Plugins for complete guidance on using extensions in your own extractors.

Command-Line Extraction#

The nexuslims extract command lets you extract metadata and generate preview thumbnails for individual files directly from the terminal, without writing any Python code and without a full NexusLIMS deployment.

Quick reference#

# Print extracted metadata as JSON
nexuslims extract /path/to/file.dm4

# Extract metadata only (skip preview generation)
nexuslims extract --no-preview /path/to/file.msa

# Generate a preview thumbnail only, saved to a specific path
nexuslims extract --no-metadata --preview-path /tmp/thumb.png /path/to/file.tif

# Also write the JSON sidecar to the NexusLIMS data directory
nexuslims extract --write /path/to/file.dm4

# Overwrite existing outputs
nexuslims extract --write --overwrite /path/to/file.dm4

# Pipe metadata into jq
nexuslims extract /path/to/file.dm4 | jq '.nx_meta.DatasetType'

By default:

Metadata is printed to stdout as pretty-printed JSON.
A preview thumbnail is generated and written alongside the input file as <filename>.thumb.png; its path is printed to stderr.
The metadata JSON is not written to disk unless --write / -w is given. With --write, the JSON is written alongside the input file as <filename>.json (or to the corresponding path under NX_DATA_PATH if the file is under NX_INSTRUMENT_DATA_PATH).
Use -p / --preview-path to control where the thumbnail is saved.

For full option documentation see nexuslims extract in the CLI reference.

Standalone Library Usage#

The NexusLIMS extractor system can be used as a standalone Python library without a full NexusLIMS deployment. No .env file, database, NEMO instance, or CDCS server is required. This is useful for:

Quickly inspecting microscopy file metadata in a Jupyter notebook
Batch-processing files on a workstation that isn’t connected to a NexusLIMS deployment
Prototyping new analysis workflows that read instrument metadata

Installation#

pip install nexusLIMS
# or, if using uv:
uv add nexusLIMS

No environment variables need to be set for basic extraction.

Quickstart#

from pathlib import Path
from nexusLIMS.extractors import parse_metadata

# Works with no .env, no database, no NEMO / CDCS
metadata_list, _ = parse_metadata(
    Path("my_data.dm3"),
    write_output=False,     # don't try to write a JSON sidecar
    generate_preview=False, # don't try to generate a thumbnail
)

nx = metadata_list[0]["nx_meta"]
print(nx["Creation Time"])       # ISO-8601 timestamp
print(nx["DatasetType"])         # "Image", "Spectrum", etc.
print(nx["Data Type"])           # e.g., "STEM_Imaging"

Default behaviour when called with `write_output=True`#

parse_metadata defaults to write_output=True and generate_preview=True so that the full record-building pipeline works without any code changes. When NexusLIMS configuration is not available those two steps are silently skipped and a warning is logged:

WARNING nexusLIMS.extractors: NexusLIMS config unavailable; skipping metadata
file write (pass write_output=False to suppress this warning)

Passing write_output=False, generate_preview=False suppresses the warnings and makes the intent explicit.

Low-level API#

You can also use the extractor registry directly for more control:

from pathlib import Path
from nexusLIMS.extractors.base import ExtractionContext
from nexusLIMS.extractors.registry import get_registry

path = Path("my_data.dm3")
context = ExtractionContext(file_path=path, instrument=None)

registry = get_registry()
extractor = registry.get_extractor(context)

print(f"Selected extractor: {extractor.name}")

metadata_list = extractor.extract(context)
for i, m in enumerate(metadata_list):
    print(f"Signal {i}: {m['nx_meta']['DatasetType']}")

What degrades gracefully#

Feature	Without config	With config
Metadata extraction	✅ Full	✅ Full
Schema validation	✅ Full	✅ Full
Instrument identification from DB	⚠️ Returns `None`	✅ Looks up DB
JSON sidecar write (`write_output`)	⚠️ Skipped + warning	✅ Written
Preview generation (`generate_preview`)	⚠️ Skipped + warning	✅ Generated
Local instrument profiles	⚠️ Skipped (built-in profiles active)	✅ Loaded

Supported formats#

All formats listed in the Quick Reference table at the top of this page work in standalone mode. The extractor registry auto-discovers plugins on first use with no configuration needed.

Example notebook#

See Standalone Extractor Usage for a complete, runnable Jupyter notebook that downloads real microscopy test files from the project repository and walks through both the high-level and low-level APIs, including the graceful-degradation behaviour when no config is present.

Adding Support for New Formats#

See Writing Extractor Plugins for instructions on how to write a new extractor.

API Reference#

Extractor Registry Properties#

The nexusLIMS.extractors.registry.ExtractorRegistry class provides convenient properties for querying registered extractors:

extractors Property : Returns a dictionary mapping file extensions to lists of extractor classes, sorted by priority (descending). This property automatically triggers plugin discovery if not already performed.

from nexusLIMS.extractors.registry import get_registry

registry = get_registry()
extractors_by_ext = registry.extractors
# Returns: {
#   'dm3': [<class digital_micrograph.DM3Extractor'>], 
#   'dm4': [<class 'digital_micrograph.DM3Extractor'>], 
#   'msa': [<class 'edax.MsaExtractor'>], 
#   'spc': [<class 'edax.SpcExtractor'>], 
#   ... 
# }

extractor_names Property : Returns a deduplicated, alphabetically-sorted list of all registered extractor class names. Includes both extension-specific and wildcard extractors. This property also triggers auto-discovery if needed.

registry = get_registry()
names = registry.extractor_names
# Returns: ["BasicFileInfoExtractor", "DM3Extractor", ..., "TescanTiffExtractor"]

Extractor Modules#

For complete API documentation of the extractor modules, see:

nexusLIMS.extractors - Main extractor module
nexusLIMS.extractors.registry - Extractor registry and auto-discovery
nexusLIMS.extractors.plugins.digital_micrograph - DM3/DM4 file extractor
nexusLIMS.extractors.plugins.fei_tif - FEI/Thermo TIF file extractor
nexusLIMS.extractors.plugins.orion_HIM_tif - Zeiss Orion / Fibics HIM TIF file extractor
nexusLIMS.extractors.plugins.tescan_tif - Tescan PFIB/SEM TIF file extractor
nexusLIMS.extractors.plugins.fei_emi - FEI TIA .ser/.emi file extractor
nexusLIMS.extractors.plugins.edax - EDAX .spc/.msa file extractor
nexusLIMS.extractors.plugins.tofwerk_pfib - Tofwerk fibTOF pFIB-ToF-SIMS .h5 file extractor
nexusLIMS.extractors.plugins.basic_metadata - Basic metadata fallback extractor
nexusLIMS.extractors.plugins.preview_generators - Preview image generation utilities

Extractors#

Quick Reference#

Overview#

Plugin Architecture#

Fully Supported Formats#

Digital Micrograph Files (.dm3, .dm4)#

FEI/Thermo Fisher TIF Files (.tif)#

Zeiss Orion / Fibics HIM TIF Files (.tif)#

Tescan PFIB/SEM TIF Files (.tif)#

FEI TIA Files (.ser, .emi)#

EDAX EDS Files (.spc, .msa)#

.spc Files#

.msa Files#

Tofwerk fibTOF pFIB-ToF-SIMS Files (.h5)#

Partially Supported Formats#

Image Formats#

Text Files (.txt)#

Unsupported Formats#

How Extraction Works#

File Discovery and Strategy#

Extraction Process#

Expected Metadata Structure#

Extension Fields in Practice#

Command-Line Extraction#

Quick reference#

Standalone Library Usage#

Installation#

Quickstart#

Default behaviour when called with `write_output=True`#

Low-level API#

What degrades gracefully#

Supported formats#

Example notebook#

Adding Support for New Formats#

API Reference#

Extractor Registry Properties#

Extractor Modules#

Further Reading#

This Page

Extractors#

Quick Reference#

Overview#

Plugin Architecture#

Fully Supported Formats#

Digital Micrograph Files (.dm3, .dm4)#

FEI/Thermo Fisher TIF Files (.tif)#

Zeiss Orion / Fibics HIM TIF Files (.tif)#

Tescan PFIB/SEM TIF Files (.tif)#

FEI TIA Files (.ser, .emi)#

EDAX EDS Files (.spc, .msa)#

.spc Files#

.msa Files#

Tofwerk fibTOF pFIB-ToF-SIMS Files (.h5)#

Partially Supported Formats#

Image Formats#

Text Files (.txt)#

Unsupported Formats#

How Extraction Works#

File Discovery and Strategy#

Extraction Process#

Expected Metadata Structure#

Extension Fields in Practice#

Command-Line Extraction#

Quick reference#

Standalone Library Usage#

Installation#

Quickstart#

Default behaviour when called with write_output=True#

Low-level API#

What degrades gracefully#

Supported formats#

Example notebook#

Adding Support for New Formats#

API Reference#

Extractor Registry Properties#

Extractor Modules#

Further Reading#

This Page

Default behaviour when called with `write_output=True`#