Internal Metadata Schema System#
Added in version 2.2.0: This document describes the metadata schema system introduced in NexusLIMS v2.2.0, which replaces the unstructured v1 metadata handling approach with a system featuring type safety, physical units, and standardized terminology.
Overview#
The schema system provides:
Type-specific validation - Different schemas for Image, Spectrum, SpectrumImage, and Diffraction datasets
Physical units with Pint - Machine-actionable quantities with automatic unit conversion
EM Glossary alignment - Standardized field names from the Electron Microscopy community
Clean XML serialization - Separate value and unit attributes in XML output
Flexible extensions - Support for instrument-specific metadata without polluting core fields
Key Changes from NexusLIMS version 1.X metadata handling#
1. Pint Quantities for Physical Values#
Before (v1):
nx_meta = {
"Voltage (kV)": 15, # Unit embedded in field name
"Working Distance (mm)": 5.2,
}
After (v2.2.0):
from decimal import Decimal
from nexusLIMS.schemas.units import ureg
nx_meta = {
# Pint Quantities
"acceleration_voltage": ureg.Quantity(Decimal("15"), "kilovolt"),
"working_distance": ureg.Quantity(Decimal("5.2"), "millimeter"),
}
Benefits:
Type safety: Invalid units caught at creation time
Automatic conversion: Units normalized during serialization
Machine-readable: Units separated from values in XML
Programmatic: Can convert between units in code
2. Centralized Field Names with EM Glossary links#
Field names now follow the Electron Microscopy Glossary (EM Glossary) for standardization:
Old Field Name (v1) |
New Field Name (v2.2.0) |
EM Glossary ID |
|---|---|---|
|
|
EMG_00000004 |
|
|
EMG_00000050 |
|
|
EMG_00000006 |
|
|
EMG_00000015 |
|
|
EMG_00000008 |
See EM Glossary Reference for complete mappings.
3. Type-Specific Schemas#
Metadata is validated against dataset-type-specific schemas:
ImageMetadata- For SEM/TEM/STEM imagesSpectrumMetadata- For EDS/EELS/CL spectraSpectrumImageMetadata- For spectrum imaging (STEM-EDS, STEM-EELS)DiffractionMetadata- For electron diffraction patterns
Each schema enforces appropriate fields for its dataset type while allowing additional fields via the extensions section.
4. Extensions Section#
Instrument-specific or vendor-specific metadata goes in the extensions section:
nx_meta = {
# Core fields (EM Glossary names, type-validated)
"acceleration_voltage": ureg.Quantity(Decimal("15"), "kilovolt"),
"working_distance": ureg.Quantity(5.2, "millimeter"),
# Extensions (flexible, instrument-specific)
"extensions": {
"facility": "Nexus Microscopy Lab",
"detector_brightness": 50.0,
"quanta_spot_size": 3.5,
}
}
Extensions Best Practices:
Core vs. Extensions Decision Guide:
✅ Use core fields for: Standard EM parameters with EM Glossary equivalents
✅ Use extensions for: Vendor-specific settings, facility metadata, experimental parameters
Naming Conventions:
Use snake_case for extension keys:
detector_brightness, notDetectorBrightnessBe descriptive:
fei_quanta_spot_sizeinstead ofspot_size(avoids conflicts)Prefix vendor-specific fields with vendor name:
zeiss_eht_mode,jeol_probe_mode
When to Use Extensions:
# ✅ GOOD: Vendor-specific feature add_to_extensions(nx_meta, "quanta_stage_tilt_correction", True) # ✅ GOOD: Facility metadata add_to_extensions(nx_meta, "facility", "Building 7 Lab 3") add_to_extensions(nx_meta, "operator_notes", "Sample slightly contaminated") # ❌ BAD: Standard parameter with EM Glossary equivalent add_to_extensions(nx_meta, "voltage", ureg.Quantity(Decimal("15"), "kV")) # Use acceleration_voltage instead!
Use Helper Function:
# ✅ GOOD: Using add_to_extensions() helper add_to_extensions(nx_meta, "facility", "Nexus Lab") add_to_extensions(nx_meta, "custom_field", value) # ❌ BAD: Directly manipulating dict (can overwrite) nx_meta["extensions"] = {"facility": "Nexus Lab"} # Replaces existing extensions!
This keeps core metadata clean and standardized while allowing flexibility for unique instrument features.
5. Improved XML Serialization#
Before (v1):
<meta name="Voltage (kV)">15</meta>
After (v2.2.0):
<meta name="Voltage" unit="kV">15</meta>
The new format:
Separates value from unit using existing XSD
unitattributeUses cleaner field names (no units in names)
Is more machine-readable (structured attributes vs. text parsing)
Leverages existing XSD infrastructure (no schema changes required)
XML Serialization Pipeline:
The complete transformation from extractor to XML follows these steps:
Extractor creates Pint Quantity:
voltage = ureg.Quantity(Decimal("15000"), "volt")
Add to nx_meta dict:
nx_meta["acceleration_voltage"] = voltage
Pydantic schema validates:
meta = ImageMetadata.model_validate(nx_meta) # Validates type, timezone, units, etc.
XML serialization converts to preferred unit:
from nexusLIMS.schemas.units import quantity_to_xml_parts name, value, unit = quantity_to_xml_parts("acceleration_voltage", voltage) # Returns: ("Acceleration Voltage", "15.0", "kV")
XML output uses unit attribute:
<meta name="Acceleration Voltage" unit="kV">15.0</meta>
Code Example:
from nexusLIMS.schemas.units import ureg, quantity_to_xml_parts
# Create quantity in any compatible unit
voltage = ureg.Quantity(Decimal("15000"), "volt") # 15 kV
distance = ureg.Quantity(Decimal("0.0052"), "meter") # 5.2 mm
# Convert to XML parts
v_name, v_value, v_unit = quantity_to_xml_parts("acceleration_voltage", voltage)
d_name, d_value, d_unit = quantity_to_xml_parts("working_distance", distance)
# Output:
# v_name="Acceleration Voltage", v_value="15.0", v_unit="kV"
# d_name="Working Distance", d_value="5.2", d_unit="mm"
# XML result:
# <meta name="Acceleration Voltage" unit="kV">15.0</meta>
# <meta name="Working Distance" unit="mm">5.2</meta>
This pipeline ensures:
Type safety at creation time (Pydantic validation)
Unit consistency across all metadata (preferred units)
Clean XML with proper structure (name/value/unit separation)
No data loss (original values preserved through conversion)
Helper Functions#
NexusLIMS provides helper functions to simplify working with schemas and extensions.
emg_field() - Create Fields with EM Glossary Metadata#
The emg_field() helper automatically adds EM Glossary metadata to Pydantic field definitions when creating new schemas:
from nexusLIMS.schemas.metadata import emg_field
from nexusLIMS.schemas.pint_types import PintQuantity
from pydantic import BaseModel
class MySchema(BaseModel):
dwell_time: PintQuantity | None = emg_field("dwell_time")
beam_current: PintQuantity | None = emg_field("beam_current")
What it does:
Automatically looks up display name from EM Glossary (“Pixel Dwell Time”, “Beam Current”)
Adds EM Glossary ID, URI, and label to JSON schema metadata
Pulls description from EM Glossary ontology
Supports custom descriptions and additional Pydantic validators
Signature:
emg_field(
field_name: str,
default: Any = None,
*,
description: str | None = None,
**kwargs: Any
) -> Field
Parameters:
field_name: Internal field name (e.g., “acceleration_voltage”)default: Default value (use...for required,Nonefor optional)description: Custom description (overrides EM Glossary description)**kwargs: Additional Pydantic Field arguments (gt, ge, examples, etc.)
Examples:
Basic usage:
acceleration_voltage: PintQuantity | None = emg_field("acceleration_voltage")
# Automatically gets: alias="Acceleration Voltage", description="Accelerating voltage...", emg_id="EMG_00000004"
With custom description:
beam_current: PintQuantity | None = emg_field(
"beam_current",
description="Probe current measured at the sample",
)
With Pydantic validators:
from pydantic import field_validator
class MySchema(BaseModel):
acceleration_voltage: PintQuantity | None = emg_field(
"acceleration_voltage",
gt=0, # Must be positive
examples=[ureg.Quantity(Decimal("10"), "kV"), ureg.Quantity(Decimal("200"), "kV")],
)
See the complete documentation in the Helper Functions section below.
add_to_extensions() - Safely Add Extension Fields#
The add_to_extensions() helper safely adds fields to the extensions dict without replacing existing content:
from nexusLIMS.extractors.utils import add_to_extensions
nx_meta = {"acceleration_voltage": ureg.Quantity(Decimal("15"), "kV")}
# Safely add extension field
add_to_extensions(nx_meta, "facility", "Nexus Lab")
add_to_extensions(nx_meta, "detector_brightness", 50.0)
# Result: nx_meta["extensions"] = {"facility": "Nexus Lab", "detector_brightness": 50.0}
Why use it:
Prevents accidentally replacing the entire extensions dict
Creates extensions dict if it doesn’t exist
Clear intent in code
When to use:
In extractors when adding instrument-specific metadata
In instrument profiles when adding static fields
Any time you need to add to extensions
See the Helper Functions section below for more examples.
Metadata Field Reference#
The table below shows the 25 most common metadata fields in NexusLIMS, organized by category. Some of these have matching EMG identifiers, but not all (the EM Glossary is an emerging standard that is not yet exhaustive).
The core common metadata parameters used throughout NexusLIMS are defined in the NEXUSLIMS_TO_EMG_MAPPINGS attribute of nexusLIMS.schemas.em_glossary module.
Beam Parameters#
Internal Name |
Display Name |
EMG ID |
Preferred Unit |
Typical Use |
|---|---|---|---|---|
|
Acceleration Voltage |
EMG_00000004 |
kilovolt (kV) |
Electron beam accelerating voltage |
|
Beam Current |
EMG_00000006 |
picoampere (pA) |
Probe current at sample |
|
Emission Current |
EMG_00000025 |
microampere (µA) |
Filament emission current |
|
Convergence Angle |
EMG_00000010 |
milliradian (mrad) |
Beam convergence angle |
Stage & Sample#
Internal Name |
Display Name |
EMG ID |
Preferred Unit |
Typical Use |
|---|---|---|---|---|
|
Stage X |
- |
micrometer (µm) |
Stage X position |
|
Stage Y |
- |
micrometer (µm) |
Stage Y position |
|
Stage Z |
- |
millimeter (mm) |
Stage Z position (height) |
|
Stage Alpha |
- |
degree (°) |
Primary tilt angle |
|
Stage Beta |
- |
degree (°) |
Secondary tilt angle (dual-tilt holders) |
Detector Parameters#
Internal Name |
Display Name |
EMG ID |
Preferred Unit |
Typical Use |
|---|---|---|---|---|
|
Detector |
- |
- (string) |
Detector name (e.g., “ETD”, “HAADF”) |
|
Working Distance |
EMG_00000050 |
millimeter (mm) |
Distance from lens to sample |
|
Energy Resolution |
- |
electronvolt (eV) |
EDS detector resolution |
Acquisition Parameters#
Internal Name |
Display Name |
EMG ID |
Preferred Unit |
Typical Use |
|---|---|---|---|---|
|
Pixel Dwell Time |
EMG_00000015 |
microsecond (µs) |
Time per pixel (scanning) |
|
Acquisition Time |
EMG_00000055 |
second (s) |
Total acquisition time |
|
Live Time |
- |
second (s) |
EDS live time (excludes dead time) |
|
Pixel Time |
- |
second (s) |
Time per pixel (spectrum imaging) |
Optical Parameters#
Internal Name |
Display Name |
EMG ID |
Preferred Unit |
Typical Use |
|---|---|---|---|---|
|
Magnification |
- |
- (dimensionless) |
Nominal magnification |
|
Camera Length |
EMG_00000008 |
millimeter (mm) |
Diffraction camera length |
|
Horizontal Field Width |
- |
micrometer (µm) |
Scan width |
|
Pixel Width |
- |
nanometer (nm) |
Physical pixel width |
|
Pixel Height |
- |
nanometer (nm) |
Physical pixel height |
Spectrum Parameters#
Internal Name |
Display Name |
EMG ID |
Preferred Unit |
Typical Use |
|---|---|---|---|---|
|
Channel Size |
- |
electronvolt (eV) |
Energy width per channel |
|
Starting Energy |
- |
kiloelectronvolt (keV) |
Spectrum starting energy |
|
Takeoff Angle |
- |
degree (°) |
EDS X-ray takeoff angle |
Notes:
Fields with “-” in EMG ID column have display names but no EM Glossary v2.0.0 equivalent
See EM Glossary Reference for complete list of all fields
Preferred units and conversion to the human-friendly display alias are automatically applied during XML serialization
Use
emg_field()helper to automatically populate EMG metadata
Using the schema values#
For Extractor Plugin Developers#
If you maintain a custom extractor plugin, it should use Pint Quantities with Decimal values:
Step 1: Import the unit registry
from nexusLIMS.schemas.units import ureg
Step 2: Create Quantities for fields with units
# Old approach (v1)
nx_meta["Voltage (kV)"] = float(voltage_v) / 1000
# New approach (v2.2.0)
nx_meta["acceleration_voltage"] = ureg.Quantity(Decimal(str(voltage_v)), "volt")
# Pint handles the conversion automatically
Step 3: Use NexusLIMS field names
Replace vendor-specific field names with NexusLIMS standard equivalents where available:
HVorVoltage→acceleration_voltageWD→working_distanceBeam Current→beam_currentPixel Dwell Time→dwell_time
Step 4: Use helper functions
from nexusLIMS.extractors.utils import add_to_extensions
nx_meta = {
"acceleration_voltage": ureg.Quantity(Decimal("15"), "kilovolt"),
}
# Add vendor-specific fields using helper
add_to_extensions(nx_meta, "vendor_specific_field", value)
add_to_extensions(nx_meta, "facility", "Nexus Lab")
See Writing Extractor Plugins for detailed guidance.
Schema Structure#
Base Schema (NexusMetadata)#
All datasets must include these core fields:
{
"DatasetType": str, # "Image" | "Spectrum" | "SpectrumImage" | "Diffraction" | "Misc"
"Data Type": str, # Descriptive string (e.g., "SEM_Imaging", "TEM_EDS")
"Creation Time": str, # ISO-8601 with timezone
"extensions": dict, # Optional: instrument-specific fields
}
Type-Specific Schemas#
ImageMetadata#
Typical fields (all optional unless specified):
acceleration_voltage- Quantity (kilovolt)working_distance- Quantity (millimeter)beam_current- Quantity (picoampere)emission_current- Quantity (microampere)magnification- float (dimensionless)dwell_time- Quantity (microsecond)field_of_view- Quantity (micrometer)scan_rotation- Quantity (degree)stage_position- StagePosition object with X, Y, Z, tilt, rotationdetector_type- str
SpectrumMetadata#
Typical fields:
acquisition_time- Quantity (second)live_time- Quantity (second)detector_energy_resolution- Quantity (electronvolt)channel_size- Quantity (electronvolt)starting_energy- Quantity (kiloelectronvolt)azimuthal_angle- Quantity (degree)elevation_angle- Quantity (degree)elements- list[str]
SpectrumImageMetadata#
Combines fields from both Image and Spectrum, plus:
pixel_time- Quantity (microsecond)scan_mode- str
DiffractionMetadata#
Typical fields:
camera_length- Quantity (millimeter)convergence_angle- Quantity (milliradian)diffraction_mode- str
See API documentation for complete field lists.
Preferred Units#
NexusLIMS defines preferred units for each field to ensure consistency across instruments, easier comparison of data, and alignment with scientific conventions.
Field |
Preferred Unit |
Rationale |
|---|---|---|
|
kilovolt (kV) |
Standard EM range (1-300 kV); avoids large numbers |
|
millimeter (mm) |
Common SEM/TEM range (1-50 mm) |
|
picoampere (pA) |
Typical probe currents (10-1000 pA) |
|
microampere (µA) |
Filament emission range (100-300 µA) |
|
microsecond (µs) |
Typical scan speeds (0.1-100 µs/pixel) |
|
micrometer (µm) |
Microscopy field widths (1-1000 µm) |
|
millimeter (mm) |
TEM diffraction range (100-1000 mm) |
|
second (s) |
Spectrum collection times (1-1000 s) |
|
electronvolt (eV) |
EDS resolution spec (130-150 eV) |
Automatic Normalization:
When you create a Quantity in any unit, it will be automatically converted to the preferred unit during XML serialization:
from nexusLIMS.schemas.units import ureg, normalize_quantity
# Create voltage in any unit
voltage = ureg.Quantity(Decimal("15000"), "volt") # Created in volts
# Normalize to preferred unit (kV)
normalized = normalize_quantity("acceleration_voltage", voltage)
print(normalized) # Output: 15.0 kilovolt
# XML output will be: <meta name="Voltage" unit="kV">15.0</meta>
This ensures all metadata uses consistent units in the final XML records, regardless of the units used in the source file or during extraction.
Validation Behavior#
Automatic Validation#
Metadata is validated automatically during record building:
Extractor returns
nx_metadictvalidate_nx_meta()checks against appropriate schema based onDatasetTypeValidation errors are logged with filename context
Invalid metadata prevents record generation
Manual Validation#
You can validate metadata manually in your extractor:
from nexusLIMS.schemas.metadata import ImageMetadata
from pydantic import ValidationError
# Validate manually
try:
validated = ImageMetadata.model_validate(nx_meta)
logger.info("Validation successful!")
except ValidationError as e:
logger.error("Validation failed: %s", e)
Common Validation Errors#
Missing Timezone in Timestamp#
# ❌ WRONG - Missing timezone
nx_meta = {
"creation_time": "2024-01-15T10:30:00", # No timezone!
"data_type": "SEM_Imaging",
"dataset_type": "Image",
}
# ValidationError: Timestamp must include timezone: 2024-01-15T10:30:00
# ✅ CORRECT - Include timezone
nx_meta = {
"creation_time": "2024-01-15T10:30:00-05:00", # EST timezone
# OR
"creation_time": "2024-01-15T15:30:00Z", # UTC timezone
"data_type": "SEM_Imaging",
"dataset_type": "Image",
}
Wrong DatasetType#
# ❌ WRONG - DatasetType doesn't match schema
try:
meta = ImageMetadata.model_validate({
"creation_time": "2024-01-15T10:30:00Z",
"data_type": "SEM_Imaging",
"dataset_type": "Spectrum", # Wrong! Should be "Image"
})
except ValidationError as e:
print(e)
# ValidationError: Input should be 'Image'
Invalid Quantity Units#
# ❌ WRONG - Incompatible units
voltage = ureg.Quantity(Decimal("10"), "meter") # Meters instead of volts!
try:
meta = ImageMetadata(
creation_time="2024-01-15T10:30:00Z",
data_type="SEM_Imaging",
dataset_type="Image",
acceleration_voltage=voltage,
)
except Exception as e:
# Pint will raise error: Cannot convert from 'meter' to 'kilovolt'
print(f"Unit error: {e}")
Validation Best Practices#
Always include timezone: Use ISO-8601 format with timezone offset
Use correct DatasetType: Match the schema to your data type
Test validation early: Validate in your extractor during development
Check unit compatibility: Ensure units are dimensionally correct (voltage in volts, distance in meters, etc.)
Use emg_field(): Automatically handles field metadata and descriptions
Backward Compatibility#
Important: The v2.2.0 schema is not backward compatible with the experimental v1 schema. This is intentional:
v1 was experimental and not used in production
Clean break enables better design
No legacy compatibility burden
All built-in extractors updated together
If you have custom extractors or local instrument profiles, you must update them to use the new schema system.
FAQ#
Q: Do I need to update existing XML records to use the new schema?#
No. The schema described here is totally internal to NexusLIMS and existing records in CDCS are not affected. The schema changes only apply to newly generated records.
Q: Can I still use plain numeric values?#
Yes, for dimensionless quantities (magnification, brightness, gain, etc.). Only use Pint Quantities (with Decimal values) for fields with physical units.
Q: What if my field doesn’t have an EM Glossary equivalent?#
Use a descriptive name and place it in the extensions section. We encourage contributing new terms to the EM Glossary project.
Q: How do I know what units to use?#
Use the preferred units defined in nexusLIMS.schemas.units.PREFERRED_UNITS. Pint will automatically convert to the preferred unit during XML serialization.
Implementation Details#
Three-Tier Architecture#
NexusLIMS uses a three-tier approach to metadata serialization:
Tier 1: Internal (Pint + QUDT/EMG mappings)
Pint Quantity objects with full unit information
Internal QUDT URI and EM Glossary ID mappings
Type safety and validation
Tier 2: XML with unit attribute (current)
Clean separation:
<meta name="Voltage" unit="kV">15</meta>Uses existing XSD attribute (no schema changes)
Machine-readable structure
Tier 3: Semantic Web integration (future)
Optional QUDT and EM Glossary URI attributes
Full semantic web compatibility
Requires XSD update (planned for future release)
QUDT Unit Ontology#
NexusLIMS internally maps Pint units to QUDT (Quantities, Units, Dimensions and Types) URIs for future semantic web integration (not currently visible in XML output). This mapping is implemented by nexusLIMS.schemas.units.get_qudt_uri(), which loads unit mappings from the QUDT ontology vocabulary file.
Pint Unit |
QUDT URI |
|---|---|
|
|
|
|
|
|
|
|
This mapping will enable future Tier 3 implementation with full semantic web support.
Tier 3 Conceptual Example (Future):
When Tier 3 is implemented, XML output will optionally include QUDT and EM Glossary URIs for full semantic web integration:
<!-- Current Tier 2 output -->
<meta name="Acceleration Voltage" unit="kV">15.0</meta>
<!-- Future Tier 3 output (conceptual) -->
<meta name="Acceleration Voltage"
unit="kV"
qudt:unit="http://qudt.org/vocab/unit/KiloV"
emg:term="https://purls.helmholtz-metadaten.de/emg/v2.0.0/EMG_00000004"
emg:label="Acceleration Voltage">
15.0
</meta>
Benefits of Tier 3:
Full semantic web compatibility for linked data applications
Direct linkage to QUDT and EM Glossary ontologies
Machine-readable term definitions
Interoperability with other metadata standards
Status: Tier 3 requires XSD schema updates and is planned for a future NexusLIMS release.
EM Glossary Integration#
The EM Glossary is parsed directly from the OWL ontology file using RDFLib:
Single source of truth (OWL file)
Auto-updates when EM Glossary is updated
Provides access to labels, definitions, and semantic structure
Resources#
EM Glossary Reference - Complete field mapping table
Writing Extractor Plugins - Developer guide with examples
EM Glossary Project - Community ontology
QUDT Ontology - Units and quantities standard
Pint Documentation - Python units library
Support#
For questions or issues:
Check the FAQ above
Review Writing Extractor Plugins
Report issues at GitHub Issues