nexusLIMS.extractors.xml_serialization#

XML serialization utilities for NexusLIMS metadata schemas.

This module provides utilities for converting type-specific metadata schemas (using Pint Quantities and EM Glossary terminology) into XML format compatible with the Nexus Experiment schema.

Key Functions#

Examples:

Convert a Pint Quantity to XML:

>>> from nexusLIMS.schemas.units import ureg
>>> qty = ureg.Quantity(10, "kilovolt")
>>> value, unit = serialize_quantity_to_xml(qty)
>>> value, unit
(10.0, 'kV')

Get human-readable field name for XML:

>>> get_xml_field_name("acceleration_voltage")
'Voltage'
>>> get_xml_field_name("working_distance")
'Working Distance'

Module Contents#

Functions#

serialize_quantity_to_xml

Convert a Pint Quantity to value and unit strings for XML serialization.

get_xml_field_name

Map an EM Glossary field name to a human-readable XML display name.

prepare_metadata_for_xml

Prepare rich metadata for XML serialization.

get_qudt_uri

Get the QUDT URI for a given field’s unit.

get_emg_id

Get the EM Glossary ID for a given field name.

Data#

EM_GLOSSARY_TO_XML_DISPLAY_NAMES

Mapping from EM Glossary field names to human-readable XML display names. This maintains backward compatibility with existing XML field names.

API#

nexusLIMS.extractors.xml_serialization.EM_GLOSSARY_TO_XML_DISPLAY_NAMES#

Mapping from EM Glossary field names to human-readable XML display names. This maintains backward compatibility with existing XML field names.

nexusLIMS.extractors.xml_serialization.serialize_quantity_to_xml(qty: pint.Quantity) tuple[float, str][source]#

Convert a Pint Quantity to value and unit strings for XML serialization.

This function extracts the magnitude and unit from a Pint Quantity object and formats them for use in XML meta elements with the unit attribute.

Parameters:

qty (pint.Quantity) – The Pint Quantity object to serialize

Returns:

  • value (float) – The numeric magnitude of the quantity

  • unit (str) – The unit symbol in compact form (e.g., “kV”, “mm”, “pA”)

Examples:

>>> from nexusLIMS.schemas.units import ureg
>>> qty = ureg.Quantity(10, "kilovolt")
>>> value, unit = serialize_quantity_to_xml(qty)
>>> value
10.0
>>> unit
'kV'
>>> qty = ureg.Quantity(5.2, "millimeter")
>>> value, unit = serialize_quantity_to_xml(qty)
>>> value
5.2
>>> unit
'mm'

Notes:

The unit is formatted using Pint’s compact format (~) which produces short unit symbols suitable for display in XML attributes.

nexusLIMS.extractors.xml_serialization.get_xml_field_name(field_name: str) str[source]#

Map an EM Glossary field name to a human-readable XML display name.

This function provides the translation layer between EM Glossary terminology (used internally in metadata schemas) and the human-readable field names used in XML output. It maintains backward compatibility with existing XML field names.

Parameters:

field_name (str) – The internal EM Glossary field name (e.g., “acceleration_voltage”)

Returns:

display_name – The human-readable display name for XML (e.g., “Voltage”)

Return type:

str

Examples:

>>> get_xml_field_name("acceleration_voltage")
'Voltage'
>>> get_xml_field_name("working_distance")
'Working Distance'
>>> get_xml_field_name("detector_type")
'Detector'

For unknown fields, returns the field name with underscores replaced by spaces and title-cased:

>>> get_xml_field_name("some_custom_field")
'Some Custom Field'

Notes:

This function prioritizes backward compatibility with existing XML field names. New fields should be added to EM_GLOSSARY_TO_XML_DISPLAY_NAMES to control their XML representation.

nexusLIMS.extractors.xml_serialization.prepare_metadata_for_xml(metadata: dict[str, Any]) dict[str, str | float][source]#

Prepare rich metadata for XML serialization.

Converts metadata from the new schema format (with Pint Quantities, nested structures, etc.) into a flat dictionary suitable for XML serialization. This includes:

  1. Converting Pint Quantity objects to separate value/unit entries

  2. Flattening nested structures (like StagePosition)

  3. Mapping EM Glossary field names to XML display names

  4. Preserving non-Quantity values as-is

Parameters:

metadata (dict[str, Any]) – Metadata dictionary from type-specific schema (ImageMetadata, etc.) May contain Pint Quantities, nested dicts, or simple values

Returns:

xml_metadata – Flat dictionary with XML-compatible field names and values. For Quantity fields, creates two entries:

  • ”<field_name>”: numeric value

  • ”<field_name>_unit”: unit string

Return type:

dict[str, str | float]

Examples:

>>> from nexusLIMS.schemas.units import ureg
>>> metadata = {
...     "acceleration_voltage": ureg.Quantity(10, "kilovolt"),
...     "magnification": 50000,
...     "detector_type": "ETD",
... }
>>> xml_dict = prepare_metadata_for_xml(metadata)
>>> xml_dict["Voltage"]
10.0
>>> xml_dict["Voltage_unit"]
'kV'
>>> xml_dict["Magnification"]
50000
>>> xml_dict["Detector"]
'ETD'

Notes:

This function is designed to work with both the new schema format and legacy metadata dicts for backward compatibility during migration.

nexusLIMS.extractors.xml_serialization.get_qudt_uri(field_name: str, unit: str) str | None[source]#

Get the QUDT URI for a given field’s unit.

This function looks up the QUDT (Quantities, Units, Dimensions and Types) ontology URI for a given unit string. Used for Tier 3 semantic web integration (future enhancement).

Parameters:
  • field_name (str) – The field name (currently unused, reserved for future context-aware lookups)

  • unit (str) – The unit string in compact form (e.g., “kV”, “mm”, “pA”)

Returns:

qudt_uri – The QUDT URI for this unit, or None if no mapping exists

Return type:

str or None

Examples:

>>> get_qudt_uri("acceleration_voltage", "kV")
'http://qudt.org/vocab/unit/KiloV'
>>> get_qudt_uri("working_distance", "mm")
'http://qudt.org/vocab/unit/MilliM'

Notes:

This function is currently a placeholder for Tier 3 implementation. It will use the QUDT mapping system from nexusLIMS.schemas.units when Tier 3 semantic attributes are added to the XML schema.

nexusLIMS.extractors.xml_serialization.get_emg_id(field_name: str) str | None[source]#

Get the EM Glossary ID for a given field name.

This function looks up the EM Glossary term ID for a field name, if one exists. Used for Tier 3 semantic web integration (future enhancement).

Parameters:

field_name (str) – The internal field name (e.g., “acceleration_voltage”)

Returns:

emg_id – The EM Glossary ID (e.g., “EMG_00000004”), or None if no mapping exists

Return type:

str or None

Examples:

>>> get_emg_id("acceleration_voltage")
'EMG_00000004'
>>> get_emg_id("working_distance")
'EMG_00000050'
>>> get_emg_id("some_custom_field")

Notes:

This function is used for Tier 3 implementation where EM Glossary IDs are added as XML attributes for semantic traceability.