nexusLIMS.extractors.xml_serialization#
XML serialization utilities for NexusLIMS metadata schemas.
This module provides utilities for converting type-specific metadata schemas (using Pint Quantities and EM Glossary terminology) into XML format compatible with the Nexus Experiment schema.
Key Functions#
serialize_quantity_to_xml(): Convert Pint Quantities to value/unit pairs for XMLget_xml_field_name(): Map EM Glossary field name to human-friendly display nameprepare_metadata_for_xml(): Convert rich metadata to XML-compatible flat dict
Examples:
Convert a Pint Quantity to XML:
>>> from nexusLIMS.schemas.units import ureg
>>> qty = ureg.Quantity(10, "kilovolt")
>>> value, unit = serialize_quantity_to_xml(qty)
>>> value, unit
(10.0, 'kV')
Get human-readable field name for XML:
>>> get_xml_field_name("acceleration_voltage")
'Voltage'
>>> get_xml_field_name("working_distance")
'Working Distance'
Module Contents#
Functions#
Convert a Pint Quantity to value and unit strings for XML serialization. |
|
Map an EM Glossary field name to a human-readable XML display name. |
|
Prepare rich metadata for XML serialization. |
|
Get the QUDT URI for a given field’s unit. |
|
Get the EM Glossary ID for a given field name. |
Data#
Mapping from EM Glossary field names to human-readable XML display names. This maintains backward compatibility with existing XML field names. |
API#
- nexusLIMS.extractors.xml_serialization.EM_GLOSSARY_TO_XML_DISPLAY_NAMES#
Mapping from EM Glossary field names to human-readable XML display names. This maintains backward compatibility with existing XML field names.
- nexusLIMS.extractors.xml_serialization.serialize_quantity_to_xml(qty: pint.Quantity) tuple[float, str][source]#
Convert a Pint Quantity to value and unit strings for XML serialization.
This function extracts the magnitude and unit from a Pint Quantity object and formats them for use in XML meta elements with the
unitattribute.- Parameters:
qty (
pint.Quantity) – The Pint Quantity object to serialize- Returns:
value (float) – The numeric magnitude of the quantity
unit (str) – The unit symbol in compact form (e.g., “kV”, “mm”, “pA”)
Examples:
>>> from nexusLIMS.schemas.units import ureg >>> qty = ureg.Quantity(10, "kilovolt") >>> value, unit = serialize_quantity_to_xml(qty) >>> value 10.0 >>> unit 'kV'
>>> qty = ureg.Quantity(5.2, "millimeter") >>> value, unit = serialize_quantity_to_xml(qty) >>> value 5.2 >>> unit 'mm'
Notes:
The unit is formatted using Pint’s compact format (~) which produces short unit symbols suitable for display in XML attributes.
- nexusLIMS.extractors.xml_serialization.get_xml_field_name(field_name: str) str[source]#
Map an EM Glossary field name to a human-readable XML display name.
This function provides the translation layer between EM Glossary terminology (used internally in metadata schemas) and the human-readable field names used in XML output. It maintains backward compatibility with existing XML field names.
- Parameters:
field_name (str) – The internal EM Glossary field name (e.g., “acceleration_voltage”)
- Returns:
display_name – The human-readable display name for XML (e.g., “Voltage”)
- Return type:
Examples:
>>> get_xml_field_name("acceleration_voltage") 'Voltage' >>> get_xml_field_name("working_distance") 'Working Distance' >>> get_xml_field_name("detector_type") 'Detector'
For unknown fields, returns the field name with underscores replaced by spaces and title-cased:
>>> get_xml_field_name("some_custom_field") 'Some Custom Field'
Notes:
This function prioritizes backward compatibility with existing XML field names. New fields should be added to EM_GLOSSARY_TO_XML_DISPLAY_NAMES to control their XML representation.
- nexusLIMS.extractors.xml_serialization.prepare_metadata_for_xml(metadata: dict[str, Any]) dict[str, str | float][source]#
Prepare rich metadata for XML serialization.
Converts metadata from the new schema format (with Pint Quantities, nested structures, etc.) into a flat dictionary suitable for XML serialization. This includes:
Converting Pint Quantity objects to separate value/unit entries
Flattening nested structures (like StagePosition)
Mapping EM Glossary field names to XML display names
Preserving non-Quantity values as-is
- Parameters:
metadata (dict[str, Any]) – Metadata dictionary from type-specific schema (ImageMetadata, etc.) May contain Pint Quantities, nested dicts, or simple values
- Returns:
xml_metadata – Flat dictionary with XML-compatible field names and values. For Quantity fields, creates two entries:
”<field_name>”: numeric value
”<field_name>_unit”: unit string
- Return type:
Examples:
>>> from nexusLIMS.schemas.units import ureg >>> metadata = { ... "acceleration_voltage": ureg.Quantity(10, "kilovolt"), ... "magnification": 50000, ... "detector_type": "ETD", ... } >>> xml_dict = prepare_metadata_for_xml(metadata) >>> xml_dict["Voltage"] 10.0 >>> xml_dict["Voltage_unit"] 'kV' >>> xml_dict["Magnification"] 50000 >>> xml_dict["Detector"] 'ETD'
Notes:
This function is designed to work with both the new schema format and legacy metadata dicts for backward compatibility during migration.
- nexusLIMS.extractors.xml_serialization.get_qudt_uri(field_name: str, unit: str) str | None[source]#
Get the QUDT URI for a given field’s unit.
This function looks up the QUDT (Quantities, Units, Dimensions and Types) ontology URI for a given unit string. Used for Tier 3 semantic web integration (future enhancement).
- Parameters:
- Returns:
qudt_uri – The QUDT URI for this unit, or None if no mapping exists
- Return type:
str or None
Examples:
>>> get_qudt_uri("acceleration_voltage", "kV") 'http://qudt.org/vocab/unit/KiloV' >>> get_qudt_uri("working_distance", "mm") 'http://qudt.org/vocab/unit/MilliM'
Notes:
This function is currently a placeholder for Tier 3 implementation. It will use the QUDT mapping system from
nexusLIMS.schemas.unitswhen Tier 3 semantic attributes are added to the XML schema.
- nexusLIMS.extractors.xml_serialization.get_emg_id(field_name: str) str | None[source]#
Get the EM Glossary ID for a given field name.
This function looks up the EM Glossary term ID for a field name, if one exists. Used for Tier 3 semantic web integration (future enhancement).
- Parameters:
field_name (str) – The internal field name (e.g., “acceleration_voltage”)
- Returns:
emg_id – The EM Glossary ID (e.g., “EMG_00000004”), or None if no mapping exists
- Return type:
str or None
Examples:
>>> get_emg_id("acceleration_voltage") 'EMG_00000004' >>> get_emg_id("working_distance") 'EMG_00000050' >>> get_emg_id("some_custom_field")
Notes:
This function is used for Tier 3 implementation where EM Glossary IDs are added as XML attributes for semantic traceability.