nexusLIMS.extractors.registry#

Extractor registry for plugin discovery and selection.

This module provides the central registry that discovers, manages, and selects extractors based on file type and context. It implements auto-discovery by walking the plugins directory and uses priority-based selection.

Module Contents#

Classes#

ExtractorRegistry

Central registry for extractor plugins.

Functions#

get_registry

Get the global extractor registry (singleton).

API#

class nexusLIMS.extractors.registry.ExtractorRegistry[source]#

Central registry for extractor plugins.

Manages auto-discovery, registration, and selection of metadata extractors. Uses priority-based selection with content sniffing support.

This is a singleton - use get_registry() to access.

Features#

  • Auto-discovers plugins by walking nexusLIMS/extractors/plugins/

  • Maintains priority-sorted lists per extension

  • Lazy instantiation for performance

  • Caches extractor instances

  • Never returns None (always has fallback extractor)

Examples:

Get an extractor for a file:

>>> from nexusLIMS.extractors.registry import get_registry
>>> from nexusLIMS.extractors.base import ExtractionContext
>>> from pathlib import Path
>>>
>>> registry = get_registry()
>>> context = ExtractionContext(Path("data.dm3"), instrument=None)
>>> extractor = registry.get_extractor(context)
>>> metadata = extractor.extract(context)

Manual registration (for testing):

>>> class MyExtractor:
...     name = "my_extractor"
...     priority = 100
...     def supports(self, context): return True
...     def extract(self, context): return {"nx_meta": {}}
>>>
>>> registry = get_registry()
>>> registry.register_extractor(MyExtractor)
property extractors: dict[str, list[type[BaseExtractor]]]#

Get the extractor list.

Returns a dictionary mapping file extensions to lists of extractor classes, sorted by priority (descending).

Auto-discovers plugins if not already discovered.

Returns:

Maps extension (without dot) to list of extractor classes

Return type:

dict[str, list[type[BaseExtractor]]]

Examples:

>>> registry = get_registry()
>>> extractors_by_ext = registry.extractors
>>> print(extractors_by_ext.get("dm3", []))
property extractor_names: list[str]#

Get a deduplicated list of extractor names.

Returns extractor names sorted alphabetically, with duplicates removed.

Auto-discovers plugins if not already discovered.

Returns:

Sorted list of unique extractor names

Return type:

list[str]

Examples:

>>> registry = get_registry()
>>> names = registry.extractor_names
>>> print(names)
['BasicFileInfoExtractor', 'DM3Extractor', 'QuantaTiffExtractor', ...]
property all_extractors: list[BaseExtractor]#

Get a deduplicated flat list of all registered extractor instances.

Returns one instance per unique extractor class (both extension-specific and wildcard extractors), sorted by priority descending.

Auto-discovers plugins if not already discovered.

Returns:

Unique extractor instances sorted by priority (descending)

Return type:

list[BaseExtractor]

Examples:

>>> registry = get_registry()
>>> for ext in registry.all_extractors:
...     print(f"{ext.name}: priority {ext.priority}")
discover_plugins() None[source]#

Auto-discover extractor plugins by walking the plugins directory.

Walks nexusLIMS/extractors/plugins/, imports all Python modules, and registers any classes that implement the BaseExtractor protocol.

This is called automatically on first use, but can be called manually to force re-discovery.

Examples:

>>> registry = get_registry()
>>> registry.discover_plugins()
>>> extractors = registry.get_extractors_for_extension("dm3")
>>> print(f"Found {len(extractors)} extractors for .dm3 files")
register_extractor(extractor_class: type[BaseExtractor]) None[source]#

Manually register an extractor class.

This method is called automatically during plugin discovery, but can also be used to manually register extractors (useful for testing).

Parameters:

extractor_class – The extractor class to register (not an instance)

Examples:

>>> class MyExtractor:
...     name = "my_extractor"
...     priority = 100
...     def supports(self, context): return True
...     def extract(self, context): return {"nx_meta": {}}
>>>
>>> registry = get_registry()
>>> registry.register_extractor(MyExtractor)
get_extractor(context: ExtractionContext) BaseExtractor[source]#

Get the best extractor for a given file context.

Selection algorithm:

  1. Auto-discover plugins if not already done

  2. Get extractors registered for this file’s extension

  3. Try each in priority order (high to low) until one’s supports() returns True

  4. If none match, try wildcard extractors

  5. If still none, return BasicMetadataExtractor fallback

This method NEVER returns None - there is always a fallback.

Parameters:

context – Extraction context containing file path, instrument, etc.

Returns:

The best extractor for this file (never None)

Return type:

BaseExtractor

Examples:

>>> from nexusLIMS.extractors.base import ExtractionContext
>>> from pathlib import Path
>>>
>>> context = ExtractionContext(Path("data.dm3"), None)
>>> registry = get_registry()
>>> extractor = registry.get_extractor(context)
>>> print(f"Selected: {extractor.name}")
get_extractors_for_extension(extension: str) list[BaseExtractor][source]#

Get all extractors registered for a specific extension.

Parameters:

extension – File extension (with or without leading dot)

Returns:

List of extractors, sorted by priority (descending)

Return type:

list[BaseExtractor]

Examples:

>>> registry = get_registry()
>>> extractors = registry.get_extractors_for_extension("dm3")
>>> for e in extractors:
...     print(f"{e.name}: priority {e.priority}")
get_supported_extensions(exclude_fallback: bool = False) set[str][source]#

Get all file extensions that have registered extractors.

Parameters:

exclude_fallback – If True, exclude extensions that only have the fallback extractor

Returns:

Set of extensions (without dots)

Return type:

set[str]

Examples:

>>> registry = get_registry()
>>> extensions = registry.get_supported_extensions()
>>> print(f"Supported: {', '.join(sorted(extensions))}")
>>> specialized = registry.get_supported_extensions(exclude_fallback=True)
>>> print(f"Specialized: {', '.join(sorted(specialized))}")
clear() None[source]#

Clear all registered extractors and reset discovery state.

Primarily used for testing.

Examples:

>>> registry = get_registry()
>>> registry.clear()
>>> # Will re-discover on next use
register_preview_generator(generator_class: type[PreviewGenerator]) None[source]#

Manually register a preview generator class.

This method is called automatically during plugin discovery, but can also be used to manually register generators (useful for testing).

Parameters:

generator_class – The preview generator class to register (not an instance)

Examples:

>>> class MyGenerator:
...     name = "my_generator"
...     priority = 100
...     def supports(self, context): return True
...     def generate(self, context, output_path): return True
>>>
>>> registry = get_registry()
>>> registry.register_preview_generator(MyGenerator)
get_preview_generator(context: ExtractionContext) PreviewGenerator | None[source]#

Get the best preview generator for a given file context.

Selection algorithm:

  1. Auto-discover plugins if not already done

  2. Get generators registered for this file’s extension

  3. Try each in priority order (high to low) until one’s supports() returns True

  4. If none match, return None

Parameters:

context – Extraction context containing file path, instrument, etc.

Returns:

The best preview generator for this file, or None if no generator found

Return type:

PreviewGenerator | None

Examples:

>>> from nexusLIMS.extractors.base import ExtractionContext
>>> from pathlib import Path
>>>
>>> context = ExtractionContext(Path("data.dm3"), None)
>>> registry = get_registry()
>>> generator = registry.get_preview_generator(context)
>>> if generator:
...     generator.generate(context, Path("preview.png"))
nexusLIMS.extractors.registry.get_registry() ExtractorRegistry[source]#

Get the global extractor registry (singleton).

Returns:

The global registry instance

Return type:

ExtractorRegistry

Examples:

>>> from nexusLIMS.extractors.registry import get_registry
>>> registry = get_registry()
>>> # Always returns the same instance
>>> assert get_registry() is registry