NexusLIMS Database#

Overview#

NexusLIMS uses a SQLite database to track experimental sessions and instrument configuration. The database file location is specified by NX_DB_PATH.

Key features:

SQLModel ORM Migration#

Added in version 2.2.0: NexusLIMS now uses SQLModel for database operations instead of raw SQL queries. This provides type safety, automatic datetime handling, and cleaner code.

Notable Changes#

  • The db_query() function for manual database queries has been completely removed from nexusLIMS.db.session_handler

  • EventType and RecordStatus are now enums (EventType, RecordStatus) instead of string literals

  • SessionLog and Instrument are now SQLModel classes with real object-relational mapping

  • The Instrument dataclass has been replaced with a SQLModel class - use instrument_pid field instead of name

Migration Examples#

Creating session log entries:

# New approach: has not changed significantly other than using enums
from nexusLIMS.db.models import SessionLog
from nexusLIMS.db.enums import EventType, RecordStatus
from datetime import datetime as dt

log = SessionLog(
    session_identifier="abc",
    instrument="FEI-Titan-TEM",
    timestamp=dt.fromisoformat("2025-01-15T10:00:00"),
    event_type=EventType.START,
    record_status=RecordStatus.TO_BE_BUILT,
    user="alice"
)
log.insert_log()

Querying the database:

# Old approach (deprecated)
_, results = db_query("SELECT * FROM session_log WHERE record_status = ?", ("TO_BE_BUILT",))

# New approach
from sqlmodel import Session, select
from nexusLIMS.db.engine import get_engine
from nexusLIMS.db.models import SessionLog
from nexusLIMS.db.enums import RecordStatus

with Session(get_engine()) as session:
    results = session.exec(
        select(SessionLog).where(SessionLog.record_status == RecordStatus.TO_BE_BUILT)
    ).all()

Benefits of SQLModel:

  • Type-safe database operations with IDE autocomplete and type checking

  • Automatic datetime serialization/deserialization

  • Relationship navigation (e.g., session_log.instrument_obj)

  • ~50% less boilerplate code

  • Foundation for Alembic migrations (database schema version control)

Database Migrations with Alembic#

Added in version 2.2.0: NexusLIMS now uses Alembic for database schema version control and migrations.

Alembic provides a way to track and manage changes to the database schema over time, making it safe to upgrade existing installations and ensuring schema consistency across deployments.

For Existing Installations#

If you have an existing NexusLIMS database (created before version 2.2.0), you need to mark it as migrated to the baseline schema:

# Mark existing database as consistent with the baseline pre-2.0 schema
nexuslims db alembic stamp v1_4_3

This tells Alembic that your database already has the baseline schema structure and doesn’t need to be created from scratch.

Added in version 2.5.0: The nexuslims db command provides simple, user-friendly commands for common database operations. It automatically locates the migrations directory inside the installed package, making migrations work correctly after pip/uv installation. Advanced users can access the full Alembic CLI via nexuslims db alembic [COMMAND]. The uv run alembic command still works for development from source checkouts.

Common Migration Commands#

# Check current migration status
nexuslims db current

# Check if database has pending migrations
nexuslims db check

# View migration history
nexuslims db history

# Upgrade to latest schema version
nexuslims db upgrade

# Upgrade to specific revision
nexuslims db upgrade v2_5_0a

# Downgrade one migration
nexuslims db downgrade

# Generate a new migration (development only, requires source checkout)
nexuslims db alembic revision --autogenerate -m "Description"

For advanced Alembic operations, use nexuslims db alembic [COMMAND] to access all Alembic sub-commands. From a source checkout, you can also use uv run alembic directly.

Creating New Migrations#

When you modify the database schema (by changing SessionLog or Instrument, etc.), you should create a migration:

  1. Modify the SQLModel classes in nexusLIMS/db/models.py or add new enums in nexusLIMS/db/enums.py

  2. Determine the revision ID following the naming convention:

    • Format: v[MAJOR]_[MINOR]_[PATCH][letter]_d[escription] (e.g., v2_5_0a_add_external_user_identifiers)

    • Version number: Match the target release version where this schema change will land

    • Letter suffix: Use sequential letters (a, b, c, …) for multiple migrations within the same version

    • Description: Brief, snake_case description of the change

  3. Generate migration script (requires source checkout):

    nexuslims db alembic revision --autogenerate \
        -m "add external user identifiers" \
        --rev-id "v2_5_0a"
    

    The migration will be created at:

    nexusLIMS/db/migrations/versions/v2_5_0a_add_external_user_identifiers.py
    

    Note: The filename combines the revision ID with the sanitized message for readability

  4. Review and edit the generated script in nexusLIMS/db/migrations/versions/:

    • Verify the upgrade() function creates the expected schema changes

    • Verify the downgrade() function properly reverses those changes

    • For CHECK constraints or enums, you should use hardcoded values in the migration (preserves historical accuracy, migration won’t break if enum changes later)

    • Add docstring explaining the migration’s purpose if needed

  5. Test the migration thoroughly:

    # Apply migration
    nexuslims db upgrade
    
    # Test downgrade
    nexuslims db downgrade
    
    # Re-apply
    nexuslims db upgrade head
    
  6. Add integration tests in tests/integration/test_migrations.py:

    • Test that upgrade creates expected tables/columns

    • Test that downgrade removes them

    • Test offline mode if migration has conditional logic

  7. Create changelog fragment in docs/changes/:

    # Format: {issue_number}.{change_type}.md
    # Change types: feature, bugfix, doc, removal, misc
    docs/changes/48.feature.md
    
  8. Commit the migration script and tests to version control

Note

Revision ID Format

NexusLIMS uses version-based revision IDs with letter suffixes:

  • Format: vMAJOR_MINOR_PATCHletter (e.g., v2_5_0a, v2_5_0b, v2_4_0a)

  • Benefits: Clear association with release versions, sequential ordering within a version

  • Multiple migrations per version: Use sequential letters (a, b, c, …)

  • Manual specification: Use --rev-id flag when generating migrations

Migration Configuration#

Alembic configuration is stored in:

  • pyproject.toml under [tool.alembic] - Source code configuration (migration paths, etc.)

  • nexusLIMS/migrations/env.py - Migration environment setup (automatically reads NX_DB_PATH)

  • nexusLIMS/migrations/versions/ - Migration scripts directory (shipped in the package)

The database URL is automatically set from the NX_DB_PATH environment variable in env.py, so you don’t need to configure it separately. All Alembic configuration lives in pyproject.toml, eliminating the need for a separate alembic.ini file.

The nexuslims db CLI command automatically locates the migrations directory using importlib.resources, so migrations work correctly whether NexusLIMS is installed via pip, uv, or run from source.

Important Notes#

  • Always backup your database before running migrations on production data

  • Test migrations thoroughly in a development environment first

  • Never edit applied migrations - create a new migration to fix issues

  • The initial migration (v1_4_3_initial_schema_baseline.py) creates the basic database structure that serves as a basis for later migrations

Database Structure#

NexusLIMS Database Schema Diagram

NexusLIMS Database Schema (auto-generated from SQLModel metadata)#

Note

This diagram is automatically regenerated when you build the documentation, ensuring it always reflects the current database schema. Field descriptions are extracted from the SQLModel class docstrings.

For an interactive Mermaid diagram with detailed field descriptions and relationship documentation, see the Database Schema Diagram (also auto-generated).

The database contains four primary tables:

  1. session_log - Tracks experimental sessions and record building status

    • Records when users start/end experiments

    • Tracks record building attempts and completion

    • Populated by harvesters (e.g., nemo)

  2. instruments - Stores authoritative instrument configuration

    • Instrument names and PIDs

    • Reservation system URLs

    • Data storage paths

    • Harvester configuration

  3. upload_log - Tracks record uploads to CDCS and other export destinations

    • Session identifier and instrument linkage

    • CDCS record ID and PID

    • Upload timestamps and success status

    • Error tracking for failed uploads

  4. external_user_identifiers - Maps NexusLIMS usernames to external system IDs

    • Star-topology design with nexuslims_username as canonical identifier

    • Supports NEMO, LabArchives ELN, LabArchives Scheduler, CDCS

    • Bidirectional lookup between internal and external identities

    • Tracks creation and verification timestamps

The session_log Table#

Purpose#

The session_log table tracks experimental sessions from start to finish. Harvesters (like nemo) populate this table by parsing reservation system APIs and creating timestamped event logs.

How It Works#

  1. Session Events - Each row represents a timestamped event:

    • START - User begins experiment

    • END - User completes experiment

    • RECORD_GENERATION - Record building attempted

  2. Session Linking - Events are linked by session_identifier to represent a complete experimental session

  3. Record Building Workflow:

    • Harvester creates START and END events with status TO_BE_BUILT

    • Back-end polls for sessions with TO_BE_BUILT status

    • Record builder finds files created between start/end timestamps

    • Status updated to COMPLETED or ERROR to prevent duplicates

    • See record building for details

Table Schema#

The following columns define the session_log table structure:

Column

Data type

Description

id_session_log

INTEGER

The auto-incrementing primary key identifier for this table (just a generic number).

Checks: must not be NULL

session_identifier

VARCHAR(36)

A unique string (could be a UUID) that is consistent among a single record’s "START", "END", and "RECORD_GENERATION" events.

Checks: must not be NULL

instrument

VARCHAR(100)

The instrument PID associated with this session (this value is a foreign key reference to the instruments table).

Checks: value must be one of those from the instrument_pid column of the instruments table.

timestamp

DATETIME

The date and time of the logged event in ISO timestamp format.

Default: strftime('%Y-%m-%dT%H:%M:%f', 'now', 'localtime')

Checks: must not be NULL

event_type

TEXT

The type of log for this session.

Checks: must be one of "START", "END", or "RECORD_GENERATION".

record_status

TEXT

The status of the record associated with this session. This value will be updated after a record is built for a given session.

Default: "WAITING_FOR_END"

Checks: must be one of:
"WAITING_FOR_END" - session has a start event, but no end event
"TO_BE_BUILT" - session has ended, but record not yet built
"COMPLETED" - record has been built successfully
"ERROR" - some error happened during record generation
"NO_FILES_FOUND" - record generation occurred, but no files matched time span
"NO_CONSENT" - user did not consent to data harvesting
"NO_RESERVATION" - no matching reservation found for this session

user

VARCHAR(50)

A username associated with this session (if known) – this value is not currently used by the back-end since it is not reliable across different instruments.

The instruments Table#

This table serves as the authoritative data source for the NexusLIMS back-end regarding information about the instruments in the Nexus Facility. By locating this information in an external database, changes to instrument configuration (or addition of a new instrument) requires making adjustments to just one location, simplifying maintenance of the system.

Back-end implementation details

When the nexusLIMS module is imported, one of the “setup” tasks performed is to perform a basic object-relational mapping between rows of the instruments table from the database into Instrument objects. These objects are stored in a dictionary attribute named nexusLIMS.instruments.instrument_db. This is done by querying the database specified in the environment variable NX_DB_PATH and creating a dictionary of Instrument objects that contain information about all of the instruments specified in the database. These objects are used widely throughout the code so that the database is only queried once at initial import, rather than every time information is needed.

Column

Data type

Description

instrument_pid

VARCHAR(100)

The unique identifier for an instrument in the facility, typically built from the make, model, and type of instrument, plus a unique numeric code (e.g. Vendor-Model-Type-12345 )

api_url

TEXT

The calendar API endpoint url for this instrument’s scheduler. For NEMO, should be of the format https://<nemo_address>/api/tools/?id=<tool_id>

calendar_url

TEXT

The URL to this instrument’s web-accessible calendar

location

VARCHAR(100)

The physical location of this instrument

display_name

TEXT

The human-readable instrument name displayed in NexusLIMS records

property_tag

VARCHAR(20)

A unique numeric identifier for this instrument (not used by NexusLIMS, but for reference and potential future use)

filestore_path

TEXT

The path (relative to central storage location specified in NX_INSTRUMENT_DATA_PATH) where this instrument stores its data (e.g. ./instrument_name)

harvester

TEXT

The specific submodule within nexusLIMS.harvesters that should be used to harvest reservation information for this instrument. Possible values: nemo.

timezone

TEXT

The timezone in which this instrument is located, in the format of the IANA timezone database (e.g. America/New_York). This is used to properly localize dates and times when communicating with the harvester APIs.

The upload_log Table#

Added in version 2.4.0: The upload_log table was added to track record exports to external repositories.

Purpose#

The upload_log table tracks export attempts to destination repositories like CDCS and LabArchives. It enables multi-destination export with granular success/failure tracking per destination and session.

How It Works#

  1. Per-Destination Tracking - Each export attempt creates a new row, allowing one session to be exported to multiple destinations

  2. Success/Failure Recording - Stores whether export succeeded and captures error messages if it failed

  3. Record Linkage - Stores destination-specific record IDs and URLs for successful exports

  4. Retry Logic - Failed exports can be identified and retried based on success status

Table Schema#

Column

Data type

Description

id

INTEGER

The auto-incrementing primary key identifier for this table.

Checks: must not be NULL

session_identifier

VARCHAR(36)

Foreign key reference to session_log.session_identifier.

Checks: must not be NULL
Indexed: for efficient session lookup

destination_name

VARCHAR(100)

Name of the export destination (e.g., "cdcs", "labarchives").

Checks: must not be NULL
Indexed: for destination-specific queries

success

BOOLEAN

Whether the export succeeded (TRUE) or failed (FALSE).

Checks: must not be NULL

timestamp

DATETIME

When the export attempt occurred (timezone-aware).

Checks: must not be NULL

record_id

VARCHAR(255)

Destination-specific record identifier (e.g., CDCS document ID).

Optional: NULL if export failed

record_url

VARCHAR(500)

Direct URL to view the exported record in the destination system.

Optional: NULL if export failed

error_message

TEXT

Error message if export failed.

Optional: NULL if export succeeded

metadata_json

TEXT

JSON-serialized dict with destination-specific metadata.

Optional: for storing additional export context

The external_user_identifiers Table#

Added in version 2.5.0: The external_user_identifiers table was added to map NexusLIMS usernames to external system IDs.

Purpose#

The external_user_identifiers table maintains mappings between NexusLIMS usernames and user identifiers in external systems (NEMO, LabArchives ELN, LabArchives Scheduler, CDCS). This enables bidirectional user identity lookup and supports integration with external APIs.

How It Works#

  1. Star Topology Design - Uses nexuslims_username (from session_log.user) as the canonical identifier, with mappings to each external system

  2. Bidirectional Lookup - Supports both forward lookup (NexusLIMS → external ID) and reverse lookup (external ID → NexusLIMS)

  3. Uniqueness Constraints:

    • One external ID per system per NexusLIMS user

    • One NexusLIMS user per external ID per system

  4. Verification Tracking - Records when mappings are created and last verified for auditing

Helper Functions#

The nexusLIMS.db.models module provides convenience functions:

Table Schema#

Column

Data type

Description

id

INTEGER

The auto-incrementing primary key identifier for this table.

Checks: must not be NULL

nexuslims_username

VARCHAR

Canonical username in NexusLIMS (from session_log.user).

Checks: must not be NULL
Indexed: for efficient lookup
Unique Constraint: with external_system

external_system

TEXT

External system identifier.

Checks: must be one of "nemo", "labarchives_eln", "labarchives_scheduler", "cdcs"
Unique Constraints: with nexuslims_username and with external_id

external_id

TEXT

User ID/username in the external system.

Checks: must not be NULL
Unique Constraint: with external_system

email

VARCHAR

User’s email for verification/matching purposes.

Optional: may be NULL

created_at

DATETIME

When this mapping was created (timezone-aware, UTC).

Checks: must not be NULL
Default: current UTC timestamp

last_verified_at

DATETIME

Last time this mapping was verified (timezone-aware, UTC).

Optional: NULL until first verification

notes

TEXT

Additional notes about this mapping (e.g., "OAuth registration", "From NEMO harvester").

Optional: may be NULL