nexusLIMS.schemas.activity#
The “Acquisition Activity” module.
Provides a class to represent and operate on an Acquisition Activity (as defined by the
NexusLIMS Experiment schema), as well as a helper method to cluster a list of
filenames by the files’ modification times.
Module Contents#
Classes#
A collection of files/metadata attributed to a physical acquisition activity. |
Functions#
Cluster a list of files by modification time. |
API#
- nexusLIMS.schemas.activity.cluster_filelist_mtimes(filelist: List[str]) List[float][source]#
Cluster a list of files by modification time.
Perform a statistical clustering of the timestamps (
mtimevalues) of a list of files to find “relatively” large gaps in acquisition time. The definition ofrelativelydepends on the context of the entire list of files. For example, if many files are simultaneously acquired, the “inter-file” time spacing between these will be very small (near zero), meaning even fairly short gaps between files may be important. Conversely, if files are saved every 30 seconds or so, the tolerance for a “large gap” will need to be correspondingly larger.The approach this method uses is to detect minima in the Kernel Density Estimation (KDE) of the file modification times. To determine the optimal bandwidth parameter to use in KDE, a grid search over possible appropriate bandwidths is performed, using Leave One Out cross-validation. This approach allows the method to determine the important gaps in file acquisition times with sensitivity controlled by the distribution of the data itself, rather than a pre-supposed optimum. The KDE minima approach was suggested here.
The sensitivity of the clustering can be controlled via the
NX_CLUSTERING_SENSITIVITYenvironment variable:Values > 1.0 make clustering more sensitive to time gaps (more activities)
Values < 1.0 make clustering less sensitive (fewer activities)
Value of 0 disables clustering entirely (all files in one activity)
Default is 1.0 (no adjustment to automatic clustering)
- Parameters:
filelist (List[str]) – The files (as a list) whose timestamps will be interrogated to find “relatively” large gaps in acquisition time (as a means to find the breaks between discrete Acquisition Activities)
- Returns:
aa_boundaries – A list of the
mtimevalues that represent boundaries between discrete Acquisition Activities. Returns empty list if clustering is disabled or only one file is provided.- Return type:
List[float]
- class nexusLIMS.schemas.activity.AcquisitionActivity[source]#
A collection of files/metadata attributed to a physical acquisition activity.
Instances of this class correspond to AcquisitionActivity nodes in the NexusLIMS schema.
- Parameters:
start (datetime) – The start point of this AcquisitionActivity
end (datetime) – The end point of this AcquisitionActivity
mode (str) – The microscope mode for this AcquisitionActivity (i.e. ‘IMAGING’, ‘DIFFRACTION’, ‘SCANNING’, etc.)
unique_params (set) – A set of dictionary keys that comprises all unique metadata keys contained within the files of this AcquisitionActivity
setup_params (dict) – A dictionary containing metadata about the data that is shared amongst all data files in this AcquisitionActivity
unique_meta (list) – A list of dictionaries (one for each file in this AcquisitionActivity) containing metadata key-value pairs that are unique to each file in
files(i.e. those that could not be moved intosetup_params)files (list) – A list of filenames belonging to this AcquisitionActivity
previews (list) – A list of filenames pointing to the previews for each file in
filesmeta (list) – A list of dictionaries containing the “important” metadata for each file in
fileswarnings (list) – A list of metadata values that may be untrustworthy because of the software
- start#
Type: datetime.datetime | None
- end#
Type: datetime.datetime | None
- mode = <Multiline-String>#
Type: str
- unique_params#
Type: set | None
- setup_params#
Type: dict | None
- unique_meta#
Type: list | None
- files#
‘field(…)’
Type: list
- previews#
‘field(…)’
Type: list
- meta#
‘field(…)’
Type: list
- warnings#
‘field(…)’
Type: list
- add_file(fname: Path, *, generate_preview=True)[source]#
Add file to AcquisitionActivity.
Add a file to this activity’s file list, parse its metadata (storing a flattened copy of it to this activity), and generate a preview thumbnail.
parse_metadata always returns a list of metadata dicts (one per signal). For files containing multiple signals (e.g., multi-signal DM3/DM4 files), this method adds one entry per signal to the parallel lists, repeating the filename for each signal but using different preview paths and metadata.
- store_unique_params()[source]#
Store unique metadata keys.
Analyze the metadata keys contained in this AcquisitionActivity and store the unique values in a set (
self.unique_params).
- store_setup_params(values_to_search=None)[source]#
Store common metadata keys as “setup parameters”.
Search the metadata of files in this AcquisitionActivity for those containing identical values over all files, which will then be defined as parameters attributed to experimental setup, rather than individual datasets.
Stores a dictionary containing the metadata keys and values that are consistent across all files in this AcquisitionActivity as an attribute (
self.setup_params).- Parameters:
values_to_search (list) – A list (or tuple, set, or other iterable type) containing values to search for in the metadata dictionary list. If None (default), all values contained in any file will be searched.
- store_unique_metadata()[source]#
Store unique metadata keys as unique to each file.
For each file in this AcquisitionActivity, stores the metadata that is unique rather than common to the entire AcquisitionActivity (which are kept in
self.setup_params.
- as_xml(seqno, sample_id)[source]#
Translate AcquisitionActivity to an XML representation.
Build an XML (
lxml) representation of this AcquisitionActivity (for use in instances of the NexusLIMS schema).- Parameters:
- Returns:
activity_xml – A string representing this AcquisitionActivity (note: is not a properly-formed complete XML document since it does not have a header or namespace definitions)
- Return type: