pyXLMS.pipelines package#

Module contents#

Predefined data transformation pipelines for crosslink-spectrum-matches and crosslinks.

Examples

>>> from pyXLMS.pipelines import pipeline
>>> pr = pipeline(
...     "data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_CSMs.xlsx",
...     engine="MS Annika",
...     crosslinker="DSS",
...     unique=True,
...     validate={"fdr": 0.05, "formula": "(TD-DD)/TT"},
...     targets_only=True,
... )
Reading MS Annika CSMs...: 100%|██████████████████████████████████████████████████| 826/826 [00:00<00:00, 10337.98it/s]
---- Summary statistics before pipeline ----
Number of CSMs: 826.0
Number of unique CSMs: 826.0
Number of intra CSMs: 803.0
Number of inter CSMs: 23.0
Number of target-target CSMs: 786.0
Number of target-decoy CSMs: 39.0
Number of decoy-decoy CSMs: 1.0
Minimum CSM score: 1.11
Maximum CSM score: 452.99
Iterating over scores for FDR calculation...:   0%|                                            | 0/826 [00:00<?, ?it/s]
---- Summary statistics after pipeline ----
Number of CSMs: 786.0
Number of unique CSMs: 786.0
Number of intra CSMs: 774.0
Number of inter CSMs: 12.0
Number of target-target CSMs: 786.0
Number of target-decoy CSMs: 0.0
Number of decoy-decoy CSMs: 0.0
Minimum CSM score: 1.28
Maximum CSM score: 452.99
---- Performed pipeline steps ----
:: parser.read() ::
:: parser.read() :: params :: <params omitted>
:: transform.unique() ::
:: transform.unique() :: params :: by=peptide
:: transform.unique() :: params :: score=higher_better
:: transform.validate() ::
:: transform.validate() :: params :: fdr=0.05
:: transform.validate() :: params :: formula=(TD-DD)/TT
:: transform.validate() :: params :: score=higher_better
:: transform.validate() :: params :: separate_intra_inter=False
:: transform.validate() :: params :: ignore_missing_labels=False
:: transform.targets_only() ::
:: transform.targets_only() :: params :: no params
pyXLMS.pipelines.pipeline(
files: str | List[str] | BinaryIO,
engine: Literal['Custom', 'MaxQuant', 'MaxLynx', 'MeroX', 'MS Annika', 'mzIdentML', 'pLink', 'Scout', 'xiSearch/xiFDR', 'xiNET/xiVIEW', 'XlinkX'],
crosslinker: str,
unique: bool | Dict[str, Any] | None = True,
validate: bool | Dict[str, Any] | None = True,
targets_only: bool | None = True,
**kwargs,
) ParserResult[source]#

Runs a standard down-stream analysis pipeline for crosslinks and crosslink-spectrum-matches.

Runs a standard down-stream analysis pipeline for crosslinks and crosslink-spectrum-matches. The pipeline first reads a result file and subsequently optionally filters the the read data for unique crosslinks and crosslink-spectrum-matches, optionally the data is validated by false discovery rate estimation and - also optionally - only target-target matches are returned. Internally the pipeline calls parser.read(), transform.unique(), transform.validate(), and transform.targets_only().

Parameters:
  • files (str, list of str, or file stream) – The name/path of the result file(s) or a file-like object/stream.

  • engine ("Custom", "MaxQuant", "MaxLynx", "MeroX", "MS Annika", "mzIdentML", "pLink", "Scout", "xiSearch/xiFDR", "xiNET/xiVIEW", or "XlinkX") – Crosslink search engine or format of the result file.

  • crosslinker (str) – Name of the used cross-linking reagent, for example “DSSO”.

  • unique (dict of str, any, or bool, or None, default = True) – If transform.unique() should be run in the pipeline. If None or False this step is omitted. If True this step is run with default parameters. If a dictionary is given it should contain parameters for running transform.unique(). Omitting a parameter in the dictionary will fall back to its default value.

  • validate (dict of str, any, or bool, or None, default = True) – If transform.validate() should be run in the pipeline. If None or False this step is omitted. If True this step is run with default parameters. If a dictionary is given it should contain parameters for running transform.validate(). Omitting a parameter in the dictionary will fall back to its default value.

  • targets_only (bool, or None, default = True) – If transform.targets_only() should be run in the pipeline. If None or False this step is omitted.

  • **kwargs – Any additional parameters will be passed to the specific result file parsers.

Returns:

The transformed parser_result after all pipeline steps are completed.

Return type:

ParserResult

Raises:

TypeError – If any of the parameters do not have the correct type.

Notes

Various helpful pipeline information is also printed to stdout.

Examples

>>> from pyXLMS.pipelines import pipeline
>>> pr = pipeline(
...     "data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_CSMs.xlsx",
...     engine="MS Annika",
...     crosslinker="DSS",
...     unique=True,
...     validate={"fdr": 0.05, "formula": "(TD-DD)/TT"},
...     targets_only=True,
... )
Reading MS Annika CSMs...: 100%|██████████████████████████████████████████████████| 826/826 [00:00<00:00, 10337.98it/s]
---- Summary statistics before pipeline ----
Number of CSMs: 826.0
Number of unique CSMs: 826.0
Number of intra CSMs: 803.0
Number of inter CSMs: 23.0
Number of target-target CSMs: 786.0
Number of target-decoy CSMs: 39.0
Number of decoy-decoy CSMs: 1.0
Minimum CSM score: 1.11
Maximum CSM score: 452.99
Iterating over scores for FDR calculation...:   0%|                                            | 0/826 [00:00<?, ?it/s]
---- Summary statistics after pipeline ----
Number of CSMs: 786.0
Number of unique CSMs: 786.0
Number of intra CSMs: 774.0
Number of inter CSMs: 12.0
Number of target-target CSMs: 786.0
Number of target-decoy CSMs: 0.0
Number of decoy-decoy CSMs: 0.0
Minimum CSM score: 1.28
Maximum CSM score: 452.99
---- Performed pipeline steps ----
:: parser.read() ::
:: parser.read() :: params :: <params omitted>
:: transform.unique() ::
:: transform.unique() :: params :: by=peptide
:: transform.unique() :: params :: score=higher_better
:: transform.validate() ::
:: transform.validate() :: params :: fdr=0.05
:: transform.validate() :: params :: formula=(TD-DD)/TT
:: transform.validate() :: params :: score=higher_better
:: transform.validate() :: params :: separate_intra_inter=False
:: transform.validate() :: params :: ignore_missing_labels=False
:: transform.targets_only() ::
:: transform.targets_only() :: params :: no params