pyXLMS.pipelines package#
Module contents#
Predefined data transformation pipelines for crosslink-spectrum-matches and crosslinks.
Examples
>>> from pyXLMS.pipelines import pipeline
>>> pr = pipeline(
... "data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_CSMs.xlsx",
... engine="MS Annika",
... crosslinker="DSS",
... unique=True,
... validate={"fdr": 0.05, "formula": "(TD-DD)/TT"},
... targets_only=True,
... )
Reading MS Annika CSMs...: 100%|██████████████████████████████████████████████████| 826/826 [00:00<00:00, 10337.98it/s]
---- Summary statistics before pipeline ----
Number of CSMs: 826.0
Number of unique CSMs: 826.0
Number of intra CSMs: 803.0
Number of inter CSMs: 23.0
Number of target-target CSMs: 786.0
Number of target-decoy CSMs: 39.0
Number of decoy-decoy CSMs: 1.0
Minimum CSM score: 1.11
Maximum CSM score: 452.99
Iterating over scores for FDR calculation...: 0%| | 0/826 [00:00<?, ?it/s]
---- Summary statistics after pipeline ----
Number of CSMs: 786.0
Number of unique CSMs: 786.0
Number of intra CSMs: 774.0
Number of inter CSMs: 12.0
Number of target-target CSMs: 786.0
Number of target-decoy CSMs: 0.0
Number of decoy-decoy CSMs: 0.0
Minimum CSM score: 1.28
Maximum CSM score: 452.99
---- Performed pipeline steps ----
:: parser.read() ::
:: parser.read() :: params :: <params omitted>
:: transform.unique() ::
:: transform.unique() :: params :: by=peptide
:: transform.unique() :: params :: score=higher_better
:: transform.validate() ::
:: transform.validate() :: params :: fdr=0.05
:: transform.validate() :: params :: formula=(TD-DD)/TT
:: transform.validate() :: params :: score=higher_better
:: transform.validate() :: params :: separate_intra_inter=False
:: transform.validate() :: params :: ignore_missing_labels=False
:: transform.targets_only() ::
:: transform.targets_only() :: params :: no params
- pyXLMS.pipelines.pipeline(
- files: str | List[str] | BinaryIO,
- engine: Literal['Custom', 'MaxQuant', 'MaxLynx', 'MeroX', 'MS Annika', 'mzIdentML', 'pLink', 'Scout', 'xiSearch/xiFDR', 'xiNET/xiVIEW', 'XlinkX'],
- crosslinker: str,
- unique: bool | Dict[str, Any] | None = True,
- validate: bool | Dict[str, Any] | None = True,
- targets_only: bool | None = True,
- **kwargs,
Runs a standard down-stream analysis pipeline for crosslinks and crosslink-spectrum-matches.
Runs a standard down-stream analysis pipeline for crosslinks and crosslink-spectrum-matches. The pipeline first reads a result file and subsequently optionally filters the the read data for unique crosslinks and crosslink-spectrum-matches, optionally the data is validated by false discovery rate estimation and - also optionally - only target-target matches are returned. Internally the pipeline calls
parser.read(),transform.unique(),transform.validate(), andtransform.targets_only().- Parameters:
files (str, list of str, or file stream) – The name/path of the result file(s) or a file-like object/stream.
engine ("Custom", "MaxQuant", "MaxLynx", "MeroX", "MS Annika", "mzIdentML", "pLink", "Scout", "xiSearch/xiFDR", "xiNET/xiVIEW", or "XlinkX") – Crosslink search engine or format of the result file.
crosslinker (str) – Name of the used cross-linking reagent, for example “DSSO”.
unique (dict of str, any, or bool, or None, default = True) – If
transform.unique()should be run in the pipeline. If None or False this step is omitted. If True this step is run with default parameters. If a dictionary is given it should contain parameters for runningtransform.unique(). Omitting a parameter in the dictionary will fall back to its default value.validate (dict of str, any, or bool, or None, default = True) – If
transform.validate()should be run in the pipeline. If None or False this step is omitted. If True this step is run with default parameters. If a dictionary is given it should contain parameters for runningtransform.validate(). Omitting a parameter in the dictionary will fall back to its default value.targets_only (bool, or None, default = True) – If
transform.targets_only()should be run in the pipeline. If None or False this step is omitted.**kwargs – Any additional parameters will be passed to the specific result file parsers.
- Returns:
The transformed parser_result after all pipeline steps are completed.
- Return type:
- Raises:
TypeError – If any of the parameters do not have the correct type.
Notes
Various helpful pipeline information is also printed to
stdout.Examples
>>> from pyXLMS.pipelines import pipeline >>> pr = pipeline( ... "data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_CSMs.xlsx", ... engine="MS Annika", ... crosslinker="DSS", ... unique=True, ... validate={"fdr": 0.05, "formula": "(TD-DD)/TT"}, ... targets_only=True, ... ) Reading MS Annika CSMs...: 100%|██████████████████████████████████████████████████| 826/826 [00:00<00:00, 10337.98it/s] ---- Summary statistics before pipeline ---- Number of CSMs: 826.0 Number of unique CSMs: 826.0 Number of intra CSMs: 803.0 Number of inter CSMs: 23.0 Number of target-target CSMs: 786.0 Number of target-decoy CSMs: 39.0 Number of decoy-decoy CSMs: 1.0 Minimum CSM score: 1.11 Maximum CSM score: 452.99 Iterating over scores for FDR calculation...: 0%| | 0/826 [00:00<?, ?it/s] ---- Summary statistics after pipeline ---- Number of CSMs: 786.0 Number of unique CSMs: 786.0 Number of intra CSMs: 774.0 Number of inter CSMs: 12.0 Number of target-target CSMs: 786.0 Number of target-decoy CSMs: 0.0 Number of decoy-decoy CSMs: 0.0 Minimum CSM score: 1.28 Maximum CSM score: 452.99 ---- Performed pipeline steps ---- :: parser.read() :: :: parser.read() :: params :: <params omitted> :: transform.unique() :: :: transform.unique() :: params :: by=peptide :: transform.unique() :: params :: score=higher_better :: transform.validate() :: :: transform.validate() :: params :: fdr=0.05 :: transform.validate() :: params :: formula=(TD-DD)/TT :: transform.validate() :: params :: score=higher_better :: transform.validate() :: params :: separate_intra_inter=False :: transform.validate() :: params :: ignore_missing_labels=False :: transform.targets_only() :: :: transform.targets_only() :: params :: no params