pyXLMS package#
Subpackages#
- pyXLMS.exporter package
- Submodules
- pyXLMS.exporter.to_impxfdr module
- pyXLMS.exporter.to_msannika module
- pyXLMS.exporter.to_pyxlinkviewer module
- pyXLMS.exporter.to_xifdr module
- pyXLMS.exporter.to_xinet module
- pyXLMS.exporter.to_xiview module
- pyXLMS.exporter.to_xlinkdb module
- pyXLMS.exporter.to_xlmstools module
- pyXLMS.exporter.to_xmas module
- pyXLMS.exporter.util module
- Module contents
- pyXLMS.parser package
- Submodules
- pyXLMS.parser.parser_xldbse_custom module
- pyXLMS.parser.parser_xldbse_maxquant module
- pyXLMS.parser.parser_xldbse_merox module
- pyXLMS.parser.parser_xldbse_msannika module
- pyXLMS.parser.parser_xldbse_mzid module
- pyXLMS.parser.parser_xldbse_plink module
- pyXLMS.parser.parser_xldbse_scout module
- pyXLMS.parser.parser_xldbse_xi module
- pyXLMS.parser.parser_xldbse_xlinkx module
- pyXLMS.parser.util module
- Module contents
detect_plink_filetype()
detect_scout_filetype()
detect_xi_filetype()
parse_modifications_from_maxquant_sequence()
parse_modifications_from_scout_sequence()
parse_modifications_from_xi_sequence()
parse_peptide()
parse_scan_nr_from_mzid()
parse_scan_nr_from_plink()
parse_spectrum_file_from_plink()
pyxlms_modification_str_parser()
read()
read_custom()
read_maxlynx()
read_maxquant()
read_merox()
read_msannika()
read_mzid()
read_plink()
read_scout()
read_xi()
read_xlinkx()
- pyXLMS.plotting package
- pyXLMS.transform package
- Submodules
- pyXLMS.transform.aggregate module
- pyXLMS.transform.filter module
- pyXLMS.transform.reannotate_positions module
- pyXLMS.transform.summary module
- pyXLMS.transform.targets_only module
- pyXLMS.transform.to_dataframe module
- pyXLMS.transform.to_proforma module
- pyXLMS.transform.util module
- pyXLMS.transform.validate module
- Module contents
Submodules#
pyXLMS.constants module#
- pyXLMS.constants.AMINO_ACIDS = {'A', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'Y'}#
List of valid amino acids.
List of one-letter codes for all valid amino acids.
Examples
>>> from pyXLMS.constants import AMINO_ACIDS >>> "A" in AMINO_ACIDS True >>> "B" in AMINO_ACIDS False
- pyXLMS.constants.AMINO_ACIDS_1TO3 = {'A': 'ALA', 'C': 'CYS', 'D': 'ASP', 'E': 'GLU', 'F': 'PHE', 'G': 'GLY', 'H': 'HIS', 'I': 'ILE', 'K': 'LYS', 'L': 'LEU', 'M': 'MET', 'N': 'ASN', 'P': 'PRO', 'Q': 'GLN', 'R': 'ARG', 'S': 'SER', 'T': 'THR', 'V': 'VAL', 'W': 'TRP', 'Y': 'TYR'}#
Mapping of amino acid 1-letter codes to their 3-letter codes.
Mapping of all amino acid 1-letter codes to their corresponding 3-letter codes.
Examples
>>> from pyXLMS.constants import AMINO_ACIDS_1TO3 >>> AMINO_ACIDS_1TO3["G"] 'GLY'
- pyXLMS.constants.AMINO_ACIDS_3TO1 = {'ALA': 'A', 'ARG': 'R', 'ASN': 'N', 'ASP': 'D', 'CYS': 'C', 'GLN': 'Q', 'GLU': 'E', 'GLY': 'G', 'HIS': 'H', 'ILE': 'I', 'LEU': 'L', 'LYS': 'K', 'MET': 'M', 'PHE': 'F', 'PRO': 'P', 'SER': 'S', 'THR': 'T', 'TRP': 'W', 'TYR': 'Y', 'VAL': 'V'}#
Mapping of amino acid 3-letter codes to their 1-letter codes.
Mapping of all amino acid 3-letter codes to their corresponding 1-letter codes.
Examples
>>> from pyXLMS.constants import AMINO_ACIDS_3TO1 >>> AMINO_ACIDS_3TO1["GLY"] 'G'
- pyXLMS.constants.CROSSLINKERS = {'ADH': 138.09054635, 'BS3': 138.06808, 'DSBSO': 308.03883, 'DSBU': 196.08479231, 'DSS': 138.06808, 'DSSO': 158.00376, 'PhoX': 209.97181}#
Dictionary of crosslinkers.
Dictionary of pre-defined crosslinkers that maps crosslinker names to crosslinker delta masses. Currently contains “BS3”, “DSS”, “DSSO”, “ADH”, “DSBSO”, “PhoX”.
Examples
>>> from pyXLMS.constants import CROSSLINKERS >>> CROSSLINKERS["BS3"] 138.06808
- pyXLMS.constants.MEROX_MODIFICATION_MAPPING = {'B': {'Amino Acid': 'C', 'Modification': ('Carbamidomethyl', 57.021464)}, 'm': {'Amino Acid': 'M', 'Modification': ('Oxidation', 15.994915)}}#
Dictionary that maps MeroX modification symbols to their corresponding amino acids and post-translational-modifications.
Dictionary that maps MeroX modification symbols (e.g. “B”) to their corresponding amino acids and post-translational-modifications (e.g.
{"Amino Acid": "C", "Modification": ("Carbamidomethyl", 57.021464)}
).Examples
>>> from pyXLMS.constants import MEROX_MODIFICATION_MAPPING >>> MEROX_MODIFICATION_MAPPING["B"] {'Amino Acid': 'C', 'Modification': ('Carbamidomethyl', 57.021464)}
>>> from pyXLMS.constants import MEROX_MODIFICATION_MAPPING >>> MEROX_MODIFICATION_MAPPING["m"] {'Amino Acid': 'M', 'Modification': ('Oxidation', 15.994915)}
>>> from pyXLMS.constants import MEROX_MODIFICATION_MAPPING >>> MEROX_MODIFICATION_MAPPING {'B': {'Amino Acid': 'C', 'Modification': ('Carbamidomethyl', 57.021464)}, 'm': {'Amino Acid': 'M', 'Modification': ('Oxidation', 15.994915)}}
- pyXLMS.constants.MODIFICATIONS = {'ADH': 138.09054635, 'Acetyl': 42.010565, 'BS3': 138.06808, 'Carbamidomethyl': 57.021464, 'DSBSO': 308.03883, 'DSBU': 196.08479231, 'DSS': 138.06808, 'DSSO': 158.00376, 'Oxidation': 15.994915, 'PhoX': 209.97181, 'Phospho': 79.966331}#
Dictionary of post-translational-modifications.
Dictionary of pre-defined post-translational-modifications that maps modification names to modification delta masses. Currently contains “Carbamidomethyl”, “Oxidation”, “Phospho”, “Acetyl” and all crosslinkers.
Examples
>>> from pyXLMS.constants import MODIFICATIONS >>> MODIFICATIONS["Carbamidomethyl"] 57.021464 >>> MODIFICATIONS["BS3"] 138.06808
- pyXLMS.constants.SCOUT_MODIFICATION_MAPPING = {'+15.994900': ('Oxidation', 15.994915), '+57.021460': ('Carbamidomethyl', 57.021464), 'ADH': ('ADH', 138.09054635), 'BS3': ('BS3', 138.06808), 'Carbamidomethyl': ('Carbamidomethyl', 57.021464), 'DSBSO': ('DSBSO', 308.03883), 'DSBU': ('DSBU', 196.08479231), 'DSS': ('DSS', 138.06808), 'DSSO': ('DSSO', 158.00376), 'Oxidation of Methionine': ('Oxidation', 15.994915), 'PhoX': ('PhoX', 209.97181)}#
Dictionary that maps sequence elements and modifications from Scout to their corresponding post-translational-modifications.
Dictionary that maps sequence elements (e.g. “+57.021460”) and modifications (e.g. “Carbamidomethyl”) from Scout to their corresponding post-translational-modifications (e.g. (“Carbamidomethyl”, 57.021464)).
Examples
>>> from pyXLMS.constants import SCOUT_MODIFICATION_MAPPING >>> SCOUT_MODIFICATION_MAPPING["+57.021460"] ('Carbamidomethyl', 57.021464) >>> SCOUT_MODIFICATION_MAPPING["Carbamidomethyl"] ('Carbamidomethyl', 57.021464) >>> SCOUT_MODIFICATION_MAPPING["Oxidation of Methionine"] ('Oxidation', 15.994915)
- pyXLMS.constants.XI_MODIFICATION_MAPPING = {'->': ('Substitution', nan), 'bs3_ami': ('BS3 Amidated', 155.094619105), 'bs3_hyd': ('BS3 Hydrolized', 156.0786347), 'bs3_tris': ('BS3 Tris', 259.141973), 'bs3loop': ('BS3 Looplink', 138.06808), 'bs3nh2': ('BS3 Amidated', 155.094619105), 'bs3oh': ('BS3 Hydrolized', 156.0786347), 'cm': ('Carbamidomethyl', 57.021464), 'dsbu_ami': ('DSBU Amidated', 213.111341), 'dsbu_hyd': ('DSBU Hydrolized', 214.095357), 'dsbu_loop': ('DSBU Looplink', 196.08479231), 'dsbu_tris': ('DSBU Tris', 317.158685), 'dsbuloop': ('DSBU Looplink', 196.08479231), 'dsso_ami': ('DSSO Amidated', 175.030313905), 'dsso_hyd': ('DSSO Hydrolized', 176.0143295), 'dsso_loop': ('DSSO Looplink', 158.00376), 'dsso_tris': ('DSSO Tris', 279.077658), 'dssoloop': ('DSSO Looplink', 158.00376), 'ox': ('Oxidation', 15.994915)}#
Dictionary that maps sequence elements from xiSearch and xiFDR to their corresponding post-translational-modifications.
Dictionary that maps sequence elements (e.g. “cm”) from xiSearch and xiFDR to their corresponding post-translational-modifications (e.g. (“Carbamidomethyl”, 57.021464)).
Examples
>>> from pyXLMS.constants import XI_MODIFICATION_MAPPING >>> XI_MODIFICATION_MAPPING["cm"] ('Carbamidomethyl', 57.021464) >>> XI_MODIFICATION_MAPPING["ox"] ('Oxidation', 15.994915)
pyXLMS.data module#
- pyXLMS.data.check_indexing(value: int | List[int]) bool [source]#
Checks that the given value is not 0-based.
- Parameters:
value (int, or list of int) – The value(s) to check.
- Returns:
If the given value(s) is/are okay.
- Return type:
bool
- Raises:
ValueError – If any of the values are smaller than one.
Examples
>>> from pyXLMS.data import check_indexing >>> check_indexing([1, 2, 3]) True
- pyXLMS.data.check_input(
- parameter: Any,
- parameter_name: str,
- supported_class: Any,
- supported_subclass: Any | None = None,
Checks if the given parameter is of the specified type.
Function that checks if a given parameter is of the specified type and if iterable, all elements are of the specified element type. This is mostly an input check function to catch any errors arising from not supported inputs early.
- Parameters:
parameter (any) – Parameter to check class of.
parameter_name (str) – Name of the parameter.
supported_class (any) – Class the parameter has to be of.
supported_subclass (any, or None, default = None) – Class of the values in case the parameter is a list or dict.
- Returns:
If the given input is okay.
- Return type:
bool
- Raises:
TypeError – If the parameter is not of the given class.
Examples
>>> from pyXLMS.data import check_input >>> check_input("PEPTIDE", "peptide_a", str) True
>>> from pyXLMS.data import check_input >>> check_input([1, 2], "xl_position_proteins_a", list, int) True
- pyXLMS.data.check_input_multi(
- parameter: Any,
- parameter_name: str,
- supported_classes: List[Any],
- supported_subclass: Any | None = None,
Checks if the given parameter is of one of the specified types.
Function that checks if a given parameter is of one of the specified types and if iterable, all elements are of the specified element type. This is mostly an input check function to catch any errors arising from not supported inputs early.
- Parameters:
parameter (any) – Parameter to check class of.
parameter_name (str) – Name of the parameter.
supported_class (list of any) – Classes the parameter has to be of.
supported_subclass (any, or None, default = None) – Class of the values in case the parameter is a list or dict.
- Returns:
If the given input is okay.
- Return type:
bool
- Raises:
TypeError – If the parameter is not of one of the given classes.
Examples
>>> from pyXLMS.data import check_input_multi >>> check_input_multi("PEPTIDE", "peptide_a", [str, list]) True
- pyXLMS.data.create_crosslink(
- peptide_a: str,
- xl_position_peptide_a: int,
- proteins_a: List[str] | None,
- xl_position_proteins_a: List[int] | None,
- decoy_a: bool | None,
- peptide_b: str,
- xl_position_peptide_b: int,
- proteins_b: List[str] | None,
- xl_position_proteins_b: List[int] | None,
- decoy_b: bool | None,
- score: float | None,
- additional_information: Dict[str, Any] | None = None,
Creates a crosslink data structure.
Contains minimal data necessary for representing a single crosslink. The returned crosslink data structure is a dictionary with keys as detailed in the return section.
- Parameters:
peptide_a (str) – The unmodified amino acid sequence of the first peptide.
xl_position_peptide_a (int) – The position of the crosslinker in the sequence of the first peptide (1-based).
proteins_a (list of str, or None) – The accessions of proteins that the first peptide is associated with.
xl_position_proteins_a (list of int, or None) – Positions of the crosslink in the proteins of the first peptide (1-based).
decoy_a (bool, or None) – Whether the alpha peptide is from the decoy database or not.
peptide_b (str) – The unmodified amino acid sequence of the second peptide.
xl_position_peptide_b (int) – The position of the crosslinker in the sequence of the second peptide (1-based).
proteins_b (list of str, or None) – The accessions of proteins that the second peptide is associated with.
xl_position_proteins_b (list of int, or None) – Positions of the crosslink in the proteins of the second peptide (1-based).
decoy_b (bool, or None) – Whether the beta peptide is from the decoy database or not.
score (float, or None) – Score of the crosslink.
additional_information (dict with str keys, or None, default = None) – A dictionary with additional information associated with the crosslink.
- Returns:
The dictionary representing the crosslink with keys
data_type
,completeness
,alpha_peptide
,alpha_peptide_crosslink_position
,alpha_proteins
,alpha_proteins_crosslink_positions
,alpha_decoy
,beta_peptide
,beta_peptide_crosslink_position
,beta_proteins
,beta_proteins_crosslink_positions
,beta_decoy
,crosslink_type
,score
, andadditional_information
. Alpha and beta are assigned based on peptide sequence, the peptide that alphabetically comes first is assigned to alpha.- Return type:
dict
- Raises:
TypeError – If the parameter is not of the given class.
ValueError – If the length of crosslink positions is not equal to the length of proteins.
Notes
The minimum required data for creating a crosslink is:
peptide_a
: The unmodified amino acid sequence of the first peptide.peptide_b
: The unmodified amino acid sequence of the second peptide.xl_position_peptide_a
: The position of the crosslinker in the sequence of the first peptide (1-based).xl_position_peptide_b
: The position of the crosslinker in the sequence of the second peptide (1-based).
Examples
>>> from pyXLMS.data import create_crosslink >>> minimal_crosslink = create_crosslink("PEPTIDEA", 1, None, None, None, "PEPTIDEB", 5, None, None, None, None) >>> crosslink = create_crosslink("PEPTIDEA", 1, ["PROTEINA"], [1], False, "PEPTIDEB", 5, ["PROTEINB"], [3], False, 34.5)
- pyXLMS.data.create_crosslink_from_csm(
- csm: Dict[str, Any],
Creates a crosslink data structure from a crosslink-spectrum-match.
Creates a crosslink data structure from a crosslink-spectrum-match. The returned crosslink data structure is a dictionary with keys as detailed in the return section.
- Parameters:
csm (dict of str) – The crosslink-spectrum-match item to be converted to a crosslink item.
- Returns:
The dictionary representing the crosslink with keys
data_type
,completeness
,alpha_peptide
,alpha_peptide_crosslink_position
,alpha_proteins
,alpha_proteins_crosslink_positions
,alpha_decoy
,beta_peptide
,beta_peptide_crosslink_position
,beta_proteins
,beta_proteins_crosslink_positions
,beta_decoy
,crosslink_type
,score
, andadditional_information
. Alpha and beta are assigned based on peptide sequence, the peptide that alphabetically comes first is assigned to alpha.- Return type:
dict
- Raises:
TypeError – If parameter
csm
is not a valid crosslink-spectrum-match.
Notes
See also
data.create_crosslink()
.Examples
>>> from pyXLMS.data import create_csm_min, create_crosslink_from_csm >>> csm = create_csm_min("PEPTIDEA", 1, "PEPTIDEB", 5, "RUN_1", 1) >>> crosslink = create_crosslink_from_csm(csm)
- pyXLMS.data.create_crosslink_min(
- peptide_a: str,
- xl_position_peptide_a: int,
- peptide_b: str,
- xl_position_peptide_b: int,
- **kwargs,
Creates a crosslink data structure from minimal input.
Contains minimal data necessary for representing a single crosslink. This is an alias for
data.create_crosslink()``that sets all optional parameters to ``None
for convenience. The returned crosslink data structure is a dictionary with keys as detailed in the return section.- Parameters:
peptide_a (str) – The unmodified amino acid sequence of the first peptide.
xl_position_peptide_a (int) – The position of the crosslinker in the sequence of the first peptide (1-based).
peptide_b (str) – The unmodified amino acid sequence of the second peptide.
xl_position_peptide_b (int) – The position of the crosslinker in the sequence of the second peptide (1-based).
**kwargs – Any additional parameters will be passed to
data.create_crosslink()
.
- Returns:
The dictionary representing the crosslink with keys
data_type
,completeness
,alpha_peptide
,alpha_peptide_crosslink_position
,alpha_proteins
,alpha_proteins_crosslink_positions
,alpha_decoy
,beta_peptide
,beta_peptide_crosslink_position
,beta_proteins
,beta_proteins_crosslink_positions
,beta_decoy
,crosslink_type
,score
, andadditional_information
. Alpha and beta are assigned based on peptide sequence, the peptide that alphabetically comes first is assigned to alpha.- Return type:
dict
Notes
See also
data.create_crosslink()
.Examples
>>> from pyXLMS.data import create_crosslink_min >>> minimal_crosslink = create_crosslink_min("PEPTIDEA", 1, "PEPTIDEB", 5)
- pyXLMS.data.create_csm(
- peptide_a: str,
- modifications_a: Dict[int, Tuple[str, float]] | None,
- xl_position_peptide_a: int,
- proteins_a: List[str] | None,
- xl_position_proteins_a: List[int] | None,
- pep_position_proteins_a: List[int] | None,
- score_a: float | None,
- decoy_a: bool | None,
- peptide_b: str,
- modifications_b: Dict[int, Tuple[str, float]] | None,
- xl_position_peptide_b: int,
- proteins_b: List[str] | None,
- xl_position_proteins_b: List[int] | None,
- pep_position_proteins_b: List[int] | None,
- score_b: float | None,
- decoy_b: bool | None,
- score: float | None,
- spectrum_file: str,
- scan_nr: int,
- charge: int | None,
- rt: float | None,
- im_cv: float | None,
- additional_information: Dict[str, Any] | None = None,
Creates a crosslink-spectrum-match data structure.
Contains minimal data necessary for representing a single crosslink-spectrum-match. The returned crosslink-spectrum-match data structure is a dictionary with keys as detailed in the return section.
- Parameters:
peptide_a (str) – The unmodified amino acid sequence of the first peptide.
modifications_a (dict of [int, tuple], or None) – The modifications of the first peptide given as a dictionary that maps peptide position (1-based) to modification given as a tuple of modification name and modification delta mass.
N-terminal
modifications should be denoted with position0
.C-terminal
modifications should be denoted with positionlen(peptide) + 1
. If the peptide is not modified an empty dictionary should be given.xl_position_peptide_a (int) – The position of the crosslinker in the sequence of the first peptide (1-based).
proteins_a (list of str, or None) – The accessions of proteins that the first peptide is associated with.
xl_position_proteins_a (list of int, or None) – Positions of the crosslink in the proteins of the first peptide (1-based).
pep_position_proteins_a (list of int, or None) – Positions of the first peptide in the corresponding proteins (1-based).
score_a (float, or None) – Identification score of the first peptide.
decoy_a (bool, or None) – Whether the alpha peptide is from the decoy database or not.
peptide_b (str) – The unmodified amino acid sequence of the second peptide.
modifications_b (dict of [int, tuple], or None) – The modifications of the second peptide given as a dictionary that maps peptide position (1-based) to modification given as a tuple of modification name and modification delta mass.
N-terminal
modifications should be denoted with position0
.C-terminal
modifications should be denoted with positionlen(peptide) + 1
. If the peptide is not modified an empty dictionary should be given.xl_position_peptide_b (int) – The position of the crosslinker in the sequence of the second peptide (1-based).
proteins_b (list of str, or None) – The accessions of proteins that the second peptide is associated with.
xl_position_proteins_b (list of int, or None) – Positions of the crosslink in the proteins of the second peptide (1-based).
pep_position_proteins_b (list of int, or None) – Positions of the second peptide in the corresponding proteins (1-based).
score_b (float, or None) – Identification score of the second peptide.
decoy_b (bool, or None) – Whether the beta peptide is from the decoy database or not.
score (float, or None) – Score of the crosslink-spectrum-match.
spectrum_file (str) – Name of the spectrum file the crosslink-spectrum-match was identified in.
scan_nr (int) – The corresponding scan number of the crosslink-spectrum-match.
charge (int, or None) – The precursor charge of the corresponding mass spectrum of the crosslink-spectrum-match.
rt (float, or None) – The retention time of the corresponding mass spectrum of the crosslink-spectrum-match in seconds.
im_cv (float, or None) – The ion mobility or compensation voltage of the corresponding mass spectrum of the crosslink-spectrum-match.
additional_information (dict with str keys, or None, default = None) – A dictionary with additional information associated with the crosslink-spectrum-match.
- Returns:
The dictionary representing the crosslink-spectrum-match with keys
data_type
,completeness
,alpha_peptide
,alpha_modifications
,alpha_peptide_crosslink_position
,alpha_proteins
,alpha_proteins_crosslink_positions
,alpha_proteins_peptide_positions
,alpha_score
,alpha_decoy
,beta_peptide
,beta_modifications
,beta_peptide_crosslink_position
,beta_proteins
,beta_proteins_crosslink_positions
,beta_proteins_peptide_positions
,beta_score
,beta_decoy
,crosslink_type
,score
,spectrum_file
,scan_nr
,retention_time
,ion_mobility
, andadditional_information
. Alpha and beta are assigned based on peptide sequence, the peptide that alphabetically comes first is assigned to alpha.- Return type:
dict
- Raises:
TypeError – If the parameter is not of the given class.
ValueError – If the length of crosslink positions or peptide positions is not equal to the length of proteins.
Notes
The minimum required data for creating a crosslink-spectrum-match is:
peptide_a
: The unmodified amino acid sequence of the first peptide.peptide_b
: The unmodified amino acid sequence of the second peptide.xl_position_peptide_a
: The position of the crosslinker in the sequence of the first peptide (1-based).xl_position_peptide_b
: The position of the crosslinker in the sequence of the second peptide (1-based).spectrum_file
: Name of the spectrum file the crosslink-spectrum-match was identified in.scan_nr
: The corresponding scan number of the crosslink-spectrum-match.
Examples
>>> from pyXLMS.data import create_csm >>> minimal_csm = create_csm("PEPTIDEA", {}, 1, None, None, None, None, None, "PEPTIDEB", {}, 5, None, None, None, None, None, None, "MS_EXP1", 1, None, None, None) >>> csm = create_csm("PEPTIDEA", {1: ("Oxidation", 15.994915)}, 1, ["PROTEINA"], [1], [1], 20.1, False, "PEPTIDEB", {}, 5, ["PROTEINB"], [3], [1], 33.7, False, 20.1, "MS_EXP1", 1, 3, 13.5, -50)
- pyXLMS.data.create_csm_min(
- peptide_a: str,
- xl_position_peptide_a: int,
- peptide_b: str,
- xl_position_peptide_b: int,
- spectrum_file: str,
- scan_nr: int,
- **kwargs,
Creates a crosslink-spectrum-match data structure from minimal input.
Contains minimal data necessary for representing a single crosslink-spectrum-match. This is an alias for
data.create_csm()``that sets all optional parameters to ``None
for convenience. The returned crosslink-spectrum-match data structure is a dictionary with keys as detailed in the return section.- Parameters:
peptide_a (str) – The unmodified amino acid sequence of the first peptide.
xl_position_peptide_a (int) – The position of the crosslinker in the sequence of the first peptide (1-based).
peptide_b (str) – The unmodified amino acid sequence of the second peptide.
xl_position_peptide_b (int) – The position of the crosslinker in the sequence of the second peptide (1-based).
spectrum_file (str) – Name of the spectrum file the crosslink-spectrum-match was identified in.
scan_nr (int) – The corresponding scan number of the crosslink-spectrum-match.
**kwargs – Any additional parameters will be passed to
data.create_csm()
.
- Returns:
The dictionary representing the crosslink-spectrum-match with keys
data_type
,completeness
,alpha_peptide
,alpha_modifications
,alpha_peptide_crosslink_position
,alpha_proteins
,alpha_proteins_crosslink_positions
,alpha_proteins_peptide_positions
,alpha_score
,alpha_decoy
,beta_peptide
,beta_modifications
,beta_peptide_crosslink_position
,beta_proteins
,beta_proteins_crosslink_positions
,beta_proteins_peptide_positions
,beta_score
,beta_decoy
,crosslink_type
,score
,spectrum_file
,scan_nr
,retention_time
,ion_mobility
, andadditional_information
. Alpha and beta are assigned based on peptide sequence, the peptide that alphabetically comes first is assigned to alpha.- Return type:
dict
Notes
See also
data.create_csm()
.Examples
>>> from pyXLMS.data import create_csm_min >>> minimal_csm = create_csm("PEPTIDEA", 1, "PEPTIDEB", 5, "MS_EXP1", 1)
- pyXLMS.data.create_parser_result(
- search_engine: str,
- csms: List[Dict[str, Any]] | None,
- crosslinks: List[Dict[str, Any]] | None,
Creates a parser result data structure.
Contains all necessary data elements that should be contained in a result returned by a crosslink search engine result parser.
- Parameters:
search_engine (str) – Name of the identifying crosslink search engine.
csms (list of dict, or None) – List of crosslink-spectrum-matches as created by
data.create_csm()
.crosslinks (list of dict, or None) – List of crosslinks as created by
data.create_crosslink()
.
- Returns:
The parser result data structure which is a dictionary with keys
data_type
,completeness
,search_engine
,crosslink-spectrum-matches
andcrosslinks
.- Return type:
dict
Examples
>>> from pyXLMS.data import create_parser_result >>> result = create_parser_result("MS Annika", None, None) >>> result["data_type"] 'parser_result' >>> result["completeness"] 'empty' >>> result["search_engine"] 'MS Annika'
pyXLMS.pipelines module#
- pyXLMS.pipelines.pipeline(
- files: str | List[str] | BinaryIO,
- engine: Literal['Custom', 'MaxQuant', 'MaxLynx', 'MeroX', 'MS Annika', 'mzIdentML', 'pLink', 'Scout', 'xiSearch/xiFDR', 'XlinkX'],
- crosslinker: str,
- unique: bool | Dict[str, Any] | None = True,
- validate: bool | Dict[str, Any] | None = True,
- targets_only: bool | None = True,
- **kwargs,
Runs a standard down-stream analysis pipeline for crosslinks and crosslink-spectrum-matches.
Runs a standard down-stream analysis pipeline for crosslinks and crosslink-spectrum-matches. The pipeline first reads a result file and subsequently optionally filters the the read data for unique crosslinks and crosslink-spectrum-matches, optionally the data is validated by false discovery rate estimation and - also optionally - only target-target matches are returned. Internally the pipeline calls
parser.read()
,transform.unique()
,transform.validate()
, andtransform.targets_only()
.- Parameters:
files (str, list of str, or file stream) – The name/path of the result file(s) or a file-like object/stream.
engine ("Custom", "MaxQuant", "MaxLynx", "MeroX", "MS Annika", "mzIdentML", "pLink", "Scout", "xiSearch/xiFDR", or "XlinkX") – Crosslink search engine or format of the result file.
crosslinker (str) – Name of the used cross-linking reagent, for example “DSSO”.
unique (dict of str, any, or bool, or None, default = True) – If
transform.unique()
should be run in the pipeline. If None or False this step is omitted. If True this step is run with default parameters. If a dictionary is given it should contain parameters for runningtransform.unique()
. Omitting a parameter in the dictionary will fall back to its default value.validate (dict of str, any, or bool, or None, default = True) – If
transform.validate()
should be run in the pipeline. If None or False this step is omitted. If True this step is run with default parameters. If a dictionary is given it should contain parameters for runningtransform.validate()
. Omitting a parameter in the dictionary will fall back to its default value.targets_only (bool, or None, default = True) – If
transform.targets_only()
should be run in the pipeline. If None or False this step is omitted.**kwargs – Any additional parameters will be passed to the specific result file parsers.
- Returns:
The transformed parser_result after all pipeline steps are completed.
- Return type:
dict of str, any
- Raises:
TypeError – If any of the parameters do not have the correct type.
Notes
Various helpful pipeline information is also printed to
stdout
.Examples
>>> from pyXLMS.pipelines import pipeline >>> pr = pipeline("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_CSMs.xlsx", ... engine="MS Annika", ... crosslinker="DSS", ... unique=True, ... validate={"fdr": 0.05, "formula":"(TD-DD)/TT"}, ... targets_only=True) Reading MS Annika CSMs...: 100%|██████████████████████████████████████████████████| 826/826 [00:00<00:00, 10337.98it/s] ---- Summary statistics before pipeline ---- Number of CSMs: 826.0 Number of unique CSMs: 826.0 Number of intra CSMs: 803.0 Number of inter CSMs: 23.0 Number of target-target CSMs: 786.0 Number of target-decoy CSMs: 39.0 Number of decoy-decoy CSMs: 1.0 Minimum CSM score: 1.11 Maximum CSM score: 452.99 Iterating over scores for FDR calculation...: 0%| | 0/826 [00:00<?, ?it/s] ---- Summary statistics after pipeline ---- Number of CSMs: 786.0 Number of unique CSMs: 786.0 Number of intra CSMs: 774.0 Number of inter CSMs: 12.0 Number of target-target CSMs: 786.0 Number of target-decoy CSMs: 0.0 Number of decoy-decoy CSMs: 0.0 Minimum CSM score: 1.28 Maximum CSM score: 452.99 ---- Performed pipeline steps ---- :: parser.read() :: :: parser.read() :: params :: <params omitted> :: transform.unique() :: :: transform.unique() :: params :: by=peptide :: transform.unique() :: params :: score=higher_better :: transform.validate() :: :: transform.validate() :: params :: fdr=0.05 :: transform.validate() :: params :: formula=(TD-DD)/TT :: transform.validate() :: params :: score=higher_better :: transform.validate() :: params :: separate_intra_inter=False :: transform.validate() :: params :: ignore_missing_labels=False :: transform.targets_only() :: :: transform.targets_only() :: params :: no params