pyXLMS.data package#

Module contents#

Core data structures and data type validation functions.

Examples

>>> from pyXLMS.data import CrosslinkSpectrumMatch as CSM
>>> csm = CSM(
...     alpha_peptide="PEKP",
...     alpha_peptide_crosslink_position=3,
...     beta_peptide="TKIDE",
...     beta_peptide_crosslink_position=2,
...     spectrum_file="dsso.mzML",
...     scan_nr=1,
... )

>>> from pyXLMS.data import Crosslink
>>> xl = Crosslink(
...     alpha_peptide="PEKP",
...     alpha_peptide_crosslink_position=3,
...     beta_peptide="TKIDE",
...     beta_peptide_crosslink_position=2,
... )

>>> from pyXLMS.data import Crosslink
>>> from pyXLMS.data import ParserResult
>>> xl = Crosslink(
...     alpha_peptide="PEKP",
...     alpha_peptide_crosslink_position=3,
...     beta_peptide="TKIDE",
...     beta_peptide_crosslink_position=2,
... )
>>> pr = ParserResult(search_engine="My Search Engine", crosslinks=[xl])

class pyXLMS.data.Crosslink( *, alpha_peptide: str, alpha_peptide_crosslink_position: int, beta_peptide: str, beta_peptide_crosslink_position: int, alpha_proteins: List[str] | None = None, alpha_proteins_crosslink_positions: List[int] | None = None, alpha_decoy: bool | None = None, beta_proteins: List[str] | None = None, beta_proteins_crosslink_positions: List[int] | None = None, beta_decoy: bool | None = None, score: float | None = None, additional_information: Dict[str, Any] | None = None, )[source]#

Bases: BaseModel

Core data structure representing a single crosslink.

Crosslinks represent two crosslinked peptides. Crosslinks can be unique peptide pairs or unique residue pairs, depending on their grouping.

Attributes Summary#

Here is a short summary about the crosslink attributes, for more details on the specific Pydantic validation requirements please refer to the corresponding attributes themselves.

Required#

The following attributes are required:

alpha_peptidestr: The unmodified amino acid sequence of the first peptide. Amino acids should be in upper case. Modifications should not be included in the sequence.
alpha_peptide_crosslink_positionint: The position of the crosslinker in the sequence of the first peptide (1-based).
beta_peptidestr: The unmodified amino acid sequence of the second peptide. Amino acids should be in upper case. Modifications should not be included in the sequence.
beta_peptide_crosslink_positionint: The position of the crosslinker in the sequence of the second peptide (1-based).

Optional#

The following attributes are optional:

alpha_proteinslist of str, or None, default = None: The accessions of proteins that the first peptide is associated with.
alpha_proteins_crosslink_positionslist of int, or None, default = None: Positions of the crosslink in the proteins of the first peptide (1-based). If given the list should be of the same length as alpha_proteins and crosslink position at list index i should correspond to the protein at list index i in alpha_proteins.
alpha_decoybool, or None, default = None: Whether the first peptide is from the decoy database (True) or not (False).
beta_proteinslist of str, or None, default = None: The accessions of proteins that the second peptide is associated with.
beta_proteins_crosslink_positionslist of int, or None, default = None: Positions of the crosslink in the proteins of the second peptide (1-based). If given the list should be of the same length as beta_proteins and crosslink position at list index i should correspond to the protein at list index i in beta_proteins.
beta_decoybool, or None, default = None: Whether the second peptide is from the decoy database (True) or not (False).
scorefloat, or None, default = None: Score of the crosslink.
additional_informationdict of str, any, or None, default = None: A dictionary with additional information associated with the crosslink.

Notes

Alpha and beta assignment is internally decided by whichever peptide’s sequence is alphabetically first. If the beta_peptide’s sequence comes alphabetically first it will be assigned to alpha_peptide and the original alpha_peptide will be assigned to beta_peptide (and the same happens for all other corresponding alpha and beta values).

Examples

>>> from pyXLMS.data import Crosslink
>>> xl = Crosslink(
...     alpha_peptide="PEKP",
...     alpha_peptide_crosslink_position=3,
...     beta_peptide="TKIDE",
...     beta_peptide_crosslink_position=2,
... )

additional_information: Annotated[Dict[str, Any] | None, Field(frozen=False, description='A dictionary with additional information associated with the crosslink.')]#: A dictionary with additional information associated with the crosslink.

alpha_decoy: Annotated[bool | None, Field(frozen=True, description='Whether the alpha peptide is from the decoy database or not.')]#: Whether the first peptide is from the decoy database (True) or not (False).

alpha_peptide: Annotated[str, Field(frozen=True, description='The unmodified amino acid sequence of the first peptide.')]#: The unmodified amino acid sequence of the first peptide. Amino acids should be in upper case. Modifications should not be included in the sequence.

alpha_peptide_crosslink_position: Annotated[int, Field(frozen=True, description='The position of the crosslinker in the sequence of the first peptide (1-based).')]#: The position of the crosslinker in the sequence of the first peptide (1-based).

alpha_proteins: Annotated[List[str] | None, Field(frozen=True, description='The accessions of proteins that the first peptide is associated with.')]#: The accessions of proteins that the first peptide is associated with.

alpha_proteins_crosslink_positions: Annotated[List[int] | None, Field(frozen=True, description='Positions of the crosslink in the proteins of the first peptide (1-based).')]#: Positions of the crosslink in the proteins of the first peptide (1-based). If given the list should be of the same length as alpha_proteins and crosslink position at list index i should correspond to the protein at list index i in alpha_proteins.

beta_decoy: Annotated[bool | None, Field(frozen=True, description='Whether the beta peptide is from the decoy database or not.')]#: Whether the second peptide is from the decoy database (True) or not (False).

beta_peptide: Annotated[str, Field(frozen=True, description='The unmodified amino acid sequence of the second peptide.')]#: The unmodified amino acid sequence of the second peptide. Amino acids should be in upper case. Modifications should not be included in the sequence.

beta_peptide_crosslink_position: Annotated[int, Field(frozen=True, description='The position of the crosslinker in the sequence of the second peptide (1-based).')]#: The position of the crosslinker in the sequence of the second peptide (1-based).

beta_proteins: Annotated[List[str] | None, Field(frozen=True, description='The accessions of proteins that the second peptide is associated with.')]#: The accessions of proteins that the second peptide is associated with.

beta_proteins_crosslink_positions: Annotated[List[int] | None, Field(frozen=True, description='Positions of the crosslink in the proteins of the second peptide (1-based).')]#: Positions of the crosslink in the proteins of the second peptide (1-based). If given the list should be of the same length as beta_proteins and crosslink position at list index i should correspond to the protein at list index i in beta_proteins.

property completeness: Literal['full', 'partial']#: Completeness of the crosslink, e.g. "full" if all attributes are not None and else "partial".

copy_with_update( update: Dict[str, Any] = {}, ) → Crosslink[source]#

Creates a deep copy of the crosslink with optional attribute updates.

Parameters:: update (dict of str, any, default = empty dict) – Dictionary mapping attribute names (str) to their updated values. The default (empty dict) will create a deep copy with the original attribute values.
Returns:: New crosslink with optionally updated attributes.
Return type:: Crosslink

Examples

>>> from pyXLMS.data import Crosslink
>>> xl = Crosslink(
...     alpha_peptide="PEKP",
...     alpha_peptide_crosslink_position=3,
...     alpha_proteins=["PROT"],
...     beta_peptide="PEKP",
...     beta_peptide_crosslink_position=3,
...     beta_proteins=["PROT"],
... )
>>> xl_copy = xl.copy_with_update(
...     update={"additional_information": {"homomeric": True}}
... )

property crosslink_type: Literal['intra', 'inter']#: Link type of the crosslink, e.g. "intra" if the proteins in alpha_proteins and beta_proteins overlap, otherwise "inter".

property data_type: Literal['crosslink']#: Data type of the object.

display( show_additional_information: bool = False, return_str: bool = False, ) → None | str[source]#

Pretty prints the crosslink.

Parameters:

show_additional_information (bool, default = False) – Also display data in the additional_information.
return_str (bool, default = False) – If the display string should be returned.

Returns:

The display string of the crosslink if return_str = True otherwise None.

Return type:

None, or str

Examples

>>> from pyXLMS import parser
>>> pr = parser.read(
...     "data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1.pdResult",
...     engine="MS Annika",
...     crosslinker="DSS",
... )
>>> xls = pr["crosslinks"]
>>> xls[0].display()
Data Type:                          crosslink
Completeness:                       full
Alpha Peptide:                      GQKNSR
Alpha Peptide Crosslink Position:   3
Alpha Proteins:                     ['Cas9']
Alpha Proteins Crosslink Positions: [779]
Alpha Decoy:                        False
Beta Peptide:                       GQKNSR
Beta Peptide Crosslink Position:    3
Beta Proteins:                      ['Cas9']
Beta Proteins Crosslink Positions:  [779]
Beta Decoy:                         False
Crosslink Type:                     intra
Crosslink Score:                    119.82547820493929

items() → List[Tuple[str, Any]][source]#

Support for dict-like read access for backward compatibility.

Returns:: Returns a list of tuples of attribute name, attribute value.
Return type:: list of tuple of str, any

Notes

This internally just calls self.model_dump(mode="python").items(). See model_dump.

keys() → List[str][source]#

Support for dict-like read access for backward compatibility.

Returns:: Returns a list of attribute names.
Return type:: list of str

Notes

This internally just calls self.model_dump(mode="python").keys(). See model_dump.

model_config = {'str_strip_whitespace': True, 'strict': True, 'validate_assignment': True}#: Pydantic configuration for the underlying validation model.

model_post_init(context: Any = None) → None[source]#

Performs extra validation and post init functions.

Notes

Warning

This method should not be called manually!

score: Annotated[float | None, Field(frozen=True, description='Score of the crosslink.')]#: Score of the crosslink.

to_proforma(crosslinker: str | float | None = None) → str[source]#

Returns the Proforma string for the crosslink.

Parameters:: crosslinker (str, or float, or None, default = None) – Optional name or mass of the crosslink reagent. If the name is given, it should be a valid name from XLMOD.
Returns:: The Proforma string of the crosslink.
Return type:: str

Notes

If no crosslinker is given, the unmodified peptide Proforma will be returned.

Examples

>>> from pyXLMS.data import create_crosslink_min
>>> xl = create_crosslink_min("PEPKTIDE", 4, "KPEPTIDE", 1)
>>> xl.to_proforma()
'KPEPTIDE//PEPKTIDE'

>>> from pyXLMS.data import create_crosslink_min
>>> xl = create_crosslink_min("PEPKTIDE", 4, "KPEPTIDE", 1)
>>> xl.to_proforma(crosslinker="Xlink:DSSO")
'K[Xlink:DSSO]PEPTIDE//PEPK[Xlink:DSSO]TIDE'

values() → List[Any][source]#

Support for dict-like read access for backward compatibility.

Returns:: Returns a list of attribute values.
Return type:: list of any

Notes

This internally just calls self.model_dump(mode="python").values(). See model_dump.

class pyXLMS.data.CrosslinkSpectrumMatch( *, alpha_peptide: str, alpha_peptide_crosslink_position: int, beta_peptide: str, beta_peptide_crosslink_position: int, spectrum_file: str, scan_nr: int, alpha_modifications: Dict[int, Tuple[str, float]] | None = None, alpha_proteins: List[str] | None = None, alpha_proteins_crosslink_positions: List[int] | None = None, alpha_proteins_peptide_positions: List[int] | None = None, alpha_score: float | None = None, alpha_decoy: bool | None = None, beta_modifications: Dict[int, Tuple[str, float]] | None = None, beta_proteins: List[str] | None = None, beta_proteins_crosslink_positions: List[int] | None = None, beta_proteins_peptide_positions: List[int] | None = None, beta_score: float | None = None, beta_decoy: bool | None = None, score: float | None = None, charge: int | None = None, retention_time: float | None = None, ion_mobility: float | None = None, additional_information: Dict[str, Any] | None = None, )[source]#

Bases: BaseModel

Core data structure representing a single crosslink-spectrum-match.

Crosslink-spectrum-matches associate two crosslinked peptides with a specific mass spectrum. They contain spectrum level information additionally to crosslink information.

Attributes Summary#

Here is a short summary about the crosslink-spectrum-match attributes, for more details on the specific Pydantic validation requirements please refer to the corresponding attributes themselves.

Required#

The following attributes are required:

alpha_peptidestr: The unmodified amino acid sequence of the first peptide. Amino acids should be in upper case. Modifications should not be included in the sequence.
alpha_peptide_crosslink_positionint: The position of the crosslinker in the sequence of the first peptide (1-based).
beta_peptidestr: The unmodified amino acid sequence of the second peptide. Amino acids should be in upper case. Modifications should not be included in the sequence.
beta_peptide_crosslink_positionint: The position of the crosslinker in the sequence of the second peptide (1-based).
spectrum_filestr: Name of the spectrum file the crosslink-spectrum-match was identified in.
scan_nrint: The corresponding scan number of the crosslink-spectrum-match. If the scan number is not available the spectrum index should be provided.

Optional#

The following attributes are optional:

alpha_modificationsdict of int, tuple of str, float, or None, default = None: The modifications of the first peptide given as a dictionary that maps peptide position (1-based) to modification given as a tuple of modification name and modification delta mass. N-terminal modifications should be denoted with position 0. C-terminal modifications should be denoted with position len(peptide) + 1. If the peptide is not modified an empty dictionary should be given.
alpha_proteinslist of str, or None, default = None: The accessions of proteins that the first peptide is associated with.
alpha_proteins_crosslink_positionslist of int, or None, default = None: Positions of the crosslink in the proteins of the first peptide (1-based). If given the list should be of the same length as alpha_proteins and crosslink position at list index i should correspond to the protein at list index i in alpha_proteins.
alpha_proteins_peptide_positionslist of int, or None, default = None: Positions of the first peptide in the corresponding proteins (1-based). If given the list should be of the same length as alpha_proteins and peptide position at list index i should correspond to the protein at list index i in alpha_proteins.
alpha_scorefloat, or None, default = None: Identification score of the first peptide.
alpha_decoybool, or None, default = None: Whether the first peptide is from the decoy database (True) or not (False).
beta_modificationsdict of int, tuple of str, float, or None, default = None: The modifications of the second peptide given as a dictionary that maps peptide position (1-based) to modification given as a tuple of modification name and modification delta mass. N-terminal modifications should be denoted with position 0. C-terminal modifications should be denoted with position len(peptide) + 1. If the peptide is not modified an empty dictionary should be given.
beta_proteinslist of str, or None, default = None: The accessions of proteins that the second peptide is associated with.
beta_proteins_crosslink_positionslist of int, or None, default = None: Positions of the crosslink in the proteins of the second peptide (1-based). If given the list should be of the same length as beta_proteins and crosslink position at list index i should correspond to the protein at list index i in beta_proteins.
beta_proteins_peptide_positionslist of int, or None, default = None: Positions of the second peptide in the corresponding proteins (1-based). If given the list should be of the same length as beta_proteins and peptide position at list index i should correspond to the protein at list index i in beta_proteins.
beta_scorefloat, or None, default = None: Identification score of the second peptide.
beta_decoybool, or None, default = None: Whether the second peptide is from the decoy database (True) or not (False).
scorefloat, or None, default = None: Score of the crosslink-spectrum-match.
chargeint, or None, default = None: The precursor charge of the corresponding mass spectrum of the crosslink-spectrum-match.
retention_timefloat, or None, default = None: The retention time of the corresponding mass spectrum of the crosslink-spectrum-match in seconds.
ion_mobilityfloat, or None, default = None: The ion mobility or compensation voltage of the corresponding mass spectrum of the crosslink-spectrum-match.
additional_informationdict of str, any, or None, default = None: A dictionary with additional information associated with the crosslink-spectrum-match.

Notes

Examples

>>> from pyXLMS.data import CrosslinkSpectrumMatch as CSM
>>> csm = CSM(
...     alpha_peptide="PEKP",
...     alpha_peptide_crosslink_position=3,
...     beta_peptide="TKIDE",
...     beta_peptide_crosslink_position=2,
...     spectrum_file="dsso.mzML",
...     scan_nr=1,
... )

additional_information: Annotated[Dict[str, Any] | None, Field(frozen=False, description='A dictionary with additional information associated with the crosslink-spectrum-match.')]#: A dictionary with additional information associated with the crosslink-spectrum-match.

alpha_decoy: Annotated[bool | None, Field(frozen=True, description='Whether the first peptide is from the decoy database or not.')]#: Whether the first peptide is from the decoy database (True) or not (False).

alpha_modifications: Annotated[Dict[int, Tuple[str, float]] | None, Field(frozen=True, description='The modifications of the first peptide.')]#: The modifications of the first peptide given as a dictionary that maps peptide position (1-based) to modification given as a tuple of modification name and modification delta mass. N-terminal modifications should be denoted with position 0. C-terminal modifications should be denoted with position len(peptide) + 1. If the peptide is not modified an empty dictionary should be given.

alpha_peptide: Annotated[str, Field(frozen=True, description='The unmodified amino acid sequence of the first peptide.')]#: The unmodified amino acid sequence of the first peptide. Amino acids should be in upper case. Modifications should not be included in the sequence.

alpha_peptide_crosslink_position: Annotated[int, Field(frozen=True, description='The position of the crosslinker in the sequence of the first peptide (1-based).')]#: The position of the crosslinker in the sequence of the first peptide (1-based).

alpha_proteins: Annotated[List[str] | None, Field(frozen=True, description='The accessions of proteins that the first peptide is associated with.')]#: The accessions of proteins that the first peptide is associated with.

alpha_proteins_crosslink_positions: Annotated[List[int] | None, Field(frozen=True, description='Positions of the crosslink in the proteins of the first peptide (1-based).')]#: Positions of the crosslink in the proteins of the first peptide (1-based). If given the list should be of the same length as alpha_proteins and crosslink position at list index i should correspond to the protein at list index i in alpha_proteins.

alpha_proteins_peptide_positions: Annotated[List[int] | None, Field(frozen=True, description='Positions of the first peptide in the corresponding proteins (1-based).')]#: Positions of the first peptide in the corresponding proteins (1-based). If given the list should be of the same length as alpha_proteins and peptide position at list index i should correspond to the protein at list index i in alpha_proteins.

alpha_score: Annotated[float | None, Field(frozen=True, description='Identification score of the first peptide.')]#: Identification score of the first peptide.

beta_decoy: Annotated[bool | None, Field(frozen=True, description='Whether the beta peptide is from the decoy database or not.')]#: Whether the second peptide is from the decoy database (True) or not (False).

beta_modifications: Annotated[Dict[int, Tuple[str, float]] | None, Field(frozen=True, description='The modifications of the second peptide.')]#: The modifications of the second peptide given as a dictionary that maps peptide position (1-based) to modification given as a tuple of modification name and modification delta mass. N-terminal modifications should be denoted with position 0. C-terminal modifications should be denoted with position len(peptide) + 1. If the peptide is not modified an empty dictionary should be given.

beta_peptide: Annotated[str, Field(frozen=True, description='The unmodified amino acid sequence of the second peptide.')]#: The unmodified amino acid sequence of the second peptide. Amino acids should be in upper case. Modifications should not be included in the sequence.

beta_peptide_crosslink_position: Annotated[int, Field(frozen=True, description='The position of the crosslinker in the sequence of the second peptide (1-based).')]#: The position of the crosslinker in the sequence of the second peptide (1-based).

beta_proteins: Annotated[List[str] | None, Field(frozen=True, description='The accessions of proteins that the second peptide is associated with.')]#: The accessions of proteins that the second peptide is associated with.

beta_proteins_crosslink_positions: Annotated[List[int] | None, Field(frozen=True, description='Positions of the crosslink in the proteins of the second peptide (1-based).')]#: Positions of the crosslink in the proteins of the second peptide (1-based). If given the list should be of the same length as beta_proteins and crosslink position at list index i should correspond to the protein at list index i in beta_proteins.

beta_proteins_peptide_positions: Annotated[List[int] | None, Field(frozen=True, description='Positions of the second peptide in the corresponding proteins (1-based).')]#: Positions of the second peptide in the corresponding proteins (1-based). If given the list should be of the same length as beta_proteins and peptide position at list index i should correspond to the protein at list index i in beta_proteins.

beta_score: Annotated[float | None, Field(frozen=True, description='Identification score of the second peptide.')]#: Identification score of the second peptide.

charge: Annotated[int | None, Field(frozen=True, description='The precursor charge of the corresponding mass spectrum of the crosslink-spectrum-match.')]#: The precursor charge of the corresponding mass spectrum of the crosslink-spectrum-match.

property completeness: Literal['full', 'partial']#: Completeness of the crosslink-spectrum-match, e.g. "full" if all attributes are not None and else "partial".

copy_with_update( update: Dict[str, Any] = {}, ) → CrosslinkSpectrumMatch[source]#

Creates a deep copy of the crosslink-spectrum-match with optional attribute updates.

Parameters:: update (dict of str, any, default = empty dict) – Dictionary mapping attribute names (str) to their updated values. The default (empty dict) will create a deep copy with the original attribute values.
Returns:: New crosslink-spectrum-match with optionally updated attributes.
Return type:: CrosslinkSpectrumMatch

Examples

>>> from pyXLMS.data import CrosslinkSpectrumMatch as CSM
>>> csm = CSM(
...     alpha_peptide="PEKP",
...     alpha_peptide_crosslink_position=3,
...     beta_peptide="TKIDE",
...     beta_peptide_crosslink_position=2,
...     spectrum_file="dsso.mzML",
...     scan_nr=1,
... )
>>> csm_copy = csm.copy_with_update(update={"scan_nr": 2})

property crosslink_type: Literal['intra', 'inter']#: Link type of the crosslink-spectrum-match, e.g. "intra" if the proteins in alpha_proteins and beta_proteins overlap, otherwise "inter".

property data_type: Literal['crosslink-spectrum-match']#: Data type of the object.

display( show_additional_information: bool = False, return_str: bool = False, ) → None | str[source]#

Pretty prints the crosslink-spectrum-match.

Parameters:

show_additional_information (bool, default = False) – Also display data in the additional_information.
return_str (bool, default = False) – If the display string should be returned.

Returns:

The display string of the crosslink-spectrum-match if return_str = True otherwise None.

Return type:

None, or str

Examples

>>> from pyXLMS import parser
>>> pr = parser.read(
...     "data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1.pdResult",
...     engine="MS Annika",
...     crosslinker="DSS",
... )
>>> csms = pr["crosslink-spectrum-matches"]
>>> csms[0].display()
Data Type:                          crosslink-spectrum-match
Completeness:                       full
Alpha Peptide:                      GQKNSR
Alpha Modifications:                {3: ('DSS', 138.06808)}
Alpha Peptide Crosslink Position:   3
Alpha Proteins:                     ['Cas9']
Alpha Proteins Crosslink Positions: [779]
Alpha Proteins Peptide Positions:   [777]
Alpha Peptide Score:                119.82548987540834
Alpha Decoy:                        False
Beta Peptide:                       GQKNSR
Beta Modifications:                 {3: ('DSS', 138.06808)}
Beta Peptide Crosslink Position:    3
Beta Proteins:                      ['Cas9']
Beta Proteins Crosslink Positions:  [779]
Beta Proteins Peptide Positions:    [777]
Beta Peptide Score:                 119.82547820493929
Beta Decoy:                         False
Crosslink Type:                     intra
CSM Score:                          119.82547820493929
Spectrum File:                      XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw
Scan Number:                        2257
Precursor Charge:                   3
Retention Time:                     733.1895599999999
Ion Mobility/FAIMS CV:              0.0

ion_mobility: Annotated[float | None, Field(frozen=True, description='The ion mobility or compensation voltage of the corresponding mass spectrum of the crosslink-spectrum-match.')]#: The ion mobility or compensation voltage of the corresponding mass spectrum of the crosslink-spectrum-match.

items() → List[Tuple[str, Any]][source]#

Support for dict-like read access for backward compatibility.

Returns:: Returns a list of tuples of attribute name, attribute value.
Return type:: list of tuple of str, any

Notes

This internally just calls self.model_dump(mode="python").items(). See model_dump.

keys() → List[str][source]#

Support for dict-like read access for backward compatibility.

Returns:: Returns a list of attribute names.
Return type:: list of str

Notes

This internally just calls self.model_dump(mode="python").keys(). See model_dump.

model_config = {'str_strip_whitespace': True, 'strict': True, 'validate_assignment': True}#: Pydantic configuration for the underlying validation model.

model_post_init(context: Any = None) → None[source]#

Performs extra validation and post init functions.

Notes

Warning

This method should not be called manually!

retention_time: Annotated[float | None, Field(frozen=True, description='The retention time of the corresponding mass spectrum of the crosslink-spectrum-match in seconds.')]#: The retention time of the corresponding mass spectrum of the crosslink-spectrum-match in seconds.

scan_nr: Annotated[int, Field(frozen=True, description='The corresponding scan number of the crosslink-spectrum-match.')]#: The corresponding scan number of the crosslink-spectrum-match. If the scan number is not available the spectrum index should be provided.

score: Annotated[float | None, Field(frozen=True, description='Score of the crosslink-spectrum-match.')]#: Score of the crosslink-spectrum-match.

spectrum_file: Annotated[str, Field(frozen=True, description='Name of the spectrum file the crosslink-spectrum-match was identified in.')]#: Name of the spectrum file the crosslink-spectrum-match was identified in.

to_crosslink() → Crosslink[source]#

Creates a crosslink from the crosslink-spectrum-match.

Returns:: The corresponding crosslink created from the crosslink-spectrum-match.
Return type:: Crosslink

to_proforma(crosslinker: str | float | None = None) → str[source]#

Returns the Proforma string for the crosslink-spectrum-match.

Parameters:: crosslinker (str, or float, or None, default = None) – Optional name or mass of the crosslink reagent. If the name is given, it should be a valid name from XLMOD.
Returns:: The Proforma string of the crosslink-spectrum-match.
Return type:: str

Notes

Modifications with unknown mass are skipped.
If no modifications are given, only the crosslink modification will be encoded in the Proforma.
If no modifications are given and no crosslinker is given, the unmodified peptide Proforma will be returned.

Examples

>>> from pyXLMS.data import create_csm_min
>>> csm = create_csm_min("PEPKTIDE", 4, "KPEPTIDE", 1, "RUN_1", 1)
>>> csm.to_proforma()
'KPEPTIDE//PEPKTIDE'

>>> from pyXLMS.data import create_csm_min
>>> csm = create_csm_min("PEPKTIDE", 4, "KPEPTIDE", 1, "RUN_1", 1)
>>> csm.to_proforma(crosslinker="Xlink:DSSO")
'K[Xlink:DSSO]PEPTIDE//PEPK[Xlink:DSSO]TIDE'

>>> from pyXLMS.data import create_csm_min
>>> csm = create_csm_min(
...     "PEPKTIDE",
...     4,
...     "KPMEPTIDE",
...     1,
...     "RUN_1",
...     1,
...     modifications_b={3: ("Oxidation", 15.994915)},
... )
>>> csm.to_proforma(crosslinker="Xlink:DSSO")
'K[Xlink:DSSO]PM[+15.994915]EPTIDE//PEPK[Xlink:DSSO]TIDE'

>>> from pyXLMS.data import create_csm_min
>>> csm = create_csm_min(
...     "PEPKTIDE",
...     4,
...     "KPMEPTIDE",
...     1,
...     "RUN_1",
...     1,
...     modifications_b={3: ("Oxidation", 15.994915)},
...     charge=3,
... )
>>> csm.to_proforma(crosslinker="Xlink:DSSO")
'K[Xlink:DSSO]PM[+15.994915]EPTIDE//PEPK[Xlink:DSSO]TIDE/3'

>>> from pyXLMS.data import create_csm_min
>>> csm = create_csm_min(
...     "PEPKTIDE",
...     4,
...     "KPMEPTIDE",
...     1,
...     "RUN_1",
...     1,
...     modifications_a={4: ("DSSO", 158.00376)},
...     modifications_b={1: ("DSSO", 158.00376), 3: ("Oxidation", 15.994915)},
...     charge=3,
... )
>>> csm.to_proforma()
'K[+158.00376]PM[+15.994915]EPTIDE//PEPK[+158.00376]TIDE/3'

>>> from pyXLMS.data import create_csm_min
>>> csm = create_csm_min(
...     "PEPKTIDE",
...     4,
...     "KPMEPTIDE",
...     1,
...     "RUN_1",
...     1,
...     modifications_a={4: ("DSSO", 158.00376)},
...     modifications_b={1: ("DSSO", 158.00376), 3: ("Oxidation", 15.994915)},
...     charge=3,
... )
>>> csm.to_proforma(crosslinker="Xlink:DSSO")
'K[+158.00376]PM[+15.994915]EPTIDE//PEPK[+158.00376]TIDE/3'

values() → List[Any][source]#

Support for dict-like read access for backward compatibility.

Returns:: Returns a list of attribute values.
Return type:: list of any

Notes

This internally just calls self.model_dump(mode="python").values(). See model_dump.

class pyXLMS.data.ParserResult( *, search_engine: str, crosslink_spectrum_matches: List[CrosslinkSpectrumMatch] | None = None, crosslinks: List[Crosslink] | None = None, )[source]#

Bases: BaseModel

Core data structure for parser results.

Data structure returned by any (parser) function that reads crosslink-spectrum-matches and/or crosslinks.

Attributes Summary#

Here is a short summary about the parser result attributes, for more details on the specific Pydantic validation requirements please refer to the corresponding attributes themselves.

Required#

The following attributes are required:

search_enginestr: The name of the identifying crosslink search engine.

Optional#

The following attributes are optional:

crosslink_spectrum_matcheslist of CrosslinkSpectrumMatch, or None, default = None: List of parsed crosslink-spectrum-matches.
crosslinkslist of Crosslink, or None, default = None: List of parsed crosslinks.

Examples

>>> from pyXLMS.data import Crosslink
>>> from pyXLMS.data import ParserResult
>>> xl = Crosslink(
...     alpha_peptide="PEKP",
...     alpha_peptide_crosslink_position=3,
...     beta_peptide="TKIDE",
...     beta_peptide_crosslink_position=2,
... )
>>> pr = ParserResult(search_engine="My Search Engine", crosslinks=[xl])

property completeness: Literal['full', 'partial', 'empty']#: Completeness of the parser result, e.g. "full" if all attributes are not None, "empty" if crosslink-spectrum-matches and crosslinks are None, and otherwise "partial".

copy_with_update( update: Dict[str, Any] = {}, ) → ParserResult[source]#

Creates a deep copy of the parser result with optional attribute updates.

Parameters:: update (dict of str, any, default = empty dict) – Dictionary mapping attribute names (str) to their updated values. The default (empty dict) will create a deep copy with the original attribute values.
Returns:: New parser result with optionally updated attributes.
Return type:: ParserResult

Examples

>>> from pyXLMS.data import Crosslink
>>> from pyXLMS.data import ParserResult
>>> pr = ParserResult(search_engine="My Search Engine")
>>> xl = Crosslink(
...     alpha_peptide="PEKP",
...     alpha_peptide_crosslink_position=3,
...     beta_peptide="TKIDE",
...     beta_peptide_crosslink_position=2,
... )
>>> pr_copy = pr.copy_with_update(update={"crosslinks": [xl]})

crosslink_spectrum_matches: Annotated[List[CrosslinkSpectrumMatch] | None, Field(frozen=True, description='List of parsed crosslink-spectrum-matches.')]#: List of parsed crosslink-spectrum-matches.

crosslinks: Annotated[List[Crosslink] | None, Field(frozen=True, description='List of parsed crosslinks.')]#: List of parsed crosslinks.

csms( create_copy: bool = True, ) → List[CrosslinkSpectrumMatch] | None[source]#

Shorthand function to retrieve crosslink-spectrum-matches.

Parameters:: create_copy (bool, default = True) – Whether a deep copy of the crosslink-spectrum-matches should be returned (default) or self.crosslink_spectrum_matches directly.
Returns:: Returns (a deep copy of) self.crosslink_spectrum_matches.
Return type:: list of CrosslinkSpectrumMatch, or None

Notes

Please be aware that by default this explicitly creates a deep copy of the underlying data!

property data_type: Literal['parser_result']#: Data type of the object.

display( show_additional_information: bool = False, return_str: bool = False, ) → None | str[source]#

Pretty prints the parser result.

Parameters:

show_additional_information (bool, default = False) – Also display data in the additional_information.
return_str (bool, default = False) – If the display string should be returned.

Returns:

The display string of the parser result if return_str = True otherwise None.

Return type:

None, or str

Examples

>>> from pyXLMS import parser
>>> pr = parser.read(
...     "data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1.pdResult",
...     engine="MS Annika",
...     crosslinker="DSS",
... )
>>> pr.display()
Data Type:                            parser_result
Completeness:                         full
Identifying Search Engine:            MS Annika
Number of Crosslink-Spectrum-Matches: 826
Number of Crosslinks:                 300

items() → List[Tuple[str, Any]][source]#

Support for dict-like read access for backward compatibility.

Returns:: Returns a list of tuples of attribute name, attribute value.
Return type:: list of tuple of str, any

Notes

This internally just calls self.model_dump(mode="python").items(). See model_dump.

keys() → List[str][source]#

Support for dict-like read access for backward compatibility.

Returns:: Returns a list of attribute names.
Return type:: list of str

Notes

This internally just calls self.model_dump(mode="python").keys(). See model_dump.

model_config = {'str_strip_whitespace': True, 'strict': True, 'validate_assignment': True}#: Pydantic configuration for the underlying validation model.

search_engine: Annotated[str, Field(frozen=True, description='The name of the identifying crosslink search engine.')]#: The name of the identifying crosslink search engine.

values() → List[Any][source]#

Support for dict-like read access for backward compatibility.

Returns:: Returns a list of attribute values.
Return type:: list of any

Notes

This internally just calls self.model_dump(mode="python").values(). See model_dump.

xls( create_copy: bool = True, ) → List[Crosslink] | None[source]#

Shorthand function to retrieve crosslinks.

Parameters:: create_copy (bool, default = True) – Whether a deep copy of the crosslinks should be returned (default) or self.crosslinks directly.
Returns:: Returns (a deep copy of) self.crosslinks.
Return type:: list of Crosslink, or None

Notes

Please be aware that by default this explicitly creates a deep copy of the underlying data!

pyXLMS.data.check_indexing(value: int | List[int]) → bool[source]#

Checks that the given value is not 0-based.

Parameters:: value (int, or list of int) – The value(s) to check.
Returns:: If the given value(s) is/are okay.
Return type:: bool
Raises:: ValueError – If any of the values are smaller than one.

Examples

>>> from pyXLMS.data import check_indexing
>>> check_indexing([1, 2, 3])
True

pyXLMS.data.check_input( parameter: Any, parameter_name: str, supported_class: Any, supported_subclass: Any | None = None, ) → bool[source]#

Checks if the given parameter is of the specified type.

Function that checks if a given parameter is of the specified type and if iterable, all elements are of the specified element type. This is mostly an input check function to catch any errors arising from not supported inputs early.

Parameters:

parameter (any) – Parameter to check class of.
parameter_name (str) – Name of the parameter.
supported_class (any) – Class the parameter has to be of.
supported_subclass (any, or None, default = None) – Class of the values in case the parameter is a list or dict.

Returns:

If the given input is okay.

Return type:

bool

Raises:

TypeError – If the parameter is not of the given class.

Examples

>>> from pyXLMS.data import check_input
>>> check_input("PEPTIDE", "peptide_a", str)
True

>>> from pyXLMS.data import check_input
>>> check_input([1, 2], "xl_position_proteins_a", list, int)
True

pyXLMS.data.check_input_multi( parameter: Any, parameter_name: str, supported_classes: List[Any], supported_subclass: Any | None = None, ) → bool[source]#

Checks if the given parameter is of one of the specified types.

Function that checks if a given parameter is of one of the specified types and if iterable, all elements are of the specified element type. This is mostly an input check function to catch any errors arising from not supported inputs early.

Parameters:

parameter (any) – Parameter to check class of.
parameter_name (str) – Name of the parameter.
supported_classes (list of any) – Classes the parameter has to be of.
supported_subclass (any, or None, default = None) – Class of the values in case the parameter is a list or dict.

Returns:

If the given input is okay.

Return type:

bool

Raises:

TypeError – If the parameter is not of one of the given classes.

Examples

>>> from pyXLMS.data import check_input_multi
>>> check_input_multi("PEPTIDE", "peptide_a", [str, list])
True

pyXLMS.data.create_crosslink( peptide_a: str, xl_position_peptide_a: int, proteins_a: List[str] | None, xl_position_proteins_a: List[int] | None, decoy_a: bool | None, peptide_b: str, xl_position_peptide_b: int, proteins_b: List[str] | None, xl_position_proteins_b: List[int] | None, decoy_b: bool | None, score: float | None, additional_information: Dict[str, Any] | None = None, ) → Crosslink[source]#

Creates a crosslink data structure.

Contains minimal data necessary for representing a single crosslink. The returned crosslink data structure is a dictionary with keys as detailed in the return section.

Parameters:

peptide_a (str) – The unmodified amino acid sequence of the first peptide.
xl_position_peptide_a (int) – The position of the crosslinker in the sequence of the first peptide (1-based).
proteins_a (list of str, or None) – The accessions of proteins that the first peptide is associated with.
xl_position_proteins_a (list of int, or None) – Positions of the crosslink in the proteins of the first peptide (1-based).
decoy_a (bool, or None) – Whether the alpha peptide is from the decoy database or not.
peptide_b (str) – The unmodified amino acid sequence of the second peptide.
xl_position_peptide_b (int) – The position of the crosslinker in the sequence of the second peptide (1-based).
proteins_b (list of str, or None) – The accessions of proteins that the second peptide is associated with.
xl_position_proteins_b (list of int, or None) – Positions of the crosslink in the proteins of the second peptide (1-based).
decoy_b (bool, or None) – Whether the beta peptide is from the decoy database or not.
score (float, or None) – Score of the crosslink.
additional_information (dict with str keys, or None, default = None) – A dictionary with additional information associated with the crosslink.

Returns:

The dictionary representing the crosslink with keys data_type, completeness, alpha_peptide, alpha_peptide_crosslink_position, alpha_proteins, alpha_proteins_crosslink_positions, alpha_decoy, beta_peptide, beta_peptide_crosslink_position, beta_proteins, beta_proteins_crosslink_positions, beta_decoy, crosslink_type, score, and additional_information. Alpha and beta are assigned based on peptide sequence, the peptide that alphabetically comes first is assigned to alpha.

Return type:

dict

Raises:

TypeError – If the parameter is not of the given class.
ValueError – If the length of crosslink positions is not equal to the length of proteins.

Notes

The minimum required data for creating a crosslink is:

peptide_a: The unmodified amino acid sequence of the first peptide.
peptide_b: The unmodified amino acid sequence of the second peptide.
xl_position_peptide_a: The position of the crosslinker in the sequence of the first peptide (1-based).
xl_position_peptide_b: The position of the crosslinker in the sequence of the second peptide (1-based).

Examples

>>> from pyXLMS.data import create_crosslink
>>> minimal_crosslink = create_crosslink(
...     peptide_a="PEPTIDEA",
...     xl_position_peptide_a=1,
...     proteins_a=None,
...     xl_position_proteins_a=None,
...     decoy_a=None,
...     peptide_b="PEPTIDEB",
...     xl_position_peptide_b=5,
...     proteins_b=None,
...     xl_position_proteins_b=None,
...     decoy_b=None,
...     score=None,
... )

>>> from pyXLMS.data import create_crosslink
>>> crosslink = create_crosslink(
...     peptide_a="PEPTIDEA",
...     xl_position_peptide_a=1,
...     proteins_a=["PROTEINA"],
...     xl_position_proteins_a=[1],
...     decoy_a=False,
...     peptide_b="PEPTIDEB",
...     xl_position_peptide_b=5,
...     proteins_b=["PROTEINB"],
...     xl_position_proteins_b=[3],
...     decoy_b=False,
...     score=34.5,
... )

pyXLMS.data.create_crosslink_from_csm( csm: CrosslinkSpectrumMatch, ) → Crosslink[source]#

Creates a crosslink data structure from a crosslink-spectrum-match.

Creates a crosslink data structure from a crosslink-spectrum-match. The returned crosslink data structure is a dictionary with keys as detailed in the return section.

Parameters:: csm (dict of str) – The crosslink-spectrum-match item to be converted to a crosslink item.
Returns:: The dictionary representing the crosslink with keys data_type, completeness, alpha_peptide, alpha_peptide_crosslink_position, alpha_proteins, alpha_proteins_crosslink_positions, alpha_decoy, beta_peptide, beta_peptide_crosslink_position, beta_proteins, beta_proteins_crosslink_positions, beta_decoy, crosslink_type, score, and additional_information. Alpha and beta are assigned based on peptide sequence, the peptide that alphabetically comes first is assigned to alpha.
Return type:: dict
Raises:: TypeError – If parameter csm is not a valid crosslink-spectrum-match.

Notes