pyXLMS.parser package#
Submodules#
pyXLMS.parser.parser_xldbse_custom module#
- pyXLMS.parser.parser_xldbse_custom.pyxlms_modification_str_parser(
- modifications: str,
Parse a pyXLMS modification string.
Parses a pyXLMS modification string and returns the pyXLMS specific modification object, a dictionary that maps positions to their modififications.
- Parameters:
modifications (str) – The pyXLMS modification string.
- Returns:
The pyXLMS specific modification object, a dictionary that maps positions (1-based) to their respective modifications given as tuples of modification name and modification delta mass.
- Return type:
dict of int, tuple
- Raises:
RuntimeError – If multiple modifications on the same residue are parsed.
Examples
>>> from pyXLMS.parser import pyxlms_modification_str_parser >>> modification_str = "(1:[DSS|138.06808])" >>> pyxlms_modification_str_parser(modification_str) {1: ('DSS', 138.06808)}
>>> from pyXLMS.parser import pyxlms_modification_str_parser >>> modification_str = "(1:[DSS|138.06808]);(7:[Oxidation|15.994915])" >>> pyxlms_modification_str_parser(modification_str) {1: ('DSS', 138.06808), 7: ('Oxidation', 15.994915)}
- pyXLMS.parser.parser_xldbse_custom.read_custom(
- files: str | List[str] | BinaryIO,
- column_mapping: Dict[str, str] | None = None,
- parse_modifications: bool = True,
- modification_parser: Callable[[str], Dict[int, Tuple[str, float]]] | None = None,
- decoy_prefix: str = 'REV_',
- format: Literal['auto', 'csv', 'txt', 'tsv', 'xlsx'] = 'auto',
- sep: str = ',',
- decimal: str = '.',
Read a custom or pyXLMS result file.
Reads a custom or pyXLMS crosslink-spectrum-matches result file or crosslink result file in
.csv
or.xlsx
format, and returns aparser_result
.The minimum required columns for a crosslink-spectrum-matches result file are:
“Alpha Peptide”: The unmodified amino acid sequence of the first peptide.
“Alpha Peptide Crosslink Position”: The position of the crosslinker in the sequence of the first peptide (1-based).
“Beta Peptide”: The unmodified amino acid sequence of the second peptide.
“Beta Peptide Crosslink Position”: The position of the crosslinker in the sequence of the second peptide (1-based).
“Spectrum File”: Name of the spectrum file the crosslink-spectrum-match was identified in.
“Scan Nr”: The corresponding scan number of the crosslink-spectrum-match.
The minimum required columns for crosslink result file are:
“Alpha Peptide”: The unmodified amino acid sequence of the first peptide.
“Alpha Peptide Crosslink Position”: The position of the crosslinker in the sequence of the first peptide (1-based).
“Beta Peptide”: The unmodified amino acid sequence of the second peptide.
“Beta Peptide Crosslink Position”: The position of the crosslinker in the sequence of the second peptide (1-based).
A full specification of columns that can be parsed can be found in the docs.
- Parameters:
files (str, list of str, or file stream) – The name/path of the result file(s) or a file-like object/stream.
column_mapping (dict of str, str) – A dictionary that maps the result file columns to the required pyXLMS column names.
parse_modifications (bool, default = True) – Whether or not post-translational-modifications should be parsed for crosslink-spectrum-matches. Requires correct specification of the ‘modification_parser’ parameter.
modification_parser (callable, or None) – A function that parses modification strings and returns the pyXLMS specific modifications object. If None, the function
pyxlms_modification_str_parser()
is used. If no modification columns are given this parameter is ignored.decoy_prefix (str, default = "REV_") – The prefix that indicates that a protein is from the decoy database.
format ("auto", "csv", "tsv", "txt", or "xlsx", default = "auto") – The format of the result file.
"auto"
is only available if the name/path to the result file is given.sep (str, default = ",") – Seperator used in the
.csv
or.tsv
file. Parameter is ignored if the file is in.xlsx
format.decimal (str, default = ".") – Character to recognize as decimal point. Parameter is ignored if the file is in
.xlsx
format.
- Returns:
The
parser_result
object containing all parsed information.- Return type:
dict
- Raises:
ValueError – If the input format is not supported or cannot be inferred.
TypeError – If one of the values could not be parsed.
RuntimeError – If the file(s) could not be read or if the file(s) contain no crosslinks or crosslink-spectrum-matches.
Examples
>>> from pyXLMS.parser import read_custom >>> csms_from_pyxlms = read_custom("data/pyxlms/csm.txt")
>>> from pyXLMS.parser import read_custom >>> crosslinks_from_pyxlms = read_custom("data/pyxlms/xl.txt")
pyXLMS.parser.parser_xldbse_maxquant module#
- pyXLMS.parser.parser_xldbse_maxquant.parse_modifications_from_maxquant_sequence(
- seq: str,
- crosslink_position: int,
- crosslinker: str,
- crosslinker_mass: float,
- modifications: Dict[str, float] = {'ADH': 138.09054635, 'Acetyl': 42.010565, 'BS3': 138.06808, 'Carbamidomethyl': 57.021464, 'DSBSO': 308.03883, 'DSBU': 196.08479231, 'DSS': 138.06808, 'DSSO': 158.00376, 'Oxidation': 15.994915, 'PhoX': 209.97181, 'Phospho': 79.966331},
Parse post-translational-modifications from a MaxQuant peptide sequence.
Parses post-translational-modifications (PTMs) from a MaxQuant peptide sequence, for example “_VVDELVKVM(Oxidation (M))GR_”.
- Parameters:
seq (str) – The MaxQuant sequence string.
crosslink_position (int) – Position of the crosslinker in the sequence (1-based).
crosslinker (str) – Name of the used cross-linking reagent, for example “DSSO”.
crosslinker_mass (float) – Monoisotopic delta mass of the crosslink modification.
modifications (dict of str, float, default =
constants.MODIFICATIONS
) – Mapping of modification names to modification masses.
- Returns:
The
pyXLMS
specific modifications object, a dictionary that maps positions to their corresponding modifications and their monoisotopic masses.- Return type:
dict of int, tuple
- Raises:
RuntimeError – If the sequence could not be parsed because it is not in MaxQuant format.
RuntimeError – If multiple modifications on the same residue are parsed.
KeyError – If an unknown modification is encountered.
Examples
>>> from pyXLMS.parser import parse_modifications_from_maxquant_sequence >>> seq = "_VVDELVKVM(Oxidation (M))GR_" >>> parse_modifications_from_maxquant_sequence(seq, 2, "DSS", 138.06808) {2: ('DSS', 138.06808), 9: ('Oxidation', 15.994915)}
>>> from pyXLMS.parser import parse_modifications_from_maxquant_sequence >>> seq = "_VVDELVKVM(Oxidation (M))GRM(Oxidation (M))_" >>> parse_modifications_from_maxquant_sequence(seq, 2, "DSS", 138.06808) {2: ('DSS', 138.06808), 9: ('Oxidation', 15.994915), 12: ('Oxidation', 15.994915)}
>>> from pyXLMS.parser import parse_modifications_from_maxquant_sequence >>> seq = "_M(Oxidation (M))VVDELVKVM(Oxidation (M))GRM(Oxidation (M))_" >>> parse_modifications_from_maxquant_sequence(seq, 2, "DSS", 138.06808) {2: ('DSS', 138.06808), 1: ('Oxidation', 15.994915), 10: ('Oxidation', 15.994915), 13: ('Oxidation', 15.994915)}
- pyXLMS.parser.parser_xldbse_maxquant.read_maxlynx(
- files: str | List[str] | BinaryIO,
- crosslinker: str,
- crosslinker_mass: float | None = None,
- decoy_prefix: str = 'REV__',
- parse_modifications: bool = True,
- modifications: Dict[str, float] = {'ADH': 138.09054635, 'Acetyl': 42.010565, 'BS3': 138.06808, 'Carbamidomethyl': 57.021464, 'DSBSO': 308.03883, 'DSBU': 196.08479231, 'DSS': 138.06808, 'DSSO': 158.00376, 'Oxidation': 15.994915, 'PhoX': 209.97181, 'Phospho': 79.966331},
- sep: str = '\t',
- decimal: str = '.',
Read a MaxLynx result file.
Reads a MaxLynx crosslink-spectrum-matches result file “crosslinkMsms.txt” in
.txt
(tab delimited) format and returns aparser_result
. This is an alias for the MaxQuant reader.- Parameters:
files (str, list of str, or file stream) – The name/path of the MaxLynx result file(s) or a file-like object/stream.
crosslinker (str) – Name of the used cross-linking reagent, for example “DSSO”.
crosslinker_mass (float, or None, default = None) – Monoisotopic delta mass of the crosslink modification. If the crosslinker is defined in parameter “modifications” this can be omitted.
decoy_prefix (str, default = "REV__") – The prefix that indicates that a protein is from the decoy database.
parse_modifications (bool, default = True) – Whether or not post-translational-modifications should be parsed for crosslink-spectrum-matches. Requires correct specification of the ‘modifications’ parameter.
modifications (dict of str, float, default =
constants.MODIFICATIONS
) – Mapping of modification names to modification masses.sep (str, default = "t") – Seperator used in the
.txt
file.decimal (str, default = ".") – Character to recognize as decimal point.
- Returns:
The
parser_result
object containing all parsed information.- Return type:
dict
- Raises:
RuntimeError – If the file(s) could not be read or if the file(s) contain no crosslink-spectrum-matches.
KeyError – If the specified crosslinker could not be found/mapped.
Warning
MaxLynx/MaxQuant only reports a single protein crosslink position per peptide, for ambiguous peptides only the crosslink position of the first matching protein is reported. All matching proteins can be retrieved via
additional_information
, however not their corresponding crosslink positions. For this reason it is recommended to usetransform.reannotate_positions()
to correctly annotate all crosslink positions for all peptides if that is important for downstream analysis.Examples
>>> from pyXLMS.parser import read_maxlynx >>> csms_from_xlsx = read_maxlynx("data/maxquant/run1/crosslinkMsms.txt")
- pyXLMS.parser.parser_xldbse_maxquant.read_maxquant(
- files: str | List[str] | BinaryIO,
- crosslinker: str,
- crosslinker_mass: float | None = None,
- decoy_prefix: str = 'REV__',
- parse_modifications: bool = True,
- modifications: Dict[str, float] = {'ADH': 138.09054635, 'Acetyl': 42.010565, 'BS3': 138.06808, 'Carbamidomethyl': 57.021464, 'DSBSO': 308.03883, 'DSBU': 196.08479231, 'DSS': 138.06808, 'DSSO': 158.00376, 'Oxidation': 15.994915, 'PhoX': 209.97181, 'Phospho': 79.966331},
- sep: str = '\t',
- decimal: str = '.',
Read a MaxQuant result file.
Reads a MaxQuant crosslink-spectrum-matches result file “crosslinkMsms.txt” in
.txt
(tab delimited) format and returns aparser_result
.- Parameters:
files (str, list of str, or file stream) – The name/path of the MaxQuant result file(s) or a file-like object/stream.
crosslinker (str) – Name of the used cross-linking reagent, for example “DSSO”.
crosslinker_mass (float, or None, default = None) – Monoisotopic delta mass of the crosslink modification. If the crosslinker is defined in parameter “modifications” this can be omitted.
decoy_prefix (str, default = "REV__") – The prefix that indicates that a protein is from the decoy database.
parse_modifications (bool, default = True) – Whether or not post-translational-modifications should be parsed for crosslink-spectrum-matches. Requires correct specification of the ‘modifications’ parameter.
modifications (dict of str, float, default =
constants.MODIFICATIONS
) – Mapping of modification names to modification masses.sep (str, default = "t") – Seperator used in the
.txt
file.decimal (str, default = ".") – Character to recognize as decimal point.
- Returns:
The
parser_result
object containing all parsed information.- Return type:
dict
- Raises:
RuntimeError – If the file(s) could not be read or if the file(s) contain no crosslink-spectrum-matches.
KeyError – If the specified crosslinker could not be found/mapped.
Warning
MaxLynx/MaxQuant only reports a single protein crosslink position per peptide, for ambiguous peptides only the crosslink position of the first matching protein is reported. All matching proteins can be retrieved via
additional_information
, however not their corresponding crosslink positions. For this reason it is recommended to usetransform.reannotate_positions()
to correctly annotate all crosslink positions for all peptides if that is important for downstream analysis.Examples
>>> from pyXLMS.parser import read_maxquant >>> csms = read_maxquant("data/maxquant/run1/crosslinkMsms.txt")
pyXLMS.parser.parser_xldbse_merox module#
- pyXLMS.parser.parser_xldbse_merox.read_merox(
- files: str | List[str] | BinaryIO,
- crosslinker: str,
- crosslinker_mass: float | None = None,
- decoy_prefix: str = 'REV__',
- parse_modifications: bool = True,
- modifications: Dict[str, Dict[str, Any]] = {'B': {'Amino Acid': 'C', 'Modification': ('Carbamidomethyl', 57.021464)}, 'm': {'Amino Acid': 'M', 'Modification': ('Oxidation', 15.994915)}},
- sep: str = ';',
- decimal: str = '.',
Read a MeroX result file.
Reads a MeroX crosslink-spectrum-matches result file in
.csv
or.zhrm
format and returns aparser_result
.- Parameters:
files (str, list of str, or file stream) – The name/path of the MeroX result file(s) or a file-like object/stream.
crosslinker (str) – Name of the used cross-linking reagent, for example “DSSO”.
crosslinker_mass (float, or None, default = None) – Monoisotopic delta mass of the crosslink modification. If the crosslinker is defined in
constants.MODIFICATIONS
this can be omitted.decoy_prefix (str, default = "REV__") – The prefix that indicates that a protein is from the decoy database.
parse_modifications (bool, default = True) – Whether or not post-translational-modifications should be parsed for crosslink-spectrum-matches. Requires correct specification of the ‘modifications’ parameter.
modifications (dict of str, dict of str, any, default =
constants.MEROX_MODIFICATION_MAPPING
) – Mapping of modification symbols to their amino acids and modifications. Please refer toconstants.MEROX_MODIFICATION_MAPPING
for examples.sep (str, default = ";") – Seperator used in the
.csv
or.zhrm
file.decimal (str, default = ".") – Character to recognize as decimal point.
- Returns:
The
parser_result
object containing all parsed information.- Return type:
dict
- Raises:
RuntimeError – If the file(s) could not be read or if the file(s) contain no crosslink-spectrum-matches.
KeyError – If the specified crosslinker could not be found/mapped.
Warning
MeroX only reports a single protein crosslink position per peptide, for ambiguous peptides only the crosslink position of the first matching protein is reported. All matching proteins can be retrieved via
additional_information
, however not their corresponding crosslink positions. For this reason it is recommended to usetransform.reannotate_positions()
to correctly annotate all crosslink positions for all peptides if that is important for downstream analysis. Additionally, please note that target and decoy information is derived based off the protein accession and parameterdecoy_prefix
. By default, MeroX only reports target matches that are above the desired FDR.Examples
>>> from pyXLMS.parser import read_merox >>> csms_from_csv = read_merox("data/merox/XLpeplib_Beveridge_QEx-HFX_DSS_R1.csv", crosslinker="DSS")
>>> from pyXLMS.parser import read_merox >>> csms_from_zhrm = read_merox("data/merox/XLpeplib_Beveridge_QEx-HFX_DSS_R1.zhrm", crosslinker="DSS")
pyXLMS.parser.parser_xldbse_msannika module#
- pyXLMS.parser.parser_xldbse_msannika.read_msannika(
- files: str | List[str] | BinaryIO,
- parse_modifications: bool = True,
- modifications: Dict[str, float] = {'ADH': 138.09054635, 'Acetyl': 42.010565, 'BS3': 138.06808, 'Carbamidomethyl': 57.021464, 'DSBSO': 308.03883, 'DSBU': 196.08479231, 'DSS': 138.06808, 'DSSO': 158.00376, 'Oxidation': 15.994915, 'PhoX': 209.97181, 'Phospho': 79.966331},
- format: Literal['auto', 'csv', 'txt', 'tsv', 'xlsx', 'pdresult'] = 'auto',
- sep: str = '\t',
- decimal: str = '.',
- unsafe: bool = False,
- verbose: Literal[0, 1, 2] = 1,
Read an MS Annika result file.
Reads an MS Annika crosslink-spectrum-matches result file or crosslink result file in
.csv
or.xlsx
format, or both from a.pdResult
file from Proteome Discover, and returns aparser_result
.- Parameters:
files (str, list of str, or file stream) – The name/path of the MS Annika result file(s) or a file-like object/stream.
parse_modifications (bool, default = True) – Whether or not post-translational-modifications should be parsed for crosslink-spectrum-matches. Requires correct specification of the ‘modifications’ parameter.
modifications (dict of str, float, default =
constants.MODIFICATIONS
) – Mapping of modification names to modification masses.format ("auto", "csv", "tsv", "txt", "xlsx", or "pdresult", default = "auto") – The format of the result file.
"auto"
is only available if the name/path to the MS Annika result file is given.sep (str, default = "t") – Seperator used in the
.csv
or.tsv
file. Parameter is ignored if the file is in.xlsx
or.pdResult
format.decimal (str, default = ".") – Character to recognize as decimal point. Parameter is ignored if the file is in
.xlsx
or.pdResult
format.unsafe (bool, default = False) – If True, allows reading of negative peptide and crosslink positions but replaces their values with None. Negative values occur when peptides can’t be matched to proteins because of ‘X’ in protein sequences. Reannotation might be possible with
transform.reannotate_positions()
.verbose (0, 1, or 2, default = 1) –
0: All warnings are ignored.
1: Warnings are printed to stdout.
2: Warnings are treated as errors.
- Returns:
The
parser_result
object containing all parsed information.- Return type:
dict
- Raises:
ValueError – If the input format is not supported or cannot be inferred.
TypeError – If the pdResult file is provided in the wrong format.
TypeError – If parameter verbose was not set correctly.
RuntimeError – If one of the crosslinks or crosslink-spectrum-matches contains unknown crosslink or peptide positions. This occurs when peptides can’t be matched to proteins because of ‘X’ in protein sequences. Selecting ‘unsafe = True’ will ignore these errors and return None type positions. Reannotation might be possible with
transform.reannotate_positions()
.RuntimeError – If the file(s) could not be read or if the file(s) contain no crosslinks or crosslink-spectrum-matches.
KeyError – If one of the found post-translational-modifications could not be found/mapped.
Warning
MS Annika does not report if the individual peptides in a crosslink are from the target or decoy database. The parser assumes that both peptides from a target crosslink are from the target database, and vice versa, that both peptides are from the decoy database if it is a decoy crosslink. This leads to only TT and DD matches, which needs to be considered for FDR estimation. This also only applies to crosslinks and not crosslink-spectrum-matches, where this information is correctly reported and parsed.
Examples
>>> from pyXLMS.parser import read_msannika >>> csms_from_xlsx = read_msannika("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_CSMs.xlsx")
>>> from pyXLMS.parser import read_msannika >>> crosslinks_from_xlsx = read_msannika("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_Crosslinks.xlsx")
>>> from pyXLMS.parser import read_msannika >>> csms_from_tsv = read_msannika("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_CSMs.txt")
>>> from pyXLMS.parser import read_msannika >>> crosslinks_from_tsv = read_msannika("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_Crosslinks.txt")
>>> from pyXLMS.parser import read_msannika >>> csms_and_crosslinks_from_pdresult = read_msannika("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1.pdResult")
pyXLMS.parser.parser_xldbse_mzid module#
- pyXLMS.parser.parser_xldbse_mzid.parse_scan_nr_from_mzid(spectrum_id: str) int [source]#
Parse the scan number from a ‘spectrumID’ of a mzIdentML file.
- Parameters:
title (str) – The ‘spectrumID’ of the mass spectrum from an mzIdentML file read with
pyteomics
.- Returns:
The scan number.
- Return type:
int
Examples
>>> from pyXLMS.parser import parse_scan_nr_from_mzid >>> parse_scan_nr_from_mzid("scan=5321") 5321
- pyXLMS.parser.parser_xldbse_mzid.read_mzid(
- files: str | List[str] | BinaryIO,
- scan_nr_parser: Callable[[str], int] | None = None,
- decoy: bool | None = None,
- crosslinkers: Dict[str, float] = {'ADH': 138.09054635, 'BS3': 138.06808, 'DSBSO': 308.03883, 'DSBU': 196.08479231, 'DSS': 138.06808, 'DSSO': 158.00376, 'PhoX': 209.97181},
- verbose: Literal[0, 1, 2] = 1,
Read a mzIdentML (mzid) file.
Reads crosslink-spectrum-matches from a mzIdentML (mzid) file and returns a
parser_result
.- Parameters:
files (str, list of str, or file stream) – The name/path of the mzIdentML (mzid) file(s) or a file-like object/stream.
scan_nr_parser (callable, or None, default = None) – A function that parses the scan number from mzid spectrumIDs. If None (default) the function
parse_scan_nr_from_mzid()
is used.decoy (bool, or None, default = None) – Whether the mzid file contains decoy CSMs (
True
) or target CSMs (False
).crosslinkers (dict of str, float, default =
constants.CROSSLINKERS
) – Mapping of crosslinker names to crosslinker delta masses.verbose (0, 1, or 2, default = 1) –
0: All warnings are ignored.
1: Warnings are printed to stdout.
2: Warnings are treated as errors.
- Returns:
The
parser_result
object containing all parsed information.- Return type:
dict
- Raises:
RuntimeError – If the file(s) could not be read or if the file(s) contain no crosslink-spectrum-matches.
RuntimeError – If parser is used with
verbose = 2
.RuntimeError – If there are warnings while reading the mzIdentML file (only for
verbose = 2
).TypeError – If parameter verbose was not set correctly.
TypeError – If one of the values necessary to create a crosslink-spectrum-match could not be parsed correctly.
Notes
This parser is experimental, as I don’t know if the mzIdentML structure is consistent accross different crosslink search engines. This parser was tested with mzIdentML files from MS Annika and XlinkX.
Warning
This parser only parses minimal data because most information is not available from the mzIdentML file. The available data is:
alpha_peptide
alpha_peptide_crosslink_position
beta_peptide
beta_peptide_crosslink_position
spectrum_file
scan_nr
Examples
>>> from pyXLMS.parser import read_mzid >>> csms = read_mzid("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1.mzid")
pyXLMS.parser.parser_xldbse_plink module#
- pyXLMS.parser.parser_xldbse_plink.detect_plink_filetype(
- file: str | BinaryIO,
- sep: str = ',',
- decimal: str = '.',
Detects the pLink-related file type of the data.
Detects whether the input data is a pLink “*cross-linked_peptides.csv” file or a pLink “*cross-linked_spectra.csv” file.
- Parameters:
file (str, or BinaryIO) – The name/path of the pLink result file or a file-like object/stream.
sep (str, default = ",") – Seperator used in the
.csv
file.decimal (str, default = ".") – Character to recognize as decimal point.
- Returns:
Returns “crosslinks” if
file
is a “*cross-linked_peptides.csv” or “crosslink-spectrum-matches” iffile
is a “*cross-linked_spectra.csv”.- Return type:
str
- Raises:
RuntimeError – If the file could not be parsed.
RuntimeError – If the file does not contain any data.
ValueError – If the file does not match any of the supported pLink input files.
Examples
>>> from pyXLMS.parser import detect_plink_filetype >>> detect_plink_filetype("data/plink2/Cas9_plus10_2024.06.20.filtered_cross-linked_peptides.csv") 'crosslinks'
>>> from pyXLMS.parser import detect_plink_filetype >>> detect_plink_filetype("data/plink2/Cas9_plus10_2024.06.20.filtered_cross-linked_spectra.csv") 'crosslink-spectrum-matches'
- pyXLMS.parser.parser_xldbse_plink.parse_scan_nr_from_plink(title: str) int [source]#
Parse the scan number from a spectrum title.
- Parameters:
title (str) – The spectrum title.
- Returns:
The scan number.
- Return type:
int
Examples
>>> from pyXLMS.parser import parse_scan_nr_from_plink >>> parse_scan_nr_from_plink("XLpeplib_Beveridge_QEx-HFX_DSS_R1.20588.20588.3.0.dta") 20588
- pyXLMS.parser.parser_xldbse_plink.parse_spectrum_file_from_plink(title: str) str [source]#
Parse the spectrum file name from a spectrum title.
- Parameters:
title (str) – The spectrum title.
- Returns:
The spectrum file name.
- Return type:
str
Examples
>>> from pyXLMS.parser import parse_spectrum_file_from_plink >>> parse_spectrum_file_from_plink("XLpeplib_Beveridge_QEx-HFX_DSS_R1.20588.20588.3.0.dta") 'XLpeplib_Beveridge_QEx-HFX_DSS_R1'
- pyXLMS.parser.parser_xldbse_plink.read_plink(
- files: str | List[str] | BinaryIO,
- spectrum_file_parser: Callable[[str], str] | None = None,
- scan_nr_parser: Callable[[str], int] | None = None,
- decoy_prefix: str = 'REV_',
- parse_modifications: bool = True,
- modifications: Dict[str, float] = {'ADH': 138.09054635, 'Acetyl': 42.010565, 'BS3': 138.06808, 'Carbamidomethyl': 57.021464, 'DSBSO': 308.03883, 'DSBU': 196.08479231, 'DSS': 138.06808, 'DSSO': 158.00376, 'Oxidation': 15.994915, 'PhoX': 209.97181, 'Phospho': 79.966331},
- sep: str = ',',
- decimal: str = '.',
- verbose: Literal[0, 1, 2] = 1,
Read a pLink result file.
Reads a pLink crosslink-spectrum-matches result file “*cross-linked_spectra.csv” in
.csv
(comma delimited) format or pLink crosslinks result file “*cross-linked_peptides.csv” in.csv
(comma delimited) format and returns aparser_result
.- Parameters:
files (str, list of str, or file stream) – The name/path of the pLink result file(s) or a file-like object/stream.
spectrum_file_parser (callable, or None, default = None) – A function that parses the spectrum file name from spectrum titles. If None (default) the function
parse_spectrum_file_from_plink()
is used.scan_nr_parser (callable, or None, default = None) – A function that parses the scan number from spectrum titles. If None (default) the function
parse_scan_nr_from_plink()
is used.decoy_prefix (str, default = "REV_") – The prefix that indicates that a protein is from the decoy database.
parse_modifications (bool, default = True) – Whether or not post-translational-modifications should be parsed for crosslink-spectrum-matches. Requires correct specification of the ‘modifications’ parameter.
modifications (dict of str, float, default =
constants.MODIFICATIONS
) – Mapping of modification names to modification masses.sep (str, default = ",") – Seperator used in the
.csv
file.decimal (str, default = ".") – Character to recognize as decimal point.
verbose (0, 1, or 2, default = 1) –
0: All warnings are ignored.
1: Warnings are printed to stdout.
2: Warnings are treated as errors.
- Returns:
The
parser_result
object containing all parsed information.- Return type:
dict
- Raises:
RuntimeError – If the file(s) could not be read or if the file(s) contain no crosslink-spectrum-matches.
TypeError – If parameter verbose was not set correctly.
Warning
Target and decoy information is derived based off the protein accession and parameter
decoy_prefix
. By default, pLink only reports target matches that are above the desired FDR.Examples
>>> from pyXLMS.parser import read_plink >>> csms = read_plink("data/plink2/Cas9_plus10_2024.06.20.filtered_cross-linked_spectra.csv")
>>> from pyXLMS.parser import read_plink >>> crosslinks = read_plink("data/plink2/Cas9_plus10_2024.06.20.filtered_cross-linked_peptides.csv")
pyXLMS.parser.parser_xldbse_scout module#
- pyXLMS.parser.parser_xldbse_scout.detect_scout_filetype(
- data: DataFrame,
Detects the Scout-related source of the data.
Detects whether the input data is unfiltered crosslink-spectrum-matches, filtered crosslink-spectrum-matches, or crosslinks from Scout.
- Parameters:
data (pd.DataFrame) – The input data originating from Scout.
- Returns:
“scout_csms_unfiltered” if a Scout unfiltered CSMs file was read, “scout_csms_filtered” if a Scout filtered CSMs file was read, “scout_xl” if a Scout crosslink/residue pair result file was read.
- Return type:
str
- Raises:
ValueError – If the data source could not be determined.
Examples
>>> from pyXLMS.parser import detect_scout_filetype >>> import pandas as pd >>> df1 = pd.read_csv("data/scout/Cas9_Unfiltered_CSMs.csv") >>> detect_scout_filetype(df1) 'scout_csms_unfiltered'
>>> from pyXLMS.parser import detect_scout_filetype >>> import pandas as pd >>> df2 = pd.read_csv("data/scout/Cas9_Filtered_CSMs.csv") >>> detect_scout_filetype(df2) 'scout_csms_filtered'
>>> from pyXLMS.parser import detect_scout_filetype >>> import pandas as pd >>> df3 = pd.read_csv("data/scout/Cas9_Residue_Pairs.csv") >>> detect_scout_filetype(df3) 'scout_xl'
- pyXLMS.parser.parser_xldbse_scout.parse_modifications_from_scout_sequence(
- seq: str,
- crosslink_position: int,
- crosslinker: str,
- crosslinker_mass: float,
- modifications: Dict[str, Tuple[str, float]] = {'+15.994900': ('Oxidation', 15.994915), '+57.021460': ('Carbamidomethyl', 57.021464), 'ADH': ('ADH', 138.09054635), 'BS3': ('BS3', 138.06808), 'Carbamidomethyl': ('Carbamidomethyl', 57.021464), 'DSBSO': ('DSBSO', 308.03883), 'DSBU': ('DSBU', 196.08479231), 'DSS': ('DSS', 138.06808), 'DSSO': ('DSSO', 158.00376), 'Oxidation of Methionine': ('Oxidation', 15.994915), 'PhoX': ('PhoX', 209.97181)},
- verbose: Literal[0, 1, 2] = 1,
Parse post-translational-modifications from a Scout peptide sequence.
Parses post-translational-modifications (PTMs) from a Scout peptide sequence, for example “M(+15.994900)LASAGELQKGNELALPSK”.
- Parameters:
seq (str) – The Scout sequence string.
crosslink_position (int) – Position of the crosslinker in the sequence (1-based).
crosslinker (str) – Name of the used cross-linking reagent, for example “DSSO”.
crosslinker_mass (float) – Monoisotopic delta mass of the crosslink modification.
modifications (dict of str, float, default =
constants.SCOUT_MODIFICATION_MAPPING
) – Mapping of modification names to modification masses.verbose (0, 1, or 2, default = 1) –
0: All warnings are ignored.
1: Warnings are printed to stdout.
2: Warnings are treated as errors.
- Returns:
The
pyXLMS
specific modifications object, a dictionary that maps positions to their corresponding modifications and their monoisotopic masses.- Return type:
dict of int, tuple
- Raises:
RuntimeError – If multiple modifications on the same residue are parsed (only if
verbose = 2
).KeyError – If an unknown modification is encountered.
Examples
>>> from pyXLMS.parser import parse_modifications_from_scout_sequence >>> seq = "M(+15.994900)LASAGELQKGNELALPSK" >>> parse_modifications_from_scout_sequence(seq, 10, "DSS", 138.06808) {10: ('DSS', 138.06808), 1: ('Oxidation', 15.994915)}
>>> from pyXLMS.parser import parse_modifications_from_scout_sequence >>> seq = "KIEC(+57.021460)FDSVEISGVEDR" >>> parse_modifications_from_scout_sequence(seq, 1, "DSS", 138.06808) {1: ('DSS', 138.06808), 4: ('Carbamidomethyl', 57.021464)}
- pyXLMS.parser.parser_xldbse_scout.read_scout(
- files: str | List[str] | BinaryIO,
- crosslinker: str,
- crosslinker_mass: float | None = None,
- parse_modifications: bool = True,
- modifications: Dict[str, Tuple[str, float]] = {'+15.994900': ('Oxidation', 15.994915), '+57.021460': ('Carbamidomethyl', 57.021464), 'ADH': ('ADH', 138.09054635), 'BS3': ('BS3', 138.06808), 'Carbamidomethyl': ('Carbamidomethyl', 57.021464), 'DSBSO': ('DSBSO', 308.03883), 'DSBU': ('DSBU', 196.08479231), 'DSS': ('DSS', 138.06808), 'DSSO': ('DSSO', 158.00376), 'Oxidation of Methionine': ('Oxidation', 15.994915), 'PhoX': ('PhoX', 209.97181)},
- sep: str = ',',
- decimal: str = '.',
- verbose: Literal[0, 1, 2] = 1,
Read a Scout result file.
Reads a Scout filtered or unfiltered crosslink-spectrum-matches result file or crosslink/residue pair result file in
.csv
format and returns aparser_result
.- Parameters:
files (str, list of str, or file stream) – The name/path of the Scout result file(s) or a file-like object/stream.
crosslinker (str) – Name of the used cross-linking reagent, for example “DSSO”.
crosslinker_mass (float, or None, default = None) – Monoisotopic delta mass of the crosslink modification. If the crosslinker is defined in parameter “modifications” this can be omitted.
parse_modifications (bool, default = True) – Whether or not post-translational-modifications should be parsed for crosslink-spectrum-matches. Requires correct specification of the ‘modifications’ parameter.
modifications (dict of str, tuple, default =
constants.SCOUT_MODIFICATION_MAPPING
) – Mapping of Scout sequence elements (e.g."+15.994900"
) and modifications (e.g"Oxidation of Methionine"
) to their modifications (e.g.("Oxidation", 15.994915)
).sep (str, default = ",") – Seperator used in the
.csv
file.decimal (str, default = ".") – Character to recognize as decimal point.
verbose (0, 1, or 2, default = 1) –
0: All warnings are ignored.
1: Warnings are printed to stdout.
2: Warnings are treated as errors.
- Returns:
The
parser_result
object containing all parsed information.- Return type:
dict
- Raises:
RuntimeError – If the file(s) could not be read or if the file(s) contain no crosslinks or crosslink-spectrum-matches.
KeyError – If the specified crosslinker could not be found/mapped.
TypeError – If parameter verbose was not set correctly.
Warning
When reading unfiltered crosslink-spectrum-matches, no protein crosslink positions or protein peptide positions are available, as these are not reported. If needed they should be annotated with
transform.reannotate_positions()
.When reading filtered crosslink-spectrum-matches, Scout does not report if the individual peptides in a crosslink are from the target or decoy database. The parser assumes that both peptides from a target crosslink-spectrum-match are from the target database, and vice versa, that both peptides are from the decoy database if it is a decoy crosslink-spectrum-match. This leads to only TT and DD matches, which needs to be considered for FDR estimation.
When reading crosslinks / residue pairs, Scout does not report if the individual peptides in a crosslink are from the target or decoy database. The parser assumes that both peptides from a target crosslink are from the target database, and vice versa, that both peptides are from the decoy database if it is a decoy crosslink. This leads to only TT and DD matches, which needs to be considered for FDR estimation.
Examples
>>> from pyXLMS.parser import read_scout >>> csms_unfiltered = read_scout("data/scout/Cas9_Unfiltered_CSMs.csv")
>>> from pyXLMS.parser import read_scout >>> csms_filtered = read_scout("data/scout/Cas9_Filtered_CSMs.csv")
>>> from pyXLMS.parser import read_scout >>> crosslinks = read_scout("data/scout/Cas9_Residue_Pairs.csv")
pyXLMS.parser.parser_xldbse_xi module#
- pyXLMS.parser.parser_xldbse_xi.detect_xi_filetype(
- data: DataFrame,
Detects the xi-related source (application) of the data.
Detects whether the input data is originating from xiSearch or xiFDR, and if xiFDR which type of data is being read (crosslink-spectrum-matches or crosslinks).
- Parameters:
data (pd.DataFrame) – The input data originating from xiSearch or xiFDR.
- Returns:
“xisearch” if a xiSearch result file was read, “xifdr_csms” if CSMs from xiFDR were read, “xifdr_crosslinks” if crosslinks from xiFDR were read.
- Return type:
str
- Raises:
ValueError – If the data source could not be determined.
Examples
>>> from pyXLMS.parser import detect_xi_filetype >>> import pandas as pd >>> df1 = pd.read_csv("data/xi/r1_Xi1.7.6.7.csv") >>> detect_xi_filetype(df1) 'xisearch'
>>> from pyXLMS.parser import detect_xi_filetype >>> import pandas as pd >>> df2 = pd.read_csv("data/xi/1perc_xl_boost_CSM_xiFDR2.2.1.csv") >>> detect_xi_filetype(df2) 'xifdr_csms'
>>> from pyXLMS.parser import detect_xi_filetype >>> import pandas as pd >>> df3 = pd.read_csv("data/xi/1perc_xl_boost_Links_xiFDR2.2.1.csv") >>> detect_xi_filetype(df3) 'xifdr_crosslinks'
- pyXLMS.parser.parser_xldbse_xi.parse_modifications_from_xi_sequence(sequence: str) Dict[int, str] [source]#
Parses all post-translational-modifications from a peptide sequence as reported by xiFDR.
Parses all post-translational-modifications from a peptide sequence as reported by xiFDR. This assumes that amino acids are given in upper case letters and post-translational-modifications in lower case letters. The parsed modifications are returned as a dictionary that maps their position in the sequence (1-based) to their xiFDR annotation (
SYMBOLEXT
), for example"cm"
or"ox"
.- Parameters:
sequence (str) – The peptide sequence as given by xiFDR.
- Returns:
Dictionary that maps modifications (values) to their respective positions in the peptide sequence (1-based) (keys). The modifications are given in xiFDR annotation style (
SYMBOLEXT
) which is the lower letter modification code, for example"cm"
for carbamidomethylation.- Return type:
dict of int, str
- Raises:
RuntimeError – If multiple modifications on the same residue are parsed.
Examples
>>> from pyXLMS.parser import parse_modifications_from_xi_sequence >>> seq1 = "KIECcmFDSVEISGVEDR" >>> parse_modifications_from_xi_sequence(seq1) {4: 'cm'}
>>> from pyXLMS.parser import parse_modifications_from_xi_sequence >>> seq2 = "KIECcmFDSVEMoxISGVEDR" >>> parse_modifications_from_xi_sequence(seq2) {4: 'cm', 10: 'ox'}
>>> from pyXLMS.parser import parse_modifications_from_xi_sequence >>> seq3 = "KIECcmFDSVEISGVEDRMox" >>> parse_modifications_from_xi_sequence(seq3) {4: 'cm', 17: 'ox'}
>>> from pyXLMS.parser import parse_modifications_from_xi_sequence >>> seq4 = "CcmKIECcmFDSVEISGVEDRMox" >>> parse_modifications_from_xi_sequence(seq4) {1: 'cm', 5: 'cm', 18: 'ox'}
- pyXLMS.parser.parser_xldbse_xi.parse_peptide(sequence: str, term_char: str = '.') str [source]#
Parses the peptide sequence from a sequence string including flanking amino acids.
Parses the peptide sequence from a sequence string including flanking amino acids, for example
"K.KKMoxKLS.S"
. The returned peptide sequence for this example would be"KKMoxKLS"
.- Parameters:
sequence (str) – The sequence string containing the peptide sequence and flanking amino acids.
term_char (str (single character), default = ".") – The character used to denote N-terminal and C-terminal.
- Returns:
The parsed peptide sequence without flanking amino acids.
- Return type:
str
- Raises:
RuntimeError – If (one of) the peptide sequence(s) could not be parsed.
Examples
>>> from pyXLMS.parser import parse_peptide >>> parse_peptide("K.KKMoxKLS.S") 'KKMoxKLS'
>>> from pyXLMS.parser import parse_peptide >>> parse_peptide("-.CcmCcmPSR.T") 'CcmCcmPSR'
>>> from pyXLMS.parser import parse_peptide >>> parse_peptide("CCPSR") 'CCPSR'
- pyXLMS.parser.parser_xldbse_xi.read_xi(
- files: str | List[str] | BinaryIO,
- decoy_prefix: str | None = 'auto',
- parse_modifications: bool = True,
- modifications: Dict[str, Tuple[str, float]] = {'->': ('Substitution', nan), 'bs3_ami': ('BS3 Amidated', 155.094619105), 'bs3_hyd': ('BS3 Hydrolized', 156.0786347), 'bs3_tris': ('BS3 Tris', 259.141973), 'bs3loop': ('BS3 Looplink', 138.06808), 'bs3nh2': ('BS3 Amidated', 155.094619105), 'bs3oh': ('BS3 Hydrolized', 156.0786347), 'cm': ('Carbamidomethyl', 57.021464), 'dsbu_ami': ('DSBU Amidated', 213.111341), 'dsbu_hyd': ('DSBU Hydrolized', 214.095357), 'dsbu_loop': ('DSBU Looplink', 196.08479231), 'dsbu_tris': ('DSBU Tris', 317.158685), 'dsbuloop': ('DSBU Looplink', 196.08479231), 'dsso_ami': ('DSSO Amidated', 175.030313905), 'dsso_hyd': ('DSSO Hydrolized', 176.0143295), 'dsso_loop': ('DSSO Looplink', 158.00376), 'dsso_tris': ('DSSO Tris', 279.077658), 'dssoloop': ('DSSO Looplink', 158.00376), 'ox': ('Oxidation', 15.994915)},
- sep: str = ',',
- decimal: str = '.',
- ignore_errors: bool = False,
- verbose: Literal[0, 1, 2] = 1,
Read a xiSearch/xiFDR result file.
Reads a xiSearch crosslink-spectrum-matches result file or a xiFDR crosslink-spectrum-matches result file or crosslink result file in
.csv
format and returns aparser_result
.- Parameters:
files (str, list of str, or file stream) – The name/path of the xiSearch/xiFDR result file(s) or a file-like object/stream.
decoy_prefix (str, or None, default = "auto") – The prefix that indicates that a protein is from the decoy database. If “auto” or None it will use the default for each xi file type.
parse_modifications (bool, default = True) – Whether or not post-translational-modifications should be parsed for crosslink-spectrum-matches. Requires correct specification of the ‘modifications’ parameter.
modifications (dict of str, tuple, default =
constants.XI_MODIFICATION_MAPPING
) – Mapping of xi sequence elements (e.g."cm"
) to their modifications (e.g.("Carbamidomethyl", 57.021464)
). This corresponds to theSYMBOLEXT
field, or theSYMBOL
field minus the amino acid in the xiSearch config.sep (str, default = ",") – Seperator used in the
.csv
file.decimal (str, default = ".") – Character to recognize as decimal point.
ignore_errors (bool, default = False) – If modifications that are not given in parameter ‘modifications’ should raise an error or not. By default an error is raised if an unknown modification is encountered. If
True
modifications that are unknown are encoded with the xi shortcode (SYMBOLEXT
) andfloat("nan")
modification mass.verbose (0, 1, or 2, default = 1) –
0: All warnings are ignored.
1: Warnings are printed to stdout.
2: Warnings are treated as errors.
- Returns:
The
parser_result
object containing all parsed information.- Return type:
dict
- Raises:
RuntimeError – If the file(s) contain no crosslinks or crosslink-spectrum-matches.
TypeError – If parameter verbose was not set correctly.
Examples
>>> from pyXLMS.parser import read_xi >>> csms_from_xiSearch = read_xi("data/xi/r1_Xi1.7.6.7.csv")
>>> from pyXLMS.parser import read_xi >>> csms_from_xiFDR = read_xi("data/xi/1perc_xl_boost_CSM_xiFDR2.2.1.csv")
>>> from pyXLMS.parser import read_xi >>> crosslinks_from_xiFDR = read_xi("data/xi/1perc_xl_boost_Links_xiFDR2.2.1.csv")
pyXLMS.parser.parser_xldbse_xlinkx module#
- pyXLMS.parser.parser_xldbse_xlinkx.read_xlinkx(
- files: str | List[str] | BinaryIO,
- decoy: bool | None = None,
- parse_modifications: bool = True,
- modifications: Dict[str, float] = {'ADH': 138.09054635, 'Acetyl': 42.010565, 'BS3': 138.06808, 'Carbamidomethyl': 57.021464, 'DSBSO': 308.03883, 'DSBU': 196.08479231, 'DSS': 138.06808, 'DSSO': 158.00376, 'Oxidation': 15.994915, 'PhoX': 209.97181, 'Phospho': 79.966331},
- format: Literal['auto', 'csv', 'txt', 'tsv', 'xlsx', 'pdresult'] = 'auto',
- sep: str = '\t',
- decimal: str = '.',
- ignore_errors: bool = False,
- verbose: Literal[0, 1, 2] = 1,
Read an XlinkX result file.
Reads an XlinkX crosslink-spectrum-matches result file or crosslink result file in
.csv
or.xlsx
format, or both from a.pdResult
file from Proteome Discover, and returns aparser_result
.- Parameters:
files (str, list of str, or file stream) – The name/path of the XlinkX result file(s) or a file-like object/stream.
decoy (bool, or None) – Default decoy value to use if no decoy value is found. Only used if the “Is Decoy” column is not found in the supplied data.
parse_modifications (bool, default = True) – Whether or not post-translational-modifications should be parsed for crosslink-spectrum-matches. Requires correct specification of the ‘modifications’ parameter.
modifications (dict of str, float, default =
constants.MODIFICATIONS
) – Mapping of modification names to modification masses.format ("auto", "csv", "tsv", "txt", "xlsx", or "pdresult", default = "auto") – The format of the result file.
"auto"
is only available if the name/path to the XlinkX result file is given.sep (str, default = "t") – Seperator used in the
.csv
or.tsv
file. Parameter is ignored if the file is in.xlsx
or.pdResult
format.decimal (str, default = ".") – Character to recognize as decimal point. Parameter is ignored if the file is in
.xlsx
or.pdResult
format.ignore_errors (bool, default = False) – If missing crosslink positions should raise an error or not. Setting this to True will suppress the
RuntimeError
for the crosslink position not being able to be parsed for at least one of the crosslinks. For these cases the crosslink position will be set to 100 000.verbose (0, 1, or 2, default = 1) –
0: All warnings are ignored.
1: Warnings are printed to stdout.
2: Warnings are treated as errors.
- Returns:
The
parser_result
object containing all parsed information.- Return type:
dict
- Raises:
ValueError – If the input format is not supported or cannot be inferred.
TypeError – If parameter verbose was not set correctly.
TypeError – If the pdResult file is provided in the wrong format.
RuntimeError – If the crosslink position could not be parsed for at least one of the crosslinks.
RuntimeError – If the file(s) could not be read or if the file(s) contain no crosslinks or crosslink-spectrum-matches.
KeyError – If one of the found post-translational-modifications could not be found/mapped.
Warning
XlinkX does not report if the individual peptides in a crosslink are from the target or decoy database. The parser assumes that both peptides from a target crosslink are from the target database, and vice versa, that both peptides are from the decoy database if it is a decoy crosslink. This leads to only TT and DD matches, which needs to be considered for FDR estimation. This applies to both crosslinks and crosslink-spectrum-matches.
Examples
>>> from pyXLMS.parser import read_xlinkx >>> csms_from_xlsx = read_xlinkx("data/xlinkx/XLpeplib_Beveridge_Lumos_DSSO_MS3_CSMs.xlsx")
>>> from pyXLMS.parser import read_xlinkx >>> crosslinks_from_xlsx = read_xlinkx("data/xlinkx/XLpeplib_Beveridge_Lumos_DSSO_MS3_Crosslinks.xlsx")
>>> from pyXLMS.parser import read_xlinkx >>> csms_from_tsv = read_xlinkx("data/xlinkx/XLpeplib_Beveridge_Lumos_DSSO_MS3_CSMs.txt")
>>> from pyXLMS.parser import read_xlinkx >>> crosslinks_from_tsv = read_xlinkx("data/xlinkx/XLpeplib_Beveridge_Lumos_DSSO_MS3_Crosslinks.txt")
>>> from pyXLMS.parser import read_xlinkx >>> csms_and_crosslinks_from_pdresult = read_xlinkx("data/xlinkx/XLpeplib_Beveridge_Lumos_DSSO_MS3.pdResult")
pyXLMS.parser.util module#
- pyXLMS.parser.util.format_sequence(
- sequence: str,
- remove_non_aa: bool = True,
- remove_lower: bool = True,
Formats the given amino acid sequence into common represenation.
The given amino acid sequence is re-formatted by converting all amino acids to upper case and optionally removing non-encoding and lower case characters.
- Parameters:
sequence (str) – The amino acid sequence that should be formatted. Post-translational-modifications can be included in lower case but will be removed.
remove_non_aa (bool, default = True) – Whether or not to remove characters that do not encode amino acids.
remove_lower (bool, default = True) – Whether or not to remove lower case characters, this should be true if the amino acid sequence encodes post-translational-modifications in lower case.
- Returns:
The formatted sequence.
- Return type:
str
Examples
>>> from pyXLMS.parser_util import format_sequence >>> format_sequence("PEP[K]TIDE") 'PEPKTIDE'
>>> from pyXLMS.parser_util import format_sequence >>> format_sequence("PEPKdssoTIDE") 'PEPKTIDE'
>>> from pyXLMS.parser_util import format_sequence >>> format_sequence("peptide", remove_lower = False) 'PEPTIDE'
- pyXLMS.parser.util.get_bool_from_value(value: Any) bool [source]#
Parse a bool value from the given input.
Tries to parse a boolean value from the given input object. If the object is of instance
bool
it will return the object, if it is of instanceint
it will returnTrue
if the object is1
orFalse
if the object is0
, any other number will raise aValueError
. If the object is of instancestr
it will returnTrue
if the lower case version contains the lettert
and otherwiseFalse
. If the object is none of these types aValueError
will be raised.- Parameters:
value (Any) – The value to parse from.
- Returns:
The parsed boolean value.
- Return type:
bool
- Raises:
ValueError – If the object could not be parsed to bool.
Examples
>>> from pyXLMS.parser_util import get_bool_from_value >>> get_bool_from_value(0) False
>>> from pyXLMS.parser_util import get_bool_from_value >>> get_bool_from_value("T") True
Module contents#
- pyXLMS.parser.detect_plink_filetype(
- file: str | BinaryIO,
- sep: str = ',',
- decimal: str = '.',
Detects the pLink-related file type of the data.
Detects whether the input data is a pLink “*cross-linked_peptides.csv” file or a pLink “*cross-linked_spectra.csv” file.
- Parameters:
file (str, or BinaryIO) – The name/path of the pLink result file or a file-like object/stream.
sep (str, default = ",") – Seperator used in the
.csv
file.decimal (str, default = ".") – Character to recognize as decimal point.
- Returns:
Returns “crosslinks” if
file
is a “*cross-linked_peptides.csv” or “crosslink-spectrum-matches” iffile
is a “*cross-linked_spectra.csv”.- Return type:
str
- Raises:
RuntimeError – If the file could not be parsed.
RuntimeError – If the file does not contain any data.
ValueError – If the file does not match any of the supported pLink input files.
Examples
>>> from pyXLMS.parser import detect_plink_filetype >>> detect_plink_filetype("data/plink2/Cas9_plus10_2024.06.20.filtered_cross-linked_peptides.csv") 'crosslinks'
>>> from pyXLMS.parser import detect_plink_filetype >>> detect_plink_filetype("data/plink2/Cas9_plus10_2024.06.20.filtered_cross-linked_spectra.csv") 'crosslink-spectrum-matches'
- pyXLMS.parser.detect_scout_filetype(
- data: DataFrame,
Detects the Scout-related source of the data.
Detects whether the input data is unfiltered crosslink-spectrum-matches, filtered crosslink-spectrum-matches, or crosslinks from Scout.
- Parameters:
data (pd.DataFrame) – The input data originating from Scout.
- Returns:
“scout_csms_unfiltered” if a Scout unfiltered CSMs file was read, “scout_csms_filtered” if a Scout filtered CSMs file was read, “scout_xl” if a Scout crosslink/residue pair result file was read.
- Return type:
str
- Raises:
ValueError – If the data source could not be determined.
Examples
>>> from pyXLMS.parser import detect_scout_filetype >>> import pandas as pd >>> df1 = pd.read_csv("data/scout/Cas9_Unfiltered_CSMs.csv") >>> detect_scout_filetype(df1) 'scout_csms_unfiltered'
>>> from pyXLMS.parser import detect_scout_filetype >>> import pandas as pd >>> df2 = pd.read_csv("data/scout/Cas9_Filtered_CSMs.csv") >>> detect_scout_filetype(df2) 'scout_csms_filtered'
>>> from pyXLMS.parser import detect_scout_filetype >>> import pandas as pd >>> df3 = pd.read_csv("data/scout/Cas9_Residue_Pairs.csv") >>> detect_scout_filetype(df3) 'scout_xl'
- pyXLMS.parser.detect_xi_filetype(
- data: DataFrame,
Detects the xi-related source (application) of the data.
Detects whether the input data is originating from xiSearch or xiFDR, and if xiFDR which type of data is being read (crosslink-spectrum-matches or crosslinks).
- Parameters:
data (pd.DataFrame) – The input data originating from xiSearch or xiFDR.
- Returns:
“xisearch” if a xiSearch result file was read, “xifdr_csms” if CSMs from xiFDR were read, “xifdr_crosslinks” if crosslinks from xiFDR were read.
- Return type:
str
- Raises:
ValueError – If the data source could not be determined.
Examples
>>> from pyXLMS.parser import detect_xi_filetype >>> import pandas as pd >>> df1 = pd.read_csv("data/xi/r1_Xi1.7.6.7.csv") >>> detect_xi_filetype(df1) 'xisearch'
>>> from pyXLMS.parser import detect_xi_filetype >>> import pandas as pd >>> df2 = pd.read_csv("data/xi/1perc_xl_boost_CSM_xiFDR2.2.1.csv") >>> detect_xi_filetype(df2) 'xifdr_csms'
>>> from pyXLMS.parser import detect_xi_filetype >>> import pandas as pd >>> df3 = pd.read_csv("data/xi/1perc_xl_boost_Links_xiFDR2.2.1.csv") >>> detect_xi_filetype(df3) 'xifdr_crosslinks'
- pyXLMS.parser.parse_modifications_from_maxquant_sequence(
- seq: str,
- crosslink_position: int,
- crosslinker: str,
- crosslinker_mass: float,
- modifications: Dict[str, float] = {'ADH': 138.09054635, 'Acetyl': 42.010565, 'BS3': 138.06808, 'Carbamidomethyl': 57.021464, 'DSBSO': 308.03883, 'DSBU': 196.08479231, 'DSS': 138.06808, 'DSSO': 158.00376, 'Oxidation': 15.994915, 'PhoX': 209.97181, 'Phospho': 79.966331},
Parse post-translational-modifications from a MaxQuant peptide sequence.
Parses post-translational-modifications (PTMs) from a MaxQuant peptide sequence, for example “_VVDELVKVM(Oxidation (M))GR_”.
- Parameters:
seq (str) – The MaxQuant sequence string.
crosslink_position (int) – Position of the crosslinker in the sequence (1-based).
crosslinker (str) – Name of the used cross-linking reagent, for example “DSSO”.
crosslinker_mass (float) – Monoisotopic delta mass of the crosslink modification.
modifications (dict of str, float, default =
constants.MODIFICATIONS
) – Mapping of modification names to modification masses.
- Returns:
The
pyXLMS
specific modifications object, a dictionary that maps positions to their corresponding modifications and their monoisotopic masses.- Return type:
dict of int, tuple
- Raises:
RuntimeError – If the sequence could not be parsed because it is not in MaxQuant format.
RuntimeError – If multiple modifications on the same residue are parsed.
KeyError – If an unknown modification is encountered.
Examples
>>> from pyXLMS.parser import parse_modifications_from_maxquant_sequence >>> seq = "_VVDELVKVM(Oxidation (M))GR_" >>> parse_modifications_from_maxquant_sequence(seq, 2, "DSS", 138.06808) {2: ('DSS', 138.06808), 9: ('Oxidation', 15.994915)}
>>> from pyXLMS.parser import parse_modifications_from_maxquant_sequence >>> seq = "_VVDELVKVM(Oxidation (M))GRM(Oxidation (M))_" >>> parse_modifications_from_maxquant_sequence(seq, 2, "DSS", 138.06808) {2: ('DSS', 138.06808), 9: ('Oxidation', 15.994915), 12: ('Oxidation', 15.994915)}
>>> from pyXLMS.parser import parse_modifications_from_maxquant_sequence >>> seq = "_M(Oxidation (M))VVDELVKVM(Oxidation (M))GRM(Oxidation (M))_" >>> parse_modifications_from_maxquant_sequence(seq, 2, "DSS", 138.06808) {2: ('DSS', 138.06808), 1: ('Oxidation', 15.994915), 10: ('Oxidation', 15.994915), 13: ('Oxidation', 15.994915)}
- pyXLMS.parser.parse_modifications_from_scout_sequence(
- seq: str,
- crosslink_position: int,
- crosslinker: str,
- crosslinker_mass: float,
- modifications: Dict[str, Tuple[str, float]] = {'+15.994900': ('Oxidation', 15.994915), '+57.021460': ('Carbamidomethyl', 57.021464), 'ADH': ('ADH', 138.09054635), 'BS3': ('BS3', 138.06808), 'Carbamidomethyl': ('Carbamidomethyl', 57.021464), 'DSBSO': ('DSBSO', 308.03883), 'DSBU': ('DSBU', 196.08479231), 'DSS': ('DSS', 138.06808), 'DSSO': ('DSSO', 158.00376), 'Oxidation of Methionine': ('Oxidation', 15.994915), 'PhoX': ('PhoX', 209.97181)},
- verbose: Literal[0, 1, 2] = 1,
Parse post-translational-modifications from a Scout peptide sequence.
Parses post-translational-modifications (PTMs) from a Scout peptide sequence, for example “M(+15.994900)LASAGELQKGNELALPSK”.
- Parameters:
seq (str) – The Scout sequence string.
crosslink_position (int) – Position of the crosslinker in the sequence (1-based).
crosslinker (str) – Name of the used cross-linking reagent, for example “DSSO”.
crosslinker_mass (float) – Monoisotopic delta mass of the crosslink modification.
modifications (dict of str, float, default =
constants.SCOUT_MODIFICATION_MAPPING
) – Mapping of modification names to modification masses.verbose (0, 1, or 2, default = 1) –
0: All warnings are ignored.
1: Warnings are printed to stdout.
2: Warnings are treated as errors.
- Returns:
The
pyXLMS
specific modifications object, a dictionary that maps positions to their corresponding modifications and their monoisotopic masses.- Return type:
dict of int, tuple
- Raises:
RuntimeError – If multiple modifications on the same residue are parsed (only if
verbose = 2
).KeyError – If an unknown modification is encountered.
Examples
>>> from pyXLMS.parser import parse_modifications_from_scout_sequence >>> seq = "M(+15.994900)LASAGELQKGNELALPSK" >>> parse_modifications_from_scout_sequence(seq, 10, "DSS", 138.06808) {10: ('DSS', 138.06808), 1: ('Oxidation', 15.994915)}
>>> from pyXLMS.parser import parse_modifications_from_scout_sequence >>> seq = "KIEC(+57.021460)FDSVEISGVEDR" >>> parse_modifications_from_scout_sequence(seq, 1, "DSS", 138.06808) {1: ('DSS', 138.06808), 4: ('Carbamidomethyl', 57.021464)}
- pyXLMS.parser.parse_modifications_from_xi_sequence(sequence: str) Dict[int, str] [source]#
Parses all post-translational-modifications from a peptide sequence as reported by xiFDR.
Parses all post-translational-modifications from a peptide sequence as reported by xiFDR. This assumes that amino acids are given in upper case letters and post-translational-modifications in lower case letters. The parsed modifications are returned as a dictionary that maps their position in the sequence (1-based) to their xiFDR annotation (
SYMBOLEXT
), for example"cm"
or"ox"
.- Parameters:
sequence (str) – The peptide sequence as given by xiFDR.
- Returns:
Dictionary that maps modifications (values) to their respective positions in the peptide sequence (1-based) (keys). The modifications are given in xiFDR annotation style (
SYMBOLEXT
) which is the lower letter modification code, for example"cm"
for carbamidomethylation.- Return type:
dict of int, str
- Raises:
RuntimeError – If multiple modifications on the same residue are parsed.
Examples
>>> from pyXLMS.parser import parse_modifications_from_xi_sequence >>> seq1 = "KIECcmFDSVEISGVEDR" >>> parse_modifications_from_xi_sequence(seq1) {4: 'cm'}
>>> from pyXLMS.parser import parse_modifications_from_xi_sequence >>> seq2 = "KIECcmFDSVEMoxISGVEDR" >>> parse_modifications_from_xi_sequence(seq2) {4: 'cm', 10: 'ox'}
>>> from pyXLMS.parser import parse_modifications_from_xi_sequence >>> seq3 = "KIECcmFDSVEISGVEDRMox" >>> parse_modifications_from_xi_sequence(seq3) {4: 'cm', 17: 'ox'}
>>> from pyXLMS.parser import parse_modifications_from_xi_sequence >>> seq4 = "CcmKIECcmFDSVEISGVEDRMox" >>> parse_modifications_from_xi_sequence(seq4) {1: 'cm', 5: 'cm', 18: 'ox'}
- pyXLMS.parser.parse_peptide(sequence: str, term_char: str = '.') str [source]#
Parses the peptide sequence from a sequence string including flanking amino acids.
Parses the peptide sequence from a sequence string including flanking amino acids, for example
"K.KKMoxKLS.S"
. The returned peptide sequence for this example would be"KKMoxKLS"
.- Parameters:
sequence (str) – The sequence string containing the peptide sequence and flanking amino acids.
term_char (str (single character), default = ".") – The character used to denote N-terminal and C-terminal.
- Returns:
The parsed peptide sequence without flanking amino acids.
- Return type:
str
- Raises:
RuntimeError – If (one of) the peptide sequence(s) could not be parsed.
Examples
>>> from pyXLMS.parser import parse_peptide >>> parse_peptide("K.KKMoxKLS.S") 'KKMoxKLS'
>>> from pyXLMS.parser import parse_peptide >>> parse_peptide("-.CcmCcmPSR.T") 'CcmCcmPSR'
>>> from pyXLMS.parser import parse_peptide >>> parse_peptide("CCPSR") 'CCPSR'
- pyXLMS.parser.parse_scan_nr_from_mzid(spectrum_id: str) int [source]#
Parse the scan number from a ‘spectrumID’ of a mzIdentML file.
- Parameters:
title (str) – The ‘spectrumID’ of the mass spectrum from an mzIdentML file read with
pyteomics
.- Returns:
The scan number.
- Return type:
int
Examples
>>> from pyXLMS.parser import parse_scan_nr_from_mzid >>> parse_scan_nr_from_mzid("scan=5321") 5321
- pyXLMS.parser.parse_scan_nr_from_plink(title: str) int [source]#
Parse the scan number from a spectrum title.
- Parameters:
title (str) – The spectrum title.
- Returns:
The scan number.
- Return type:
int
Examples
>>> from pyXLMS.parser import parse_scan_nr_from_plink >>> parse_scan_nr_from_plink("XLpeplib_Beveridge_QEx-HFX_DSS_R1.20588.20588.3.0.dta") 20588
- pyXLMS.parser.parse_spectrum_file_from_plink(title: str) str [source]#
Parse the spectrum file name from a spectrum title.
- Parameters:
title (str) – The spectrum title.
- Returns:
The spectrum file name.
- Return type:
str
Examples
>>> from pyXLMS.parser import parse_spectrum_file_from_plink >>> parse_spectrum_file_from_plink("XLpeplib_Beveridge_QEx-HFX_DSS_R1.20588.20588.3.0.dta") 'XLpeplib_Beveridge_QEx-HFX_DSS_R1'
- pyXLMS.parser.pyxlms_modification_str_parser(
- modifications: str,
Parse a pyXLMS modification string.
Parses a pyXLMS modification string and returns the pyXLMS specific modification object, a dictionary that maps positions to their modififications.
- Parameters:
modifications (str) – The pyXLMS modification string.
- Returns:
The pyXLMS specific modification object, a dictionary that maps positions (1-based) to their respective modifications given as tuples of modification name and modification delta mass.
- Return type:
dict of int, tuple
- Raises:
RuntimeError – If multiple modifications on the same residue are parsed.
Examples
>>> from pyXLMS.parser import pyxlms_modification_str_parser >>> modification_str = "(1:[DSS|138.06808])" >>> pyxlms_modification_str_parser(modification_str) {1: ('DSS', 138.06808)}
>>> from pyXLMS.parser import pyxlms_modification_str_parser >>> modification_str = "(1:[DSS|138.06808]);(7:[Oxidation|15.994915])" >>> pyxlms_modification_str_parser(modification_str) {1: ('DSS', 138.06808), 7: ('Oxidation', 15.994915)}
- pyXLMS.parser.read(
- files: str | List[str] | BinaryIO,
- engine: Literal['Custom', 'MaxQuant', 'MaxLynx', 'MeroX', 'MS Annika', 'mzIdentML', 'pLink', 'Scout', 'xiSearch/xiFDR', 'XlinkX'],
- crosslinker: str,
- parse_modifications: bool = True,
- ignore_errors: bool = False,
- verbose: Literal[0, 1, 2] = 1,
- **kwargs,
Read a crosslink result file.
Reads a crosslink or crosslink-spectrum-match result file from any of the supported crosslink search engines or formats. Currently supports results files from MaxLynx/MaxQuant, MeroX, MS Annika, pLink 2 and pLink 3, Scout, xiSearch and xiFDR, XlinkX, and the mzIdentML format. Additionally supports parsing from custom
.csv
files in pyXLMS format, see more about the custom format inparser.read_custom()
and in here: docs.- Parameters:
files (str, list of str, or file stream) – The name/path of the result file(s) or a file-like object/stream.
engine ("Custom", "MaxQuant", "MaxLynx", "MeroX", "MS Annika", "mzIdentML", "pLink", "Scout", "xiSearch/xiFDR", or "XlinkX") – Crosslink search engine or format of the result file.
crosslinker (str) – Name of the used cross-linking reagent, for example “DSSO”.
parse_modifications (bool, default = True) – Whether or not post-translational-modifications should be parsed for crosslink-spectrum-matches. Requires correct specification of the ‘modifications’ parameter for every parser. Defaults are selected for every parser if ‘modifications’ is not passed via
**kwargs
.ignore_errors (bool, default = False) – Ignore errors when mapping modifications. Used in
parser.read_xi()
andparser.read_xlinkx()
.verbose (0, 1, or 2, default = 1) –
0: All warnings are ignored.
1: Warnings are printed to stdout.
2: Warnings are treated as errors.
**kwargs – Any additional parameters will be passed to the specific parsers.
- Returns:
The
parser_result
object containing all parsed information.- Return type:
dict
- Raises:
ValueError – If the value entered for parameter
engine
is not supported.
Examples
>>> from pyXLMS.parser import read >>> csms_from_xiSearch = read("data/xi/r1_Xi1.7.6.7.csv", engine="xiSearch/xiFDR", crosslinker="DSS")
>>> from pyXLMS.parser import read >>> csms_from_MaxQuant = read("data/maxquant/run1/crosslinkMsms.txt", engine="MaxQuant", crosslinker="DSS")
- pyXLMS.parser.read_custom(
- files: str | List[str] | BinaryIO,
- column_mapping: Dict[str, str] | None = None,
- parse_modifications: bool = True,
- modification_parser: Callable[[str], Dict[int, Tuple[str, float]]] | None = None,
- decoy_prefix: str = 'REV_',
- format: Literal['auto', 'csv', 'txt', 'tsv', 'xlsx'] = 'auto',
- sep: str = ',',
- decimal: str = '.',
Read a custom or pyXLMS result file.
Reads a custom or pyXLMS crosslink-spectrum-matches result file or crosslink result file in
.csv
or.xlsx
format, and returns aparser_result
.The minimum required columns for a crosslink-spectrum-matches result file are:
“Alpha Peptide”: The unmodified amino acid sequence of the first peptide.
“Alpha Peptide Crosslink Position”: The position of the crosslinker in the sequence of the first peptide (1-based).
“Beta Peptide”: The unmodified amino acid sequence of the second peptide.
“Beta Peptide Crosslink Position”: The position of the crosslinker in the sequence of the second peptide (1-based).
“Spectrum File”: Name of the spectrum file the crosslink-spectrum-match was identified in.
“Scan Nr”: The corresponding scan number of the crosslink-spectrum-match.
The minimum required columns for crosslink result file are:
“Alpha Peptide”: The unmodified amino acid sequence of the first peptide.
“Alpha Peptide Crosslink Position”: The position of the crosslinker in the sequence of the first peptide (1-based).
“Beta Peptide”: The unmodified amino acid sequence of the second peptide.
“Beta Peptide Crosslink Position”: The position of the crosslinker in the sequence of the second peptide (1-based).
A full specification of columns that can be parsed can be found in the docs.
- Parameters:
files (str, list of str, or file stream) – The name/path of the result file(s) or a file-like object/stream.
column_mapping (dict of str, str) – A dictionary that maps the result file columns to the required pyXLMS column names.
parse_modifications (bool, default = True) – Whether or not post-translational-modifications should be parsed for crosslink-spectrum-matches. Requires correct specification of the ‘modification_parser’ parameter.
modification_parser (callable, or None) – A function that parses modification strings and returns the pyXLMS specific modifications object. If None, the function
pyxlms_modification_str_parser()
is used. If no modification columns are given this parameter is ignored.decoy_prefix (str, default = "REV_") – The prefix that indicates that a protein is from the decoy database.
format ("auto", "csv", "tsv", "txt", or "xlsx", default = "auto") – The format of the result file.
"auto"
is only available if the name/path to the result file is given.sep (str, default = ",") – Seperator used in the
.csv
or.tsv
file. Parameter is ignored if the file is in.xlsx
format.decimal (str, default = ".") – Character to recognize as decimal point. Parameter is ignored if the file is in
.xlsx
format.
- Returns:
The
parser_result
object containing all parsed information.- Return type:
dict
- Raises:
ValueError – If the input format is not supported or cannot be inferred.
TypeError – If one of the values could not be parsed.
RuntimeError – If the file(s) could not be read or if the file(s) contain no crosslinks or crosslink-spectrum-matches.
Examples
>>> from pyXLMS.parser import read_custom >>> csms_from_pyxlms = read_custom("data/pyxlms/csm.txt")
>>> from pyXLMS.parser import read_custom >>> crosslinks_from_pyxlms = read_custom("data/pyxlms/xl.txt")
- pyXLMS.parser.read_maxlynx(
- files: str | List[str] | BinaryIO,
- crosslinker: str,
- crosslinker_mass: float | None = None,
- decoy_prefix: str = 'REV__',
- parse_modifications: bool = True,
- modifications: Dict[str, float] = {'ADH': 138.09054635, 'Acetyl': 42.010565, 'BS3': 138.06808, 'Carbamidomethyl': 57.021464, 'DSBSO': 308.03883, 'DSBU': 196.08479231, 'DSS': 138.06808, 'DSSO': 158.00376, 'Oxidation': 15.994915, 'PhoX': 209.97181, 'Phospho': 79.966331},
- sep: str = '\t',
- decimal: str = '.',
Read a MaxLynx result file.
Reads a MaxLynx crosslink-spectrum-matches result file “crosslinkMsms.txt” in
.txt
(tab delimited) format and returns aparser_result
. This is an alias for the MaxQuant reader.- Parameters:
files (str, list of str, or file stream) – The name/path of the MaxLynx result file(s) or a file-like object/stream.
crosslinker (str) – Name of the used cross-linking reagent, for example “DSSO”.
crosslinker_mass (float, or None, default = None) – Monoisotopic delta mass of the crosslink modification. If the crosslinker is defined in parameter “modifications” this can be omitted.
decoy_prefix (str, default = "REV__") – The prefix that indicates that a protein is from the decoy database.
parse_modifications (bool, default = True) – Whether or not post-translational-modifications should be parsed for crosslink-spectrum-matches. Requires correct specification of the ‘modifications’ parameter.
modifications (dict of str, float, default =
constants.MODIFICATIONS
) – Mapping of modification names to modification masses.sep (str, default = "t") – Seperator used in the
.txt
file.decimal (str, default = ".") – Character to recognize as decimal point.
- Returns:
The
parser_result
object containing all parsed information.- Return type:
dict
- Raises:
RuntimeError – If the file(s) could not be read or if the file(s) contain no crosslink-spectrum-matches.
KeyError – If the specified crosslinker could not be found/mapped.
Warning
MaxLynx/MaxQuant only reports a single protein crosslink position per peptide, for ambiguous peptides only the crosslink position of the first matching protein is reported. All matching proteins can be retrieved via
additional_information
, however not their corresponding crosslink positions. For this reason it is recommended to usetransform.reannotate_positions()
to correctly annotate all crosslink positions for all peptides if that is important for downstream analysis.Examples
>>> from pyXLMS.parser import read_maxlynx >>> csms_from_xlsx = read_maxlynx("data/maxquant/run1/crosslinkMsms.txt")
- pyXLMS.parser.read_maxquant(
- files: str | List[str] | BinaryIO,
- crosslinker: str,
- crosslinker_mass: float | None = None,
- decoy_prefix: str = 'REV__',
- parse_modifications: bool = True,
- modifications: Dict[str, float] = {'ADH': 138.09054635, 'Acetyl': 42.010565, 'BS3': 138.06808, 'Carbamidomethyl': 57.021464, 'DSBSO': 308.03883, 'DSBU': 196.08479231, 'DSS': 138.06808, 'DSSO': 158.00376, 'Oxidation': 15.994915, 'PhoX': 209.97181, 'Phospho': 79.966331},
- sep: str = '\t',
- decimal: str = '.',
Read a MaxQuant result file.
Reads a MaxQuant crosslink-spectrum-matches result file “crosslinkMsms.txt” in
.txt
(tab delimited) format and returns aparser_result
.- Parameters:
files (str, list of str, or file stream) – The name/path of the MaxQuant result file(s) or a file-like object/stream.
crosslinker (str) – Name of the used cross-linking reagent, for example “DSSO”.
crosslinker_mass (float, or None, default = None) – Monoisotopic delta mass of the crosslink modification. If the crosslinker is defined in parameter “modifications” this can be omitted.
decoy_prefix (str, default = "REV__") – The prefix that indicates that a protein is from the decoy database.
parse_modifications (bool, default = True) – Whether or not post-translational-modifications should be parsed for crosslink-spectrum-matches. Requires correct specification of the ‘modifications’ parameter.
modifications (dict of str, float, default =
constants.MODIFICATIONS
) – Mapping of modification names to modification masses.sep (str, default = "t") – Seperator used in the
.txt
file.decimal (str, default = ".") – Character to recognize as decimal point.
- Returns:
The
parser_result
object containing all parsed information.- Return type:
dict
- Raises:
RuntimeError – If the file(s) could not be read or if the file(s) contain no crosslink-spectrum-matches.
KeyError – If the specified crosslinker could not be found/mapped.
Warning
MaxLynx/MaxQuant only reports a single protein crosslink position per peptide, for ambiguous peptides only the crosslink position of the first matching protein is reported. All matching proteins can be retrieved via
additional_information
, however not their corresponding crosslink positions. For this reason it is recommended to usetransform.reannotate_positions()
to correctly annotate all crosslink positions for all peptides if that is important for downstream analysis.Examples
>>> from pyXLMS.parser import read_maxquant >>> csms = read_maxquant("data/maxquant/run1/crosslinkMsms.txt")
- pyXLMS.parser.read_merox(
- files: str | List[str] | BinaryIO,
- crosslinker: str,
- crosslinker_mass: float | None = None,
- decoy_prefix: str = 'REV__',
- parse_modifications: bool = True,
- modifications: Dict[str, Dict[str, Any]] = {'B': {'Amino Acid': 'C', 'Modification': ('Carbamidomethyl', 57.021464)}, 'm': {'Amino Acid': 'M', 'Modification': ('Oxidation', 15.994915)}},
- sep: str = ';',
- decimal: str = '.',
Read a MeroX result file.
Reads a MeroX crosslink-spectrum-matches result file in
.csv
or.zhrm
format and returns aparser_result
.- Parameters:
files (str, list of str, or file stream) – The name/path of the MeroX result file(s) or a file-like object/stream.
crosslinker (str) – Name of the used cross-linking reagent, for example “DSSO”.
crosslinker_mass (float, or None, default = None) – Monoisotopic delta mass of the crosslink modification. If the crosslinker is defined in
constants.MODIFICATIONS
this can be omitted.decoy_prefix (str, default = "REV__") – The prefix that indicates that a protein is from the decoy database.
parse_modifications (bool, default = True) – Whether or not post-translational-modifications should be parsed for crosslink-spectrum-matches. Requires correct specification of the ‘modifications’ parameter.
modifications (dict of str, dict of str, any, default =
constants.MEROX_MODIFICATION_MAPPING
) – Mapping of modification symbols to their amino acids and modifications. Please refer toconstants.MEROX_MODIFICATION_MAPPING
for examples.sep (str, default = ";") – Seperator used in the
.csv
or.zhrm
file.decimal (str, default = ".") – Character to recognize as decimal point.
- Returns:
The
parser_result
object containing all parsed information.- Return type:
dict
- Raises:
RuntimeError – If the file(s) could not be read or if the file(s) contain no crosslink-spectrum-matches.
KeyError – If the specified crosslinker could not be found/mapped.
Warning
MeroX only reports a single protein crosslink position per peptide, for ambiguous peptides only the crosslink position of the first matching protein is reported. All matching proteins can be retrieved via
additional_information
, however not their corresponding crosslink positions. For this reason it is recommended to usetransform.reannotate_positions()
to correctly annotate all crosslink positions for all peptides if that is important for downstream analysis. Additionally, please note that target and decoy information is derived based off the protein accession and parameterdecoy_prefix
. By default, MeroX only reports target matches that are above the desired FDR.Examples
>>> from pyXLMS.parser import read_merox >>> csms_from_csv = read_merox("data/merox/XLpeplib_Beveridge_QEx-HFX_DSS_R1.csv", crosslinker="DSS")
>>> from pyXLMS.parser import read_merox >>> csms_from_zhrm = read_merox("data/merox/XLpeplib_Beveridge_QEx-HFX_DSS_R1.zhrm", crosslinker="DSS")
- pyXLMS.parser.read_msannika(
- files: str | List[str] | BinaryIO,
- parse_modifications: bool = True,
- modifications: Dict[str, float] = {'ADH': 138.09054635, 'Acetyl': 42.010565, 'BS3': 138.06808, 'Carbamidomethyl': 57.021464, 'DSBSO': 308.03883, 'DSBU': 196.08479231, 'DSS': 138.06808, 'DSSO': 158.00376, 'Oxidation': 15.994915, 'PhoX': 209.97181, 'Phospho': 79.966331},
- format: Literal['auto', 'csv', 'txt', 'tsv', 'xlsx', 'pdresult'] = 'auto',
- sep: str = '\t',
- decimal: str = '.',
- unsafe: bool = False,
- verbose: Literal[0, 1, 2] = 1,
Read an MS Annika result file.
Reads an MS Annika crosslink-spectrum-matches result file or crosslink result file in
.csv
or.xlsx
format, or both from a.pdResult
file from Proteome Discover, and returns aparser_result
.- Parameters:
files (str, list of str, or file stream) – The name/path of the MS Annika result file(s) or a file-like object/stream.
parse_modifications (bool, default = True) – Whether or not post-translational-modifications should be parsed for crosslink-spectrum-matches. Requires correct specification of the ‘modifications’ parameter.
modifications (dict of str, float, default =
constants.MODIFICATIONS
) – Mapping of modification names to modification masses.format ("auto", "csv", "tsv", "txt", "xlsx", or "pdresult", default = "auto") – The format of the result file.
"auto"
is only available if the name/path to the MS Annika result file is given.sep (str, default = "t") – Seperator used in the
.csv
or.tsv
file. Parameter is ignored if the file is in.xlsx
or.pdResult
format.decimal (str, default = ".") – Character to recognize as decimal point. Parameter is ignored if the file is in
.xlsx
or.pdResult
format.unsafe (bool, default = False) – If True, allows reading of negative peptide and crosslink positions but replaces their values with None. Negative values occur when peptides can’t be matched to proteins because of ‘X’ in protein sequences. Reannotation might be possible with
transform.reannotate_positions()
.verbose (0, 1, or 2, default = 1) –
0: All warnings are ignored.
1: Warnings are printed to stdout.
2: Warnings are treated as errors.
- Returns:
The
parser_result
object containing all parsed information.- Return type:
dict
- Raises:
ValueError – If the input format is not supported or cannot be inferred.
TypeError – If the pdResult file is provided in the wrong format.
TypeError – If parameter verbose was not set correctly.
RuntimeError – If one of the crosslinks or crosslink-spectrum-matches contains unknown crosslink or peptide positions. This occurs when peptides can’t be matched to proteins because of ‘X’ in protein sequences. Selecting ‘unsafe = True’ will ignore these errors and return None type positions. Reannotation might be possible with
transform.reannotate_positions()
.RuntimeError – If the file(s) could not be read or if the file(s) contain no crosslinks or crosslink-spectrum-matches.
KeyError – If one of the found post-translational-modifications could not be found/mapped.
Warning
MS Annika does not report if the individual peptides in a crosslink are from the target or decoy database. The parser assumes that both peptides from a target crosslink are from the target database, and vice versa, that both peptides are from the decoy database if it is a decoy crosslink. This leads to only TT and DD matches, which needs to be considered for FDR estimation. This also only applies to crosslinks and not crosslink-spectrum-matches, where this information is correctly reported and parsed.
Examples
>>> from pyXLMS.parser import read_msannika >>> csms_from_xlsx = read_msannika("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_CSMs.xlsx")
>>> from pyXLMS.parser import read_msannika >>> crosslinks_from_xlsx = read_msannika("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_Crosslinks.xlsx")
>>> from pyXLMS.parser import read_msannika >>> csms_from_tsv = read_msannika("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_CSMs.txt")
>>> from pyXLMS.parser import read_msannika >>> crosslinks_from_tsv = read_msannika("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_Crosslinks.txt")
>>> from pyXLMS.parser import read_msannika >>> csms_and_crosslinks_from_pdresult = read_msannika("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1.pdResult")
- pyXLMS.parser.read_mzid(
- files: str | List[str] | BinaryIO,
- scan_nr_parser: Callable[[str], int] | None = None,
- decoy: bool | None = None,
- crosslinkers: Dict[str, float] = {'ADH': 138.09054635, 'BS3': 138.06808, 'DSBSO': 308.03883, 'DSBU': 196.08479231, 'DSS': 138.06808, 'DSSO': 158.00376, 'PhoX': 209.97181},
- verbose: Literal[0, 1, 2] = 1,
Read a mzIdentML (mzid) file.
Reads crosslink-spectrum-matches from a mzIdentML (mzid) file and returns a
parser_result
.- Parameters:
files (str, list of str, or file stream) – The name/path of the mzIdentML (mzid) file(s) or a file-like object/stream.
scan_nr_parser (callable, or None, default = None) – A function that parses the scan number from mzid spectrumIDs. If None (default) the function
parse_scan_nr_from_mzid()
is used.decoy (bool, or None, default = None) – Whether the mzid file contains decoy CSMs (
True
) or target CSMs (False
).crosslinkers (dict of str, float, default =
constants.CROSSLINKERS
) – Mapping of crosslinker names to crosslinker delta masses.verbose (0, 1, or 2, default = 1) –
0: All warnings are ignored.
1: Warnings are printed to stdout.
2: Warnings are treated as errors.
- Returns:
The
parser_result
object containing all parsed information.- Return type:
dict
- Raises:
RuntimeError – If the file(s) could not be read or if the file(s) contain no crosslink-spectrum-matches.
RuntimeError – If parser is used with
verbose = 2
.RuntimeError – If there are warnings while reading the mzIdentML file (only for
verbose = 2
).TypeError – If parameter verbose was not set correctly.
TypeError – If one of the values necessary to create a crosslink-spectrum-match could not be parsed correctly.
Notes
This parser is experimental, as I don’t know if the mzIdentML structure is consistent accross different crosslink search engines. This parser was tested with mzIdentML files from MS Annika and XlinkX.
Warning
This parser only parses minimal data because most information is not available from the mzIdentML file. The available data is:
alpha_peptide
alpha_peptide_crosslink_position
beta_peptide
beta_peptide_crosslink_position
spectrum_file
scan_nr
Examples
>>> from pyXLMS.parser import read_mzid >>> csms = read_mzid("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1.mzid")
- pyXLMS.parser.read_plink(
- files: str | List[str] | BinaryIO,
- spectrum_file_parser: Callable[[str], str] | None = None,
- scan_nr_parser: Callable[[str], int] | None = None,
- decoy_prefix: str = 'REV_',
- parse_modifications: bool = True,
- modifications: Dict[str, float] = {'ADH': 138.09054635, 'Acetyl': 42.010565, 'BS3': 138.06808, 'Carbamidomethyl': 57.021464, 'DSBSO': 308.03883, 'DSBU': 196.08479231, 'DSS': 138.06808, 'DSSO': 158.00376, 'Oxidation': 15.994915, 'PhoX': 209.97181, 'Phospho': 79.966331},
- sep: str = ',',
- decimal: str = '.',
- verbose: Literal[0, 1, 2] = 1,
Read a pLink result file.
Reads a pLink crosslink-spectrum-matches result file “*cross-linked_spectra.csv” in
.csv
(comma delimited) format or pLink crosslinks result file “*cross-linked_peptides.csv” in.csv
(comma delimited) format and returns aparser_result
.- Parameters:
files (str, list of str, or file stream) – The name/path of the pLink result file(s) or a file-like object/stream.
spectrum_file_parser (callable, or None, default = None) – A function that parses the spectrum file name from spectrum titles. If None (default) the function
parse_spectrum_file_from_plink()
is used.scan_nr_parser (callable, or None, default = None) – A function that parses the scan number from spectrum titles. If None (default) the function
parse_scan_nr_from_plink()
is used.decoy_prefix (str, default = "REV_") – The prefix that indicates that a protein is from the decoy database.
parse_modifications (bool, default = True) – Whether or not post-translational-modifications should be parsed for crosslink-spectrum-matches. Requires correct specification of the ‘modifications’ parameter.
modifications (dict of str, float, default =
constants.MODIFICATIONS
) – Mapping of modification names to modification masses.sep (str, default = ",") – Seperator used in the
.csv
file.decimal (str, default = ".") – Character to recognize as decimal point.
verbose (0, 1, or 2, default = 1) –
0: All warnings are ignored.
1: Warnings are printed to stdout.
2: Warnings are treated as errors.
- Returns:
The
parser_result
object containing all parsed information.- Return type:
dict
- Raises:
RuntimeError – If the file(s) could not be read or if the file(s) contain no crosslink-spectrum-matches.
TypeError – If parameter verbose was not set correctly.
Warning
Target and decoy information is derived based off the protein accession and parameter
decoy_prefix
. By default, pLink only reports target matches that are above the desired FDR.Examples
>>> from pyXLMS.parser import read_plink >>> csms = read_plink("data/plink2/Cas9_plus10_2024.06.20.filtered_cross-linked_spectra.csv")
>>> from pyXLMS.parser import read_plink >>> crosslinks = read_plink("data/plink2/Cas9_plus10_2024.06.20.filtered_cross-linked_peptides.csv")
- pyXLMS.parser.read_scout(
- files: str | List[str] | BinaryIO,
- crosslinker: str,
- crosslinker_mass: float | None = None,
- parse_modifications: bool = True,
- modifications: Dict[str, Tuple[str, float]] = {'+15.994900': ('Oxidation', 15.994915), '+57.021460': ('Carbamidomethyl', 57.021464), 'ADH': ('ADH', 138.09054635), 'BS3': ('BS3', 138.06808), 'Carbamidomethyl': ('Carbamidomethyl', 57.021464), 'DSBSO': ('DSBSO', 308.03883), 'DSBU': ('DSBU', 196.08479231), 'DSS': ('DSS', 138.06808), 'DSSO': ('DSSO', 158.00376), 'Oxidation of Methionine': ('Oxidation', 15.994915), 'PhoX': ('PhoX', 209.97181)},
- sep: str = ',',
- decimal: str = '.',
- verbose: Literal[0, 1, 2] = 1,
Read a Scout result file.
Reads a Scout filtered or unfiltered crosslink-spectrum-matches result file or crosslink/residue pair result file in
.csv
format and returns aparser_result
.- Parameters:
files (str, list of str, or file stream) – The name/path of the Scout result file(s) or a file-like object/stream.
crosslinker (str) – Name of the used cross-linking reagent, for example “DSSO”.
crosslinker_mass (float, or None, default = None) – Monoisotopic delta mass of the crosslink modification. If the crosslinker is defined in parameter “modifications” this can be omitted.
parse_modifications (bool, default = True) – Whether or not post-translational-modifications should be parsed for crosslink-spectrum-matches. Requires correct specification of the ‘modifications’ parameter.
modifications (dict of str, tuple, default =
constants.SCOUT_MODIFICATION_MAPPING
) – Mapping of Scout sequence elements (e.g."+15.994900"
) and modifications (e.g"Oxidation of Methionine"
) to their modifications (e.g.("Oxidation", 15.994915)
).sep (str, default = ",") – Seperator used in the
.csv
file.decimal (str, default = ".") – Character to recognize as decimal point.
verbose (0, 1, or 2, default = 1) –
0: All warnings are ignored.
1: Warnings are printed to stdout.
2: Warnings are treated as errors.
- Returns:
The
parser_result
object containing all parsed information.- Return type:
dict
- Raises:
RuntimeError – If the file(s) could not be read or if the file(s) contain no crosslinks or crosslink-spectrum-matches.
KeyError – If the specified crosslinker could not be found/mapped.
TypeError – If parameter verbose was not set correctly.
Warning
When reading unfiltered crosslink-spectrum-matches, no protein crosslink positions or protein peptide positions are available, as these are not reported. If needed they should be annotated with
transform.reannotate_positions()
.When reading filtered crosslink-spectrum-matches, Scout does not report if the individual peptides in a crosslink are from the target or decoy database. The parser assumes that both peptides from a target crosslink-spectrum-match are from the target database, and vice versa, that both peptides are from the decoy database if it is a decoy crosslink-spectrum-match. This leads to only TT and DD matches, which needs to be considered for FDR estimation.
When reading crosslinks / residue pairs, Scout does not report if the individual peptides in a crosslink are from the target or decoy database. The parser assumes that both peptides from a target crosslink are from the target database, and vice versa, that both peptides are from the decoy database if it is a decoy crosslink. This leads to only TT and DD matches, which needs to be considered for FDR estimation.
Examples
>>> from pyXLMS.parser import read_scout >>> csms_unfiltered = read_scout("data/scout/Cas9_Unfiltered_CSMs.csv")
>>> from pyXLMS.parser import read_scout >>> csms_filtered = read_scout("data/scout/Cas9_Filtered_CSMs.csv")
>>> from pyXLMS.parser import read_scout >>> crosslinks = read_scout("data/scout/Cas9_Residue_Pairs.csv")
- pyXLMS.parser.read_xi(
- files: str | List[str] | BinaryIO,
- decoy_prefix: str | None = 'auto',
- parse_modifications: bool = True,
- modifications: Dict[str, Tuple[str, float]] = {'->': ('Substitution', nan), 'bs3_ami': ('BS3 Amidated', 155.094619105), 'bs3_hyd': ('BS3 Hydrolized', 156.0786347), 'bs3_tris': ('BS3 Tris', 259.141973), 'bs3loop': ('BS3 Looplink', 138.06808), 'bs3nh2': ('BS3 Amidated', 155.094619105), 'bs3oh': ('BS3 Hydrolized', 156.0786347), 'cm': ('Carbamidomethyl', 57.021464), 'dsbu_ami': ('DSBU Amidated', 213.111341), 'dsbu_hyd': ('DSBU Hydrolized', 214.095357), 'dsbu_loop': ('DSBU Looplink', 196.08479231), 'dsbu_tris': ('DSBU Tris', 317.158685), 'dsbuloop': ('DSBU Looplink', 196.08479231), 'dsso_ami': ('DSSO Amidated', 175.030313905), 'dsso_hyd': ('DSSO Hydrolized', 176.0143295), 'dsso_loop': ('DSSO Looplink', 158.00376), 'dsso_tris': ('DSSO Tris', 279.077658), 'dssoloop': ('DSSO Looplink', 158.00376), 'ox': ('Oxidation', 15.994915)},
- sep: str = ',',
- decimal: str = '.',
- ignore_errors: bool = False,
- verbose: Literal[0, 1, 2] = 1,
Read a xiSearch/xiFDR result file.
Reads a xiSearch crosslink-spectrum-matches result file or a xiFDR crosslink-spectrum-matches result file or crosslink result file in
.csv
format and returns aparser_result
.- Parameters:
files (str, list of str, or file stream) – The name/path of the xiSearch/xiFDR result file(s) or a file-like object/stream.
decoy_prefix (str, or None, default = "auto") – The prefix that indicates that a protein is from the decoy database. If “auto” or None it will use the default for each xi file type.
parse_modifications (bool, default = True) – Whether or not post-translational-modifications should be parsed for crosslink-spectrum-matches. Requires correct specification of the ‘modifications’ parameter.
modifications (dict of str, tuple, default =
constants.XI_MODIFICATION_MAPPING
) – Mapping of xi sequence elements (e.g."cm"
) to their modifications (e.g.("Carbamidomethyl", 57.021464)
). This corresponds to theSYMBOLEXT
field, or theSYMBOL
field minus the amino acid in the xiSearch config.sep (str, default = ",") – Seperator used in the
.csv
file.decimal (str, default = ".") – Character to recognize as decimal point.
ignore_errors (bool, default = False) – If modifications that are not given in parameter ‘modifications’ should raise an error or not. By default an error is raised if an unknown modification is encountered. If
True
modifications that are unknown are encoded with the xi shortcode (SYMBOLEXT
) andfloat("nan")
modification mass.verbose (0, 1, or 2, default = 1) –
0: All warnings are ignored.
1: Warnings are printed to stdout.
2: Warnings are treated as errors.
- Returns:
The
parser_result
object containing all parsed information.- Return type:
dict
- Raises:
RuntimeError – If the file(s) contain no crosslinks or crosslink-spectrum-matches.
TypeError – If parameter verbose was not set correctly.
Examples
>>> from pyXLMS.parser import read_xi >>> csms_from_xiSearch = read_xi("data/xi/r1_Xi1.7.6.7.csv")
>>> from pyXLMS.parser import read_xi >>> csms_from_xiFDR = read_xi("data/xi/1perc_xl_boost_CSM_xiFDR2.2.1.csv")
>>> from pyXLMS.parser import read_xi >>> crosslinks_from_xiFDR = read_xi("data/xi/1perc_xl_boost_Links_xiFDR2.2.1.csv")
- pyXLMS.parser.read_xlinkx(
- files: str | List[str] | BinaryIO,
- decoy: bool | None = None,
- parse_modifications: bool = True,
- modifications: Dict[str, float] = {'ADH': 138.09054635, 'Acetyl': 42.010565, 'BS3': 138.06808, 'Carbamidomethyl': 57.021464, 'DSBSO': 308.03883, 'DSBU': 196.08479231, 'DSS': 138.06808, 'DSSO': 158.00376, 'Oxidation': 15.994915, 'PhoX': 209.97181, 'Phospho': 79.966331},
- format: Literal['auto', 'csv', 'txt', 'tsv', 'xlsx', 'pdresult'] = 'auto',
- sep: str = '\t',
- decimal: str = '.',
- ignore_errors: bool = False,
- verbose: Literal[0, 1, 2] = 1,
Read an XlinkX result file.
Reads an XlinkX crosslink-spectrum-matches result file or crosslink result file in
.csv
or.xlsx
format, or both from a.pdResult
file from Proteome Discover, and returns aparser_result
.- Parameters:
files (str, list of str, or file stream) – The name/path of the XlinkX result file(s) or a file-like object/stream.
decoy (bool, or None) – Default decoy value to use if no decoy value is found. Only used if the “Is Decoy” column is not found in the supplied data.
parse_modifications (bool, default = True) – Whether or not post-translational-modifications should be parsed for crosslink-spectrum-matches. Requires correct specification of the ‘modifications’ parameter.
modifications (dict of str, float, default =
constants.MODIFICATIONS
) – Mapping of modification names to modification masses.format ("auto", "csv", "tsv", "txt", "xlsx", or "pdresult", default = "auto") – The format of the result file.
"auto"
is only available if the name/path to the XlinkX result file is given.sep (str, default = "t") – Seperator used in the
.csv
or.tsv
file. Parameter is ignored if the file is in.xlsx
or.pdResult
format.decimal (str, default = ".") – Character to recognize as decimal point. Parameter is ignored if the file is in
.xlsx
or.pdResult
format.ignore_errors (bool, default = False) – If missing crosslink positions should raise an error or not. Setting this to True will suppress the
RuntimeError
for the crosslink position not being able to be parsed for at least one of the crosslinks. For these cases the crosslink position will be set to 100 000.verbose (0, 1, or 2, default = 1) –
0: All warnings are ignored.
1: Warnings are printed to stdout.
2: Warnings are treated as errors.
- Returns:
The
parser_result
object containing all parsed information.- Return type:
dict
- Raises:
ValueError – If the input format is not supported or cannot be inferred.
TypeError – If parameter verbose was not set correctly.
TypeError – If the pdResult file is provided in the wrong format.
RuntimeError – If the crosslink position could not be parsed for at least one of the crosslinks.
RuntimeError – If the file(s) could not be read or if the file(s) contain no crosslinks or crosslink-spectrum-matches.
KeyError – If one of the found post-translational-modifications could not be found/mapped.
Warning
XlinkX does not report if the individual peptides in a crosslink are from the target or decoy database. The parser assumes that both peptides from a target crosslink are from the target database, and vice versa, that both peptides are from the decoy database if it is a decoy crosslink. This leads to only TT and DD matches, which needs to be considered for FDR estimation. This applies to both crosslinks and crosslink-spectrum-matches.
Examples
>>> from pyXLMS.parser import read_xlinkx >>> csms_from_xlsx = read_xlinkx("data/xlinkx/XLpeplib_Beveridge_Lumos_DSSO_MS3_CSMs.xlsx")
>>> from pyXLMS.parser import read_xlinkx >>> crosslinks_from_xlsx = read_xlinkx("data/xlinkx/XLpeplib_Beveridge_Lumos_DSSO_MS3_Crosslinks.xlsx")
>>> from pyXLMS.parser import read_xlinkx >>> csms_from_tsv = read_xlinkx("data/xlinkx/XLpeplib_Beveridge_Lumos_DSSO_MS3_CSMs.txt")
>>> from pyXLMS.parser import read_xlinkx >>> crosslinks_from_tsv = read_xlinkx("data/xlinkx/XLpeplib_Beveridge_Lumos_DSSO_MS3_Crosslinks.txt")
>>> from pyXLMS.parser import read_xlinkx >>> csms_and_crosslinks_from_pdresult = read_xlinkx("data/xlinkx/XLpeplib_Beveridge_Lumos_DSSO_MS3.pdResult")