pyXLMS.exporter package#
Submodules#
pyXLMS.exporter.to_impxfdr module#
- pyXLMS.exporter.to_impxfdr.to_impxfdr(
- data: List[Dict[str, Any]],
- filename: str | None,
- targets_only: bool = True,
Exports a list of crosslinks or crosslink-spectrum-matches to IMP-X-FDR format.
Exports a list of crosslinks or crosslink-spectrum-matches to IMP-X-FDR format for benchmarking purposes. The tool IMP-X-FDR is available from github.com/vbc-proteomics-org/imp-x-fdr. We recommend using version 1.1.0 and selecting “MS Annika” as input file format for the here exported file. A slightly modified version is available from github.com/hgb-bin-proteomics/MSAnnika_NC_Results. This version contains a few bug fixes and was used for the MS Annika 2.0 and MS Annika 3.0 publications. Requires that
alpha_proteins
,beta_proteins
,alpha_proteins_crosslink_positions
andbeta_proteins_crosslink_positions
fields are set for crosslinks and crosslink-spectrum-matches.- Parameters:
data (list of dict of str, any) – A list of crosslinks or crosslink-spectrum-matches.
filename (str, or None, default = None) – If not None, the exported data will be written to a file with the specified filename. The filename should end in “.xlsx” as the file is exported to Microsoft Excel file format.
targets_only (bool, default = True) – Whether or not only target crosslinks or crosslink-spectrum-matches should be exported. For benchmarking purposes this is usually the case. If the crosslinks or crosslink-spectrum-matches do not contain target-decoy labels this should be set to False.
- Returns:
A pandas DataFrame containing crosslinks or crosslink-spectrum-matches in IMP-X-FDR format.
- Return type:
pd.DataFrame
- Raises:
TypeError – If a wrong data type is provided.
TypeError – If data contains elements of mixed data type.
ValueError – If the provided data contains no elements or if none of the data has target-decoy labels and parameter ‘targets_only’ is set to True.
RuntimeError – If not all of the required information is present in the input data.
Examples
>>> from pyXLMS.exporter import to_impxfdr >>> from pyXLMS.parser import read >>> pr = read("data/xi/1perc_xl_boost_Links_xiFDR2.2.1.csv", engine="xiSearch/xiFDR", crosslinker="DSS") >>> crosslinks = pr["crosslinks"] >>> to_impxfdr(crosslinks, filename="crosslinks.xlsx") Crosslink Type Sequence A Position A Accession A In protein A ... Position B Accession B In protein B Best CSM Score Decoy 0 Intra VVDELV[K]VMGR 7 Cas9 753 ... 7 Cas9 753 40.679 False 1 Intra MLASAGELQ[K]GNELALPSK 10 Cas9 753 ... 7 Cas9 1226 40.231 False 2 Intra MDGTEELLV[K]LNR 10 Cas9 396 ... 10 Cas9 396 39.582 False 3 Intra MTNFD[K]NLPNEK 6 Cas9 965 ... 2 Cas9 504 35.880 False 4 Intra DFQFY[K]VR 6 Cas9 978 ... 4 Cas9 1028 35.281 False .. ... ... ... ... ... ... ... ... ... ... ... 220 Intra LP[K]YSLFELENGR 3 Cas9 866 ... 3 Cas9 1204 9.877 False 221 Intra D[K]QSGK 2 Cas9 677 ... 2 Cas9 677 9.702 False 222 Intra AGFI[K]R 5 Cas9 922 ... 11 Cas9 881 9.666 False 223 Intra E[K]IEK 2 Cas9 443 ... 1 Cas9 562 9.656 False 224 Intra LS[K]SR 3 Cas9 222 ... 3 Cas9 222 9.619 False [225 rows x 11 columns]
>>> from pyXLMS.exporter import to_impxfdr >>> from pyXLMS.parser import read >>> pr = read("data/xi/1perc_xl_boost_CSM_xiFDR2.2.1.csv", engine="xiSearch/xiFDR", crosslinker="DSS") >>> csms = pr["crosslink-spectrum-matches"] >>> to_impxfdr(csms, filename="csms.xlsx") Crosslink Type Sequence A Position A Accession A In protein A ... Position B Accession B In protein B Best CSM Score Decoy 0 Intra [K]IECFDSVEISGVEDR 1 Cas9 575 ... 1 Cas9 575 27.268 False 1 Intra LVDSTD[K]ADLR 7 Cas9 152 ... 11 Cas9 881 26.437 False 2 Intra GGLSELD[K]AGFIK 8 Cas9 917 ... 8 Cas9 917 26.134 False 3 Intra LVDSTD[K]ADLR 7 Cas9 152 ... 7 Cas9 152 25.804 False 4 Intra VVDELV[K]VMGR 7 Cas9 753 ... 7 Cas9 753 24.861 False .. ... ... ... ... ... ... ... ... ... ... ... 406 Intra [K]GILQTVK 1 Cas9 739 ... 3 Cas9 222 6.977 False 407 Intra QQLPE[K]YK 6 Cas9 350 ... 6 Cas9 350 6.919 False 408 Intra ESILP[K]R 6 Cas9 1117 ... 7 Cas9 1035 6.853 False 409 Intra LS[K]SR 3 Cas9 222 ... 2 Cas9 884 6.809 False 410 Intra QIT[K]HVAQILDSR 4 Cas9 933 ... 6 Cas9 350 6.808 False [411 rows x 11 columns]
pyXLMS.exporter.to_msannika module#
- pyXLMS.exporter.to_msannika.get_msannika_crosslink_sequence(peptide: str, crosslink_position: int) str [source]#
Returns the crosslinked peptide sequence in MS Annika format.
Returns the crosslinked peptide sequence in MS Annika format, which is the peptide amino acid sequence with the crosslinked residue in square brackets (see examples).
- Parameters:
peptide (str) – The (unmodified) amino acid sequence of the peptide.
crosslink_position (int) – Position of the crosslinker in the peptide sequence (1-based).
- Returns:
The crosslinked peptide sequence in MS Annika format.
- Return type:
str
- Raises:
ValueError – If the crosslink position is outside the peptide’s length.
Examples
>>> from pyXLMS.exporter import get_msannika_crosslink_sequence >>> get_msannika_crosslink_sequence("PEPKTIDE", 4) 'PEP[K]TIDE'
>>> from pyXLMS.exporter import get_msannika_crosslink_sequence >>> get_msannika_crosslink_sequence("KPEPTIDE", 1) '[K]PEPTIDE'
>>> from pyXLMS.exporter import get_msannika_crosslink_sequence >>> get_msannika_crosslink_sequence("PEPTIDEK", 8) 'PEPTIDE[K]'
- pyXLMS.exporter.to_msannika.to_msannika(
- data: List[Dict[str, Any]],
- filename: str | None = None,
- format: Literal['csv', 'tsv', 'xlsx'] = 'csv',
Exports a list of crosslinks or crosslink-spectrum-matches to MS Annika format.
Exports a list of crosslinks or crosslink-spectrum-matches to MS Annika format. This might be useful for tools that support MS Annika input but are not supported by pyXLMS (yet).
- Parameters:
data (list of dict of str, any) – A list of crosslinks or crosslink-spectrum-matches.
filename (str, or None, default = None) – If not None, the exported data will be written to a file with the specified filename.
format (str, one of "csv", "tsv", or "xlsx", default = "csv") – File format of the exported file if filename is not None.
- Returns:
A pandas DataFrame containing crosslinks or crosslink-spectrum-matches in MS Annika format.
- Return type:
pd.DataFrame
- Raises:
TypeError – If a wrong data type is provided.
TypeError – If data contains elements of mixed data type.
TypeError – If parameter format is not one of ‘csv’, ‘tsv’ or ‘xlsx’.
ValueError – If the provided data contains no elements.
Warning
The MS Annika exporter will not check if all necessary information is available for the exported crosslinks or crosslink-spectrum-matches. If a value is not available it will be denoted as a missing value in the dataframe and exported file. Please make sure all necessary information is available before using the exported file with another tool! Please also note that modifications are not exported, for modification down-stream analysis please refer to
transform.to_proforma()
ortransform.to_dataframe()
!Examples
>>> from pyXLMS.exporter import to_msannika >>> from pyXLMS.data import create_crosslink_min >>> xl1 = create_crosslink_min("KPEPTIDE", 1, "PKEPTIDE", 2) >>> xl2 = create_crosslink_min("PEKPTIDE", 3, "PEPKTIDE", 4) >>> crosslinks = [xl1, xl2] >>> to_msannika(crosslinks) Crosslink Type Sequence A Position A Accession A In protein A Sequence B Position B Accession B In protein B Best CSM Score Decoy 0 Inter [K]PEPTIDE 1 None None P[K]EPTIDE 2 None None None None 1 Inter PE[K]PTIDE 3 None None PEP[K]TIDE 4 None None None None
>>> from pyXLMS.exporter import to_msannika >>> from pyXLMS.data import create_crosslink_min >>> xl1 = create_crosslink_min("KPEPTIDE", 1, "PKEPTIDE", 2) >>> xl2 = create_crosslink_min("PEKPTIDE", 3, "PEPKTIDE", 4) >>> crosslinks = [xl1, xl2] >>> df = to_msannika(crosslinks, filename = "crosslinks.csv", format = "csv")
>>> from pyXLMS.exporter import to_msannika >>> from pyXLMS.data import create_csm_min >>> csm1 = create_csm_min("KPEPTIDE", 1, "PKEPTIDE", 2, "RUN_1", 1) >>> csm2 = create_csm_min("PEKPTIDE", 3, "PEPKTIDE", 4, "RUN_1", 2) >>> csms = [csm1, csm2] >>> to_msannika(csms) Sequence Crosslink Type Sequence A Crosslinker Position A ... First Scan Charge RT [min] Compensation Voltage 0 KPEPTIDE-PKEPTIDE Inter KPEPTIDE 1 ... 1 None None None 1 PEKPTIDE-PEPKTIDE Inter PEKPTIDE 3 ... 2 None None None [2 rows x 20 columns]
>>> from pyXLMS.exporter import to_msannika >>> from pyXLMS.data import create_csm_min >>> csm1 = create_csm_min("KPEPTIDE", 1, "PKEPTIDE", 2, "RUN_1", 1) >>> csm2 = create_csm_min("PEKPTIDE", 3, "PEPKTIDE", 4, "RUN_1", 2) >>> csms = [csm1, csm2] >>> df = to_msannika(csms, filename = "csms.csv", format = "csv")
pyXLMS.exporter.to_pyxlinkviewer module#
- pyXLMS.exporter.to_pyxlinkviewer.to_pyxlinkviewer(
- crosslinks: List[Dict[str, Any]],
- pdb_file: str | BinaryIO,
- gap_open: int | float = -10.0,
- gap_extension: int | float = -1.0,
- min_sequence_identity: float = 0.8,
- allow_site_mismatch: bool = False,
- ignore_chains: List[str] = [],
- filename_prefix: str | None = None,
Exports a list of crosslinks to PyXlinkViewer format.
Exports a list of crosslinks to PyXlinkViewer format for visualization in pyMOL. The tool PyXlinkViewer is available from github.com/BobSchiffrin/PyXlinkViewer. This exporter performs basical local sequence alignment to align crosslinked peptides to a protein structure in PDB format. Gap open and gap extension penalties can be chosen as well as a threshold for sequence identity that must be satisfied in order for a match to be reported. Additionally the alignment is checked if the supposedly crosslinked residue can be modified with a crosslinker in the protein structure. Due to the alignment shift amino acids might change and a crosslink is reported at a position that is not able to react with the crosslinker. Optionally, these positions can still be reported.
- Parameters:
crosslinks (list of dict of str, any) – A list of crosslinks.
pdb_file (str, or file stream) – The name/path of the PDB file or a file-like object/stream. If a string is provided but no file is found locally, it’s assumed to be an identifier and the file is fetched from the PDB.
gap_open (int, or float, default = -10.0) – Gap open penalty for sequence alignment.
gap_extension (int, or float, default = -1.0,) – Gap extension penalty for sequence alignment.
min_sequence_identity (float, default = 0.8) – Minimum sequence identity to consider an aligned crosslinked peptide a match with its corresponding position in the protein structure. Should be given as a fraction between 0 and 1, e.g. the default of 0.8 corresponds to a minimum of 80% sequence identity.
allow_site_mismatch (bool, default = False) – If the crosslink position after alignment is not a reactive amino acid in the protein structure, should the position still be reported. By default such cases are not reported.
ignore_chains (list of str, default = empty list) – A list of chains to ignore in the protein structure.
filename_prefix (str, or None, default = None) – If not None, the exported data will be written to files with the specified filename prefix. The full list of written files can be accessed via the returned dictionary.
- Returns:
Returns a dictionary with key
PyXlinkViewer
containing the formatted text for PyXlinkViewer, with keyPyXlinkViewer DataFrame
containing the information fromPyXlinkViewer
but as a pandas DataFrame, with keyNumber of mapped crosslinks
containing the total number of mapped crosslinks, with keyMapping
containing a string that logs how crosslinks were mapped to the protein structure, with keyParsed PDB sequence
containing the protein sequence that was parsed from the PDB file, with keyParsed PDB chains
containing the parsed chains from the PDB file, with keyParsed PDB residue numbers
containing the parsed residue numbers from the PDB file, and with keyExported files
containing a list of filenames of all files that were written to disk.- Return type:
dict of str, any
- Raises:
TypeError – If a wrong data type is provided.
TypeError – If data contains elements of mixed data type.
ValueError – If parameter min_sequence_identity is out of bounds.
ValueError – If the provided data contains no elements.
Examples
>>> from pyXLMS.exporter import to_pyxlinkviewer >>> from pyXLMS.parser import read_custom >>> pr = read_custom("data/_test/exporter/pyxlinkviewer/unique_links_all_pyxlms.csv") >>> crosslinks = pr["crosslinks"] >>> pyxlinkviewer_result = to_pyxlinkviewer(crosslinks, pdb_file="6YHU", filename_prefix="6YHU") >>> pyxlinkviewer_output_file_str = pyxlinkviewer_result["PyXlinkViewer"] >>> pyxlinkviewer_dataframe = pyxlinkviewer_result["PyXlinkViewer DataFrame"] >>> nr_mapped_crosslinks = pyxlinkviewer_result["Number of mapped crosslinks"] >>> crosslink_mapping = pyxlinkviewer_result["Mapping"] >>> parsed_pdb_sequenece = pyxlinkviewer_result["Parsed PDB sequence"] >>> parsed_pdb_chains = pyxlinkviewer_result["Parsed PDB chains"] >>> parsed_pdb_residue_numbers = pyxlinkviewer_result["Parsed PDB residue numbers"] >>> exported_files = pyxlinkviewer_result["Exported files"]
pyXLMS.exporter.to_xifdr module#
- pyXLMS.exporter.to_xifdr.to_xifdr(
- csms: List[Dict[str, Any]],
- filename: str | None,
Exports a list of crosslink-spectrum-matches to xiFDR format.
Exports a list of crosslinks to xiFDR format. The tool xiFDR is accessible via the link rappsilberlab.org/software/xifdr. Requires that
alpha_proteins
,beta_proteins
,alpha_proteins_peptide_positions
,beta_proteins_peptide_positions
,alpha_decoy
,beta_decoy
,charge
andscore
fields are set for all crosslink-spectrum-matches.- Parameters:
csms (list of dict of str, any) – A list of crosslink-spectrum-matches.
filename (str, or None) – If not None, the exported data will be written to a file with the specified filename.
- Returns:
A pandas DataFrame containing crosslink-spectrum-matches in xiFDR format.
- Return type:
pd.DataFrame
- Raises:
TypeError – If a wrong data type is provided.
TypeError – If ‘csms’ parameter contains elements of mixed data type.
ValueError – If the provided ‘csms’ parameter contains no elements.
RuntimeError – If not all of the required information is present in the input data.
Examples
>>> from pyXLMS.exporter import to_xifdr >>> from pyXLMS.parser import read >>> pr = read("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_CSMs.xlsx", engine="MS Annika", crosslinker="DSS") >>> csms = pr["crosslink-spectrum-matches"] >>> to_xifdr(csms, filename="msannika_xiFDR.csv") run scan peptide1 ... peptide position 1 peptide position 2 score 0 XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw 2257 GQKNSR ... 777 777 119.83 1 XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw 2448 GQKNSR ... 777 693 13.91 2 XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw 2561 SDKNR ... 864 864 114.43 3 XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw 2719 DKQSGK ... 676 676 200.98 4 XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw 2792 DKQSGK ... 676 45 94.47 .. ... ... ... ... ... ... ... 821 XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw 23297 MDGTEELLVKLNR ... 387 387 286.05 822 XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw 23454 KIECFDSVEISGVEDR ... 575 682 376.15 823 XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw 23581 SSFEKNPIDFLEAK ... 1176 1176 412.44 824 XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw 23683 SSFEKNPIDFLEAK ... 1176 1176 437.10 825 XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw 27087 MEDESKLHKFKDFK ... 99 1176 15.89 [826 rows x 14 columns]
>>> from pyXLMS.exporter import to_xifdr >>> from pyXLMS.parser import read >>> pr = read("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_CSMs.xlsx", engine="MS Annika", crosslinker="DSS") >>> csms = pr["crosslink-spectrum-matches"] >>> df = to_xifdr(csms, filename=None)
pyXLMS.exporter.to_xinet module#
- pyXLMS.exporter.to_xinet.to_xinet(
- crosslinks: List[Dict[str, Any]],
- filename: str | None,
Exports a list of crosslinks to xiNET format.
Exports a list of crosslinks to xiNET format. The tool xiNET is accessible via the link crosslinkviewer.org. Requires that
alpha_proteins
,beta_proteins
,alpha_proteins_crosslink_positions
andbeta_proteins_crosslink_positions
fields are set for all crosslinks.- Parameters:
crosslinks (list of dict of str, any) – A list of crosslinks.
filename (str, or None) – If not None, the exported data will be written to a file with the specified filename.
- Returns:
A pandas DataFrame containing crosslinks in xiNET format.
- Return type:
pd.DataFrame
- Raises:
TypeError – If a wrong data type is provided.
TypeError – If ‘crosslinks’ parameter contains elements of mixed data type.
ValueError – If the provided ‘crosslinks’ parameter contains no elements.
RuntimeError – If not all of the required information is present in the input data.
Notes
The optional
Score
column in the xiNET table will only be available if all crosslinks have assigned scores.Examples
>>> from pyXLMS.exporter import to_xinet >>> from pyXLMS.parser import read >>> from pyXLMS.transform import targets_only >>> from pyXLMS.transform import filter_proteins >>> pr = read("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_Crosslinks.xlsx", engine="MS Annika", crosslinker="DSS") >>> crosslinks = targets_only(pr)["crosslinks"] >>> cas9 = filter_proteins(crosslinks, proteins=["Cas9"])["Both"] >>> to_xinet(cas9, filename="crosslinks_xiNET.csv") Protein1 PepPos1 PepSeq1 LinkPos1 Protein2 PepPos2 PepSeq2 LinkPos2 Score Id 0 Cas9 777 GQKNSR 3 Cas9 777 GQKNSR 3 119.83 1 1 Cas9 864 SDKNR 3 Cas9 864 SDKNR 3 114.43 2 2 Cas9 676 DKQSGK 2 Cas9 676 DKQSGK 2 200.98 3 3 Cas9 676 DKQSGK 2 Cas9 45 HSIKK 4 94.47 4 4 Cas9 31 VPSKK 4 Cas9 31 VPSKK 4 110.48 5 .. ... ... ... ... ... ... ... ... ... ... 248 Cas9 387 MDGTEELLVKLNR 10 Cas9 387 MDGTEELLVKLNR 10 305.63 249 249 Cas9 682 TILDFLKSDGFANR 7 Cas9 947 YDENDKLIR 6 110.46 250 250 Cas9 788 IEEGIKELGSQILK 6 Cas9 1176 SSFEKNPIDFLEAK 5 288.36 251 251 Cas9 575 KIECFDSVEISGVEDR 1 Cas9 682 TILDFLKSDGFANR 7 376.15 252 252 Cas9 1176 SSFEKNPIDFLEAK 5 Cas9 1176 SSFEKNPIDFLEAK 5 437.10 253 [253 rows x 10 columns]
>>> from pyXLMS.exporter import to_xinet >>> from pyXLMS.parser import read >>> from pyXLMS.transform import targets_only >>> from pyXLMS.transform import filter_proteins >>> pr = read("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_Crosslinks.xlsx", engine="MS Annika", crosslinker="DSS") >>> crosslinks = targets_only(pr)["crosslinks"] >>> cas9 = filter_proteins(crosslinks, proteins=["Cas9"])["Both"] >>> df = to_xinet(cas9, filename=None)
pyXLMS.exporter.to_xiview module#
- pyXLMS.exporter.to_xiview.to_xiview(
- crosslinks: List[Dict[str, Any]],
- filename: str | None,
- minimal: bool = True,
Exports a list of crosslinks to xiVIEW format.
Exports a list of crosslinks to xiVIEW format. The tool xiVIEW is accessible via the link xiview.org/. Requires that
alpha_proteins
,beta_proteins
,alpha_proteins_crosslink_positions
andbeta_proteins_crosslink_positions
fields are set for all crosslinks.- Parameters:
crosslinks (list of dict of str, any) – A list of crosslinks.
filename (str, or None) – If not None, the exported data will be written to a file with the specified filename.
minimal (bool, default = True) – Which xiVIEW format to return, if
minimal = True
the minimal xiVIEW format is returned. Otherwise the “CSV without peak lists” format is returned (internally this just callsexporter.to_xinet()
). For more information on the xiVIEW formats please refer to the xiVIEW specification.
- Returns:
A pandas DataFrame containing crosslinks in xiVIEW format.
- Return type:
pd.DataFrame
- Raises:
TypeError – If a wrong data type is provided.
TypeError – If ‘crosslinks’ parameter contains elements of mixed data type.
ValueError – If the provided ‘crosslinks’ parameter contains no elements.
RuntimeError – If not all of the required information is present in the input data.
Notes
The optional
Score
column in the xiVIEW table will only be available if all crosslinks have assigned scores, the optionalDecoy*
columns will only be available if all crosslinks have assigned target and decoy labels.Examples
>>> from pyXLMS.exporter import to_xiview >>> from pyXLMS.parser import read >>> from pyXLMS.transform import targets_only >>> from pyXLMS.transform import filter_proteins >>> pr = read("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_Crosslinks.xlsx", engine="MS Annika", crosslinker="DSS") >>> crosslinks = targets_only(pr)["crosslinks"] >>> cas9 = filter_proteins(crosslinks, proteins=["Cas9"])["Both"] >>> to_xiview(cas9, filename="crosslinks_xiVIEW.csv") AbsPos1 AbsPos2 Protein1 Protein2 Decoy1 Decoy2 Score 0 779 779 Cas9 Cas9 FALSE FALSE 119.83 1 866 866 Cas9 Cas9 FALSE FALSE 114.43 2 677 677 Cas9 Cas9 FALSE FALSE 200.98 3 677 48 Cas9 Cas9 FALSE FALSE 94.47 4 34 34 Cas9 Cas9 FALSE FALSE 110.48 .. ... ... ... ... ... ... ... 248 396 396 Cas9 Cas9 FALSE FALSE 305.63 249 688 952 Cas9 Cas9 FALSE FALSE 110.46 250 793 1180 Cas9 Cas9 FALSE FALSE 288.36 251 575 688 Cas9 Cas9 FALSE FALSE 376.15 252 1180 1180 Cas9 Cas9 FALSE FALSE 437.10 [253 rows x 7 columns]
>>> from pyXLMS.exporter import to_xiview >>> from pyXLMS.parser import read >>> from pyXLMS.transform import targets_only >>> from pyXLMS.transform import filter_proteins >>> pr = read("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_Crosslinks.xlsx", engine="MS Annika", crosslinker="DSS") >>> crosslinks = targets_only(pr)["crosslinks"] >>> cas9 = filter_proteins(crosslinks, proteins=["Cas9"])["Both"] >>> df = to_xiview(cas9, filename=None)
>>> from pyXLMS.exporter import to_xiview >>> from pyXLMS.parser import read >>> from pyXLMS.transform import targets_only >>> from pyXLMS.transform import filter_proteins >>> pr = read("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_Crosslinks.xlsx", engine="MS Annika", crosslinker="DSS") >>> crosslinks = targets_only(pr)["crosslinks"] >>> cas9 = filter_proteins(crosslinks, proteins=["Cas9"])["Both"] >>> to_xiview(cas9, filename="crosslinks_xiVIEW.csv", minimal=False) Protein1 PepPos1 PepSeq1 LinkPos1 Protein2 PepPos2 PepSeq2 LinkPos2 Score Id 0 Cas9 777 GQKNSR 3 Cas9 777 GQKNSR 3 119.83 1 1 Cas9 864 SDKNR 3 Cas9 864 SDKNR 3 114.43 2 2 Cas9 676 DKQSGK 2 Cas9 676 DKQSGK 2 200.98 3 3 Cas9 676 DKQSGK 2 Cas9 45 HSIKK 4 94.47 4 4 Cas9 31 VPSKK 4 Cas9 31 VPSKK 4 110.48 5 .. ... ... ... ... ... ... ... ... ... ... 248 Cas9 387 MDGTEELLVKLNR 10 Cas9 387 MDGTEELLVKLNR 10 305.63 249 249 Cas9 682 TILDFLKSDGFANR 7 Cas9 947 YDENDKLIR 6 110.46 250 250 Cas9 788 IEEGIKELGSQILK 6 Cas9 1176 SSFEKNPIDFLEAK 5 288.36 251 251 Cas9 575 KIECFDSVEISGVEDR 1 Cas9 682 TILDFLKSDGFANR 7 376.15 252 252 Cas9 1176 SSFEKNPIDFLEAK 5 Cas9 1176 SSFEKNPIDFLEAK 5 437.10 253 [253 rows x 10 columns]
pyXLMS.exporter.to_xlinkdb module#
- pyXLMS.exporter.to_xlinkdb.to_xlinkdb(
- crosslinks: List[Dict[str, Any]],
- filename: str | None,
Exports a list of crosslinks to XlinkDB format.
Exports a list of crosslinks to XlinkDB format. The tool XlinkDB is accessible via the link xlinkdb.gs.washington.edu/xlinkdb. Requires that
alpha_proteins
andbeta_proteins
fields are set for all crosslinks.- Parameters:
crosslinks (list of dict of str, any) – A list of crosslinks.
filename (str, or None) – If not None, the exported data will be written to a file with the specified filename. The filename should not contain a file extension and consist only of alpha-numeric characters (a-Z, 0-9).
- Returns:
A pandas DataFrame containing crosslinks in XlinkDB format.
- Return type:
pd.DataFrame
- Raises:
TypeError – If a wrong data type is provided.
TypeError – If ‘crosslinks’ parameter contains elements of mixed data type.
ValueError – If the filename contains any non-alpha-numeric characters.
ValueError – If the provided ‘crosslinks’ parameter contains no elements.
RuntimeError – If not all of the required information is present in the input data.
Notes
XlinkDB input format requires a column with probabilities that the crosslinks are correct. Since that is not available from most crosslink search engines, this is simply set to a constant
1
.Examples
>>> from pyXLMS.exporter import to_xlinkdb >>> from pyXLMS.parser import read >>> pr = read("data/xi/1perc_xl_boost_Links_xiFDR2.2.1.csv", engine="xiSearch/xiFDR", crosslinker="DSS") >>> crosslinks = pr["crosslinks"] >>> to_xlinkdb(crosslinks, filename="crosslinksForXlinkDB") Peptide A Protein A Labeled Position A Peptide B Protein B Labeled Position B Probability 0 VVDELVKVMGR Cas9 6 VVDELVKVMGR Cas9 6 1 1 MLASAGELQKGNELALPSK Cas9 9 VVDELVKVMGR Cas9 6 1 2 MDGTEELLVKLNR Cas9 9 MDGTEELLVKLNR Cas9 9 1 3 MTNFDKNLPNEK Cas9 5 SKLVSDFR Cas9 1 1 4 DFQFYKVR Cas9 5 MIAKSEQEIGK Cas9 3 1 .. ... ... ... ... ... ... ... 222 LPKYSLFELENGR Cas9 2 SDKNR Cas9 2 1 223 DKQSGK Cas9 1 DKQSGK Cas9 1 1 224 AGFIKR Cas9 4 SDNVPSEEVVKK Cas9 10 1 225 EKIEK Cas9 1 KVTVK Cas9 0 1 226 LSKSR Cas9 2 LSKSR Cas9 2 1 [227 rows x 7 columns]
>>> from pyXLMS.exporter import to_xlinkdb >>> from pyXLMS.parser import read >>> pr = read("data/xi/1perc_xl_boost_Links_xiFDR2.2.1.csv", engine="xiSearch/xiFDR", crosslinker="DSS") >>> crosslinks = pr["crosslinks"] >>> df = to_xlinkdb(crosslinks, filename=None)
pyXLMS.exporter.to_xlmstools module#
- pyXLMS.exporter.to_xlmstools.to_xlmstools(
- crosslinks: List[Dict[str, Any]],
- pdb_file: str | BinaryIO,
- gap_open: int | float = -10.0,
- gap_extension: int | float = -1.0,
- min_sequence_identity: float = 0.8,
- allow_site_mismatch: bool = False,
- ignore_chains: List[str] = [],
- filename_prefix: str | None = None,
Exports a list of crosslinks to xlms-tools format.
Exports a list of crosslinks to xlms-tools format for protein structure analysis. The python package xlms-tools is available from gitlab.com/topf-lab/xlms-tools. This exporter performs basical local sequence alignment to align crosslinked peptides to a protein structure in PDB format. Gap open and gap extension penalties can be chosen as well as a threshold for sequence identity that must be satisfied in order for a match to be reported. Additionally the alignment is checked if the supposedly crosslinked residue can be modified with a crosslinker in the protein structure. Due to the alignment shift amino acids might change and a crosslink is reported at a position that is not able to react with the crosslinker. Optionally, these positions can still be reported.
- Parameters:
crosslinks (list of dict of str, any) – A list of crosslinks.
pdb_file (str, or file stream) – The name/path of the PDB file or a file-like object/stream. If a string is provided but no file is found locally, it’s assumed to be an identifier and the file is fetched from the PDB.
gap_open (int, or float, default = -10.0) – Gap open penalty for sequence alignment.
gap_extension (int, or float, default = -1.0,) – Gap extension penalty for sequence alignment.
min_sequence_identity (float, default = 0.8) – Minimum sequence identity to consider an aligned crosslinked peptide a match with its corresponding position in the protein structure. Should be given as a fraction between 0 and 1, e.g. the default of 0.8 corresponds to a minimum of 80% sequence identity.
allow_site_mismatch (bool, default = False) – If the crosslink position after alignment is not a reactive amino acid in the protein structure, should the position still be reported. By default such cases are not reported.
ignore_chains (list of str, default = empty list) – A list of chains to ignore in the protein structure.
filename_prefix (str, or None, default = None) – If not None, the exported data will be written to files with the specified filename prefix. The full list of written files can be accessed via the returned dictionary.
- Returns:
Returns a dictionary with key
xlms-tools
containing the formatted text for xlms-tools, with keyxlms-tools DataFrame
containing the information fromxlms-tools
but as a pandas DataFrame, with keyNumber of mapped crosslinks
containing the total number of mapped crosslinks, with keyMapping
containing a string that logs how crosslinks were mapped to the protein structure, with keyParsed PDB sequence
containing the protein sequence that was parsed from the PDB file, with keyParsed PDB chains
containing the parsed chains from the PDB file, with keyParsed PDB residue numbers
containing the parsed residue numbers from the PDB file, and with keyExported files
containing a list of filenames of all files that were written to disk.- Return type:
dict of str, any
- Raises:
TypeError – If a wrong data type is provided.
TypeError – If data contains elements of mixed data type.
ValueError – If parameter min_sequence_identity is out of bounds.
ValueError – If the provided data contains no elements.
Notes
Internally this exporter just calls
exporter.to_pyxlinkviewer()
and re-writes some of the files since the two tools share the same input file structure.Examples
>>> from pyXLMS.exporter import to_xlmstools >>> from pyXLMS.parser import read_custom >>> pr = read_custom("data/_test/exporter/xlms-tools/unique_links_all_pyxlms.csv") >>> crosslinks = pr["crosslinks"] >>> xlmstools_result = to_xlmstools(crosslinks, pdb_file="6YHU", filename_prefix="6YHU") >>> xlmstools_output_file_str = xlmstools_result["xlms-tools"] >>> xlmstools_dataframe = xlmstools_result["xlms-tools DataFrame"] >>> nr_mapped_crosslinks = xlmstools_result["Number of mapped crosslinks"] >>> crosslink_mapping = xlmstools_result["Mapping"] >>> parsed_pdb_sequenece = xlmstools_result["Parsed PDB sequence"] >>> parsed_pdb_chains = xlmstools_result["Parsed PDB chains"] >>> parsed_pdb_residue_numbers = xlmstools_result["Parsed PDB residue numbers"] >>> exported_files = xlmstools_result["Exported files"]
pyXLMS.exporter.to_xmas module#
- pyXLMS.exporter.to_xmas.to_xmas(
- crosslinks: List[Dict[str, Any]],
- filename: str | None,
Exports a list of crosslinks to XMAS format.
Exports a list of crosslinks to XMAS format for visualization in ChimeraX. The tool XMAS is available from github.com/ScheltemaLab/ChimeraX_XMAS_bundle.
- Parameters:
crosslinks (list of dict of str, any) – A list of crosslinks.
filename (str, or None) – If not None, the exported data will be written to a file with the specified filename.
- Returns:
A pandas DataFrame containing crosslinks in XMAS format.
- Return type:
pd.DataFrame
- Raises:
TypeError – If a wrong data type is provided.
TypeError – If ‘crosslinks’ parameter contains elements of mixed data type.
ValueError – If the provided ‘crosslinks’ parameter contains no elements.
Examples
>>> from pyXLMS.exporter import to_xmas >>> from pyXLMS.data import create_crosslink_min >>> xl1 = create_crosslink_min("KPEPTIDE", 1, "PKEPTIDE", 2) >>> xl2 = create_crosslink_min("PEKPTIDE", 3, "PEPKTIDE", 4) >>> crosslinks = [xl1, xl2] >>> to_xmas(crosslinks, filename="crosslinks_xmas.xlsx") Sequence A Sequence B 0 [K]PEPTIDE P[K]EPTIDE 1 PE[K]PTIDE PEP[K]TIDE
>>> from pyXLMS.exporter import to_xmas >>> from pyXLMS.data import create_crosslink_min >>> xl1 = create_crosslink_min("KPEPTIDE", 1, "PKEPTIDE", 2) >>> xl2 = create_crosslink_min("PEKPTIDE", 3, "PEPKTIDE", 4) >>> crosslinks = [xl1, xl2] >>> to_xmas(crosslinks, filename=None) Sequence A Sequence B 0 [K]PEPTIDE P[K]EPTIDE 1 PE[K]PTIDE PEP[K]TIDE
pyXLMS.exporter.util module#
Module contents#
- pyXLMS.exporter.get_msannika_crosslink_sequence(peptide: str, crosslink_position: int) str [source]#
Returns the crosslinked peptide sequence in MS Annika format.
Returns the crosslinked peptide sequence in MS Annika format, which is the peptide amino acid sequence with the crosslinked residue in square brackets (see examples).
- Parameters:
peptide (str) – The (unmodified) amino acid sequence of the peptide.
crosslink_position (int) – Position of the crosslinker in the peptide sequence (1-based).
- Returns:
The crosslinked peptide sequence in MS Annika format.
- Return type:
str
- Raises:
ValueError – If the crosslink position is outside the peptide’s length.
Examples
>>> from pyXLMS.exporter import get_msannika_crosslink_sequence >>> get_msannika_crosslink_sequence("PEPKTIDE", 4) 'PEP[K]TIDE'
>>> from pyXLMS.exporter import get_msannika_crosslink_sequence >>> get_msannika_crosslink_sequence("KPEPTIDE", 1) '[K]PEPTIDE'
>>> from pyXLMS.exporter import get_msannika_crosslink_sequence >>> get_msannika_crosslink_sequence("PEPTIDEK", 8) 'PEPTIDE[K]'
- pyXLMS.exporter.to_impxfdr(
- data: List[Dict[str, Any]],
- filename: str | None,
- targets_only: bool = True,
Exports a list of crosslinks or crosslink-spectrum-matches to IMP-X-FDR format.
Exports a list of crosslinks or crosslink-spectrum-matches to IMP-X-FDR format for benchmarking purposes. The tool IMP-X-FDR is available from github.com/vbc-proteomics-org/imp-x-fdr. We recommend using version 1.1.0 and selecting “MS Annika” as input file format for the here exported file. A slightly modified version is available from github.com/hgb-bin-proteomics/MSAnnika_NC_Results. This version contains a few bug fixes and was used for the MS Annika 2.0 and MS Annika 3.0 publications. Requires that
alpha_proteins
,beta_proteins
,alpha_proteins_crosslink_positions
andbeta_proteins_crosslink_positions
fields are set for crosslinks and crosslink-spectrum-matches.- Parameters:
data (list of dict of str, any) – A list of crosslinks or crosslink-spectrum-matches.
filename (str, or None, default = None) – If not None, the exported data will be written to a file with the specified filename. The filename should end in “.xlsx” as the file is exported to Microsoft Excel file format.
targets_only (bool, default = True) – Whether or not only target crosslinks or crosslink-spectrum-matches should be exported. For benchmarking purposes this is usually the case. If the crosslinks or crosslink-spectrum-matches do not contain target-decoy labels this should be set to False.
- Returns:
A pandas DataFrame containing crosslinks or crosslink-spectrum-matches in IMP-X-FDR format.
- Return type:
pd.DataFrame
- Raises:
TypeError – If a wrong data type is provided.
TypeError – If data contains elements of mixed data type.
ValueError – If the provided data contains no elements or if none of the data has target-decoy labels and parameter ‘targets_only’ is set to True.
RuntimeError – If not all of the required information is present in the input data.
Examples
>>> from pyXLMS.exporter import to_impxfdr >>> from pyXLMS.parser import read >>> pr = read("data/xi/1perc_xl_boost_Links_xiFDR2.2.1.csv", engine="xiSearch/xiFDR", crosslinker="DSS") >>> crosslinks = pr["crosslinks"] >>> to_impxfdr(crosslinks, filename="crosslinks.xlsx") Crosslink Type Sequence A Position A Accession A In protein A ... Position B Accession B In protein B Best CSM Score Decoy 0 Intra VVDELV[K]VMGR 7 Cas9 753 ... 7 Cas9 753 40.679 False 1 Intra MLASAGELQ[K]GNELALPSK 10 Cas9 753 ... 7 Cas9 1226 40.231 False 2 Intra MDGTEELLV[K]LNR 10 Cas9 396 ... 10 Cas9 396 39.582 False 3 Intra MTNFD[K]NLPNEK 6 Cas9 965 ... 2 Cas9 504 35.880 False 4 Intra DFQFY[K]VR 6 Cas9 978 ... 4 Cas9 1028 35.281 False .. ... ... ... ... ... ... ... ... ... ... ... 220 Intra LP[K]YSLFELENGR 3 Cas9 866 ... 3 Cas9 1204 9.877 False 221 Intra D[K]QSGK 2 Cas9 677 ... 2 Cas9 677 9.702 False 222 Intra AGFI[K]R 5 Cas9 922 ... 11 Cas9 881 9.666 False 223 Intra E[K]IEK 2 Cas9 443 ... 1 Cas9 562 9.656 False 224 Intra LS[K]SR 3 Cas9 222 ... 3 Cas9 222 9.619 False [225 rows x 11 columns]
>>> from pyXLMS.exporter import to_impxfdr >>> from pyXLMS.parser import read >>> pr = read("data/xi/1perc_xl_boost_CSM_xiFDR2.2.1.csv", engine="xiSearch/xiFDR", crosslinker="DSS") >>> csms = pr["crosslink-spectrum-matches"] >>> to_impxfdr(csms, filename="csms.xlsx") Crosslink Type Sequence A Position A Accession A In protein A ... Position B Accession B In protein B Best CSM Score Decoy 0 Intra [K]IECFDSVEISGVEDR 1 Cas9 575 ... 1 Cas9 575 27.268 False 1 Intra LVDSTD[K]ADLR 7 Cas9 152 ... 11 Cas9 881 26.437 False 2 Intra GGLSELD[K]AGFIK 8 Cas9 917 ... 8 Cas9 917 26.134 False 3 Intra LVDSTD[K]ADLR 7 Cas9 152 ... 7 Cas9 152 25.804 False 4 Intra VVDELV[K]VMGR 7 Cas9 753 ... 7 Cas9 753 24.861 False .. ... ... ... ... ... ... ... ... ... ... ... 406 Intra [K]GILQTVK 1 Cas9 739 ... 3 Cas9 222 6.977 False 407 Intra QQLPE[K]YK 6 Cas9 350 ... 6 Cas9 350 6.919 False 408 Intra ESILP[K]R 6 Cas9 1117 ... 7 Cas9 1035 6.853 False 409 Intra LS[K]SR 3 Cas9 222 ... 2 Cas9 884 6.809 False 410 Intra QIT[K]HVAQILDSR 4 Cas9 933 ... 6 Cas9 350 6.808 False [411 rows x 11 columns]
- pyXLMS.exporter.to_msannika(
- data: List[Dict[str, Any]],
- filename: str | None = None,
- format: Literal['csv', 'tsv', 'xlsx'] = 'csv',
Exports a list of crosslinks or crosslink-spectrum-matches to MS Annika format.
Exports a list of crosslinks or crosslink-spectrum-matches to MS Annika format. This might be useful for tools that support MS Annika input but are not supported by pyXLMS (yet).
- Parameters:
data (list of dict of str, any) – A list of crosslinks or crosslink-spectrum-matches.
filename (str, or None, default = None) – If not None, the exported data will be written to a file with the specified filename.
format (str, one of "csv", "tsv", or "xlsx", default = "csv") – File format of the exported file if filename is not None.
- Returns:
A pandas DataFrame containing crosslinks or crosslink-spectrum-matches in MS Annika format.
- Return type:
pd.DataFrame
- Raises:
TypeError – If a wrong data type is provided.
TypeError – If data contains elements of mixed data type.
TypeError – If parameter format is not one of ‘csv’, ‘tsv’ or ‘xlsx’.
ValueError – If the provided data contains no elements.
Warning
The MS Annika exporter will not check if all necessary information is available for the exported crosslinks or crosslink-spectrum-matches. If a value is not available it will be denoted as a missing value in the dataframe and exported file. Please make sure all necessary information is available before using the exported file with another tool! Please also note that modifications are not exported, for modification down-stream analysis please refer to
transform.to_proforma()
ortransform.to_dataframe()
!Examples
>>> from pyXLMS.exporter import to_msannika >>> from pyXLMS.data import create_crosslink_min >>> xl1 = create_crosslink_min("KPEPTIDE", 1, "PKEPTIDE", 2) >>> xl2 = create_crosslink_min("PEKPTIDE", 3, "PEPKTIDE", 4) >>> crosslinks = [xl1, xl2] >>> to_msannika(crosslinks) Crosslink Type Sequence A Position A Accession A In protein A Sequence B Position B Accession B In protein B Best CSM Score Decoy 0 Inter [K]PEPTIDE 1 None None P[K]EPTIDE 2 None None None None 1 Inter PE[K]PTIDE 3 None None PEP[K]TIDE 4 None None None None
>>> from pyXLMS.exporter import to_msannika >>> from pyXLMS.data import create_crosslink_min >>> xl1 = create_crosslink_min("KPEPTIDE", 1, "PKEPTIDE", 2) >>> xl2 = create_crosslink_min("PEKPTIDE", 3, "PEPKTIDE", 4) >>> crosslinks = [xl1, xl2] >>> df = to_msannika(crosslinks, filename = "crosslinks.csv", format = "csv")
>>> from pyXLMS.exporter import to_msannika >>> from pyXLMS.data import create_csm_min >>> csm1 = create_csm_min("KPEPTIDE", 1, "PKEPTIDE", 2, "RUN_1", 1) >>> csm2 = create_csm_min("PEKPTIDE", 3, "PEPKTIDE", 4, "RUN_1", 2) >>> csms = [csm1, csm2] >>> to_msannika(csms) Sequence Crosslink Type Sequence A Crosslinker Position A ... First Scan Charge RT [min] Compensation Voltage 0 KPEPTIDE-PKEPTIDE Inter KPEPTIDE 1 ... 1 None None None 1 PEKPTIDE-PEPKTIDE Inter PEKPTIDE 3 ... 2 None None None [2 rows x 20 columns]
>>> from pyXLMS.exporter import to_msannika >>> from pyXLMS.data import create_csm_min >>> csm1 = create_csm_min("KPEPTIDE", 1, "PKEPTIDE", 2, "RUN_1", 1) >>> csm2 = create_csm_min("PEKPTIDE", 3, "PEPKTIDE", 4, "RUN_1", 2) >>> csms = [csm1, csm2] >>> df = to_msannika(csms, filename = "csms.csv", format = "csv")
- pyXLMS.exporter.to_pyxlinkviewer(
- crosslinks: List[Dict[str, Any]],
- pdb_file: str | BinaryIO,
- gap_open: int | float = -10.0,
- gap_extension: int | float = -1.0,
- min_sequence_identity: float = 0.8,
- allow_site_mismatch: bool = False,
- ignore_chains: List[str] = [],
- filename_prefix: str | None = None,
Exports a list of crosslinks to PyXlinkViewer format.
Exports a list of crosslinks to PyXlinkViewer format for visualization in pyMOL. The tool PyXlinkViewer is available from github.com/BobSchiffrin/PyXlinkViewer. This exporter performs basical local sequence alignment to align crosslinked peptides to a protein structure in PDB format. Gap open and gap extension penalties can be chosen as well as a threshold for sequence identity that must be satisfied in order for a match to be reported. Additionally the alignment is checked if the supposedly crosslinked residue can be modified with a crosslinker in the protein structure. Due to the alignment shift amino acids might change and a crosslink is reported at a position that is not able to react with the crosslinker. Optionally, these positions can still be reported.
- Parameters:
crosslinks (list of dict of str, any) – A list of crosslinks.
pdb_file (str, or file stream) – The name/path of the PDB file or a file-like object/stream. If a string is provided but no file is found locally, it’s assumed to be an identifier and the file is fetched from the PDB.
gap_open (int, or float, default = -10.0) – Gap open penalty for sequence alignment.
gap_extension (int, or float, default = -1.0,) – Gap extension penalty for sequence alignment.
min_sequence_identity (float, default = 0.8) – Minimum sequence identity to consider an aligned crosslinked peptide a match with its corresponding position in the protein structure. Should be given as a fraction between 0 and 1, e.g. the default of 0.8 corresponds to a minimum of 80% sequence identity.
allow_site_mismatch (bool, default = False) – If the crosslink position after alignment is not a reactive amino acid in the protein structure, should the position still be reported. By default such cases are not reported.
ignore_chains (list of str, default = empty list) – A list of chains to ignore in the protein structure.
filename_prefix (str, or None, default = None) – If not None, the exported data will be written to files with the specified filename prefix. The full list of written files can be accessed via the returned dictionary.
- Returns:
Returns a dictionary with key
PyXlinkViewer
containing the formatted text for PyXlinkViewer, with keyPyXlinkViewer DataFrame
containing the information fromPyXlinkViewer
but as a pandas DataFrame, with keyNumber of mapped crosslinks
containing the total number of mapped crosslinks, with keyMapping
containing a string that logs how crosslinks were mapped to the protein structure, with keyParsed PDB sequence
containing the protein sequence that was parsed from the PDB file, with keyParsed PDB chains
containing the parsed chains from the PDB file, with keyParsed PDB residue numbers
containing the parsed residue numbers from the PDB file, and with keyExported files
containing a list of filenames of all files that were written to disk.- Return type:
dict of str, any
- Raises:
TypeError – If a wrong data type is provided.
TypeError – If data contains elements of mixed data type.
ValueError – If parameter min_sequence_identity is out of bounds.
ValueError – If the provided data contains no elements.
Examples
>>> from pyXLMS.exporter import to_pyxlinkviewer >>> from pyXLMS.parser import read_custom >>> pr = read_custom("data/_test/exporter/pyxlinkviewer/unique_links_all_pyxlms.csv") >>> crosslinks = pr["crosslinks"] >>> pyxlinkviewer_result = to_pyxlinkviewer(crosslinks, pdb_file="6YHU", filename_prefix="6YHU") >>> pyxlinkviewer_output_file_str = pyxlinkviewer_result["PyXlinkViewer"] >>> pyxlinkviewer_dataframe = pyxlinkviewer_result["PyXlinkViewer DataFrame"] >>> nr_mapped_crosslinks = pyxlinkviewer_result["Number of mapped crosslinks"] >>> crosslink_mapping = pyxlinkviewer_result["Mapping"] >>> parsed_pdb_sequenece = pyxlinkviewer_result["Parsed PDB sequence"] >>> parsed_pdb_chains = pyxlinkviewer_result["Parsed PDB chains"] >>> parsed_pdb_residue_numbers = pyxlinkviewer_result["Parsed PDB residue numbers"] >>> exported_files = pyxlinkviewer_result["Exported files"]
- pyXLMS.exporter.to_xifdr(
- csms: List[Dict[str, Any]],
- filename: str | None,
Exports a list of crosslink-spectrum-matches to xiFDR format.
Exports a list of crosslinks to xiFDR format. The tool xiFDR is accessible via the link rappsilberlab.org/software/xifdr. Requires that
alpha_proteins
,beta_proteins
,alpha_proteins_peptide_positions
,beta_proteins_peptide_positions
,alpha_decoy
,beta_decoy
,charge
andscore
fields are set for all crosslink-spectrum-matches.- Parameters:
csms (list of dict of str, any) – A list of crosslink-spectrum-matches.
filename (str, or None) – If not None, the exported data will be written to a file with the specified filename.
- Returns:
A pandas DataFrame containing crosslink-spectrum-matches in xiFDR format.
- Return type:
pd.DataFrame
- Raises:
TypeError – If a wrong data type is provided.
TypeError – If ‘csms’ parameter contains elements of mixed data type.
ValueError – If the provided ‘csms’ parameter contains no elements.
RuntimeError – If not all of the required information is present in the input data.
Examples
>>> from pyXLMS.exporter import to_xifdr >>> from pyXLMS.parser import read >>> pr = read("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_CSMs.xlsx", engine="MS Annika", crosslinker="DSS") >>> csms = pr["crosslink-spectrum-matches"] >>> to_xifdr(csms, filename="msannika_xiFDR.csv") run scan peptide1 ... peptide position 1 peptide position 2 score 0 XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw 2257 GQKNSR ... 777 777 119.83 1 XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw 2448 GQKNSR ... 777 693 13.91 2 XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw 2561 SDKNR ... 864 864 114.43 3 XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw 2719 DKQSGK ... 676 676 200.98 4 XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw 2792 DKQSGK ... 676 45 94.47 .. ... ... ... ... ... ... ... 821 XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw 23297 MDGTEELLVKLNR ... 387 387 286.05 822 XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw 23454 KIECFDSVEISGVEDR ... 575 682 376.15 823 XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw 23581 SSFEKNPIDFLEAK ... 1176 1176 412.44 824 XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw 23683 SSFEKNPIDFLEAK ... 1176 1176 437.10 825 XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw 27087 MEDESKLHKFKDFK ... 99 1176 15.89 [826 rows x 14 columns]
>>> from pyXLMS.exporter import to_xifdr >>> from pyXLMS.parser import read >>> pr = read("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_CSMs.xlsx", engine="MS Annika", crosslinker="DSS") >>> csms = pr["crosslink-spectrum-matches"] >>> df = to_xifdr(csms, filename=None)
- pyXLMS.exporter.to_xinet(
- crosslinks: List[Dict[str, Any]],
- filename: str | None,
Exports a list of crosslinks to xiNET format.
Exports a list of crosslinks to xiNET format. The tool xiNET is accessible via the link crosslinkviewer.org. Requires that
alpha_proteins
,beta_proteins
,alpha_proteins_crosslink_positions
andbeta_proteins_crosslink_positions
fields are set for all crosslinks.- Parameters:
crosslinks (list of dict of str, any) – A list of crosslinks.
filename (str, or None) – If not None, the exported data will be written to a file with the specified filename.
- Returns:
A pandas DataFrame containing crosslinks in xiNET format.
- Return type:
pd.DataFrame
- Raises:
TypeError – If a wrong data type is provided.
TypeError – If ‘crosslinks’ parameter contains elements of mixed data type.
ValueError – If the provided ‘crosslinks’ parameter contains no elements.
RuntimeError – If not all of the required information is present in the input data.
Notes
The optional
Score
column in the xiNET table will only be available if all crosslinks have assigned scores.Examples
>>> from pyXLMS.exporter import to_xinet >>> from pyXLMS.parser import read >>> from pyXLMS.transform import targets_only >>> from pyXLMS.transform import filter_proteins >>> pr = read("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_Crosslinks.xlsx", engine="MS Annika", crosslinker="DSS") >>> crosslinks = targets_only(pr)["crosslinks"] >>> cas9 = filter_proteins(crosslinks, proteins=["Cas9"])["Both"] >>> to_xinet(cas9, filename="crosslinks_xiNET.csv") Protein1 PepPos1 PepSeq1 LinkPos1 Protein2 PepPos2 PepSeq2 LinkPos2 Score Id 0 Cas9 777 GQKNSR 3 Cas9 777 GQKNSR 3 119.83 1 1 Cas9 864 SDKNR 3 Cas9 864 SDKNR 3 114.43 2 2 Cas9 676 DKQSGK 2 Cas9 676 DKQSGK 2 200.98 3 3 Cas9 676 DKQSGK 2 Cas9 45 HSIKK 4 94.47 4 4 Cas9 31 VPSKK 4 Cas9 31 VPSKK 4 110.48 5 .. ... ... ... ... ... ... ... ... ... ... 248 Cas9 387 MDGTEELLVKLNR 10 Cas9 387 MDGTEELLVKLNR 10 305.63 249 249 Cas9 682 TILDFLKSDGFANR 7 Cas9 947 YDENDKLIR 6 110.46 250 250 Cas9 788 IEEGIKELGSQILK 6 Cas9 1176 SSFEKNPIDFLEAK 5 288.36 251 251 Cas9 575 KIECFDSVEISGVEDR 1 Cas9 682 TILDFLKSDGFANR 7 376.15 252 252 Cas9 1176 SSFEKNPIDFLEAK 5 Cas9 1176 SSFEKNPIDFLEAK 5 437.10 253 [253 rows x 10 columns]
>>> from pyXLMS.exporter import to_xinet >>> from pyXLMS.parser import read >>> from pyXLMS.transform import targets_only >>> from pyXLMS.transform import filter_proteins >>> pr = read("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_Crosslinks.xlsx", engine="MS Annika", crosslinker="DSS") >>> crosslinks = targets_only(pr)["crosslinks"] >>> cas9 = filter_proteins(crosslinks, proteins=["Cas9"])["Both"] >>> df = to_xinet(cas9, filename=None)
- pyXLMS.exporter.to_xiview(
- crosslinks: List[Dict[str, Any]],
- filename: str | None,
- minimal: bool = True,
Exports a list of crosslinks to xiVIEW format.
Exports a list of crosslinks to xiVIEW format. The tool xiVIEW is accessible via the link xiview.org/. Requires that
alpha_proteins
,beta_proteins
,alpha_proteins_crosslink_positions
andbeta_proteins_crosslink_positions
fields are set for all crosslinks.- Parameters:
crosslinks (list of dict of str, any) – A list of crosslinks.
filename (str, or None) – If not None, the exported data will be written to a file with the specified filename.
minimal (bool, default = True) –
Which xiVIEW format to return, if
minimal = True
the minimal xiVIEW format is returned. Otherwise the “CSV without peak lists” format is returned (internally this just callsexporter.to_xinet()
). For more information on the xiVIEW formats please refer to the xiVIEW specification.
- Returns:
A pandas DataFrame containing crosslinks in xiVIEW format.
- Return type:
pd.DataFrame
- Raises:
TypeError – If a wrong data type is provided.
TypeError – If ‘crosslinks’ parameter contains elements of mixed data type.
ValueError – If the provided ‘crosslinks’ parameter contains no elements.
RuntimeError – If not all of the required information is present in the input data.
Notes
The optional
Score
column in the xiVIEW table will only be available if all crosslinks have assigned scores, the optionalDecoy*
columns will only be available if all crosslinks have assigned target and decoy labels.Examples
>>> from pyXLMS.exporter import to_xiview >>> from pyXLMS.parser import read >>> from pyXLMS.transform import targets_only >>> from pyXLMS.transform import filter_proteins >>> pr = read("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_Crosslinks.xlsx", engine="MS Annika", crosslinker="DSS") >>> crosslinks = targets_only(pr)["crosslinks"] >>> cas9 = filter_proteins(crosslinks, proteins=["Cas9"])["Both"] >>> to_xiview(cas9, filename="crosslinks_xiVIEW.csv") AbsPos1 AbsPos2 Protein1 Protein2 Decoy1 Decoy2 Score 0 779 779 Cas9 Cas9 FALSE FALSE 119.83 1 866 866 Cas9 Cas9 FALSE FALSE 114.43 2 677 677 Cas9 Cas9 FALSE FALSE 200.98 3 677 48 Cas9 Cas9 FALSE FALSE 94.47 4 34 34 Cas9 Cas9 FALSE FALSE 110.48 .. ... ... ... ... ... ... ... 248 396 396 Cas9 Cas9 FALSE FALSE 305.63 249 688 952 Cas9 Cas9 FALSE FALSE 110.46 250 793 1180 Cas9 Cas9 FALSE FALSE 288.36 251 575 688 Cas9 Cas9 FALSE FALSE 376.15 252 1180 1180 Cas9 Cas9 FALSE FALSE 437.10 [253 rows x 7 columns]
>>> from pyXLMS.exporter import to_xiview >>> from pyXLMS.parser import read >>> from pyXLMS.transform import targets_only >>> from pyXLMS.transform import filter_proteins >>> pr = read("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_Crosslinks.xlsx", engine="MS Annika", crosslinker="DSS") >>> crosslinks = targets_only(pr)["crosslinks"] >>> cas9 = filter_proteins(crosslinks, proteins=["Cas9"])["Both"] >>> df = to_xiview(cas9, filename=None)
>>> from pyXLMS.exporter import to_xiview >>> from pyXLMS.parser import read >>> from pyXLMS.transform import targets_only >>> from pyXLMS.transform import filter_proteins >>> pr = read("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_Crosslinks.xlsx", engine="MS Annika", crosslinker="DSS") >>> crosslinks = targets_only(pr)["crosslinks"] >>> cas9 = filter_proteins(crosslinks, proteins=["Cas9"])["Both"] >>> to_xiview(cas9, filename="crosslinks_xiVIEW.csv", minimal=False) Protein1 PepPos1 PepSeq1 LinkPos1 Protein2 PepPos2 PepSeq2 LinkPos2 Score Id 0 Cas9 777 GQKNSR 3 Cas9 777 GQKNSR 3 119.83 1 1 Cas9 864 SDKNR 3 Cas9 864 SDKNR 3 114.43 2 2 Cas9 676 DKQSGK 2 Cas9 676 DKQSGK 2 200.98 3 3 Cas9 676 DKQSGK 2 Cas9 45 HSIKK 4 94.47 4 4 Cas9 31 VPSKK 4 Cas9 31 VPSKK 4 110.48 5 .. ... ... ... ... ... ... ... ... ... ... 248 Cas9 387 MDGTEELLVKLNR 10 Cas9 387 MDGTEELLVKLNR 10 305.63 249 249 Cas9 682 TILDFLKSDGFANR 7 Cas9 947 YDENDKLIR 6 110.46 250 250 Cas9 788 IEEGIKELGSQILK 6 Cas9 1176 SSFEKNPIDFLEAK 5 288.36 251 251 Cas9 575 KIECFDSVEISGVEDR 1 Cas9 682 TILDFLKSDGFANR 7 376.15 252 252 Cas9 1176 SSFEKNPIDFLEAK 5 Cas9 1176 SSFEKNPIDFLEAK 5 437.10 253 [253 rows x 10 columns]
- pyXLMS.exporter.to_xlinkdb(
- crosslinks: List[Dict[str, Any]],
- filename: str | None,
Exports a list of crosslinks to XlinkDB format.
Exports a list of crosslinks to XlinkDB format. The tool XlinkDB is accessible via the link xlinkdb.gs.washington.edu/xlinkdb. Requires that
alpha_proteins
andbeta_proteins
fields are set for all crosslinks.- Parameters:
crosslinks (list of dict of str, any) – A list of crosslinks.
filename (str, or None) – If not None, the exported data will be written to a file with the specified filename. The filename should not contain a file extension and consist only of alpha-numeric characters (a-Z, 0-9).
- Returns:
A pandas DataFrame containing crosslinks in XlinkDB format.
- Return type:
pd.DataFrame
- Raises:
TypeError – If a wrong data type is provided.
TypeError – If ‘crosslinks’ parameter contains elements of mixed data type.
ValueError – If the filename contains any non-alpha-numeric characters.
ValueError – If the provided ‘crosslinks’ parameter contains no elements.
RuntimeError – If not all of the required information is present in the input data.
Notes
XlinkDB input format requires a column with probabilities that the crosslinks are correct. Since that is not available from most crosslink search engines, this is simply set to a constant
1
.Examples
>>> from pyXLMS.exporter import to_xlinkdb >>> from pyXLMS.parser import read >>> pr = read("data/xi/1perc_xl_boost_Links_xiFDR2.2.1.csv", engine="xiSearch/xiFDR", crosslinker="DSS") >>> crosslinks = pr["crosslinks"] >>> to_xlinkdb(crosslinks, filename="crosslinksForXlinkDB") Peptide A Protein A Labeled Position A Peptide B Protein B Labeled Position B Probability 0 VVDELVKVMGR Cas9 6 VVDELVKVMGR Cas9 6 1 1 MLASAGELQKGNELALPSK Cas9 9 VVDELVKVMGR Cas9 6 1 2 MDGTEELLVKLNR Cas9 9 MDGTEELLVKLNR Cas9 9 1 3 MTNFDKNLPNEK Cas9 5 SKLVSDFR Cas9 1 1 4 DFQFYKVR Cas9 5 MIAKSEQEIGK Cas9 3 1 .. ... ... ... ... ... ... ... 222 LPKYSLFELENGR Cas9 2 SDKNR Cas9 2 1 223 DKQSGK Cas9 1 DKQSGK Cas9 1 1 224 AGFIKR Cas9 4 SDNVPSEEVVKK Cas9 10 1 225 EKIEK Cas9 1 KVTVK Cas9 0 1 226 LSKSR Cas9 2 LSKSR Cas9 2 1 [227 rows x 7 columns]
>>> from pyXLMS.exporter import to_xlinkdb >>> from pyXLMS.parser import read >>> pr = read("data/xi/1perc_xl_boost_Links_xiFDR2.2.1.csv", engine="xiSearch/xiFDR", crosslinker="DSS") >>> crosslinks = pr["crosslinks"] >>> df = to_xlinkdb(crosslinks, filename=None)
- pyXLMS.exporter.to_xlmstools(
- crosslinks: List[Dict[str, Any]],
- pdb_file: str | BinaryIO,
- gap_open: int | float = -10.0,
- gap_extension: int | float = -1.0,
- min_sequence_identity: float = 0.8,
- allow_site_mismatch: bool = False,
- ignore_chains: List[str] = [],
- filename_prefix: str | None = None,
Exports a list of crosslinks to xlms-tools format.
Exports a list of crosslinks to xlms-tools format for protein structure analysis. The python package xlms-tools is available from gitlab.com/topf-lab/xlms-tools. This exporter performs basical local sequence alignment to align crosslinked peptides to a protein structure in PDB format. Gap open and gap extension penalties can be chosen as well as a threshold for sequence identity that must be satisfied in order for a match to be reported. Additionally the alignment is checked if the supposedly crosslinked residue can be modified with a crosslinker in the protein structure. Due to the alignment shift amino acids might change and a crosslink is reported at a position that is not able to react with the crosslinker. Optionally, these positions can still be reported.
- Parameters:
crosslinks (list of dict of str, any) – A list of crosslinks.
pdb_file (str, or file stream) – The name/path of the PDB file or a file-like object/stream. If a string is provided but no file is found locally, it’s assumed to be an identifier and the file is fetched from the PDB.
gap_open (int, or float, default = -10.0) – Gap open penalty for sequence alignment.
gap_extension (int, or float, default = -1.0,) – Gap extension penalty for sequence alignment.
min_sequence_identity (float, default = 0.8) – Minimum sequence identity to consider an aligned crosslinked peptide a match with its corresponding position in the protein structure. Should be given as a fraction between 0 and 1, e.g. the default of 0.8 corresponds to a minimum of 80% sequence identity.
allow_site_mismatch (bool, default = False) – If the crosslink position after alignment is not a reactive amino acid in the protein structure, should the position still be reported. By default such cases are not reported.
ignore_chains (list of str, default = empty list) – A list of chains to ignore in the protein structure.
filename_prefix (str, or None, default = None) – If not None, the exported data will be written to files with the specified filename prefix. The full list of written files can be accessed via the returned dictionary.
- Returns:
Returns a dictionary with key
xlms-tools
containing the formatted text for xlms-tools, with keyxlms-tools DataFrame
containing the information fromxlms-tools
but as a pandas DataFrame, with keyNumber of mapped crosslinks
containing the total number of mapped crosslinks, with keyMapping
containing a string that logs how crosslinks were mapped to the protein structure, with keyParsed PDB sequence
containing the protein sequence that was parsed from the PDB file, with keyParsed PDB chains
containing the parsed chains from the PDB file, with keyParsed PDB residue numbers
containing the parsed residue numbers from the PDB file, and with keyExported files
containing a list of filenames of all files that were written to disk.- Return type:
dict of str, any
- Raises:
TypeError – If a wrong data type is provided.
TypeError – If data contains elements of mixed data type.
ValueError – If parameter min_sequence_identity is out of bounds.
ValueError – If the provided data contains no elements.
Notes
Internally this exporter just calls
exporter.to_pyxlinkviewer()
and re-writes some of the files since the two tools share the same input file structure.Examples
>>> from pyXLMS.exporter import to_xlmstools >>> from pyXLMS.parser import read_custom >>> pr = read_custom("data/_test/exporter/xlms-tools/unique_links_all_pyxlms.csv") >>> crosslinks = pr["crosslinks"] >>> xlmstools_result = to_xlmstools(crosslinks, pdb_file="6YHU", filename_prefix="6YHU") >>> xlmstools_output_file_str = xlmstools_result["xlms-tools"] >>> xlmstools_dataframe = xlmstools_result["xlms-tools DataFrame"] >>> nr_mapped_crosslinks = xlmstools_result["Number of mapped crosslinks"] >>> crosslink_mapping = xlmstools_result["Mapping"] >>> parsed_pdb_sequenece = xlmstools_result["Parsed PDB sequence"] >>> parsed_pdb_chains = xlmstools_result["Parsed PDB chains"] >>> parsed_pdb_residue_numbers = xlmstools_result["Parsed PDB residue numbers"] >>> exported_files = xlmstools_result["Exported files"]
- pyXLMS.exporter.to_xmas(
- crosslinks: List[Dict[str, Any]],
- filename: str | None,
Exports a list of crosslinks to XMAS format.
Exports a list of crosslinks to XMAS format for visualization in ChimeraX. The tool XMAS is available from github.com/ScheltemaLab/ChimeraX_XMAS_bundle.
- Parameters:
crosslinks (list of dict of str, any) – A list of crosslinks.
filename (str, or None) – If not None, the exported data will be written to a file with the specified filename.
- Returns:
A pandas DataFrame containing crosslinks in XMAS format.
- Return type:
pd.DataFrame
- Raises:
TypeError – If a wrong data type is provided.
TypeError – If ‘crosslinks’ parameter contains elements of mixed data type.
ValueError – If the provided ‘crosslinks’ parameter contains no elements.
Examples
>>> from pyXLMS.exporter import to_xmas >>> from pyXLMS.data import create_crosslink_min >>> xl1 = create_crosslink_min("KPEPTIDE", 1, "PKEPTIDE", 2) >>> xl2 = create_crosslink_min("PEKPTIDE", 3, "PEPKTIDE", 4) >>> crosslinks = [xl1, xl2] >>> to_xmas(crosslinks, filename="crosslinks_xmas.xlsx") Sequence A Sequence B 0 [K]PEPTIDE P[K]EPTIDE 1 PE[K]PTIDE PEP[K]TIDE
>>> from pyXLMS.exporter import to_xmas >>> from pyXLMS.data import create_crosslink_min >>> xl1 = create_crosslink_min("KPEPTIDE", 1, "PKEPTIDE", 2) >>> xl2 = create_crosslink_min("PEKPTIDE", 3, "PEPKTIDE", 4) >>> crosslinks = [xl1, xl2] >>> to_xmas(crosslinks, filename=None) Sequence A Sequence B 0 [K]PEPTIDE P[K]EPTIDE 1 PE[K]PTIDE PEP[K]TIDE