pyXLMS.exporter package#

Submodules#

pyXLMS.exporter.to_impxfdr module#

pyXLMS.exporter.to_impxfdr.to_impxfdr(
data: List[Dict[str, Any]],
filename: str | None,
targets_only: bool = True,
) DataFrame[source]#

Exports a list of crosslinks or crosslink-spectrum-matches to IMP-X-FDR format.

Exports a list of crosslinks or crosslink-spectrum-matches to IMP-X-FDR format for benchmarking purposes. The tool IMP-X-FDR is available from github.com/vbc-proteomics-org/imp-x-fdr. We recommend using version 1.1.0 and selecting “MS Annika” as input file format for the here exported file. A slightly modified version is available from github.com/hgb-bin-proteomics/MSAnnika_NC_Results. This version contains a few bug fixes and was used for the MS Annika 2.0 and MS Annika 3.0 publications. Requires that alpha_proteins, beta_proteins, alpha_proteins_crosslink_positions and beta_proteins_crosslink_positions fields are set for crosslinks and crosslink-spectrum-matches.

Parameters:
  • data (list of dict of str, any) – A list of crosslinks or crosslink-spectrum-matches.

  • filename (str, or None, default = None) – If not None, the exported data will be written to a file with the specified filename. The filename should end in “.xlsx” as the file is exported to Microsoft Excel file format.

  • targets_only (bool, default = True) – Whether or not only target crosslinks or crosslink-spectrum-matches should be exported. For benchmarking purposes this is usually the case. If the crosslinks or crosslink-spectrum-matches do not contain target-decoy labels this should be set to False.

Returns:

A pandas DataFrame containing crosslinks or crosslink-spectrum-matches in IMP-X-FDR format.

Return type:

pd.DataFrame

Raises:
  • TypeError – If a wrong data type is provided.

  • TypeError – If data contains elements of mixed data type.

  • ValueError – If the provided data contains no elements or if none of the data has target-decoy labels and parameter ‘targets_only’ is set to True.

  • RuntimeError – If not all of the required information is present in the input data.

Examples

>>> from pyXLMS.exporter import to_impxfdr
>>> from pyXLMS.parser import read
>>> pr = read("data/xi/1perc_xl_boost_Links_xiFDR2.2.1.csv", engine="xiSearch/xiFDR", crosslinker="DSS")
>>> crosslinks = pr["crosslinks"]
>>> to_impxfdr(crosslinks, filename="crosslinks.xlsx")
    Crosslink Type             Sequence A  Position A Accession A In protein A  ... Position B  Accession B In protein B Best CSM Score  Decoy
0            Intra          VVDELV[K]VMGR           7        Cas9          753  ...          7         Cas9          753         40.679  False
1            Intra  MLASAGELQ[K]GNELALPSK          10        Cas9          753  ...          7         Cas9         1226         40.231  False
2            Intra        MDGTEELLV[K]LNR          10        Cas9          396  ...         10         Cas9          396         39.582  False
3            Intra         MTNFD[K]NLPNEK           6        Cas9          965  ...          2         Cas9          504         35.880  False
4            Intra             DFQFY[K]VR           6        Cas9          978  ...          4         Cas9         1028         35.281  False
..             ...                    ...         ...         ...          ...  ...        ...          ...          ...            ...    ...
220          Intra        LP[K]YSLFELENGR           3        Cas9          866  ...          3         Cas9         1204          9.877  False
221          Intra               D[K]QSGK           2        Cas9          677  ...          2         Cas9          677          9.702  False
222          Intra               AGFI[K]R           5        Cas9          922  ...         11         Cas9          881          9.666  False
223          Intra                E[K]IEK           2        Cas9          443  ...          1         Cas9          562          9.656  False
224          Intra                LS[K]SR           3        Cas9          222  ...          3         Cas9          222          9.619  False
[225 rows x 11 columns]
>>> from pyXLMS.exporter import to_impxfdr
>>> from pyXLMS.parser import read
>>> pr = read("data/xi/1perc_xl_boost_CSM_xiFDR2.2.1.csv", engine="xiSearch/xiFDR", crosslinker="DSS")
>>> csms = pr["crosslink-spectrum-matches"]
>>> to_impxfdr(csms, filename="csms.xlsx")
    Crosslink Type          Sequence A  Position A Accession A In protein A  ... Position B  Accession B In protein B Best CSM Score  Decoy
0            Intra  [K]IECFDSVEISGVEDR           1        Cas9          575  ...          1         Cas9          575         27.268  False
1            Intra       LVDSTD[K]ADLR           7        Cas9          152  ...         11         Cas9          881         26.437  False
2            Intra     GGLSELD[K]AGFIK           8        Cas9          917  ...          8         Cas9          917         26.134  False
3            Intra       LVDSTD[K]ADLR           7        Cas9          152  ...          7         Cas9          152         25.804  False
4            Intra       VVDELV[K]VMGR           7        Cas9          753  ...          7         Cas9          753         24.861  False
..             ...                 ...         ...         ...          ...  ...        ...          ...          ...            ...    ...
406          Intra          [K]GILQTVK           1        Cas9          739  ...          3         Cas9          222          6.977  False
407          Intra          QQLPE[K]YK           6        Cas9          350  ...          6         Cas9          350          6.919  False
408          Intra           ESILP[K]R           6        Cas9         1117  ...          7         Cas9         1035          6.853  False
409          Intra             LS[K]SR           3        Cas9          222  ...          2         Cas9          884          6.809  False
410          Intra     QIT[K]HVAQILDSR           4        Cas9          933  ...          6         Cas9          350          6.808  False
[411 rows x 11 columns]

pyXLMS.exporter.to_msannika module#

Returns the crosslinked peptide sequence in MS Annika format.

Returns the crosslinked peptide sequence in MS Annika format, which is the peptide amino acid sequence with the crosslinked residue in square brackets (see examples).

Parameters:
  • peptide (str) – The (unmodified) amino acid sequence of the peptide.

  • crosslink_position (int) – Position of the crosslinker in the peptide sequence (1-based).

Returns:

The crosslinked peptide sequence in MS Annika format.

Return type:

str

Raises:

ValueError – If the crosslink position is outside the peptide’s length.

Examples

>>> from pyXLMS.exporter import get_msannika_crosslink_sequence
>>> get_msannika_crosslink_sequence("PEPKTIDE", 4)
'PEP[K]TIDE'
>>> from pyXLMS.exporter import get_msannika_crosslink_sequence
>>> get_msannika_crosslink_sequence("KPEPTIDE", 1)
'[K]PEPTIDE'
>>> from pyXLMS.exporter import get_msannika_crosslink_sequence
>>> get_msannika_crosslink_sequence("PEPTIDEK", 8)
'PEPTIDE[K]'
pyXLMS.exporter.to_msannika.to_msannika(
data: List[Dict[str, Any]],
filename: str | None = None,
format: Literal['csv', 'tsv', 'xlsx'] = 'csv',
) DataFrame[source]#

Exports a list of crosslinks or crosslink-spectrum-matches to MS Annika format.

Exports a list of crosslinks or crosslink-spectrum-matches to MS Annika format. This might be useful for tools that support MS Annika input but are not supported by pyXLMS (yet).

Parameters:
  • data (list of dict of str, any) – A list of crosslinks or crosslink-spectrum-matches.

  • filename (str, or None, default = None) – If not None, the exported data will be written to a file with the specified filename.

  • format (str, one of "csv", "tsv", or "xlsx", default = "csv") – File format of the exported file if filename is not None.

Returns:

A pandas DataFrame containing crosslinks or crosslink-spectrum-matches in MS Annika format.

Return type:

pd.DataFrame

Raises:
  • TypeError – If a wrong data type is provided.

  • TypeError – If data contains elements of mixed data type.

  • TypeError – If parameter format is not one of ‘csv’, ‘tsv’ or ‘xlsx’.

  • ValueError – If the provided data contains no elements.

Warning

The MS Annika exporter will not check if all necessary information is available for the exported crosslinks or crosslink-spectrum-matches. If a value is not available it will be denoted as a missing value in the dataframe and exported file. Please make sure all necessary information is available before using the exported file with another tool! Please also note that modifications are not exported, for modification down-stream analysis please refer to transform.to_proforma() or transform.to_dataframe()!

Examples

>>> from pyXLMS.exporter import to_msannika
>>> from pyXLMS.data import create_crosslink_min
>>> xl1 = create_crosslink_min("KPEPTIDE", 1, "PKEPTIDE", 2)
>>> xl2 = create_crosslink_min("PEKPTIDE", 3, "PEPKTIDE", 4)
>>> crosslinks = [xl1, xl2]
>>> to_msannika(crosslinks)
  Crosslink Type  Sequence A  Position A Accession A In protein A  Sequence B  Position B Accession B In protein B Best CSM Score Decoy
0          Inter  [K]PEPTIDE           1        None         None  P[K]EPTIDE           2        None         None           None  None
1          Inter  PE[K]PTIDE           3        None         None  PEP[K]TIDE           4        None         None           None  None
>>> from pyXLMS.exporter import to_msannika
>>> from pyXLMS.data import create_crosslink_min
>>> xl1 = create_crosslink_min("KPEPTIDE", 1, "PKEPTIDE", 2)
>>> xl2 = create_crosslink_min("PEKPTIDE", 3, "PEPKTIDE", 4)
>>> crosslinks = [xl1, xl2]
>>> df = to_msannika(crosslinks, filename = "crosslinks.csv", format = "csv")
>>> from pyXLMS.exporter import to_msannika
>>> from pyXLMS.data import create_csm_min
>>> csm1 = create_csm_min("KPEPTIDE", 1, "PKEPTIDE", 2, "RUN_1", 1)
>>> csm2 = create_csm_min("PEKPTIDE", 3, "PEPKTIDE", 4, "RUN_1", 2)
>>> csms = [csm1, csm2]
>>> to_msannika(csms)
            Sequence Crosslink Type Sequence A  Crosslinker Position A  ... First Scan Charge RT [min] Compensation Voltage
0  KPEPTIDE-PKEPTIDE          Inter   KPEPTIDE                       1  ...          1   None     None                 None
1  PEKPTIDE-PEPKTIDE          Inter   PEKPTIDE                       3  ...          2   None     None                 None
[2 rows x 20 columns]
>>> from pyXLMS.exporter import to_msannika
>>> from pyXLMS.data import create_csm_min
>>> csm1 = create_csm_min("KPEPTIDE", 1, "PKEPTIDE", 2, "RUN_1", 1)
>>> csm2 = create_csm_min("PEKPTIDE", 3, "PEPKTIDE", 4, "RUN_1", 2)
>>> csms = [csm1, csm2]
>>> df = to_msannika(csms, filename = "csms.csv", format = "csv")

pyXLMS.exporter.to_pyxlinkviewer module#

pyXLMS.exporter.to_pyxlinkviewer.to_pyxlinkviewer(
crosslinks: List[Dict[str, Any]],
pdb_file: str | BinaryIO,
gap_open: int | float = -10.0,
gap_extension: int | float = -1.0,
min_sequence_identity: float = 0.8,
allow_site_mismatch: bool = False,
ignore_chains: List[str] = [],
filename_prefix: str | None = None,
) Dict[str, Any][source]#

Exports a list of crosslinks to PyXlinkViewer format.

Exports a list of crosslinks to PyXlinkViewer format for visualization in pyMOL. The tool PyXlinkViewer is available from github.com/BobSchiffrin/PyXlinkViewer. This exporter performs basical local sequence alignment to align crosslinked peptides to a protein structure in PDB format. Gap open and gap extension penalties can be chosen as well as a threshold for sequence identity that must be satisfied in order for a match to be reported. Additionally the alignment is checked if the supposedly crosslinked residue can be modified with a crosslinker in the protein structure. Due to the alignment shift amino acids might change and a crosslink is reported at a position that is not able to react with the crosslinker. Optionally, these positions can still be reported.

Parameters:
  • crosslinks (list of dict of str, any) – A list of crosslinks.

  • pdb_file (str, or file stream) – The name/path of the PDB file or a file-like object/stream. If a string is provided but no file is found locally, it’s assumed to be an identifier and the file is fetched from the PDB.

  • gap_open (int, or float, default = -10.0) – Gap open penalty for sequence alignment.

  • gap_extension (int, or float, default = -1.0,) – Gap extension penalty for sequence alignment.

  • min_sequence_identity (float, default = 0.8) – Minimum sequence identity to consider an aligned crosslinked peptide a match with its corresponding position in the protein structure. Should be given as a fraction between 0 and 1, e.g. the default of 0.8 corresponds to a minimum of 80% sequence identity.

  • allow_site_mismatch (bool, default = False) – If the crosslink position after alignment is not a reactive amino acid in the protein structure, should the position still be reported. By default such cases are not reported.

  • ignore_chains (list of str, default = empty list) – A list of chains to ignore in the protein structure.

  • filename_prefix (str, or None, default = None) – If not None, the exported data will be written to files with the specified filename prefix. The full list of written files can be accessed via the returned dictionary.

Returns:

Returns a dictionary with key PyXlinkViewer containing the formatted text for PyXlinkViewer, with key PyXlinkViewer DataFrame containing the information from PyXlinkViewer but as a pandas DataFrame, with key Number of mapped crosslinks containing the total number of mapped crosslinks, with key Mapping containing a string that logs how crosslinks were mapped to the protein structure, with key Parsed PDB sequence containing the protein sequence that was parsed from the PDB file, with key Parsed PDB chains containing the parsed chains from the PDB file, with key Parsed PDB residue numbers containing the parsed residue numbers from the PDB file, and with key Exported files containing a list of filenames of all files that were written to disk.

Return type:

dict of str, any

Raises:
  • TypeError – If a wrong data type is provided.

  • TypeError – If data contains elements of mixed data type.

  • ValueError – If parameter min_sequence_identity is out of bounds.

  • ValueError – If the provided data contains no elements.

Examples

>>> from pyXLMS.exporter import to_pyxlinkviewer
>>> from pyXLMS.parser import read_custom
>>> pr = read_custom("data/_test/exporter/pyxlinkviewer/unique_links_all_pyxlms.csv")
>>> crosslinks = pr["crosslinks"]
>>> pyxlinkviewer_result = to_pyxlinkviewer(crosslinks, pdb_file="6YHU", filename_prefix="6YHU")
>>> pyxlinkviewer_output_file_str = pyxlinkviewer_result["PyXlinkViewer"]
>>> pyxlinkviewer_dataframe = pyxlinkviewer_result["PyXlinkViewer DataFrame"]
>>> nr_mapped_crosslinks = pyxlinkviewer_result["Number of mapped crosslinks"]
>>> crosslink_mapping = pyxlinkviewer_result["Mapping"]
>>> parsed_pdb_sequenece = pyxlinkviewer_result["Parsed PDB sequence"]
>>> parsed_pdb_chains = pyxlinkviewer_result["Parsed PDB chains"]
>>> parsed_pdb_residue_numbers = pyxlinkviewer_result["Parsed PDB residue numbers"]
>>> exported_files = pyxlinkviewer_result["Exported files"]

pyXLMS.exporter.to_xifdr module#

pyXLMS.exporter.to_xifdr.to_xifdr(
csms: List[Dict[str, Any]],
filename: str | None,
) DataFrame[source]#

Exports a list of crosslink-spectrum-matches to xiFDR format.

Exports a list of crosslinks to xiFDR format. The tool xiFDR is accessible via the link rappsilberlab.org/software/xifdr. Requires that alpha_proteins, beta_proteins, alpha_proteins_peptide_positions, beta_proteins_peptide_positions, alpha_decoy, beta_decoy, charge and score fields are set for all crosslink-spectrum-matches.

Parameters:
  • csms (list of dict of str, any) – A list of crosslink-spectrum-matches.

  • filename (str, or None) – If not None, the exported data will be written to a file with the specified filename.

Returns:

A pandas DataFrame containing crosslink-spectrum-matches in xiFDR format.

Return type:

pd.DataFrame

Raises:
  • TypeError – If a wrong data type is provided.

  • TypeError – If ‘csms’ parameter contains elements of mixed data type.

  • ValueError – If the provided ‘csms’ parameter contains no elements.

  • RuntimeError – If not all of the required information is present in the input data.

Examples

>>> from pyXLMS.exporter import to_xifdr
>>> from pyXLMS.parser import read
>>> pr = read("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_CSMs.xlsx", engine="MS Annika", crosslinker="DSS")
>>> csms = pr["crosslink-spectrum-matches"]
>>> to_xifdr(csms, filename="msannika_xiFDR.csv")
                                       run   scan          peptide1  ... peptide position 1  peptide position 2   score
0    XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw   2257            GQKNSR  ...                777                 777  119.83
1    XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw   2448            GQKNSR  ...                777                 693   13.91
2    XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw   2561             SDKNR  ...                864                 864  114.43
3    XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw   2719            DKQSGK  ...                676                 676  200.98
4    XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw   2792            DKQSGK  ...                676                  45   94.47
..                                     ...    ...               ...  ...                ...                 ...     ...
821  XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw  23297     MDGTEELLVKLNR  ...                387                 387  286.05
822  XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw  23454  KIECFDSVEISGVEDR  ...                575                 682  376.15
823  XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw  23581    SSFEKNPIDFLEAK  ...               1176                1176  412.44
824  XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw  23683    SSFEKNPIDFLEAK  ...               1176                1176  437.10
825  XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw  27087    MEDESKLHKFKDFK  ...                 99                1176   15.89
[826 rows x 14 columns]
>>> from pyXLMS.exporter import to_xifdr
>>> from pyXLMS.parser import read
>>> pr = read("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_CSMs.xlsx", engine="MS Annika", crosslinker="DSS")
>>> csms = pr["crosslink-spectrum-matches"]
>>> df = to_xifdr(csms, filename=None)

pyXLMS.exporter.to_xinet module#

pyXLMS.exporter.to_xinet.to_xinet(
crosslinks: List[Dict[str, Any]],
filename: str | None,
) DataFrame[source]#

Exports a list of crosslinks to xiNET format.

Exports a list of crosslinks to xiNET format. The tool xiNET is accessible via the link crosslinkviewer.org. Requires that alpha_proteins, beta_proteins, alpha_proteins_crosslink_positions and beta_proteins_crosslink_positions fields are set for all crosslinks.

Parameters:
  • crosslinks (list of dict of str, any) – A list of crosslinks.

  • filename (str, or None) – If not None, the exported data will be written to a file with the specified filename.

Returns:

A pandas DataFrame containing crosslinks in xiNET format.

Return type:

pd.DataFrame

Raises:
  • TypeError – If a wrong data type is provided.

  • TypeError – If ‘crosslinks’ parameter contains elements of mixed data type.

  • ValueError – If the provided ‘crosslinks’ parameter contains no elements.

  • RuntimeError – If not all of the required information is present in the input data.

Notes

The optional Score column in the xiNET table will only be available if all crosslinks have assigned scores.

Examples

>>> from pyXLMS.exporter import to_xinet
>>> from pyXLMS.parser import read
>>> from pyXLMS.transform import targets_only
>>> from pyXLMS.transform import filter_proteins
>>> pr = read("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_Crosslinks.xlsx", engine="MS Annika", crosslinker="DSS")
>>> crosslinks = targets_only(pr)["crosslinks"]
>>> cas9 = filter_proteins(crosslinks, proteins=["Cas9"])["Both"]
>>> to_xinet(cas9, filename="crosslinks_xiNET.csv")
    Protein1 PepPos1           PepSeq1  LinkPos1 Protein2 PepPos2         PepSeq2  LinkPos2   Score   Id
0       Cas9     777            GQKNSR         3     Cas9     777          GQKNSR         3  119.83    1
1       Cas9     864             SDKNR         3     Cas9     864           SDKNR         3  114.43    2
2       Cas9     676            DKQSGK         2     Cas9     676          DKQSGK         2  200.98    3
3       Cas9     676            DKQSGK         2     Cas9      45           HSIKK         4   94.47    4
4       Cas9      31             VPSKK         4     Cas9      31           VPSKK         4  110.48    5
..       ...     ...               ...       ...      ...     ...             ...       ...     ...  ...
248     Cas9     387     MDGTEELLVKLNR        10     Cas9     387   MDGTEELLVKLNR        10  305.63  249
249     Cas9     682    TILDFLKSDGFANR         7     Cas9     947       YDENDKLIR         6  110.46  250
250     Cas9     788    IEEGIKELGSQILK         6     Cas9    1176  SSFEKNPIDFLEAK         5  288.36  251
251     Cas9     575  KIECFDSVEISGVEDR         1     Cas9     682  TILDFLKSDGFANR         7  376.15  252
252     Cas9    1176    SSFEKNPIDFLEAK         5     Cas9    1176  SSFEKNPIDFLEAK         5  437.10  253
[253 rows x 10 columns]
>>> from pyXLMS.exporter import to_xinet
>>> from pyXLMS.parser import read
>>> from pyXLMS.transform import targets_only
>>> from pyXLMS.transform import filter_proteins
>>> pr = read("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_Crosslinks.xlsx", engine="MS Annika", crosslinker="DSS")
>>> crosslinks = targets_only(pr)["crosslinks"]
>>> cas9 = filter_proteins(crosslinks, proteins=["Cas9"])["Both"]
>>> df = to_xinet(cas9, filename=None)

pyXLMS.exporter.to_xiview module#

pyXLMS.exporter.to_xiview.to_xiview(
crosslinks: List[Dict[str, Any]],
filename: str | None,
minimal: bool = True,
) DataFrame[source]#

Exports a list of crosslinks to xiVIEW format.

Exports a list of crosslinks to xiVIEW format. The tool xiVIEW is accessible via the link xiview.org/. Requires that alpha_proteins, beta_proteins, alpha_proteins_crosslink_positions and beta_proteins_crosslink_positions fields are set for all crosslinks.

Parameters:
  • crosslinks (list of dict of str, any) – A list of crosslinks.

  • filename (str, or None) – If not None, the exported data will be written to a file with the specified filename.

  • minimal (bool, default = True) – Which xiVIEW format to return, if minimal = True the minimal xiVIEW format is returned. Otherwise the “CSV without peak lists” format is returned (internally this just calls exporter.to_xinet()). For more information on the xiVIEW formats please refer to the xiVIEW specification.

Returns:

A pandas DataFrame containing crosslinks in xiVIEW format.

Return type:

pd.DataFrame

Raises:
  • TypeError – If a wrong data type is provided.

  • TypeError – If ‘crosslinks’ parameter contains elements of mixed data type.

  • ValueError – If the provided ‘crosslinks’ parameter contains no elements.

  • RuntimeError – If not all of the required information is present in the input data.

Notes

The optional Score column in the xiVIEW table will only be available if all crosslinks have assigned scores, the optional Decoy* columns will only be available if all crosslinks have assigned target and decoy labels.

Examples

>>> from pyXLMS.exporter import to_xiview
>>> from pyXLMS.parser import read
>>> from pyXLMS.transform import targets_only
>>> from pyXLMS.transform import filter_proteins
>>> pr = read("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_Crosslinks.xlsx", engine="MS Annika", crosslinker="DSS")
>>> crosslinks = targets_only(pr)["crosslinks"]
>>> cas9 = filter_proteins(crosslinks, proteins=["Cas9"])["Both"]
>>> to_xiview(cas9, filename="crosslinks_xiVIEW.csv")
    AbsPos1 AbsPos2 Protein1 Protein2 Decoy1 Decoy2   Score
0       779     779     Cas9     Cas9  FALSE  FALSE  119.83
1       866     866     Cas9     Cas9  FALSE  FALSE  114.43
2       677     677     Cas9     Cas9  FALSE  FALSE  200.98
3       677      48     Cas9     Cas9  FALSE  FALSE   94.47
4        34      34     Cas9     Cas9  FALSE  FALSE  110.48
..      ...     ...      ...      ...    ...    ...     ...
248     396     396     Cas9     Cas9  FALSE  FALSE  305.63
249     688     952     Cas9     Cas9  FALSE  FALSE  110.46
250     793    1180     Cas9     Cas9  FALSE  FALSE  288.36
251     575     688     Cas9     Cas9  FALSE  FALSE  376.15
252    1180    1180     Cas9     Cas9  FALSE  FALSE  437.10
[253 rows x 7 columns]
>>> from pyXLMS.exporter import to_xiview
>>> from pyXLMS.parser import read
>>> from pyXLMS.transform import targets_only
>>> from pyXLMS.transform import filter_proteins
>>> pr = read("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_Crosslinks.xlsx", engine="MS Annika", crosslinker="DSS")
>>> crosslinks = targets_only(pr)["crosslinks"]
>>> cas9 = filter_proteins(crosslinks, proteins=["Cas9"])["Both"]
>>> df = to_xiview(cas9, filename=None)
>>> from pyXLMS.exporter import to_xiview
>>> from pyXLMS.parser import read
>>> from pyXLMS.transform import targets_only
>>> from pyXLMS.transform import filter_proteins
>>> pr = read("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_Crosslinks.xlsx", engine="MS Annika", crosslinker="DSS")
>>> crosslinks = targets_only(pr)["crosslinks"]
>>> cas9 = filter_proteins(crosslinks, proteins=["Cas9"])["Both"]
>>> to_xiview(cas9, filename="crosslinks_xiVIEW.csv", minimal=False)
    Protein1 PepPos1           PepSeq1  LinkPos1 Protein2 PepPos2         PepSeq2  LinkPos2   Score   Id
0       Cas9     777            GQKNSR         3     Cas9     777          GQKNSR         3  119.83    1
1       Cas9     864             SDKNR         3     Cas9     864           SDKNR         3  114.43    2
2       Cas9     676            DKQSGK         2     Cas9     676          DKQSGK         2  200.98    3
3       Cas9     676            DKQSGK         2     Cas9      45           HSIKK         4   94.47    4
4       Cas9      31             VPSKK         4     Cas9      31           VPSKK         4  110.48    5
..       ...     ...               ...       ...      ...     ...             ...       ...     ...  ...
248     Cas9     387     MDGTEELLVKLNR        10     Cas9     387   MDGTEELLVKLNR        10  305.63  249
249     Cas9     682    TILDFLKSDGFANR         7     Cas9     947       YDENDKLIR         6  110.46  250
250     Cas9     788    IEEGIKELGSQILK         6     Cas9    1176  SSFEKNPIDFLEAK         5  288.36  251
251     Cas9     575  KIECFDSVEISGVEDR         1     Cas9     682  TILDFLKSDGFANR         7  376.15  252
252     Cas9    1176    SSFEKNPIDFLEAK         5     Cas9    1176  SSFEKNPIDFLEAK         5  437.10  253
[253 rows x 10 columns]

pyXLMS.exporter.to_xlinkdb module#

pyXLMS.exporter.to_xlinkdb.to_xlinkdb(
crosslinks: List[Dict[str, Any]],
filename: str | None,
) DataFrame[source]#

Exports a list of crosslinks to XlinkDB format.

Exports a list of crosslinks to XlinkDB format. The tool XlinkDB is accessible via the link xlinkdb.gs.washington.edu/xlinkdb. Requires that alpha_proteins and beta_proteins fields are set for all crosslinks.

Parameters:
  • crosslinks (list of dict of str, any) – A list of crosslinks.

  • filename (str, or None) – If not None, the exported data will be written to a file with the specified filename. The filename should not contain a file extension and consist only of alpha-numeric characters (a-Z, 0-9).

Returns:

A pandas DataFrame containing crosslinks in XlinkDB format.

Return type:

pd.DataFrame

Raises:
  • TypeError – If a wrong data type is provided.

  • TypeError – If ‘crosslinks’ parameter contains elements of mixed data type.

  • ValueError – If the filename contains any non-alpha-numeric characters.

  • ValueError – If the provided ‘crosslinks’ parameter contains no elements.

  • RuntimeError – If not all of the required information is present in the input data.

Notes

XlinkDB input format requires a column with probabilities that the crosslinks are correct. Since that is not available from most crosslink search engines, this is simply set to a constant 1.

Examples

>>> from pyXLMS.exporter import to_xlinkdb
>>> from pyXLMS.parser import read
>>> pr = read("data/xi/1perc_xl_boost_Links_xiFDR2.2.1.csv", engine="xiSearch/xiFDR", crosslinker="DSS")
>>> crosslinks = pr["crosslinks"]
>>> to_xlinkdb(crosslinks, filename="crosslinksForXlinkDB")
               Peptide A Protein A  Labeled Position A      Peptide B Protein B  Labeled Position B  Probability
0            VVDELVKVMGR      Cas9                   6    VVDELVKVMGR      Cas9                   6            1
1    MLASAGELQKGNELALPSK      Cas9                   9    VVDELVKVMGR      Cas9                   6            1
2          MDGTEELLVKLNR      Cas9                   9  MDGTEELLVKLNR      Cas9                   9            1
3           MTNFDKNLPNEK      Cas9                   5       SKLVSDFR      Cas9                   1            1
4               DFQFYKVR      Cas9                   5    MIAKSEQEIGK      Cas9                   3            1
..                   ...       ...                 ...            ...       ...                 ...          ...
222        LPKYSLFELENGR      Cas9                   2          SDKNR      Cas9                   2            1
223               DKQSGK      Cas9                   1         DKQSGK      Cas9                   1            1
224               AGFIKR      Cas9                   4   SDNVPSEEVVKK      Cas9                  10            1
225                EKIEK      Cas9                   1          KVTVK      Cas9                   0            1
226                LSKSR      Cas9                   2          LSKSR      Cas9                   2            1
[227 rows x 7 columns]
>>> from pyXLMS.exporter import to_xlinkdb
>>> from pyXLMS.parser import read
>>> pr = read("data/xi/1perc_xl_boost_Links_xiFDR2.2.1.csv", engine="xiSearch/xiFDR", crosslinker="DSS")
>>> crosslinks = pr["crosslinks"]
>>> df = to_xlinkdb(crosslinks, filename=None)

pyXLMS.exporter.to_xlmstools module#

pyXLMS.exporter.to_xlmstools.to_xlmstools(
crosslinks: List[Dict[str, Any]],
pdb_file: str | BinaryIO,
gap_open: int | float = -10.0,
gap_extension: int | float = -1.0,
min_sequence_identity: float = 0.8,
allow_site_mismatch: bool = False,
ignore_chains: List[str] = [],
filename_prefix: str | None = None,
) Dict[str, Any][source]#

Exports a list of crosslinks to xlms-tools format.

Exports a list of crosslinks to xlms-tools format for protein structure analysis. The python package xlms-tools is available from gitlab.com/topf-lab/xlms-tools. This exporter performs basical local sequence alignment to align crosslinked peptides to a protein structure in PDB format. Gap open and gap extension penalties can be chosen as well as a threshold for sequence identity that must be satisfied in order for a match to be reported. Additionally the alignment is checked if the supposedly crosslinked residue can be modified with a crosslinker in the protein structure. Due to the alignment shift amino acids might change and a crosslink is reported at a position that is not able to react with the crosslinker. Optionally, these positions can still be reported.

Parameters:
  • crosslinks (list of dict of str, any) – A list of crosslinks.

  • pdb_file (str, or file stream) – The name/path of the PDB file or a file-like object/stream. If a string is provided but no file is found locally, it’s assumed to be an identifier and the file is fetched from the PDB.

  • gap_open (int, or float, default = -10.0) – Gap open penalty for sequence alignment.

  • gap_extension (int, or float, default = -1.0,) – Gap extension penalty for sequence alignment.

  • min_sequence_identity (float, default = 0.8) – Minimum sequence identity to consider an aligned crosslinked peptide a match with its corresponding position in the protein structure. Should be given as a fraction between 0 and 1, e.g. the default of 0.8 corresponds to a minimum of 80% sequence identity.

  • allow_site_mismatch (bool, default = False) – If the crosslink position after alignment is not a reactive amino acid in the protein structure, should the position still be reported. By default such cases are not reported.

  • ignore_chains (list of str, default = empty list) – A list of chains to ignore in the protein structure.

  • filename_prefix (str, or None, default = None) – If not None, the exported data will be written to files with the specified filename prefix. The full list of written files can be accessed via the returned dictionary.

Returns:

Returns a dictionary with key xlms-tools containing the formatted text for xlms-tools, with key xlms-tools DataFrame containing the information from xlms-tools but as a pandas DataFrame, with key Number of mapped crosslinks containing the total number of mapped crosslinks, with key Mapping containing a string that logs how crosslinks were mapped to the protein structure, with key Parsed PDB sequence containing the protein sequence that was parsed from the PDB file, with key Parsed PDB chains containing the parsed chains from the PDB file, with key Parsed PDB residue numbers containing the parsed residue numbers from the PDB file, and with key Exported files containing a list of filenames of all files that were written to disk.

Return type:

dict of str, any

Raises:
  • TypeError – If a wrong data type is provided.

  • TypeError – If data contains elements of mixed data type.

  • ValueError – If parameter min_sequence_identity is out of bounds.

  • ValueError – If the provided data contains no elements.

Notes

Internally this exporter just calls exporter.to_pyxlinkviewer() and re-writes some of the files since the two tools share the same input file structure.

Examples

>>> from pyXLMS.exporter import to_xlmstools
>>> from pyXLMS.parser import read_custom
>>> pr = read_custom("data/_test/exporter/xlms-tools/unique_links_all_pyxlms.csv")
>>> crosslinks = pr["crosslinks"]
>>> xlmstools_result = to_xlmstools(crosslinks, pdb_file="6YHU", filename_prefix="6YHU")
>>> xlmstools_output_file_str = xlmstools_result["xlms-tools"]
>>> xlmstools_dataframe = xlmstools_result["xlms-tools DataFrame"]
>>> nr_mapped_crosslinks = xlmstools_result["Number of mapped crosslinks"]
>>> crosslink_mapping = xlmstools_result["Mapping"]
>>> parsed_pdb_sequenece = xlmstools_result["Parsed PDB sequence"]
>>> parsed_pdb_chains = xlmstools_result["Parsed PDB chains"]
>>> parsed_pdb_residue_numbers = xlmstools_result["Parsed PDB residue numbers"]
>>> exported_files = xlmstools_result["Exported files"]

pyXLMS.exporter.to_xmas module#

pyXLMS.exporter.to_xmas.to_xmas(
crosslinks: List[Dict[str, Any]],
filename: str | None,
) DataFrame[source]#

Exports a list of crosslinks to XMAS format.

Exports a list of crosslinks to XMAS format for visualization in ChimeraX. The tool XMAS is available from github.com/ScheltemaLab/ChimeraX_XMAS_bundle.

Parameters:
  • crosslinks (list of dict of str, any) – A list of crosslinks.

  • filename (str, or None) – If not None, the exported data will be written to a file with the specified filename.

Returns:

A pandas DataFrame containing crosslinks in XMAS format.

Return type:

pd.DataFrame

Raises:
  • TypeError – If a wrong data type is provided.

  • TypeError – If ‘crosslinks’ parameter contains elements of mixed data type.

  • ValueError – If the provided ‘crosslinks’ parameter contains no elements.

Examples

>>> from pyXLMS.exporter import to_xmas
>>> from pyXLMS.data import create_crosslink_min
>>> xl1 = create_crosslink_min("KPEPTIDE", 1, "PKEPTIDE", 2)
>>> xl2 = create_crosslink_min("PEKPTIDE", 3, "PEPKTIDE", 4)
>>> crosslinks = [xl1, xl2]
>>> to_xmas(crosslinks, filename="crosslinks_xmas.xlsx")
   Sequence A  Sequence B
0  [K]PEPTIDE  P[K]EPTIDE
1  PE[K]PTIDE  PEP[K]TIDE
>>> from pyXLMS.exporter import to_xmas
>>> from pyXLMS.data import create_crosslink_min
>>> xl1 = create_crosslink_min("KPEPTIDE", 1, "PKEPTIDE", 2)
>>> xl2 = create_crosslink_min("PEKPTIDE", 3, "PEPKTIDE", 4)
>>> crosslinks = [xl1, xl2]
>>> to_xmas(crosslinks, filename=None)
   Sequence A  Sequence B
0  [K]PEPTIDE  P[K]EPTIDE
1  PE[K]PTIDE  PEP[K]TIDE

pyXLMS.exporter.util module#

Module contents#

Returns the crosslinked peptide sequence in MS Annika format.

Returns the crosslinked peptide sequence in MS Annika format, which is the peptide amino acid sequence with the crosslinked residue in square brackets (see examples).

Parameters:
  • peptide (str) – The (unmodified) amino acid sequence of the peptide.

  • crosslink_position (int) – Position of the crosslinker in the peptide sequence (1-based).

Returns:

The crosslinked peptide sequence in MS Annika format.

Return type:

str

Raises:

ValueError – If the crosslink position is outside the peptide’s length.

Examples

>>> from pyXLMS.exporter import get_msannika_crosslink_sequence
>>> get_msannika_crosslink_sequence("PEPKTIDE", 4)
'PEP[K]TIDE'
>>> from pyXLMS.exporter import get_msannika_crosslink_sequence
>>> get_msannika_crosslink_sequence("KPEPTIDE", 1)
'[K]PEPTIDE'
>>> from pyXLMS.exporter import get_msannika_crosslink_sequence
>>> get_msannika_crosslink_sequence("PEPTIDEK", 8)
'PEPTIDE[K]'
pyXLMS.exporter.to_impxfdr(
data: List[Dict[str, Any]],
filename: str | None,
targets_only: bool = True,
) DataFrame[source]#

Exports a list of crosslinks or crosslink-spectrum-matches to IMP-X-FDR format.

Exports a list of crosslinks or crosslink-spectrum-matches to IMP-X-FDR format for benchmarking purposes. The tool IMP-X-FDR is available from github.com/vbc-proteomics-org/imp-x-fdr. We recommend using version 1.1.0 and selecting “MS Annika” as input file format for the here exported file. A slightly modified version is available from github.com/hgb-bin-proteomics/MSAnnika_NC_Results. This version contains a few bug fixes and was used for the MS Annika 2.0 and MS Annika 3.0 publications. Requires that alpha_proteins, beta_proteins, alpha_proteins_crosslink_positions and beta_proteins_crosslink_positions fields are set for crosslinks and crosslink-spectrum-matches.

Parameters:
  • data (list of dict of str, any) – A list of crosslinks or crosslink-spectrum-matches.

  • filename (str, or None, default = None) – If not None, the exported data will be written to a file with the specified filename. The filename should end in “.xlsx” as the file is exported to Microsoft Excel file format.

  • targets_only (bool, default = True) – Whether or not only target crosslinks or crosslink-spectrum-matches should be exported. For benchmarking purposes this is usually the case. If the crosslinks or crosslink-spectrum-matches do not contain target-decoy labels this should be set to False.

Returns:

A pandas DataFrame containing crosslinks or crosslink-spectrum-matches in IMP-X-FDR format.

Return type:

pd.DataFrame

Raises:
  • TypeError – If a wrong data type is provided.

  • TypeError – If data contains elements of mixed data type.

  • ValueError – If the provided data contains no elements or if none of the data has target-decoy labels and parameter ‘targets_only’ is set to True.

  • RuntimeError – If not all of the required information is present in the input data.

Examples

>>> from pyXLMS.exporter import to_impxfdr
>>> from pyXLMS.parser import read
>>> pr = read("data/xi/1perc_xl_boost_Links_xiFDR2.2.1.csv", engine="xiSearch/xiFDR", crosslinker="DSS")
>>> crosslinks = pr["crosslinks"]
>>> to_impxfdr(crosslinks, filename="crosslinks.xlsx")
    Crosslink Type             Sequence A  Position A Accession A In protein A  ... Position B  Accession B In protein B Best CSM Score  Decoy
0            Intra          VVDELV[K]VMGR           7        Cas9          753  ...          7         Cas9          753         40.679  False
1            Intra  MLASAGELQ[K]GNELALPSK          10        Cas9          753  ...          7         Cas9         1226         40.231  False
2            Intra        MDGTEELLV[K]LNR          10        Cas9          396  ...         10         Cas9          396         39.582  False
3            Intra         MTNFD[K]NLPNEK           6        Cas9          965  ...          2         Cas9          504         35.880  False
4            Intra             DFQFY[K]VR           6        Cas9          978  ...          4         Cas9         1028         35.281  False
..             ...                    ...         ...         ...          ...  ...        ...          ...          ...            ...    ...
220          Intra        LP[K]YSLFELENGR           3        Cas9          866  ...          3         Cas9         1204          9.877  False
221          Intra               D[K]QSGK           2        Cas9          677  ...          2         Cas9          677          9.702  False
222          Intra               AGFI[K]R           5        Cas9          922  ...         11         Cas9          881          9.666  False
223          Intra                E[K]IEK           2        Cas9          443  ...          1         Cas9          562          9.656  False
224          Intra                LS[K]SR           3        Cas9          222  ...          3         Cas9          222          9.619  False
[225 rows x 11 columns]
>>> from pyXLMS.exporter import to_impxfdr
>>> from pyXLMS.parser import read
>>> pr = read("data/xi/1perc_xl_boost_CSM_xiFDR2.2.1.csv", engine="xiSearch/xiFDR", crosslinker="DSS")
>>> csms = pr["crosslink-spectrum-matches"]
>>> to_impxfdr(csms, filename="csms.xlsx")
    Crosslink Type          Sequence A  Position A Accession A In protein A  ... Position B  Accession B In protein B Best CSM Score  Decoy
0            Intra  [K]IECFDSVEISGVEDR           1        Cas9          575  ...          1         Cas9          575         27.268  False
1            Intra       LVDSTD[K]ADLR           7        Cas9          152  ...         11         Cas9          881         26.437  False
2            Intra     GGLSELD[K]AGFIK           8        Cas9          917  ...          8         Cas9          917         26.134  False
3            Intra       LVDSTD[K]ADLR           7        Cas9          152  ...          7         Cas9          152         25.804  False
4            Intra       VVDELV[K]VMGR           7        Cas9          753  ...          7         Cas9          753         24.861  False
..             ...                 ...         ...         ...          ...  ...        ...          ...          ...            ...    ...
406          Intra          [K]GILQTVK           1        Cas9          739  ...          3         Cas9          222          6.977  False
407          Intra          QQLPE[K]YK           6        Cas9          350  ...          6         Cas9          350          6.919  False
408          Intra           ESILP[K]R           6        Cas9         1117  ...          7         Cas9         1035          6.853  False
409          Intra             LS[K]SR           3        Cas9          222  ...          2         Cas9          884          6.809  False
410          Intra     QIT[K]HVAQILDSR           4        Cas9          933  ...          6         Cas9          350          6.808  False
[411 rows x 11 columns]
pyXLMS.exporter.to_msannika(
data: List[Dict[str, Any]],
filename: str | None = None,
format: Literal['csv', 'tsv', 'xlsx'] = 'csv',
) DataFrame[source]#

Exports a list of crosslinks or crosslink-spectrum-matches to MS Annika format.

Exports a list of crosslinks or crosslink-spectrum-matches to MS Annika format. This might be useful for tools that support MS Annika input but are not supported by pyXLMS (yet).

Parameters:
  • data (list of dict of str, any) – A list of crosslinks or crosslink-spectrum-matches.

  • filename (str, or None, default = None) – If not None, the exported data will be written to a file with the specified filename.

  • format (str, one of "csv", "tsv", or "xlsx", default = "csv") – File format of the exported file if filename is not None.

Returns:

A pandas DataFrame containing crosslinks or crosslink-spectrum-matches in MS Annika format.

Return type:

pd.DataFrame

Raises:
  • TypeError – If a wrong data type is provided.

  • TypeError – If data contains elements of mixed data type.

  • TypeError – If parameter format is not one of ‘csv’, ‘tsv’ or ‘xlsx’.

  • ValueError – If the provided data contains no elements.

Warning

The MS Annika exporter will not check if all necessary information is available for the exported crosslinks or crosslink-spectrum-matches. If a value is not available it will be denoted as a missing value in the dataframe and exported file. Please make sure all necessary information is available before using the exported file with another tool! Please also note that modifications are not exported, for modification down-stream analysis please refer to transform.to_proforma() or transform.to_dataframe()!

Examples

>>> from pyXLMS.exporter import to_msannika
>>> from pyXLMS.data import create_crosslink_min
>>> xl1 = create_crosslink_min("KPEPTIDE", 1, "PKEPTIDE", 2)
>>> xl2 = create_crosslink_min("PEKPTIDE", 3, "PEPKTIDE", 4)
>>> crosslinks = [xl1, xl2]
>>> to_msannika(crosslinks)
  Crosslink Type  Sequence A  Position A Accession A In protein A  Sequence B  Position B Accession B In protein B Best CSM Score Decoy
0          Inter  [K]PEPTIDE           1        None         None  P[K]EPTIDE           2        None         None           None  None
1          Inter  PE[K]PTIDE           3        None         None  PEP[K]TIDE           4        None         None           None  None
>>> from pyXLMS.exporter import to_msannika
>>> from pyXLMS.data import create_crosslink_min
>>> xl1 = create_crosslink_min("KPEPTIDE", 1, "PKEPTIDE", 2)
>>> xl2 = create_crosslink_min("PEKPTIDE", 3, "PEPKTIDE", 4)
>>> crosslinks = [xl1, xl2]
>>> df = to_msannika(crosslinks, filename = "crosslinks.csv", format = "csv")
>>> from pyXLMS.exporter import to_msannika
>>> from pyXLMS.data import create_csm_min
>>> csm1 = create_csm_min("KPEPTIDE", 1, "PKEPTIDE", 2, "RUN_1", 1)
>>> csm2 = create_csm_min("PEKPTIDE", 3, "PEPKTIDE", 4, "RUN_1", 2)
>>> csms = [csm1, csm2]
>>> to_msannika(csms)
            Sequence Crosslink Type Sequence A  Crosslinker Position A  ... First Scan Charge RT [min] Compensation Voltage
0  KPEPTIDE-PKEPTIDE          Inter   KPEPTIDE                       1  ...          1   None     None                 None
1  PEKPTIDE-PEPKTIDE          Inter   PEKPTIDE                       3  ...          2   None     None                 None
[2 rows x 20 columns]
>>> from pyXLMS.exporter import to_msannika
>>> from pyXLMS.data import create_csm_min
>>> csm1 = create_csm_min("KPEPTIDE", 1, "PKEPTIDE", 2, "RUN_1", 1)
>>> csm2 = create_csm_min("PEKPTIDE", 3, "PEPKTIDE", 4, "RUN_1", 2)
>>> csms = [csm1, csm2]
>>> df = to_msannika(csms, filename = "csms.csv", format = "csv")
pyXLMS.exporter.to_pyxlinkviewer(
crosslinks: List[Dict[str, Any]],
pdb_file: str | BinaryIO,
gap_open: int | float = -10.0,
gap_extension: int | float = -1.0,
min_sequence_identity: float = 0.8,
allow_site_mismatch: bool = False,
ignore_chains: List[str] = [],
filename_prefix: str | None = None,
) Dict[str, Any][source]#

Exports a list of crosslinks to PyXlinkViewer format.

Exports a list of crosslinks to PyXlinkViewer format for visualization in pyMOL. The tool PyXlinkViewer is available from github.com/BobSchiffrin/PyXlinkViewer. This exporter performs basical local sequence alignment to align crosslinked peptides to a protein structure in PDB format. Gap open and gap extension penalties can be chosen as well as a threshold for sequence identity that must be satisfied in order for a match to be reported. Additionally the alignment is checked if the supposedly crosslinked residue can be modified with a crosslinker in the protein structure. Due to the alignment shift amino acids might change and a crosslink is reported at a position that is not able to react with the crosslinker. Optionally, these positions can still be reported.

Parameters:
  • crosslinks (list of dict of str, any) – A list of crosslinks.

  • pdb_file (str, or file stream) – The name/path of the PDB file or a file-like object/stream. If a string is provided but no file is found locally, it’s assumed to be an identifier and the file is fetched from the PDB.

  • gap_open (int, or float, default = -10.0) – Gap open penalty for sequence alignment.

  • gap_extension (int, or float, default = -1.0,) – Gap extension penalty for sequence alignment.

  • min_sequence_identity (float, default = 0.8) – Minimum sequence identity to consider an aligned crosslinked peptide a match with its corresponding position in the protein structure. Should be given as a fraction between 0 and 1, e.g. the default of 0.8 corresponds to a minimum of 80% sequence identity.

  • allow_site_mismatch (bool, default = False) – If the crosslink position after alignment is not a reactive amino acid in the protein structure, should the position still be reported. By default such cases are not reported.

  • ignore_chains (list of str, default = empty list) – A list of chains to ignore in the protein structure.

  • filename_prefix (str, or None, default = None) – If not None, the exported data will be written to files with the specified filename prefix. The full list of written files can be accessed via the returned dictionary.

Returns:

Returns a dictionary with key PyXlinkViewer containing the formatted text for PyXlinkViewer, with key PyXlinkViewer DataFrame containing the information from PyXlinkViewer but as a pandas DataFrame, with key Number of mapped crosslinks containing the total number of mapped crosslinks, with key Mapping containing a string that logs how crosslinks were mapped to the protein structure, with key Parsed PDB sequence containing the protein sequence that was parsed from the PDB file, with key Parsed PDB chains containing the parsed chains from the PDB file, with key Parsed PDB residue numbers containing the parsed residue numbers from the PDB file, and with key Exported files containing a list of filenames of all files that were written to disk.

Return type:

dict of str, any

Raises:
  • TypeError – If a wrong data type is provided.

  • TypeError – If data contains elements of mixed data type.

  • ValueError – If parameter min_sequence_identity is out of bounds.

  • ValueError – If the provided data contains no elements.

Examples

>>> from pyXLMS.exporter import to_pyxlinkviewer
>>> from pyXLMS.parser import read_custom
>>> pr = read_custom("data/_test/exporter/pyxlinkviewer/unique_links_all_pyxlms.csv")
>>> crosslinks = pr["crosslinks"]
>>> pyxlinkviewer_result = to_pyxlinkviewer(crosslinks, pdb_file="6YHU", filename_prefix="6YHU")
>>> pyxlinkviewer_output_file_str = pyxlinkviewer_result["PyXlinkViewer"]
>>> pyxlinkviewer_dataframe = pyxlinkviewer_result["PyXlinkViewer DataFrame"]
>>> nr_mapped_crosslinks = pyxlinkviewer_result["Number of mapped crosslinks"]
>>> crosslink_mapping = pyxlinkviewer_result["Mapping"]
>>> parsed_pdb_sequenece = pyxlinkviewer_result["Parsed PDB sequence"]
>>> parsed_pdb_chains = pyxlinkviewer_result["Parsed PDB chains"]
>>> parsed_pdb_residue_numbers = pyxlinkviewer_result["Parsed PDB residue numbers"]
>>> exported_files = pyxlinkviewer_result["Exported files"]
pyXLMS.exporter.to_xifdr(
csms: List[Dict[str, Any]],
filename: str | None,
) DataFrame[source]#

Exports a list of crosslink-spectrum-matches to xiFDR format.

Exports a list of crosslinks to xiFDR format. The tool xiFDR is accessible via the link rappsilberlab.org/software/xifdr. Requires that alpha_proteins, beta_proteins, alpha_proteins_peptide_positions, beta_proteins_peptide_positions, alpha_decoy, beta_decoy, charge and score fields are set for all crosslink-spectrum-matches.

Parameters:
  • csms (list of dict of str, any) – A list of crosslink-spectrum-matches.

  • filename (str, or None) – If not None, the exported data will be written to a file with the specified filename.

Returns:

A pandas DataFrame containing crosslink-spectrum-matches in xiFDR format.

Return type:

pd.DataFrame

Raises:
  • TypeError – If a wrong data type is provided.

  • TypeError – If ‘csms’ parameter contains elements of mixed data type.

  • ValueError – If the provided ‘csms’ parameter contains no elements.

  • RuntimeError – If not all of the required information is present in the input data.

Examples

>>> from pyXLMS.exporter import to_xifdr
>>> from pyXLMS.parser import read
>>> pr = read("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_CSMs.xlsx", engine="MS Annika", crosslinker="DSS")
>>> csms = pr["crosslink-spectrum-matches"]
>>> to_xifdr(csms, filename="msannika_xiFDR.csv")
                                       run   scan          peptide1  ... peptide position 1  peptide position 2   score
0    XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw   2257            GQKNSR  ...                777                 777  119.83
1    XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw   2448            GQKNSR  ...                777                 693   13.91
2    XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw   2561             SDKNR  ...                864                 864  114.43
3    XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw   2719            DKQSGK  ...                676                 676  200.98
4    XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw   2792            DKQSGK  ...                676                  45   94.47
..                                     ...    ...               ...  ...                ...                 ...     ...
821  XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw  23297     MDGTEELLVKLNR  ...                387                 387  286.05
822  XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw  23454  KIECFDSVEISGVEDR  ...                575                 682  376.15
823  XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw  23581    SSFEKNPIDFLEAK  ...               1176                1176  412.44
824  XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw  23683    SSFEKNPIDFLEAK  ...               1176                1176  437.10
825  XLpeplib_Beveridge_QEx-HFX_DSS_R1.raw  27087    MEDESKLHKFKDFK  ...                 99                1176   15.89
[826 rows x 14 columns]
>>> from pyXLMS.exporter import to_xifdr
>>> from pyXLMS.parser import read
>>> pr = read("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_CSMs.xlsx", engine="MS Annika", crosslinker="DSS")
>>> csms = pr["crosslink-spectrum-matches"]
>>> df = to_xifdr(csms, filename=None)
pyXLMS.exporter.to_xinet(
crosslinks: List[Dict[str, Any]],
filename: str | None,
) DataFrame[source]#

Exports a list of crosslinks to xiNET format.

Exports a list of crosslinks to xiNET format. The tool xiNET is accessible via the link crosslinkviewer.org. Requires that alpha_proteins, beta_proteins, alpha_proteins_crosslink_positions and beta_proteins_crosslink_positions fields are set for all crosslinks.

Parameters:
  • crosslinks (list of dict of str, any) – A list of crosslinks.

  • filename (str, or None) – If not None, the exported data will be written to a file with the specified filename.

Returns:

A pandas DataFrame containing crosslinks in xiNET format.

Return type:

pd.DataFrame

Raises:
  • TypeError – If a wrong data type is provided.

  • TypeError – If ‘crosslinks’ parameter contains elements of mixed data type.

  • ValueError – If the provided ‘crosslinks’ parameter contains no elements.

  • RuntimeError – If not all of the required information is present in the input data.

Notes

The optional Score column in the xiNET table will only be available if all crosslinks have assigned scores.

Examples

>>> from pyXLMS.exporter import to_xinet
>>> from pyXLMS.parser import read
>>> from pyXLMS.transform import targets_only
>>> from pyXLMS.transform import filter_proteins
>>> pr = read("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_Crosslinks.xlsx", engine="MS Annika", crosslinker="DSS")
>>> crosslinks = targets_only(pr)["crosslinks"]
>>> cas9 = filter_proteins(crosslinks, proteins=["Cas9"])["Both"]
>>> to_xinet(cas9, filename="crosslinks_xiNET.csv")
    Protein1 PepPos1           PepSeq1  LinkPos1 Protein2 PepPos2         PepSeq2  LinkPos2   Score   Id
0       Cas9     777            GQKNSR         3     Cas9     777          GQKNSR         3  119.83    1
1       Cas9     864             SDKNR         3     Cas9     864           SDKNR         3  114.43    2
2       Cas9     676            DKQSGK         2     Cas9     676          DKQSGK         2  200.98    3
3       Cas9     676            DKQSGK         2     Cas9      45           HSIKK         4   94.47    4
4       Cas9      31             VPSKK         4     Cas9      31           VPSKK         4  110.48    5
..       ...     ...               ...       ...      ...     ...             ...       ...     ...  ...
248     Cas9     387     MDGTEELLVKLNR        10     Cas9     387   MDGTEELLVKLNR        10  305.63  249
249     Cas9     682    TILDFLKSDGFANR         7     Cas9     947       YDENDKLIR         6  110.46  250
250     Cas9     788    IEEGIKELGSQILK         6     Cas9    1176  SSFEKNPIDFLEAK         5  288.36  251
251     Cas9     575  KIECFDSVEISGVEDR         1     Cas9     682  TILDFLKSDGFANR         7  376.15  252
252     Cas9    1176    SSFEKNPIDFLEAK         5     Cas9    1176  SSFEKNPIDFLEAK         5  437.10  253
[253 rows x 10 columns]
>>> from pyXLMS.exporter import to_xinet
>>> from pyXLMS.parser import read
>>> from pyXLMS.transform import targets_only
>>> from pyXLMS.transform import filter_proteins
>>> pr = read("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_Crosslinks.xlsx", engine="MS Annika", crosslinker="DSS")
>>> crosslinks = targets_only(pr)["crosslinks"]
>>> cas9 = filter_proteins(crosslinks, proteins=["Cas9"])["Both"]
>>> df = to_xinet(cas9, filename=None)
pyXLMS.exporter.to_xiview(
crosslinks: List[Dict[str, Any]],
filename: str | None,
minimal: bool = True,
) DataFrame[source]#

Exports a list of crosslinks to xiVIEW format.

Exports a list of crosslinks to xiVIEW format. The tool xiVIEW is accessible via the link xiview.org/. Requires that alpha_proteins, beta_proteins, alpha_proteins_crosslink_positions and beta_proteins_crosslink_positions fields are set for all crosslinks.

Parameters:
  • crosslinks (list of dict of str, any) – A list of crosslinks.

  • filename (str, or None) – If not None, the exported data will be written to a file with the specified filename.

  • minimal (bool, default = True) –

    Which xiVIEW format to return, if minimal = True the minimal xiVIEW format is returned. Otherwise the “CSV without peak lists” format is returned (internally this just calls exporter.to_xinet()). For more information on the xiVIEW formats please refer to the xiVIEW specification.

Returns:

A pandas DataFrame containing crosslinks in xiVIEW format.

Return type:

pd.DataFrame

Raises:
  • TypeError – If a wrong data type is provided.

  • TypeError – If ‘crosslinks’ parameter contains elements of mixed data type.

  • ValueError – If the provided ‘crosslinks’ parameter contains no elements.

  • RuntimeError – If not all of the required information is present in the input data.

Notes

The optional Score column in the xiVIEW table will only be available if all crosslinks have assigned scores, the optional Decoy* columns will only be available if all crosslinks have assigned target and decoy labels.

Examples

>>> from pyXLMS.exporter import to_xiview
>>> from pyXLMS.parser import read
>>> from pyXLMS.transform import targets_only
>>> from pyXLMS.transform import filter_proteins
>>> pr = read("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_Crosslinks.xlsx", engine="MS Annika", crosslinker="DSS")
>>> crosslinks = targets_only(pr)["crosslinks"]
>>> cas9 = filter_proteins(crosslinks, proteins=["Cas9"])["Both"]
>>> to_xiview(cas9, filename="crosslinks_xiVIEW.csv")
    AbsPos1 AbsPos2 Protein1 Protein2 Decoy1 Decoy2   Score
0       779     779     Cas9     Cas9  FALSE  FALSE  119.83
1       866     866     Cas9     Cas9  FALSE  FALSE  114.43
2       677     677     Cas9     Cas9  FALSE  FALSE  200.98
3       677      48     Cas9     Cas9  FALSE  FALSE   94.47
4        34      34     Cas9     Cas9  FALSE  FALSE  110.48
..      ...     ...      ...      ...    ...    ...     ...
248     396     396     Cas9     Cas9  FALSE  FALSE  305.63
249     688     952     Cas9     Cas9  FALSE  FALSE  110.46
250     793    1180     Cas9     Cas9  FALSE  FALSE  288.36
251     575     688     Cas9     Cas9  FALSE  FALSE  376.15
252    1180    1180     Cas9     Cas9  FALSE  FALSE  437.10
[253 rows x 7 columns]
>>> from pyXLMS.exporter import to_xiview
>>> from pyXLMS.parser import read
>>> from pyXLMS.transform import targets_only
>>> from pyXLMS.transform import filter_proteins
>>> pr = read("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_Crosslinks.xlsx", engine="MS Annika", crosslinker="DSS")
>>> crosslinks = targets_only(pr)["crosslinks"]
>>> cas9 = filter_proteins(crosslinks, proteins=["Cas9"])["Both"]
>>> df = to_xiview(cas9, filename=None)
>>> from pyXLMS.exporter import to_xiview
>>> from pyXLMS.parser import read
>>> from pyXLMS.transform import targets_only
>>> from pyXLMS.transform import filter_proteins
>>> pr = read("data/ms_annika/XLpeplib_Beveridge_QEx-HFX_DSS_R1_Crosslinks.xlsx", engine="MS Annika", crosslinker="DSS")
>>> crosslinks = targets_only(pr)["crosslinks"]
>>> cas9 = filter_proteins(crosslinks, proteins=["Cas9"])["Both"]
>>> to_xiview(cas9, filename="crosslinks_xiVIEW.csv", minimal=False)
    Protein1 PepPos1           PepSeq1  LinkPos1 Protein2 PepPos2         PepSeq2  LinkPos2   Score   Id
0       Cas9     777            GQKNSR         3     Cas9     777          GQKNSR         3  119.83    1
1       Cas9     864             SDKNR         3     Cas9     864           SDKNR         3  114.43    2
2       Cas9     676            DKQSGK         2     Cas9     676          DKQSGK         2  200.98    3
3       Cas9     676            DKQSGK         2     Cas9      45           HSIKK         4   94.47    4
4       Cas9      31             VPSKK         4     Cas9      31           VPSKK         4  110.48    5
..       ...     ...               ...       ...      ...     ...             ...       ...     ...  ...
248     Cas9     387     MDGTEELLVKLNR        10     Cas9     387   MDGTEELLVKLNR        10  305.63  249
249     Cas9     682    TILDFLKSDGFANR         7     Cas9     947       YDENDKLIR         6  110.46  250
250     Cas9     788    IEEGIKELGSQILK         6     Cas9    1176  SSFEKNPIDFLEAK         5  288.36  251
251     Cas9     575  KIECFDSVEISGVEDR         1     Cas9     682  TILDFLKSDGFANR         7  376.15  252
252     Cas9    1176    SSFEKNPIDFLEAK         5     Cas9    1176  SSFEKNPIDFLEAK         5  437.10  253
[253 rows x 10 columns]
pyXLMS.exporter.to_xlinkdb(
crosslinks: List[Dict[str, Any]],
filename: str | None,
) DataFrame[source]#

Exports a list of crosslinks to XlinkDB format.

Exports a list of crosslinks to XlinkDB format. The tool XlinkDB is accessible via the link xlinkdb.gs.washington.edu/xlinkdb. Requires that alpha_proteins and beta_proteins fields are set for all crosslinks.

Parameters:
  • crosslinks (list of dict of str, any) – A list of crosslinks.

  • filename (str, or None) – If not None, the exported data will be written to a file with the specified filename. The filename should not contain a file extension and consist only of alpha-numeric characters (a-Z, 0-9).

Returns:

A pandas DataFrame containing crosslinks in XlinkDB format.

Return type:

pd.DataFrame

Raises:
  • TypeError – If a wrong data type is provided.

  • TypeError – If ‘crosslinks’ parameter contains elements of mixed data type.

  • ValueError – If the filename contains any non-alpha-numeric characters.

  • ValueError – If the provided ‘crosslinks’ parameter contains no elements.

  • RuntimeError – If not all of the required information is present in the input data.

Notes

XlinkDB input format requires a column with probabilities that the crosslinks are correct. Since that is not available from most crosslink search engines, this is simply set to a constant 1.

Examples

>>> from pyXLMS.exporter import to_xlinkdb
>>> from pyXLMS.parser import read
>>> pr = read("data/xi/1perc_xl_boost_Links_xiFDR2.2.1.csv", engine="xiSearch/xiFDR", crosslinker="DSS")
>>> crosslinks = pr["crosslinks"]
>>> to_xlinkdb(crosslinks, filename="crosslinksForXlinkDB")
               Peptide A Protein A  Labeled Position A      Peptide B Protein B  Labeled Position B  Probability
0            VVDELVKVMGR      Cas9                   6    VVDELVKVMGR      Cas9                   6            1
1    MLASAGELQKGNELALPSK      Cas9                   9    VVDELVKVMGR      Cas9                   6            1
2          MDGTEELLVKLNR      Cas9                   9  MDGTEELLVKLNR      Cas9                   9            1
3           MTNFDKNLPNEK      Cas9                   5       SKLVSDFR      Cas9                   1            1
4               DFQFYKVR      Cas9                   5    MIAKSEQEIGK      Cas9                   3            1
..                   ...       ...                 ...            ...       ...                 ...          ...
222        LPKYSLFELENGR      Cas9                   2          SDKNR      Cas9                   2            1
223               DKQSGK      Cas9                   1         DKQSGK      Cas9                   1            1
224               AGFIKR      Cas9                   4   SDNVPSEEVVKK      Cas9                  10            1
225                EKIEK      Cas9                   1          KVTVK      Cas9                   0            1
226                LSKSR      Cas9                   2          LSKSR      Cas9                   2            1
[227 rows x 7 columns]
>>> from pyXLMS.exporter import to_xlinkdb
>>> from pyXLMS.parser import read
>>> pr = read("data/xi/1perc_xl_boost_Links_xiFDR2.2.1.csv", engine="xiSearch/xiFDR", crosslinker="DSS")
>>> crosslinks = pr["crosslinks"]
>>> df = to_xlinkdb(crosslinks, filename=None)
pyXLMS.exporter.to_xlmstools(
crosslinks: List[Dict[str, Any]],
pdb_file: str | BinaryIO,
gap_open: int | float = -10.0,
gap_extension: int | float = -1.0,
min_sequence_identity: float = 0.8,
allow_site_mismatch: bool = False,
ignore_chains: List[str] = [],
filename_prefix: str | None = None,
) Dict[str, Any][source]#

Exports a list of crosslinks to xlms-tools format.

Exports a list of crosslinks to xlms-tools format for protein structure analysis. The python package xlms-tools is available from gitlab.com/topf-lab/xlms-tools. This exporter performs basical local sequence alignment to align crosslinked peptides to a protein structure in PDB format. Gap open and gap extension penalties can be chosen as well as a threshold for sequence identity that must be satisfied in order for a match to be reported. Additionally the alignment is checked if the supposedly crosslinked residue can be modified with a crosslinker in the protein structure. Due to the alignment shift amino acids might change and a crosslink is reported at a position that is not able to react with the crosslinker. Optionally, these positions can still be reported.

Parameters:
  • crosslinks (list of dict of str, any) – A list of crosslinks.

  • pdb_file (str, or file stream) – The name/path of the PDB file or a file-like object/stream. If a string is provided but no file is found locally, it’s assumed to be an identifier and the file is fetched from the PDB.

  • gap_open (int, or float, default = -10.0) – Gap open penalty for sequence alignment.

  • gap_extension (int, or float, default = -1.0,) – Gap extension penalty for sequence alignment.

  • min_sequence_identity (float, default = 0.8) – Minimum sequence identity to consider an aligned crosslinked peptide a match with its corresponding position in the protein structure. Should be given as a fraction between 0 and 1, e.g. the default of 0.8 corresponds to a minimum of 80% sequence identity.

  • allow_site_mismatch (bool, default = False) – If the crosslink position after alignment is not a reactive amino acid in the protein structure, should the position still be reported. By default such cases are not reported.

  • ignore_chains (list of str, default = empty list) – A list of chains to ignore in the protein structure.

  • filename_prefix (str, or None, default = None) – If not None, the exported data will be written to files with the specified filename prefix. The full list of written files can be accessed via the returned dictionary.

Returns:

Returns a dictionary with key xlms-tools containing the formatted text for xlms-tools, with key xlms-tools DataFrame containing the information from xlms-tools but as a pandas DataFrame, with key Number of mapped crosslinks containing the total number of mapped crosslinks, with key Mapping containing a string that logs how crosslinks were mapped to the protein structure, with key Parsed PDB sequence containing the protein sequence that was parsed from the PDB file, with key Parsed PDB chains containing the parsed chains from the PDB file, with key Parsed PDB residue numbers containing the parsed residue numbers from the PDB file, and with key Exported files containing a list of filenames of all files that were written to disk.

Return type:

dict of str, any

Raises:
  • TypeError – If a wrong data type is provided.

  • TypeError – If data contains elements of mixed data type.

  • ValueError – If parameter min_sequence_identity is out of bounds.

  • ValueError – If the provided data contains no elements.

Notes

Internally this exporter just calls exporter.to_pyxlinkviewer() and re-writes some of the files since the two tools share the same input file structure.

Examples

>>> from pyXLMS.exporter import to_xlmstools
>>> from pyXLMS.parser import read_custom
>>> pr = read_custom("data/_test/exporter/xlms-tools/unique_links_all_pyxlms.csv")
>>> crosslinks = pr["crosslinks"]
>>> xlmstools_result = to_xlmstools(crosslinks, pdb_file="6YHU", filename_prefix="6YHU")
>>> xlmstools_output_file_str = xlmstools_result["xlms-tools"]
>>> xlmstools_dataframe = xlmstools_result["xlms-tools DataFrame"]
>>> nr_mapped_crosslinks = xlmstools_result["Number of mapped crosslinks"]
>>> crosslink_mapping = xlmstools_result["Mapping"]
>>> parsed_pdb_sequenece = xlmstools_result["Parsed PDB sequence"]
>>> parsed_pdb_chains = xlmstools_result["Parsed PDB chains"]
>>> parsed_pdb_residue_numbers = xlmstools_result["Parsed PDB residue numbers"]
>>> exported_files = xlmstools_result["Exported files"]
pyXLMS.exporter.to_xmas(
crosslinks: List[Dict[str, Any]],
filename: str | None,
) DataFrame[source]#

Exports a list of crosslinks to XMAS format.

Exports a list of crosslinks to XMAS format for visualization in ChimeraX. The tool XMAS is available from github.com/ScheltemaLab/ChimeraX_XMAS_bundle.

Parameters:
  • crosslinks (list of dict of str, any) – A list of crosslinks.

  • filename (str, or None) – If not None, the exported data will be written to a file with the specified filename.

Returns:

A pandas DataFrame containing crosslinks in XMAS format.

Return type:

pd.DataFrame

Raises:
  • TypeError – If a wrong data type is provided.

  • TypeError – If ‘crosslinks’ parameter contains elements of mixed data type.

  • ValueError – If the provided ‘crosslinks’ parameter contains no elements.

Examples

>>> from pyXLMS.exporter import to_xmas
>>> from pyXLMS.data import create_crosslink_min
>>> xl1 = create_crosslink_min("KPEPTIDE", 1, "PKEPTIDE", 2)
>>> xl2 = create_crosslink_min("PEKPTIDE", 3, "PEPKTIDE", 4)
>>> crosslinks = [xl1, xl2]
>>> to_xmas(crosslinks, filename="crosslinks_xmas.xlsx")
   Sequence A  Sequence B
0  [K]PEPTIDE  P[K]EPTIDE
1  PE[K]PTIDE  PEP[K]TIDE
>>> from pyXLMS.exporter import to_xmas
>>> from pyXLMS.data import create_crosslink_min
>>> xl1 = create_crosslink_min("KPEPTIDE", 1, "PKEPTIDE", 2)
>>> xl2 = create_crosslink_min("PEKPTIDE", 3, "PEPKTIDE", 4)
>>> crosslinks = [xl1, xl2]
>>> to_xmas(crosslinks, filename=None)
   Sequence A  Sequence B
0  [K]PEPTIDE  P[K]EPTIDE
1  PE[K]PTIDE  PEP[K]TIDE