Protein-Ligand Benchmarks for Free Energy Calculations

Installing the Protein Ligand Benchmark Set

The Protein Ligand Benchmark Set is currently only installable from source.

Installation from Source

The repository uses git-lfs (large file storage) for the storage of all the data file. Ideally git-lfs is installed first before cloning the repository.

conda create -n plbenchmark python=3.7 git-lfs
conda activate plbenchmark
git lfs clone https://github.com/openforcefield/protein-ligand-benchmark.git
cd protein-ligand-benchmark
conda env update --file environment.yml
pip install -e .

Example Notebook: protein-ligand-benchmark

[1]:
from plbenchmark import targets
from IPython.core.display import HTML
Warning: Unable to load toolkit 'OpenEye Toolkit'. The Open Force Field Toolkit does not require the OpenEye Toolkits, and can use RDKit/AmberTools instead. However, if you have a valid license for the OpenEye Toolkits, consider installing them for faster performance and additional file format support: https://docs.eyesopen.com/toolkits/python/quickstart-python/linuxosx.html OpenEye offers free Toolkit licenses for academics: https://www.eyesopen.com/academic-licensing

Get the whole set of targets in the dataset

[2]:
# it is initialized from the `plbenchmark/sample_data/targets.yml` file
target_set = targets.TargetSet()
# to see which targets are available, one can get a list of names
target_set.get_names()
[2]:
['mcl1_sample']

The TargetSet is a Dict, but can be converted to a pandas.DataFrame or a html string via TargetSet.get_dataframe(columns=None) or TargetSet.get_html(columns=None). The default None for columns means that all columns are printed. One can also define a subset of columns as a list:

[3]:
HTML(target_set.get_html(columns=['name', 'fullname', 'pdb', 'references', 'numLigands', 'minDG', 'maxDG', 'associated_sets']))
/home/dhahn3/miniconda3/envs/plbenchmark/lib/python3.9/site-packages/pandas/core/dtypes/cast.py:1638: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  result[:] = values
[3]:
name fullname pdb references numLigands minDG maxDG associated_sets
0 mcl1_sample Induced myeloid leukemia cell differentiation protein Mcl-1 4HW3 {'calculation': ['10.1021/ja512751q', '10.1021/acs.jcim.9b00105'], 'measurement': None} 15 -9.0 kilocalorie / mole -6.1 kilocalorie / mole [Schrodinger JACS]

A target can be accessed with its name in two ways

[4]:
mcl1 = target_set['mcl1_sample']
mcl1_2 = target_set.get_target('mcl1_sample')

The Target class

contains all the available information about one target of plbenchmark. It also has two member variables, _ligand_set and _edge_set, which contain the information about the available ligand and edges of the respective target. A Target can either be accessed from the TargetSet (see cell before) or initialized using its name via

[5]:
mcl1 = targets.Target('mcl1_sample')
# The data in the column is stored in a pandas.Series and can be accessed via
mcl1.get_dataframe(columns=None)
/home/dhahn3/miniconda3/envs/plbenchmark/lib/python3.9/site-packages/pandas/core/dtypes/cast.py:1638: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  result[:] = values
[5]:
associated_sets                                   [Schrodinger JACS]
comments            hydrophobic interactions contributing to binding
date                                                      2020-08-21
fullname           Induced myeloid leukemia cell differentiation ...
id                                                                99
ligands            [lig_23, lig_26, lig_27, lig_28, lig_29, lig_3...
name                                                     mcl1_sample
netcharge                                                         xx
pdb                                                             4HW3
references         {'calculation': ['10.1021/ja512751q', '10.1021...
numLigands                                                        15
maxDG                                        -6.1 kilocalorie / mole
minDG                                        -9.0 kilocalorie / mole
std(DG)                                       0.9 kilocalorie / mole
calculation        REP1http://dx.doi.org/10.1021/ja512751qREP2Wan...
pdblinks           REP1http://www.rcsb.org/structure/4HW3REP24HW3...
dtype: object

Access to the EdgeSet and LigandSet in different formats is achieved by

[6]:
mcl1_ligands = mcl1.get_ligand_set()
mcl1_ligands_df = mcl1.get_ligand_set_dataframe()
HTML(mcl1.get_ligand_set_html(columns = ['name', 'ROMol', 'measurement', 'DerivedMeasurement']))
[6]:
name ROMol measurement DerivedMeasurement
comment error type unit value Reference type value error unit
0 lig_23 Mol dtype: object Table 2, entry 23 30 nanomolar ki nanomolar 370 nanomolar Friberg et al., J. Med. Chem. 2013 dg -8.83 kilocalorie / mole 0.05 kilocalorie / mole None
1 lig_26 Mol dtype: object Table 2, entry 26 0.44 micromolar ki micromolar 1.0 micromolar Friberg et al., J. Med. Chem. 2013 dg -8.24 kilocalorie / mole 0.26 kilocalorie / mole None
2 lig_27 Mol dtype: object Table 3, entry 27 0.0071 millimolar ki millimolar 0.035 millimolar Friberg et al., J. Med. Chem. 2013 dg -6.12 kilocalorie / mole 0.12 kilocalorie / mole None
3 lig_28 Mol dtype: object Table 3, entry 28, manually converted 0.03 kilocalorie / mole dg kilocalorie / mole -6.62 kilocalorie / mole Friberg et al., J. Med. Chem. 2013 dg -6.62 kilocalorie / mole 0.03 kilocalorie / mole None
4 lig_29 Mol dtype: object Table 3, entry 29, manually converted 120.0 calorie / mole dg calories / mole -6940.0 calorie / mole Friberg et al., J. Med. Chem. 2013 dg -6.94 kilocalorie / mole 0.12 kilocalorie / mole None
5 lig_30 Mol dtype: object Table 3, entry 30, manually converted 0.6 micromolar ic50 micromolar 1.9 micromolar Friberg et al., J. Med. Chem. 2013 dg -7.85 kilocalorie / mole 0.19 kilocalorie / mole None
6 lig_31 Mol dtype: object Table 3, entry 31, manually converted 80 nanomolar ic50 nanomolar 1700 nanomolar Friberg et al., J. Med. Chem. 2013 dg -7.92 kilocalorie / mole 0.03 kilocalorie / mole None
7 lig_32 Mol dtype: object Table 3, entry 32, manually converted 0.08 dimensionless pic50 dimensionless 4.8 dimensionless Friberg et al., J. Med. Chem. 2013 dg -6.59 kilocalorie / mole 0.11 kilocalorie / mole None
8 lig_33 Mol dtype: object Table 3, entry 33, manually converted 0.75 kilojoule / mole dg kilojoules / mole -28.79 kilojoule / mole Friberg et al., J. Med. Chem. 2013 dg -6.880975143403441 kilocalorie / mole 0.17925430210325047 kilocalorie / mole None
9 lig_34 Mol dtype: object Table 3, entry 34 3.2 micromolar ki micromolar 9.9 micromolar Friberg et al., J. Med. Chem. 2013 dg -6.87 kilocalorie / mole 0.19 kilocalorie / mole None
10 lig_35 Mol dtype: object Table 3, entry 35 0.14 micromolar ki micromolar 0.38 micromolar Friberg et al., J. Med. Chem. 2013 dg -8.81 kilocalorie / mole 0.22 kilocalorie / mole None
11 lig_36 Mol dtype: object Table 3, entry 36 0.1 micromolar ki micromolar 1.1 micromolar Friberg et al., J. Med. Chem. 2013 dg -8.18 kilocalorie / mole 0.05 kilocalorie / mole None
12 lig_37 Mol dtype: object Table 3, entry 37 0.15 micromolar ki micromolar 0.3 micromolar Friberg et al., J. Med. Chem. 2013 dg -8.95 kilocalorie / mole 0.3 kilocalorie / mole None
13 lig_38 Mol dtype: object Table 3, entry 38 2.1 micromolar ki micromolar 7.7 micromolar Friberg et al., J. Med. Chem. 2013 dg -7.02 kilocalorie / mole 0.16 kilocalorie / mole None
14 lig_39 Mol dtype: object Table 3, entry 39 0.7 micromolar ki micromolar 7.6 micromolar Friberg et al., J. Med. Chem. 2013 dg -7.03 kilocalorie / mole 0.05 kilocalorie / mole None
[7]:
mcl1_edges = mcl1.get_edge_set()
mcl1_edges_df = mcl1.get_edge_set_dataframe()
HTML(mcl1.get_edge_set_html())
/home/dhahn3/miniconda3/envs/plbenchmark/lib/python3.9/site-packages/pandas/core/dtypes/cast.py:1638: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  result[:] = values
[7]:
ligand_a ligand_b name Mol1 Smiles1 Mol2 Smiles2 exp. DeltaG [kcal/mol] exp. Error [kcal/mol]
0 lig_28 lig_35 edge_28_35 Mol [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3C([H])([H])[H])[H])[H])[H])[H])[H])[H] Mol [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C([H])([H])[H])Cl)[H])[H])[H])[H] -2.19 kilocalorie / mole 0.22 kilocalorie / mole
1 lig_30 lig_27 edge_30_27 Mol [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C([H])([H])[H])[H])[H])[H])[H])[H] Mol [H]c1c(c(c(c(c1[H])[H])OC([H])([H])C([H])([H])C([H])([H])C2=C(N(c3c2c(c(c(c3[H])[H])[H])[H])[H])C(=O)[O-])[H])[H] 1.73 kilocalorie / mole 0.22 kilocalorie / mole
2 lig_31 lig_35 edge_31_35 Mol [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C(F)(F)F)[H])[H])[H])[H])[H] Mol [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C([H])([H])[H])Cl)[H])[H])[H])[H] -0.89 kilocalorie / mole 0.22 kilocalorie / mole
3 lig_33 lig_27 edge_33_27 Mol [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])[H])Cl)[H])[H])[H])[H] Mol [H]c1c(c(c(c(c1[H])[H])OC([H])([H])C([H])([H])C([H])([H])C2=C(N(c3c2c(c(c(c3[H])[H])[H])[H])[H])C(=O)[O-])[H])[H] 0.76 kilocalorie / mole 0.22 kilocalorie / mole
4 lig_35 lig_33 edge_35_33 Mol [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C([H])([H])[H])Cl)[H])[H])[H])[H] Mol [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])[H])Cl)[H])[H])[H])[H] 1.93 kilocalorie / mole 0.28 kilocalorie / mole
5 lig_35 lig_37 edge_35_37 Mol [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C([H])([H])[H])Cl)[H])[H])[H])[H] Mol [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C([H])([H])[H])Cl)C([H])([H])[H])[H])[H])[H] -0.14 kilocalorie / mole 0.37 kilocalorie / mole
6 lig_39 lig_32 edge_39_32 Mol [H]c1c(c(c(c(c1[H])[H])c2c(c(c(c(c2[H])[H])OC([H])([H])C([H])([H])C([H])([H])C3=C(N(c4c3c(c(c(c4[H])[H])[H])[H])[H])C(=O)[O-])[H])[H])[H])[H] Mol [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])[H])C([H])([H])[H])[H])[H])[H])[H] 0.44 kilocalorie / mole 0.12 kilocalorie / mole

Finally, the set out of ligands and edges can be visualized in a graph:

[8]:
graph = mcl1.get_graph()
_images/examples_01-protein-ligand-benchmark_14_0.png

The LigandSet and Ligand class

The LigandSet consists of a Dict of Ligands which are availabe for one target. It is accessible via Target.get_ligand_set(), but can also be initialized directly.

[9]:
from plbenchmark import ligands
[10]:
mcl1_ligands = ligands.LigandSet('mcl1_sample')
HTML(mcl1_ligands.get_html())
/home/dhahn3/miniconda3/envs/plbenchmark/lib/python3.9/site-packages/pandas/core/dtypes/cast.py:1638: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  result[:] = values
[10]:
name smiles measurement DerivedMeasurement ROMol measurement
comment error type unit value type value error unit Reference
0 lig_23 [H]c1c(c(c2c(c1[H])c(c(c(c2OC([H])([H])C([H])([H])C([H])([H])C3=C(Sc4c3c(c(c(c4[H])[H])[H])[H])C(=O)[O-])[H])[H])[H])[H])[H] Table 2, entry 23 30 nanomolar ki nanomolar 370 nanomolar dg -8.83 kilocalorie / mole 0.05 kilocalorie / mole None Mol dtype: object Friberg et al., J. Med. Chem. 2013
1 lig_26 [H]c1c(c(c2c(c1[H])c(c(c(c2OC([H])([H])C([H])([H])C([H])([H])C3=C(Oc4c3c(c(c(c4[H])[H])[H])[H])C(=O)[O-])[H])[H])[H])[H])[H] Table 2, entry 26 0.44 micromolar ki micromolar 1.0 micromolar dg -8.24 kilocalorie / mole 0.26 kilocalorie / mole None Mol dtype: object Friberg et al., J. Med. Chem. 2013
2 lig_27 [H]c1c(c(c(c(c1[H])[H])OC([H])([H])C([H])([H])C([H])([H])C2=C(N(c3c2c(c(c(c3[H])[H])[H])[H])[H])C(=O)[O-])[H])[H] Table 3, entry 27 0.0071 millimolar ki millimolar 0.035 millimolar dg -6.12 kilocalorie / mole 0.12 kilocalorie / mole None Mol dtype: object Friberg et al., J. Med. Chem. 2013
3 lig_28 [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3C([H])([H])[H])[H])[H])[H])[H])[H])[H] Table 3, entry 28, manually converted 0.03 kilocalorie / mole dg kilocalorie / mole -6.62 kilocalorie / mole dg -6.62 kilocalorie / mole 0.03 kilocalorie / mole None Mol dtype: object Friberg et al., J. Med. Chem. 2013
4 lig_29 [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3C(F)(F)F)[H])[H])[H])[H])[H])[H] Table 3, entry 29, manually converted 120.0 calorie / mole dg calories / mole -6940.0 calorie / mole dg -6.94 kilocalorie / mole 0.12 kilocalorie / mole None Mol dtype: object Friberg et al., J. Med. Chem. 2013
5 lig_30 [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C([H])([H])[H])[H])[H])[H])[H])[H] Table 3, entry 30, manually converted 0.6 micromolar ic50 micromolar 1.9 micromolar dg -7.85 kilocalorie / mole 0.19 kilocalorie / mole None Mol dtype: object Friberg et al., J. Med. Chem. 2013
6 lig_31 [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C(F)(F)F)[H])[H])[H])[H])[H] Table 3, entry 31, manually converted 80 nanomolar ic50 nanomolar 1700 nanomolar dg -7.92 kilocalorie / mole 0.03 kilocalorie / mole None Mol dtype: object Friberg et al., J. Med. Chem. 2013
7 lig_32 [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])[H])C([H])([H])[H])[H])[H])[H])[H] Table 3, entry 32, manually converted 0.08 dimensionless pic50 dimensionless 4.8 dimensionless dg -6.59 kilocalorie / mole 0.11 kilocalorie / mole None Mol dtype: object Friberg et al., J. Med. Chem. 2013
8 lig_33 [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])[H])Cl)[H])[H])[H])[H] Table 3, entry 33, manually converted 0.75 kilojoule / mole dg kilojoules / mole -28.79 kilojoule / mole dg -6.880975143403441 kilocalorie / mole 0.17925430210325047 kilocalorie / mole None Mol dtype: object Friberg et al., J. Med. Chem. 2013
9 lig_34 [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])[H])C(F)(F)F)[H])[H])[H])[H] Table 3, entry 34 3.2 micromolar ki micromolar 9.9 micromolar dg -6.87 kilocalorie / mole 0.19 kilocalorie / mole None Mol dtype: object Friberg et al., J. Med. Chem. 2013
10 lig_35 [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C([H])([H])[H])Cl)[H])[H])[H])[H] Table 3, entry 35 0.14 micromolar ki micromolar 0.38 micromolar dg -8.81 kilocalorie / mole 0.22 kilocalorie / mole None Mol dtype: object Friberg et al., J. Med. Chem. 2013
11 lig_36 [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C([H])([H])C([H])([H])[H])Cl)[H])[H])[H])[H] Table 3, entry 36 0.1 micromolar ki micromolar 1.1 micromolar dg -8.18 kilocalorie / mole 0.05 kilocalorie / mole None Mol dtype: object Friberg et al., J. Med. Chem. 2013
12 lig_37 [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C([H])([H])[H])Cl)C([H])([H])[H])[H])[H])[H] Table 3, entry 37 0.15 micromolar ki micromolar 0.3 micromolar dg -8.95 kilocalorie / mole 0.3 kilocalorie / mole None Mol dtype: object Friberg et al., J. Med. Chem. 2013
13 lig_38 [H]c1c(c(c(c(c1[H])[H])c2c(c(c(c(c2[H])OC([H])([H])C([H])([H])C([H])([H])C3=C(N(c4c3c(c(c(c4[H])[H])[H])[H])[H])C(=O)[O-])[H])[H])[H])[H])[H] Table 3, entry 38 2.1 micromolar ki micromolar 7.7 micromolar dg -7.02 kilocalorie / mole 0.16 kilocalorie / mole None Mol dtype: object Friberg et al., J. Med. Chem. 2013
14 lig_39 [H]c1c(c(c(c(c1[H])[H])c2c(c(c(c(c2[H])[H])OC([H])([H])C([H])([H])C([H])([H])C3=C(N(c4c3c(c(c(c4[H])[H])[H])[H])[H])C(=O)[O-])[H])[H])[H])[H] Table 3, entry 39 0.7 micromolar ki micromolar 7.6 micromolar dg -7.03 kilocalorie / mole 0.05 kilocalorie / mole None Mol dtype: object Friberg et al., J. Med. Chem. 2013

The Ligand classes can be accessed from the LigandSet by their name. Each Ligand has information about experimental data, references, SMILES string and SDF file path of the docked structure. Additionally, there are functions to derive and process the primary data, which is then added to the pandas.Series as a new entry.

[11]:
lig_30 = mcl1_ligands['lig_30']
lig_27 = mcl1_ligands.get_ligand('lig_27')

The EdgeSet and Edge class

The EdgeSet contains a dict of Edges which are availabe for one target. It is accessible via Target.get_edge_set(), but can also be initialized directly.

[12]:
from plbenchmark import edges
[13]:
mcl1_edges = edges.EdgeSet('mcl1_sample')
HTML(mcl1_edges.get_html())
/home/dhahn3/miniconda3/envs/plbenchmark/lib/python3.9/site-packages/pandas/core/dtypes/cast.py:1638: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  result[:] = values
[13]:
ligand_a ligand_b name Mol1 Smiles1 Mol2 Smiles2 exp. DeltaG [kcal/mol] exp. Error [kcal/mol]
0 lig_28 lig_35 edge_28_35 Mol [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3C([H])([H])[H])[H])[H])[H])[H])[H])[H] Mol [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C([H])([H])[H])Cl)[H])[H])[H])[H] -2.19 kilocalorie / mole 0.22 kilocalorie / mole
1 lig_30 lig_27 edge_30_27 Mol [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C([H])([H])[H])[H])[H])[H])[H])[H] Mol [H]c1c(c(c(c(c1[H])[H])OC([H])([H])C([H])([H])C([H])([H])C2=C(N(c3c2c(c(c(c3[H])[H])[H])[H])[H])C(=O)[O-])[H])[H] 1.73 kilocalorie / mole 0.22 kilocalorie / mole
2 lig_31 lig_35 edge_31_35 Mol [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C(F)(F)F)[H])[H])[H])[H])[H] Mol [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C([H])([H])[H])Cl)[H])[H])[H])[H] -0.89 kilocalorie / mole 0.22 kilocalorie / mole
3 lig_33 lig_27 edge_33_27 Mol [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])[H])Cl)[H])[H])[H])[H] Mol [H]c1c(c(c(c(c1[H])[H])OC([H])([H])C([H])([H])C([H])([H])C2=C(N(c3c2c(c(c(c3[H])[H])[H])[H])[H])C(=O)[O-])[H])[H] 0.76 kilocalorie / mole 0.22 kilocalorie / mole
4 lig_35 lig_33 edge_35_33 Mol [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C([H])([H])[H])Cl)[H])[H])[H])[H] Mol [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])[H])Cl)[H])[H])[H])[H] 1.93 kilocalorie / mole 0.28 kilocalorie / mole
5 lig_35 lig_37 edge_35_37 Mol [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C([H])([H])[H])Cl)[H])[H])[H])[H] Mol [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C([H])([H])[H])Cl)C([H])([H])[H])[H])[H])[H] -0.14 kilocalorie / mole 0.37 kilocalorie / mole
6 lig_39 lig_32 edge_39_32 Mol [H]c1c(c(c(c(c1[H])[H])c2c(c(c(c(c2[H])[H])OC([H])([H])C([H])([H])C([H])([H])C3=C(N(c4c3c(c(c(c4[H])[H])[H])[H])[H])C(=O)[O-])[H])[H])[H])[H] Mol [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])[H])C([H])([H])[H])[H])[H])[H])[H] 0.44 kilocalorie / mole 0.12 kilocalorie / mole
[14]:
mcl1_edges.keys()
[14]:
dict_keys(['edge_28_35', 'edge_30_27', 'edge_31_35', 'edge_33_27', 'edge_35_33', 'edge_35_37', 'edge_39_32'])

The Edge classes can be accessed from the EdgeSet by their name. They are lightweight and provide only access to a pandas.DataFrame and a Dict:

[15]:
edge_30_27 = mcl1_edges.get_edge('edge_30_27')
df = edge_30_27.get_dataframe()
edge_30_27.get_dict()
[15]:
{'ligand_a': 'lig_30',
 'ligand_b': 'lig_27',
 'name': 'edge_30_27',
 'Mol1': <rdkit.Chem.rdchem.Mol at 0x7f1a3046e8e0>,
 'Smiles1': '[H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C([H])([H])[H])[H])[H])[H])[H])[H]',
 'Mol2': <rdkit.Chem.rdchem.Mol at 0x7f1a30460700>,
 'Smiles2': '[H]c1c(c(c(c(c1[H])[H])OC([H])([H])C([H])([H])C([H])([H])C2=C(N(c3c2c(c(c(c3[H])[H])[H])[H])[H])C(=O)[O-])[H])[H]',
 'exp. DeltaG [kcal/mol]': 1.73 <Unit('kilocalorie / mole')>,
 'exp. Error [kcal/mol]': 0.22 <Unit('kilocalorie / mole')>}
[ ]:

Data

Data file tree and file description

The data is organized as follows:

data
├── targets.yml                               # list of all targets and their directories   
├── <date>_<target_name_1>                    # directory for target 1
│   ├── 00_data                               #     metadata for target 1
│   │   ├── edges.yml                         #         edges/perturbations
│   │   ├── ligands.yml                       #         ligands and activities
│   │   └── target.yml                        #         target
│   ├── 01_protein                            #     protein data
│   │   ├── crd                               #         coordinates
│   │   │   ├── cofactors_crystalwater.pdb    #             cofactors and cyrstal waters    
│   │   │   └── protein.pdb                   #             aminoacid residues   
│   │   └── top                               #         topology(s)
│   │   │   └── amber99sb-star-ildn-mut.ff    #             force field spec.     
│   │   │       ├── cofactors_crystalwater.top#                 Gromacs TOP file of cofactors and crystal water
│   │   │       ├── protein.top               #                 Gromacs TOP file of amino acid residues
│   │   │       └── *.itp                     #                 Gromacs ITP file(s) to be included in TOP files
│   └── 02_ligands                            #     ligands
│   ├── lig_<name_1>                          #          ligand 1 
│   │   ├── crd                               #              coordinates
│   │   │   └── lig_<name_1>.sdf              #                  SDF file
│   │   └── top                               #              topology(s)
│   │       └── openff-1.0.0.offxml           #                  force field spec.       
│   │           ├── fflig_<name_1>.itp        #                      Gromacs ITP file : atom types     
│   │           ├── lig_<name_1>.itp          #                      Gromacs ITP file       
│   │           ├── lig_<name_1>.top          #                      Gromacs TOP file                
│   │           └── posre_lig_<name_1>.itp    #                      Gromacs ITP file : position restraint file  
│   ├── lig_<name_2>                          #         ligand 2                               
│   …                                        
│   └── 03_hybrid                             #    edges (perturbations)
│   ├── edge_<name_1>_<name_2>                #         edge between ligand 1 and ligand 2   
│   │   └── water                             #             edge in water 
│   │       ├── crd                           #                 coordinates 
│   │       │   ├── mergedA.pdb               #                     merged conf based on coords of ligand 1  
│   │       │   ├── mergedB.pdb               #                     merged conf based on coords of ligand 2   
│   │       │   ├── pairs.dat                 #                     atom mapping                  
│   │       │   └── score.dat                 #                     similarity score         
│   │       └── top                           #                 topology(s)       
│   │           └── openff-1.0.0.offxml       #                     force field spec.         
│   │               ├── ffmerged.itp          #                         Gromacs ITP file  
│   │               ├── ffMOL.itp             #                         Gromacs ITP file   
│   │               └── merged.itp            #                         Gromacs ITP file     
│   …                                        
├── <date>_<target_name_2>                    # directory for target 2  
…

Contributions

API Documentation

Targets

targets.py Functions and classes for handling the target data.

class plbenchmark.targets.Target(name: str)[source]

Class to store the data of one target.

add_ligand_data()[source]

Adds data from ligands to plbenchmark.targets.target. Molecule images and the minimum and maximum affinity are added.

Returns

None

Processes primary data to have links in the html string of the target data

Returns

None

get_dataframe(columns=None)[source]

Access the target data as a pandas.DataFrame

Parameters

colslist of columns which should be returned in the pandas.DataFrame

Returns

pandas.DataFrame

get_edge_set()[source]

Get plbenchmark:edges:edgeSet associated with the target

Returns

plbenchmark:edges:edgeSet object

get_edge_set_dataframe(columns=None)[source]

Get plbenchmark:edges:edgeSet associated with the target as a pandas.DataFrame

Parameters

columns – list of columns which should be returned in the pandas.DataFrame

Returns

plbenchmark:edges:edgeSet object

get_edge_set_html(columns=None)[source]

Get plbenchmark:edges:edgeSet associated with the target in a html string

Parameters

columnslist of edge which should be returned

Returns

html string

get_graph()[source]

Get a graph representation of the ligand perturbations associated with the target in a matplotlib.figure

Returns

matplotlib.figure

get_ligand_set()[source]

Get ligandSet associated with the target

Returns

plbenchmark.ligands.ligandSet object

get_ligand_set_dataframe(columns=None)[source]

Get ligandSet associated with the target in a pandas.DataFrame

Parameters

columnslist of columns which should be returned in the pandas.DataFrame

Returns

pandas.DataFrame

get_ligand_set_html(columns=None)[source]

Get ligandSet associated with the target in a html string

Parameters

columns – list of columns which should be returned

Returns

html string

get_name()[source]

Access the name of the target.

Returns

name as a string

class plbenchmark.targets.TargetSet(*arg, **kw)[source]

Class inherited from dict to store all available targets in plbenchmark.

get_dataframe(columns=None)[source]

Convert targetSet class to pandas.DataFrame

Parameters

columnslist of columns which should be returned in the pandas.DataFrame

Returns

pandas.DataFrame

get_html(columns=None)[source]

Access the plbenchmark:targets:targetSet as a HTML string

Parameters

colslist of columns which should be returned in the pandas.DataFrame

Returns

HTML string

get_names()[source]

Get a list of available target names

Returns

list of strings

get_target(name)[source]

Accesses one target of the targetSet

Parameters

name – string name of the target

Returns

plbenchmark.targets.target class

plbenchmark.targets.get_target_data_path(target)[source]

Gets the file path of the target data

Parameters

target – string with target name

Returns

list of directories (have to be joined with ‘/’ to get the file path relative to the plbenchmark repository)

plbenchmark.targets.get_target_dir(target)[source]

Gets the directory name of the target

Parameters

target – string with target name

Returns

string with directory name

plbenchmark.targets.set_data_dir(path='/home/docs/checkouts/readthedocs.org/user_builds/plbenchmarks/checkouts/0.1.2/plbenchmark/sample_data')[source]

Gets the directory name of the target

Parameters

path – string with path to data directory

Ligands

ligands.py Functions and classes for handling the ligand data.

class plbenchmark.ligands.Ligand(d: dict, target: Optional[str] = None)[source]

Store and convert the data of one ligand in a pandas.Series.

add_mol_to_frame()[source]

Adds a image file of the ligand to the pandas.Dataframe

Returns

None

derive_observables(derived_type='dg', destination='DerivedMeasurement', out_unit=None)[source]

Derive observables from (stored) primary data, which is then stored in the pandas.DataFrame

Parameters
  • derived_type – type of derived observable, can be any of ‘dg’ ‘ki’, ‘ic50’ or ‘pic50’

  • destination – string with column name for ‘pandas.DataFrame’ where the derived observable should be stored.

  • out_unit – unit of type pint unit of derived coordinate

Returns

None

Processes primary data to have links in the html string of the ligand data

Returns

None

get_coordinate_file_path()[source]

Get file path relative to the plbenchmark repository of the SDF coordinate file of the docked ligand

Returns

file path as string

get_dataframe(columns=None)[source]

Access the ligand data as a pandas.Dataframe

Parameters

columns – list of columns which should be returned in the pandas.Dataframe

Returns

pandas.Dataframe

get_html(columns=None)[source]

Access the ligand as a HTML string

Parameters

columns – list of columns which should be returned in the pandas.Dataframe

Returns

HTML string

get_image()[source]

Creates a molecule image.

Returns

PIL.Image object

get_molecule()[source]

Get molecule object with coordinates of the docked ligand

Returns

file path as string

get_name()[source]

Access the name of the ligand.

Returns

name: string

class plbenchmark.ligands.LigandSet(target, *arg, **kw)[source]

Class inherited from dict to store all available ligands of one target.

get_dataframe(columns=None)[source]

Access the ligandSet as a pandas.Dataframe

Parameters

columnslist of columns which should be returned in the pandas.Dataframe

Returns

pandas.Dataframe

get_html(columns=None)[source]

Access the plbenchmark:ligands.ligandSet as a HTML string

Parameters

columnslist of columns which should be returned in the pandas.Dataframe

Returns

HTML string

get_ligand(name)[source]

Accesses one ligand of the ligandSet

Parameters

name – string name of the ligand

Returns

plbenchmark.ligands.ligand class

get_list()[source]

Returns list of ligands

Returns

list of ligand names

get_molecules()[source]

Returns a dict with names as keys and values as py:class:openforcefield:topology:Molecule objects

Returns

dict

Edges

edges.py Functions and classes for handling the perturbation edges.

class plbenchmark.edges.Edge(d: dict)[source]

Store and convert the data of one perturbation (“edge”) in a pandas.Series.

Parameters

ddict with the edge data

Returns

None

add_ligand_data(ligand_set)[source]

Adds data from ligands to edge. Molecule images and the affinity difference are added.

Parameters

ligand_setplbenchmark:ligands:ligandSet class of the same target

Returns

None

get_dataframe(columns=None)[source]

Access the edge data as a pandas.DataFrame

Parameters

cols – list of columns which should be returned in the pandas.DataFrame

Returns

pandas.DataFrame

get_dict()[source]

Access the edge data as a dict which contains the name of the edge as key and the names of the two ligands as list.

Returns

dict

get_name()[source]

Access the name of the edge.

Returns

name as string

class plbenchmark.edges.EdgeSet(target, *arg, **kw)[source]

Class inherited from dict to store all available edges of one target.

get_dataframe(columns=None)[source]

Access the plbenchmark:edges.edgeSet as a pandas.DataFrame

Parameters

colslist of columns which should be returned in the pandas.DataFrame

Returns

pandas.DataFrame

get_dict()[source]

Access the plbenchmark:edges.edgeSet as a dict which contains the name of the edges as key and the names of the two ligands in a list as items.

Returns

dict

get_edge(name)[source]

Accesses one edge of the plbenchmark.edges.edgeSet

Parameters

name – string name of the edge

Returns

plbenchmark:edges:edge class

get_html(columns=None)[source]

Access the plbenchmark:edges.edgeSet as a HTML string

Parameters

colslist of columns which should be returned in the pandas.DataFrame

Returns

HTML string

Utils

utils.py Contains utility functions

plbenchmark.utils.convert_error(error_value, value, original_type, final_type, temperature=300.0, out_unit=None)[source]

Converts an experimental value into another derived quantity with specified unit.

Parameters
  • error_value – float, error of val, numerical value

  • value – float, numerical value

  • original_type – string, code for the original observable. Can be dg, ki, ic50, pic50

  • final_type – string, code for the desired derived quantity. Can be dg, ki, ic50, pic50

  • temperature – float, temperature in kelvin

  • out_unit – unit of type pint, output unit of final_type, needs to fit to the requested final_type

Returns

pint.Quantity with desired unit

plbenchmark.utils.convert_value(value, original_type, final_type, temperature=300.0, out_unit=None)[source]

Converts an experimental value into another derived quantity with specified unit.

Parameters
  • value – float, numerical value

  • original_type – string, code for the original observable. Can be dg, ki, ic50, pic50

  • final_type – string, code for the desired derived quantity. Can be dg, ki, ic50, pic50

  • temperature – float, temperature in kelvin

  • out_unit – unit of type pint, output unit of final_type, needs to fit to the requested final_type

Returns

pint.Quantity with desired unit

plbenchmark.utils.find_doi_url(doi)[source]

Finds the links to a digital object identifier (doi).

Parameters

doi – string

Returns

string compiled string including the urls to the publication

plbenchmark.utils.find_pdb_url(pdb)[source]

Finds the links to a pdb or a list of pdb codes.

Parameters

pdb – string or list of strings

Returns

string compiled string including the urls to the pdb entries

Indices and tables