Protein-Ligand Benchmarks for Free Energy Calculations
Installing the Protein Ligand Benchmark Set
The Protein Ligand Benchmark Set is currently only installable from source.
Installation from Source
The repository uses git-lfs (large file storage) for the storage of all the data file. Ideally git-lfs is installed first before cloning the repository.
conda create -n plbenchmark python=3.7 git-lfs
conda activate plbenchmark
git lfs clone https://github.com/openforcefield/protein-ligand-benchmark.git
cd protein-ligand-benchmark
conda env update --file environment.yml
pip install -e .
Example Notebook: protein-ligand-benchmark
Related Publication
The preprint on “Best practices for constructing, preparing, and evaluating protein-ligand binding affinity benchmarks” provides accompanying information to this benchmark dataset and how to use it for alchemical free energy calculations. For any suggestions of improvements please raise an issue in its GitHub repository.
[1]:
from plbenchmark import targets
from IPython.core.display import HTML
Warning: Unable to load toolkit 'OpenEye Toolkit'. The Open Force Field Toolkit does not require the OpenEye Toolkits, and can use RDKit/AmberTools instead. However, if you have a valid license for the OpenEye Toolkits, consider installing them for faster performance and additional file format support: https://docs.eyesopen.com/toolkits/python/quickstart-python/linuxosx.html OpenEye offers free Toolkit licenses for academics: https://www.eyesopen.com/academic-licensing
Get the whole set of targets in the dataset
[2]:
# it is initialized from the `plbenchmark/sample_data/targets.yml` file
target_set = targets.TargetSet()
# to see which targets are available, one can get a list of names
target_set.get_names()
[2]:
['mcl1_sample']
The TargetSet
is a Dict
, but can be converted to a pandas.DataFrame
or a html
string via TargetSet.get_dataframe(columns=None)
or TargetSet.get_html(columns=None)
. The default None
for columns
means that all columns are printed. One can also define a subset of columns as a list
:
[3]:
HTML(target_set.get_html(columns=['name', 'fullname', 'pdb', 'references', 'numLigands', 'minDG', 'maxDG', 'associated_sets']))
/home/dhahn3/miniconda3/envs/plbenchmark/lib/python3.9/site-packages/pandas/core/dtypes/cast.py:1638: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
result[:] = values
[3]:
name | fullname | pdb | references | numLigands | minDG | maxDG | associated_sets | |
---|---|---|---|---|---|---|---|---|
0 | mcl1_sample | Induced myeloid leukemia cell differentiation protein Mcl-1 | 4HW3 | {'calculation': ['10.1021/ja512751q', '10.1021/acs.jcim.9b00105'], 'measurement': None} | 15 | -9.0 kilocalorie / mole | -6.1 kilocalorie / mole | [Schrodinger JACS] |
A target
can be accessed with its name in two ways
[4]:
mcl1 = target_set['mcl1_sample']
mcl1_2 = target_set.get_target('mcl1_sample')
The Target
class
contains all the available information about one target of plbenchmark. It also has two member variables, _ligand_set
and _edge_set
, which contain the information about the available ligand and edges of the respective target. A Target
can either be accessed from the TargetSet
(see cell before) or initialized using its name via
[5]:
mcl1 = targets.Target('mcl1_sample')
# The data in the column is stored in a pandas.Series and can be accessed via
mcl1.get_dataframe(columns=None)
/home/dhahn3/miniconda3/envs/plbenchmark/lib/python3.9/site-packages/pandas/core/dtypes/cast.py:1638: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
result[:] = values
[5]:
associated_sets [Schrodinger JACS]
comments hydrophobic interactions contributing to binding
date 2020-08-21
fullname Induced myeloid leukemia cell differentiation ...
id 99
ligands [lig_23, lig_26, lig_27, lig_28, lig_29, lig_3...
name mcl1_sample
netcharge xx
pdb 4HW3
references {'calculation': ['10.1021/ja512751q', '10.1021...
numLigands 15
maxDG -6.1 kilocalorie / mole
minDG -9.0 kilocalorie / mole
std(DG) 0.9 kilocalorie / mole
calculation REP1http://dx.doi.org/10.1021/ja512751qREP2Wan...
pdblinks REP1http://www.rcsb.org/structure/4HW3REP24HW3...
dtype: object
Access to the EdgeSet
and LigandSet
in different formats is achieved by
[6]:
mcl1_ligands = mcl1.get_ligand_set()
mcl1_ligands_df = mcl1.get_ligand_set_dataframe()
HTML(mcl1.get_ligand_set_html(columns = ['name', 'ROMol', 'measurement', 'DerivedMeasurement']))
[6]:
name | ROMol | measurement | DerivedMeasurement | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
comment | error | type | unit | value | Reference | type | value | error | unit | |||
0 | lig_23 | Table 2, entry 23 | 30 nanomolar | ki | nanomolar | 370 nanomolar | Friberg et al., J. Med. Chem. 2013 | dg | -8.83 kilocalorie / mole | 0.05 kilocalorie / mole | None | |
1 | lig_26 | Table 2, entry 26 | 0.44 micromolar | ki | micromolar | 1.0 micromolar | Friberg et al., J. Med. Chem. 2013 | dg | -8.24 kilocalorie / mole | 0.26 kilocalorie / mole | None | |
2 | lig_27 | Table 3, entry 27 | 0.0071 millimolar | ki | millimolar | 0.035 millimolar | Friberg et al., J. Med. Chem. 2013 | dg | -6.12 kilocalorie / mole | 0.12 kilocalorie / mole | None | |
3 | lig_28 | Table 3, entry 28, manually converted | 0.03 kilocalorie / mole | dg | kilocalorie / mole | -6.62 kilocalorie / mole | Friberg et al., J. Med. Chem. 2013 | dg | -6.62 kilocalorie / mole | 0.03 kilocalorie / mole | None | |
4 | lig_29 | Table 3, entry 29, manually converted | 120.0 calorie / mole | dg | calories / mole | -6940.0 calorie / mole | Friberg et al., J. Med. Chem. 2013 | dg | -6.94 kilocalorie / mole | 0.12 kilocalorie / mole | None | |
5 | lig_30 | Table 3, entry 30, manually converted | 0.6 micromolar | ic50 | micromolar | 1.9 micromolar | Friberg et al., J. Med. Chem. 2013 | dg | -7.85 kilocalorie / mole | 0.19 kilocalorie / mole | None | |
6 | lig_31 | Table 3, entry 31, manually converted | 80 nanomolar | ic50 | nanomolar | 1700 nanomolar | Friberg et al., J. Med. Chem. 2013 | dg | -7.92 kilocalorie / mole | 0.03 kilocalorie / mole | None | |
7 | lig_32 | Table 3, entry 32, manually converted | 0.08 dimensionless | pic50 | dimensionless | 4.8 dimensionless | Friberg et al., J. Med. Chem. 2013 | dg | -6.59 kilocalorie / mole | 0.11 kilocalorie / mole | None | |
8 | lig_33 | Table 3, entry 33, manually converted | 0.75 kilojoule / mole | dg | kilojoules / mole | -28.79 kilojoule / mole | Friberg et al., J. Med. Chem. 2013 | dg | -6.880975143403441 kilocalorie / mole | 0.17925430210325047 kilocalorie / mole | None | |
9 | lig_34 | Table 3, entry 34 | 3.2 micromolar | ki | micromolar | 9.9 micromolar | Friberg et al., J. Med. Chem. 2013 | dg | -6.87 kilocalorie / mole | 0.19 kilocalorie / mole | None | |
10 | lig_35 | Table 3, entry 35 | 0.14 micromolar | ki | micromolar | 0.38 micromolar | Friberg et al., J. Med. Chem. 2013 | dg | -8.81 kilocalorie / mole | 0.22 kilocalorie / mole | None | |
11 | lig_36 | Table 3, entry 36 | 0.1 micromolar | ki | micromolar | 1.1 micromolar | Friberg et al., J. Med. Chem. 2013 | dg | -8.18 kilocalorie / mole | 0.05 kilocalorie / mole | None | |
12 | lig_37 | Table 3, entry 37 | 0.15 micromolar | ki | micromolar | 0.3 micromolar | Friberg et al., J. Med. Chem. 2013 | dg | -8.95 kilocalorie / mole | 0.3 kilocalorie / mole | None | |
13 | lig_38 | Table 3, entry 38 | 2.1 micromolar | ki | micromolar | 7.7 micromolar | Friberg et al., J. Med. Chem. 2013 | dg | -7.02 kilocalorie / mole | 0.16 kilocalorie / mole | None | |
14 | lig_39 | Table 3, entry 39 | 0.7 micromolar | ki | micromolar | 7.6 micromolar | Friberg et al., J. Med. Chem. 2013 | dg | -7.03 kilocalorie / mole | 0.05 kilocalorie / mole | None |
[7]:
mcl1_edges = mcl1.get_edge_set()
mcl1_edges_df = mcl1.get_edge_set_dataframe()
HTML(mcl1.get_edge_set_html())
/home/dhahn3/miniconda3/envs/plbenchmark/lib/python3.9/site-packages/pandas/core/dtypes/cast.py:1638: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
result[:] = values
[7]:
ligand_a | ligand_b | name | Mol1 | Smiles1 | Mol2 | Smiles2 | exp. DeltaG [kcal/mol] | exp. Error [kcal/mol] | |
---|---|---|---|---|---|---|---|---|---|
0 | lig_28 | lig_35 | edge_28_35 | [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3C([H])([H])[H])[H])[H])[H])[H])[H])[H] | [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C([H])([H])[H])Cl)[H])[H])[H])[H] | -2.19 kilocalorie / mole | 0.22 kilocalorie / mole | ||
1 | lig_30 | lig_27 | edge_30_27 | [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C([H])([H])[H])[H])[H])[H])[H])[H] | [H]c1c(c(c(c(c1[H])[H])OC([H])([H])C([H])([H])C([H])([H])C2=C(N(c3c2c(c(c(c3[H])[H])[H])[H])[H])C(=O)[O-])[H])[H] | 1.73 kilocalorie / mole | 0.22 kilocalorie / mole | ||
2 | lig_31 | lig_35 | edge_31_35 | [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C(F)(F)F)[H])[H])[H])[H])[H] | [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C([H])([H])[H])Cl)[H])[H])[H])[H] | -0.89 kilocalorie / mole | 0.22 kilocalorie / mole | ||
3 | lig_33 | lig_27 | edge_33_27 | [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])[H])Cl)[H])[H])[H])[H] | [H]c1c(c(c(c(c1[H])[H])OC([H])([H])C([H])([H])C([H])([H])C2=C(N(c3c2c(c(c(c3[H])[H])[H])[H])[H])C(=O)[O-])[H])[H] | 0.76 kilocalorie / mole | 0.22 kilocalorie / mole | ||
4 | lig_35 | lig_33 | edge_35_33 | [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C([H])([H])[H])Cl)[H])[H])[H])[H] | [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])[H])Cl)[H])[H])[H])[H] | 1.93 kilocalorie / mole | 0.28 kilocalorie / mole | ||
5 | lig_35 | lig_37 | edge_35_37 | [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C([H])([H])[H])Cl)[H])[H])[H])[H] | [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C([H])([H])[H])Cl)C([H])([H])[H])[H])[H])[H] | -0.14 kilocalorie / mole | 0.37 kilocalorie / mole | ||
6 | lig_39 | lig_32 | edge_39_32 | [H]c1c(c(c(c(c1[H])[H])c2c(c(c(c(c2[H])[H])OC([H])([H])C([H])([H])C([H])([H])C3=C(N(c4c3c(c(c(c4[H])[H])[H])[H])[H])C(=O)[O-])[H])[H])[H])[H] | [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])[H])C([H])([H])[H])[H])[H])[H])[H] | 0.44 kilocalorie / mole | 0.12 kilocalorie / mole |
Finally, the set out of ligands and edges can be visualized in a graph:
[8]:
graph = mcl1.get_graph()

The LigandSet
and Ligand
class
The LigandSet
consists of a Dict
of Ligand
s which are availabe for one target. It is accessible via Target.get_ligand_set()
, but can also be initialized directly.
[9]:
from plbenchmark import ligands
[10]:
mcl1_ligands = ligands.LigandSet('mcl1_sample')
HTML(mcl1_ligands.get_html())
/home/dhahn3/miniconda3/envs/plbenchmark/lib/python3.9/site-packages/pandas/core/dtypes/cast.py:1638: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
result[:] = values
[10]:
name | smiles | measurement | DerivedMeasurement | ROMol | measurement | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
comment | error | type | unit | value | type | value | error | unit | Reference | ||||
0 | lig_23 | [H]c1c(c(c2c(c1[H])c(c(c(c2OC([H])([H])C([H])([H])C([H])([H])C3=C(Sc4c3c(c(c(c4[H])[H])[H])[H])C(=O)[O-])[H])[H])[H])[H])[H] | Table 2, entry 23 | 30 nanomolar | ki | nanomolar | 370 nanomolar | dg | -8.83 kilocalorie / mole | 0.05 kilocalorie / mole | None | Friberg et al., J. Med. Chem. 2013 | |
1 | lig_26 | [H]c1c(c(c2c(c1[H])c(c(c(c2OC([H])([H])C([H])([H])C([H])([H])C3=C(Oc4c3c(c(c(c4[H])[H])[H])[H])C(=O)[O-])[H])[H])[H])[H])[H] | Table 2, entry 26 | 0.44 micromolar | ki | micromolar | 1.0 micromolar | dg | -8.24 kilocalorie / mole | 0.26 kilocalorie / mole | None | Friberg et al., J. Med. Chem. 2013 | |
2 | lig_27 | [H]c1c(c(c(c(c1[H])[H])OC([H])([H])C([H])([H])C([H])([H])C2=C(N(c3c2c(c(c(c3[H])[H])[H])[H])[H])C(=O)[O-])[H])[H] | Table 3, entry 27 | 0.0071 millimolar | ki | millimolar | 0.035 millimolar | dg | -6.12 kilocalorie / mole | 0.12 kilocalorie / mole | None | Friberg et al., J. Med. Chem. 2013 | |
3 | lig_28 | [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3C([H])([H])[H])[H])[H])[H])[H])[H])[H] | Table 3, entry 28, manually converted | 0.03 kilocalorie / mole | dg | kilocalorie / mole | -6.62 kilocalorie / mole | dg | -6.62 kilocalorie / mole | 0.03 kilocalorie / mole | None | Friberg et al., J. Med. Chem. 2013 | |
4 | lig_29 | [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3C(F)(F)F)[H])[H])[H])[H])[H])[H] | Table 3, entry 29, manually converted | 120.0 calorie / mole | dg | calories / mole | -6940.0 calorie / mole | dg | -6.94 kilocalorie / mole | 0.12 kilocalorie / mole | None | Friberg et al., J. Med. Chem. 2013 | |
5 | lig_30 | [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C([H])([H])[H])[H])[H])[H])[H])[H] | Table 3, entry 30, manually converted | 0.6 micromolar | ic50 | micromolar | 1.9 micromolar | dg | -7.85 kilocalorie / mole | 0.19 kilocalorie / mole | None | Friberg et al., J. Med. Chem. 2013 | |
6 | lig_31 | [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C(F)(F)F)[H])[H])[H])[H])[H] | Table 3, entry 31, manually converted | 80 nanomolar | ic50 | nanomolar | 1700 nanomolar | dg | -7.92 kilocalorie / mole | 0.03 kilocalorie / mole | None | Friberg et al., J. Med. Chem. 2013 | |
7 | lig_32 | [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])[H])C([H])([H])[H])[H])[H])[H])[H] | Table 3, entry 32, manually converted | 0.08 dimensionless | pic50 | dimensionless | 4.8 dimensionless | dg | -6.59 kilocalorie / mole | 0.11 kilocalorie / mole | None | Friberg et al., J. Med. Chem. 2013 | |
8 | lig_33 | [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])[H])Cl)[H])[H])[H])[H] | Table 3, entry 33, manually converted | 0.75 kilojoule / mole | dg | kilojoules / mole | -28.79 kilojoule / mole | dg | -6.880975143403441 kilocalorie / mole | 0.17925430210325047 kilocalorie / mole | None | Friberg et al., J. Med. Chem. 2013 | |
9 | lig_34 | [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])[H])C(F)(F)F)[H])[H])[H])[H] | Table 3, entry 34 | 3.2 micromolar | ki | micromolar | 9.9 micromolar | dg | -6.87 kilocalorie / mole | 0.19 kilocalorie / mole | None | Friberg et al., J. Med. Chem. 2013 | |
10 | lig_35 | [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C([H])([H])[H])Cl)[H])[H])[H])[H] | Table 3, entry 35 | 0.14 micromolar | ki | micromolar | 0.38 micromolar | dg | -8.81 kilocalorie / mole | 0.22 kilocalorie / mole | None | Friberg et al., J. Med. Chem. 2013 | |
11 | lig_36 | [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C([H])([H])C([H])([H])[H])Cl)[H])[H])[H])[H] | Table 3, entry 36 | 0.1 micromolar | ki | micromolar | 1.1 micromolar | dg | -8.18 kilocalorie / mole | 0.05 kilocalorie / mole | None | Friberg et al., J. Med. Chem. 2013 | |
12 | lig_37 | [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C([H])([H])[H])Cl)C([H])([H])[H])[H])[H])[H] | Table 3, entry 37 | 0.15 micromolar | ki | micromolar | 0.3 micromolar | dg | -8.95 kilocalorie / mole | 0.3 kilocalorie / mole | None | Friberg et al., J. Med. Chem. 2013 | |
13 | lig_38 | [H]c1c(c(c(c(c1[H])[H])c2c(c(c(c(c2[H])OC([H])([H])C([H])([H])C([H])([H])C3=C(N(c4c3c(c(c(c4[H])[H])[H])[H])[H])C(=O)[O-])[H])[H])[H])[H])[H] | Table 3, entry 38 | 2.1 micromolar | ki | micromolar | 7.7 micromolar | dg | -7.02 kilocalorie / mole | 0.16 kilocalorie / mole | None | Friberg et al., J. Med. Chem. 2013 | |
14 | lig_39 | [H]c1c(c(c(c(c1[H])[H])c2c(c(c(c(c2[H])[H])OC([H])([H])C([H])([H])C([H])([H])C3=C(N(c4c3c(c(c(c4[H])[H])[H])[H])[H])C(=O)[O-])[H])[H])[H])[H] | Table 3, entry 39 | 0.7 micromolar | ki | micromolar | 7.6 micromolar | dg | -7.03 kilocalorie / mole | 0.05 kilocalorie / mole | None | Friberg et al., J. Med. Chem. 2013 |
The Ligand
classes can be accessed from the LigandSet
by their name. Each Ligand
has information about experimental data, references, SMILES string and SDF file path of the docked structure. Additionally, there are functions to derive and process the primary data, which is then added to the pandas.Series
as a new entry.
[11]:
lig_30 = mcl1_ligands['lig_30']
lig_27 = mcl1_ligands.get_ligand('lig_27')
The EdgeSet
and Edge
class
The EdgeSet
contains a dict
of Edge
s which are availabe for one target. It is accessible via Target.get_edge_set()
, but can also be initialized directly.
[12]:
from plbenchmark import edges
[13]:
mcl1_edges = edges.EdgeSet('mcl1_sample')
HTML(mcl1_edges.get_html())
/home/dhahn3/miniconda3/envs/plbenchmark/lib/python3.9/site-packages/pandas/core/dtypes/cast.py:1638: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
result[:] = values
[13]:
ligand_a | ligand_b | name | Mol1 | Smiles1 | Mol2 | Smiles2 | exp. DeltaG [kcal/mol] | exp. Error [kcal/mol] | |
---|---|---|---|---|---|---|---|---|---|
0 | lig_28 | lig_35 | edge_28_35 | [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3C([H])([H])[H])[H])[H])[H])[H])[H])[H] | [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C([H])([H])[H])Cl)[H])[H])[H])[H] | -2.19 kilocalorie / mole | 0.22 kilocalorie / mole | ||
1 | lig_30 | lig_27 | edge_30_27 | [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C([H])([H])[H])[H])[H])[H])[H])[H] | [H]c1c(c(c(c(c1[H])[H])OC([H])([H])C([H])([H])C([H])([H])C2=C(N(c3c2c(c(c(c3[H])[H])[H])[H])[H])C(=O)[O-])[H])[H] | 1.73 kilocalorie / mole | 0.22 kilocalorie / mole | ||
2 | lig_31 | lig_35 | edge_31_35 | [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C(F)(F)F)[H])[H])[H])[H])[H] | [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C([H])([H])[H])Cl)[H])[H])[H])[H] | -0.89 kilocalorie / mole | 0.22 kilocalorie / mole | ||
3 | lig_33 | lig_27 | edge_33_27 | [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])[H])Cl)[H])[H])[H])[H] | [H]c1c(c(c(c(c1[H])[H])OC([H])([H])C([H])([H])C([H])([H])C2=C(N(c3c2c(c(c(c3[H])[H])[H])[H])[H])C(=O)[O-])[H])[H] | 0.76 kilocalorie / mole | 0.22 kilocalorie / mole | ||
4 | lig_35 | lig_33 | edge_35_33 | [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C([H])([H])[H])Cl)[H])[H])[H])[H] | [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])[H])Cl)[H])[H])[H])[H] | 1.93 kilocalorie / mole | 0.28 kilocalorie / mole | ||
5 | lig_35 | lig_37 | edge_35_37 | [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C([H])([H])[H])Cl)[H])[H])[H])[H] | [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C([H])([H])[H])Cl)C([H])([H])[H])[H])[H])[H] | -0.14 kilocalorie / mole | 0.37 kilocalorie / mole | ||
6 | lig_39 | lig_32 | edge_39_32 | [H]c1c(c(c(c(c1[H])[H])c2c(c(c(c(c2[H])[H])OC([H])([H])C([H])([H])C([H])([H])C3=C(N(c4c3c(c(c(c4[H])[H])[H])[H])[H])C(=O)[O-])[H])[H])[H])[H] | [H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])[H])C([H])([H])[H])[H])[H])[H])[H] | 0.44 kilocalorie / mole | 0.12 kilocalorie / mole |
[14]:
mcl1_edges.keys()
[14]:
dict_keys(['edge_28_35', 'edge_30_27', 'edge_31_35', 'edge_33_27', 'edge_35_33', 'edge_35_37', 'edge_39_32'])
The Edge
classes can be accessed from the EdgeSet
by their name. They are lightweight and provide only access to a pandas.DataFrame
and a Dict
:
[15]:
edge_30_27 = mcl1_edges.get_edge('edge_30_27')
df = edge_30_27.get_dataframe()
edge_30_27.get_dict()
[15]:
{'ligand_a': 'lig_30',
'ligand_b': 'lig_27',
'name': 'edge_30_27',
'Mol1': <rdkit.Chem.rdchem.Mol at 0x7f1a3046e8e0>,
'Smiles1': '[H]c1c(c(c2c(c1[H])C(=C(N2[H])C(=O)[O-])C([H])([H])C([H])([H])C([H])([H])Oc3c(c(c(c(c3[H])C([H])([H])[H])[H])[H])[H])[H])[H]',
'Mol2': <rdkit.Chem.rdchem.Mol at 0x7f1a30460700>,
'Smiles2': '[H]c1c(c(c(c(c1[H])[H])OC([H])([H])C([H])([H])C([H])([H])C2=C(N(c3c2c(c(c(c3[H])[H])[H])[H])[H])C(=O)[O-])[H])[H]',
'exp. DeltaG [kcal/mol]': 1.73 <Unit('kilocalorie / mole')>,
'exp. Error [kcal/mol]': 0.22 <Unit('kilocalorie / mole')>}
[ ]:
Data
Data file tree and file description
The data is organized as follows:
data
├── targets.yml # list of all targets and their directories
├── <date>_<target_name_1> # directory for target 1
│ ├── 00_data # metadata for target 1
│ │ ├── edges.yml # edges/perturbations
│ │ ├── ligands.yml # ligands and activities
│ │ └── target.yml # target
│ ├── 01_protein # protein data
│ │ ├── crd # coordinates
│ │ │ ├── cofactors_crystalwater.pdb # cofactors and cyrstal waters
│ │ │ └── protein.pdb # aminoacid residues
│ │ └── top # topology(s)
│ │ │ └── amber99sb-star-ildn-mut.ff # force field spec.
│ │ │ ├── cofactors_crystalwater.top# Gromacs TOP file of cofactors and crystal water
│ │ │ ├── protein.top # Gromacs TOP file of amino acid residues
│ │ │ └── *.itp # Gromacs ITP file(s) to be included in TOP files
│ └── 02_ligands # ligands
│ ├── lig_<name_1> # ligand 1
│ │ ├── crd # coordinates
│ │ │ └── lig_<name_1>.sdf # SDF file
│ │ └── top # topology(s)
│ │ └── openff-1.0.0.offxml # force field spec.
│ │ ├── fflig_<name_1>.itp # Gromacs ITP file : atom types
│ │ ├── lig_<name_1>.itp # Gromacs ITP file
│ │ ├── lig_<name_1>.top # Gromacs TOP file
│ │ └── posre_lig_<name_1>.itp # Gromacs ITP file : position restraint file
│ ├── lig_<name_2> # ligand 2
│ …
│ └── 03_hybrid # edges (perturbations)
│ ├── edge_<name_1>_<name_2> # edge between ligand 1 and ligand 2
│ │ └── water # edge in water
│ │ ├── crd # coordinates
│ │ │ ├── mergedA.pdb # merged conf based on coords of ligand 1
│ │ │ ├── mergedB.pdb # merged conf based on coords of ligand 2
│ │ │ ├── pairs.dat # atom mapping
│ │ │ └── score.dat # similarity score
│ │ └── top # topology(s)
│ │ └── openff-1.0.0.offxml # force field spec.
│ │ ├── ffmerged.itp # Gromacs ITP file
│ │ ├── ffMOL.itp # Gromacs ITP file
│ │ └── merged.itp # Gromacs ITP file
│ …
├── <date>_<target_name_2> # directory for target 2
…
Contributions
Authors David Hahn
- Data Contributors The authors of the following publications, especially Vytautas Gapsys and Christina E. M. Schindler.
Discussions and Suggestions Christopher I. Bayly, Marko Breznik, Hannah E. Bruce Macdonald, John D.Chodera, Katharina Meier, Antonia S. J. S. Mey, David L. Mobley, Laura Perez Benito, Gary Tresadern, Gregory L. Warren and all members of the Open Force Field Initiative
Code review and discussions Matt Thompson, Jeffrey Wagner
API Documentation
Targets
targets.py Functions and classes for handling the target data.
- class plbenchmark.targets.Target(name: str)[source]
Class to store the data of one target.
- add_ligand_data()[source]
Adds data from ligands to
plbenchmark.targets.target
. Molecule images and the minimum and maximum affinity are added.- Returns
None
- find_links()[source]
Processes primary data to have links in the html string of the target data
- Returns
None
- get_dataframe(columns=None)[source]
Access the target data as a
pandas.DataFrame
- Parameters
cols –
list
of columns which should be returned in thepandas.DataFrame
- Returns
pandas.DataFrame
- get_edge_set()[source]
Get
plbenchmark:edges:edgeSet
associated with the target- Returns
plbenchmark:edges:edgeSet
object
- get_edge_set_dataframe(columns=None)[source]
Get
plbenchmark:edges:edgeSet
associated with the target as apandas.DataFrame
- Parameters
columns – list of columns which should be returned in the
pandas.DataFrame
- Returns
plbenchmark:edges:edgeSet
object
- get_edge_set_html(columns=None)[source]
Get
plbenchmark:edges:edgeSet
associated with the target in a html string- Parameters
columns –
list
of edge which should be returned- Returns
html string
- get_graph()[source]
Get a graph representation of the ligand perturbations associated with the target in a
matplotlib.figure
- Returns
matplotlib.figure
- get_ligand_set()[source]
Get
ligandSet
associated with the target- Returns
plbenchmark.ligands.ligandSet
object
- get_ligand_set_dataframe(columns=None)[source]
Get
ligandSet
associated with the target in apandas.DataFrame
- Parameters
columns –
list
of columns which should be returned in thepandas.DataFrame
- Returns
pandas.DataFrame
- class plbenchmark.targets.TargetSet(*arg, **kw)[source]
Class inherited from dict to store all available targets in plbenchmark.
- get_dataframe(columns=None)[source]
Convert targetSet class to
pandas.DataFrame
- Parameters
columns –
list
of columns which should be returned in thepandas.DataFrame
- Returns
pandas.DataFrame
- plbenchmark.targets.get_target_data_path(target)[source]
Gets the file path of the target data
- Parameters
target – string with target name
- Returns
list of directories (have to be joined with ‘/’ to get the file path relative to the plbenchmark repository)
Ligands
ligands.py Functions and classes for handling the ligand data.
- class plbenchmark.ligands.Ligand(d: dict, target: Optional[str] = None)[source]
Store and convert the data of one ligand in a
pandas.Series
.- derive_observables(derived_type='dg', destination='DerivedMeasurement', out_unit=None)[source]
Derive observables from (stored) primary data, which is then stored in the
pandas.DataFrame
- Parameters
derived_type – type of derived observable, can be any of ‘dg’ ‘ki’, ‘ic50’ or ‘pic50’
destination – string with column name for ‘pandas.DataFrame’ where the derived observable should be stored.
out_unit – unit of type
pint
unit of derived coordinate
- Returns
None
- find_links()[source]
Processes primary data to have links in the html string of the ligand data
- Returns
None
- get_coordinate_file_path()[source]
Get file path relative to the plbenchmark repository of the SDF coordinate file of the docked ligand
- Returns
file path as string
- get_dataframe(columns=None)[source]
Access the ligand data as a
pandas.Dataframe
- Parameters
columns – list of columns which should be returned in the
pandas.Dataframe
- Returns
pandas.Dataframe
- get_html(columns=None)[source]
Access the ligand as a HTML string
- Parameters
columns – list of columns which should be returned in the
pandas.Dataframe
- Returns
HTML string
- class plbenchmark.ligands.LigandSet(target, *arg, **kw)[source]
Class inherited from dict to store all available ligands of one target.
- get_dataframe(columns=None)[source]
Access the
ligandSet
as apandas.Dataframe
- Parameters
columns –
list
of columns which should be returned in thepandas.Dataframe
- Returns
pandas.Dataframe
- get_html(columns=None)[source]
Access the
plbenchmark:ligands.ligandSet
as a HTML string- Parameters
columns –
list
of columns which should be returned in thepandas.Dataframe
- Returns
HTML string
Edges
edges.py Functions and classes for handling the perturbation edges.
- class plbenchmark.edges.Edge(d: dict)[source]
Store and convert the data of one perturbation (“edge”) in a
pandas.Series
.- Parameters
d –
dict
with the edge data- Returns
None
- add_ligand_data(ligand_set)[source]
Adds data from ligands to
edge
. Molecule images and the affinity difference are added.- Parameters
ligand_set –
plbenchmark:ligands:ligandSet
class of the same target- Returns
None
- get_dataframe(columns=None)[source]
Access the edge data as a
pandas.DataFrame
- Parameters
cols – list of columns which should be returned in the
pandas.DataFrame
- Returns
pandas.DataFrame
- class plbenchmark.edges.EdgeSet(target, *arg, **kw)[source]
Class inherited from dict to store all available edges of one target.
- get_dataframe(columns=None)[source]
Access the
plbenchmark:edges.edgeSet
as apandas.DataFrame
- Parameters
cols –
list
of columns which should be returned in thepandas.DataFrame
- Returns
pandas.DataFrame
- get_dict()[source]
Access the
plbenchmark:edges.edgeSet
as a dict which contains the name of the edges as key and the names of the two ligands in a list as items.- Returns
dict
Utils
utils.py Contains utility functions
- plbenchmark.utils.convert_error(error_value, value, original_type, final_type, temperature=300.0, out_unit=None)[source]
Converts an experimental value into another derived quantity with specified unit.
- Parameters
error_value – float, error of val, numerical value
value – float, numerical value
original_type – string, code for the original observable. Can be dg, ki, ic50, pic50
final_type – string, code for the desired derived quantity. Can be dg, ki, ic50, pic50
temperature – float, temperature in kelvin
out_unit – unit of type
pint
, output unit of final_type, needs to fit to the requested final_type
- Returns
pint.Quantity
with desired unit
- plbenchmark.utils.convert_value(value, original_type, final_type, temperature=300.0, out_unit=None)[source]
Converts an experimental value into another derived quantity with specified unit.
- Parameters
value – float, numerical value
original_type – string, code for the original observable. Can be dg, ki, ic50, pic50
final_type – string, code for the desired derived quantity. Can be dg, ki, ic50, pic50
temperature – float, temperature in kelvin
out_unit – unit of type
pint
, output unit of final_type, needs to fit to the requested final_type
- Returns
pint.Quantity
with desired unit