Building systems - eChem book

This section several ways to construct an initial molecular structure. Common choices include:

Specifying the structure using the simplified molecular-input line-entry system (SMILES) line notation. A presentation of this technique using VeloxChem is given below
Downloading a structure from the protein data bank (PDB)
Using a graphical molecular editor, such as Avogadro

import veloxchem as vlx

Using SMILES¶

The interaction with SMILES strings in VeloxChem uses functionalities from the RDkit package. We will illustrate some of basic SMILES string features.

We can obtain the SMILES string of caffeine from Wikipedia and construct a force-field optimized structure.

smiles_str = "CN1C=NC2=C1C(=O)N(C(=O)N2C)C"

caffeine_molecule = vlx.Molecule.read_smiles(smiles_str)

In vlx.Molecule.read_smiles we:

Build the molecule, excluding implicit atoms (hydrogens)
Add hydrogens
Generate conformer
Optimize using the Universal Force Field (UFF)
Return the molecule object

A graphical illustration of the molecule is obtained with the show method.

caffeine_molecule.show()

The molecular coordinates can be retrieved as an xyz-string using the get_xyz_string method.

caffeine_xyz = caffeine_molecule.get_xyz_string()

print(caffeine_xyz)

24

C             -3.190510000000         0.283971000000        -1.150999000000
N             -2.121525000000         0.536391000000        -0.190207000000
C             -2.201614000000         1.272850000000         0.945274000000
N             -1.026628000000         1.337720000000         1.624308000000
C             -0.196947000000         0.600061000000         0.858736000000
C             -0.844418000000         0.121239000000        -0.227773000000
C             -0.169684000000        -0.645212000000        -1.168891000000
O             -0.768103000000        -1.074608000000        -2.193222000000
N              1.163950000000        -0.911871000000        -0.955940000000
C              1.812381000000        -0.385052000000         0.149594000000
O              3.051881000000        -0.579229000000         0.301323000000
N              1.122506000000         0.356565000000         1.084953000000
C              1.805361000000         0.938979000000         2.247809000000
C              1.923151000000        -1.689497000000        -1.949452000000
H             -2.916569000000         0.713859000000        -2.137206000000
H             -4.138328000000         0.751473000000        -0.808922000000
H             -3.351455000000        -0.809612000000        -1.254169000000
H             -3.105773000000         1.765391000000         1.279335000000
H              2.564677000000         0.234504000000         2.649641000000
H              1.090911000000         1.150264000000         3.071886000000
H              2.304749000000         1.884963000000         1.950408000000
H              2.728468000000        -2.281043000000        -1.463829000000
H              2.374181000000        -1.001215000000        -2.694848000000
H              1.272287000000        -2.420126000000        -2.475507000000

In order to immediately get the xyz-object, we can also use:

caffeine_xyz = vlx.Molecule.smiles_to_xyz(smiles_str)

For a 2D-representation:

from IPython.display import SVG

caffeine_molecule.draw_2d(smiles_str, width=300, height=300)

Since the initial conformer and force field optimized structure will have different orientations, the image will look somewhat different each time and this will affect the projection to 2D, particularly so for larger molecules extending in all three dimensions. Still, for many molecules this will work quite well, and you can save the .svg-figure with right-click save.

Bond type¶

The basic principle of SMILES is to start at a suitable part of the molecule, and build the full molecule by sequentially describing the connections of each atom/functional group in turn. Hydrogens are typically added implicitly, and thus do not need to be specified. As such, the first four alkanes can simply be written with an increasing number of “C”.

import py3Dmol as p3d

methane_xyz = vlx.Molecule.smiles_to_xyz("C")
ethane_xyz = vlx.Molecule.smiles_to_xyz("CC")
propane_xyz = vlx.Molecule.smiles_to_xyz("CCC")
butane_xyz = vlx.Molecule.smiles_to_xyz("CCCC")

viewer = p3d.view(viewergrid=(2, 2), width=500, height=300, linked=False)
viewer.addModel(methane_xyz, "xyz", viewer=(0, 0))
viewer.addModel(ethane_xyz, "xyz", viewer=(0, 1))
viewer.addModel(propane_xyz, "xyz", viewer=(1, 0))
viewer.addModel(butane_xyz, "xyz", viewer=(1, 1))
viewer.setViewStyle({"style": "outline", "width": 0.05})
viewer.setStyle({"stick": {}, "sphere": {"scale": 0.25}})
viewer.zoomTo()
viewer.show()

Bond types are specified with different symbols between two atoms, with -, =, and #, for single-, double-, and triple-bonds, respectively, and other bond types are possible (see, e.g., Wikipedia). Single bonds is assumed if no bond type is specified.

For example, ethane, ethene, and ethyne are constructed as:

ethane_xyz = vlx.Molecule.smiles_to_xyz("C-C")  # or 'CC'
ethene_xyz = vlx.Molecule.smiles_to_xyz("C=C")
ethyne_xyz = vlx.Molecule.smiles_to_xyz("C#C")

viewer = p3d.view(viewergrid=(1, 3), width=400, height=200, linked=False)
viewer.addModel(ethane_xyz, "xyz", viewer=(0, 0))
viewer.addModel(ethene_xyz, "xyz", viewer=(0, 1))
viewer.addModel(ethyne_xyz, "xyz", viewer=(0, 2))
viewer.setViewStyle({"style": "outline", "width": 0.05})
viewer.setStyle({"stick": {}, "sphere": {"scale": 0.25}})
viewer.zoomTo()
viewer.show()

Branching and substitutions¶

Branches are given within paranthesis, as connected to the atom just before the paranthesis. These branches can be nested, for more complex molecules.

Hexane and 3-ethylheptane are then written as:

hexane_xyz = vlx.Molecule.smiles_to_xyz("CCCCCC")
ethylheptane_xyz = vlx.Molecule.smiles_to_xyz("CCCCC(CC)CC")

viewer = p3d.view(viewergrid=(1, 2), width=500, height=200, linked=False)
viewer.addModel(hexane_xyz, "xyz", viewer=(0, 0))
viewer.addModel(ethylheptane_xyz, "xyz", viewer=(0, 1))
viewer.setViewStyle({"style": "outline", "width": 0.05})
viewer.setStyle({"stick": {}, "sphere": {"scale": 0.25}})
viewer.zoomTo()
viewer.show()

Substitutions are done by replacing the relevant atom, or adding it to replace an implicit hydrogen. For example, methanol, 1,1-difluoroethene, and 1,2-difluoroethene can be given as:

methanol_xyz = vlx.Molecule.smiles_to_xyz("CO")
difluoro11_xyz = vlx.Molecule.smiles_to_xyz("C=C(F)F")
difluoro12_xyz = vlx.Molecule.smiles_to_xyz("FC=CF")

viewer = p3d.view(viewergrid=(1, 3), width=400, height=200, linked=False)
viewer.addModel(methanol_xyz, "xyz", viewer=(0, 0))
viewer.addModel(difluoro11_xyz, "xyz", viewer=(0, 1))
viewer.addModel(difluoro12_xyz, "xyz", viewer=(0, 2))
viewer.setViewStyle({"style": "outline", "width": 0.05})
viewer.setStyle({"stick": {}, "sphere": {"scale": 0.25}})
viewer.zoomTo()
viewer.show()

Optimization using force field¶

Following the construction of a conformer from a SMILES string, it is generally advised to perform a short optimization to obtain a structure close to a local minimum.

As an example, for ethanol the initial conformer will have the OH-group with any rotation to CH $_3$ , and following a force field optimization it takes on the staggered conformation:

ethanol_initial_xyz = vlx.Molecule.smiles_to_xyz("CO", optimize=False)
ethanol_xyz = vlx.Molecule.smiles_to_xyz("CO")

viewer = p3d.view(viewergrid=(1, 2), width=400, height=200, linked=False)
viewer.addModel(ethanol_initial_xyz, "xyz", viewer=(0, 0))
viewer.addModel(ethanol_xyz, "xyz", viewer=(0, 1))
viewer.setViewStyle({"style": "outline", "width": 0.05})
viewer.setStyle({"stick": {}, "sphere": {"scale": 0.25}})
viewer.zoomTo()
viewer.show()

Ring systems¶

Ring systems are constructed by labeling part(s) of the ring and then closing the ring by having the SMILES string go back to the labeled atom(s). This can be performed several time, so that, e.g. cyclohexane and norbornane are written as:

cyclohexane_xyz = vlx.Molecule.smiles_to_xyz("C1CCCCC1")
norborane_xyz = vlx.Molecule.smiles_to_xyz("C1C2CCC(C2)C1")

viewer = p3d.view(viewergrid=(1, 2), width=400, height=200, linked=False)
viewer.addModel(cyclohexane_xyz, "xyz", viewer=(0, 0))
viewer.addModel(norborane_xyz, "xyz", viewer=(0, 1))
viewer.setViewStyle({"style": "outline", "width": 0.05})
viewer.setStyle({"stick": {}, "sphere": {"scale": 0.25}})
viewer.zoomTo()
viewer.show()

Aromatic rings can be written by using alternating single and double bonds or using lower-case for the contributing atoms. For example, p-cymene can be written as both:

pcymene1_xyz = vlx.Molecule.smiles_to_xyz("C=C(C)C1=CC=C(C)C=C1")
pcymene2_xyz = vlx.Molecule.smiles_to_xyz("C=C(C)c1ccc(C)cc1")

viewer = p3d.view(viewergrid=(1, 2), width=500, height=200, linked=False)
viewer.addModel(pcymene1_xyz, "xyz", viewer=(0, 0))
viewer.addModel(pcymene2_xyz, "xyz", viewer=(0, 1))
viewer.setViewStyle({"style": "outline", "width": 0.05})
viewer.setStyle({"stick": {}, "sphere": {"scale": 0.25}})
viewer.zoomTo()
viewer.show()

More complex systems¶

For more complex molecules, the easiest option is to check if Wikipedia, ChemSpider, or any other digital resources provide the SMILES string. This will work for many common molecules, and for the rest you can either consult more detailed guides on using SMILES and go from there, or use a graphical interface.

As an example, the structures NADH (left) and NAD $^+$ (right) are illustrated below, using the SMILES string from Wikipedia. Nicotinamide adenine dinucleotide partake in metabolism through the redox reaction:

\mathrm{NADH} \Longleftrightarrow \mathrm{NAD}^+ + \mathrm{H}^+ + 2\mathrm{e}^-

(1)

The resulting conformers need to be corrected, but we see that the general structure and the location of the hydrogen at the nicotinamide ring is correct.

nadh_xyz = vlx.Molecule.smiles_to_xyz(
    "O=C(N)C1CC=C[N](C=1)[C@@H]2O[C@@H]([C@@H](O)[C@H]2O)COP([O-])(=O)OP(=O)([O-])OC[C@H]5O[C@@H](n4cnc3c(ncnc34)N)[C@H](O)[C@@H]5O"
)
nadp_xyz = vlx.Molecule.smiles_to_xyz(
    "O=C(N)c1ccc[n+](c1)[C@@H]2O[C@@H]([C@@H](O)[C@H]2O)COP([O-])(=O)OP(=O)([O-])OC[C@H]5O[C@@H](n4cnc3c(ncnc34)N)[C@H](O)[C@@H]5O"
)

viewer = p3d.view(viewergrid=(1, 2), width=600, height=200, linked=False)
viewer.addModel(nadh_xyz, "xyz", viewer=(0, 0))
viewer.addModel(nadp_xyz, "xyz", viewer=(0, 1))
viewer.setViewStyle({"style": "outline", "width": 0.05})
viewer.setStyle({"stick": {}, "sphere": {"scale": 0.25}})
viewer.zoomTo()
viewer.show()

Using a structure downloaded from the PDB databank¶

from urllib.request import urlopen

# Specify the PDB ID you want to download
ID = '1AOI'
url = 'https://files.rcsb.org/download/' + ID + '.pdb'

with urlopen(url) as fh:
    pdb = fh.read().decode('utf-8')

view = p3d.view()
view.addModel(pdb,'pdb')
view.setBackgroundColor('white')
view.setStyle({"stick": {}, "sphere": {"scale": 0.25}})
view.zoomTo()
view.show()