Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Building systems

This section several ways to construct an initial molecular structure. Common choices include:

  • Specifying the structure using the simplified molecular-input line-entry system (SMILES) line notation. A presentation of this technique using VeloxChem is given below

  • Downloading a structure from the protein data bank (PDB)

  • Using a graphical molecular editor, such as Avogadro

import veloxchem as vlx

Using SMILES

The interaction with SMILES strings in VeloxChem uses functionalities from the RDkit package. We will illustrate some of basic SMILES string features.

We can obtain the SMILES string of caffeine from Wikipedia and construct a force-field optimized structure.

smiles_str = "CN1C=NC2=C1C(=O)N(C(=O)N2C)C"

caffeine_molecule = vlx.Molecule.read_smiles(smiles_str)

In vlx.Molecule.read_smiles we:

  1. Build the molecule, excluding implicit atoms (hydrogens)

  2. Add hydrogens

  3. Generate conformer

  4. Optimize using the Universal Force Field (UFF)

  5. Return the molecule object

A graphical illustration of the molecule is obtained with the show method.

caffeine_molecule.show()
Loading...

The molecular coordinates can be retrieved as an xyz-string using the get_xyz_string method.

caffeine_xyz = caffeine_molecule.get_xyz_string()

print(caffeine_xyz)
24

C             -3.190510000000         0.283971000000        -1.150999000000
N             -2.121525000000         0.536391000000        -0.190207000000
C             -2.201614000000         1.272850000000         0.945274000000
N             -1.026628000000         1.337720000000         1.624308000000
C             -0.196947000000         0.600061000000         0.858736000000
C             -0.844418000000         0.121239000000        -0.227773000000
C             -0.169684000000        -0.645212000000        -1.168891000000
O             -0.768103000000        -1.074608000000        -2.193222000000
N              1.163950000000        -0.911871000000        -0.955940000000
C              1.812381000000        -0.385052000000         0.149594000000
O              3.051881000000        -0.579229000000         0.301323000000
N              1.122506000000         0.356565000000         1.084953000000
C              1.805361000000         0.938979000000         2.247809000000
C              1.923151000000        -1.689497000000        -1.949452000000
H             -2.916569000000         0.713859000000        -2.137206000000
H             -4.138328000000         0.751473000000        -0.808922000000
H             -3.351455000000        -0.809612000000        -1.254169000000
H             -3.105773000000         1.765391000000         1.279335000000
H              2.564677000000         0.234504000000         2.649641000000
H              1.090911000000         1.150264000000         3.071886000000
H              2.304749000000         1.884963000000         1.950408000000
H              2.728468000000        -2.281043000000        -1.463829000000
H              2.374181000000        -1.001215000000        -2.694848000000
H              1.272287000000        -2.420126000000        -2.475507000000

In order to immediately get the xyz-object, we can also use:

caffeine_xyz = vlx.Molecule.smiles_to_xyz(smiles_str)

For a 2D-representation:

from IPython.display import SVG

caffeine_molecule.draw_2d(smiles_str, width=300, height=300)
Loading...

Since the initial conformer and force field optimized structure will have different orientations, the image will look somewhat different each time and this will affect the projection to 2D, particularly so for larger molecules extending in all three dimensions. Still, for many molecules this will work quite well, and you can save the .svg-figure with right-click save.

Bond type

The basic principle of SMILES is to start at a suitable part of the molecule, and build the full molecule by sequentially describing the connections of each atom/functional group in turn. Hydrogens are typically added implicitly, and thus do not need to be specified. As such, the first four alkanes can simply be written with an increasing number of “C”.

import py3Dmol as p3d

methane_xyz = vlx.Molecule.smiles_to_xyz("C")
ethane_xyz = vlx.Molecule.smiles_to_xyz("CC")
propane_xyz = vlx.Molecule.smiles_to_xyz("CCC")
butane_xyz = vlx.Molecule.smiles_to_xyz("CCCC")

viewer = p3d.view(viewergrid=(2, 2), width=500, height=300, linked=False)
viewer.addModel(methane_xyz, "xyz", viewer=(0, 0))
viewer.addModel(ethane_xyz, "xyz", viewer=(0, 1))
viewer.addModel(propane_xyz, "xyz", viewer=(1, 0))
viewer.addModel(butane_xyz, "xyz", viewer=(1, 1))
viewer.setViewStyle({"style": "outline", "width": 0.05})
viewer.setStyle({"stick": {}, "sphere": {"scale": 0.25}})
viewer.zoomTo()
viewer.show()
Loading...

Bond types are specified with different symbols between two atoms, with -, =, and #, for single-, double-, and triple-bonds, respectively, and other bond types are possible (see, e.g., Wikipedia). Single bonds is assumed if no bond type is specified.

For example, ethane, ethene, and ethyne are constructed as:

ethane_xyz = vlx.Molecule.smiles_to_xyz("C-C")  # or 'CC'
ethene_xyz = vlx.Molecule.smiles_to_xyz("C=C")
ethyne_xyz = vlx.Molecule.smiles_to_xyz("C#C")

viewer = p3d.view(viewergrid=(1, 3), width=400, height=200, linked=False)
viewer.addModel(ethane_xyz, "xyz", viewer=(0, 0))
viewer.addModel(ethene_xyz, "xyz", viewer=(0, 1))
viewer.addModel(ethyne_xyz, "xyz", viewer=(0, 2))
viewer.setViewStyle({"style": "outline", "width": 0.05})
viewer.setStyle({"stick": {}, "sphere": {"scale": 0.25}})
viewer.zoomTo()
viewer.show()
Loading...

Branching and substitutions

Branches are given within paranthesis, as connected to the atom just before the paranthesis. These branches can be nested, for more complex molecules.

Hexane and 3-ethylheptane are then written as:

hexane_xyz = vlx.Molecule.smiles_to_xyz("CCCCCC")
ethylheptane_xyz = vlx.Molecule.smiles_to_xyz("CCCCC(CC)CC")

viewer = p3d.view(viewergrid=(1, 2), width=500, height=200, linked=False)
viewer.addModel(hexane_xyz, "xyz", viewer=(0, 0))
viewer.addModel(ethylheptane_xyz, "xyz", viewer=(0, 1))
viewer.setViewStyle({"style": "outline", "width": 0.05})
viewer.setStyle({"stick": {}, "sphere": {"scale": 0.25}})
viewer.zoomTo()
viewer.show()
Loading...

Substitutions are done by replacing the relevant atom, or adding it to replace an implicit hydrogen. For example, methanol, 1,1-difluoroethene, and 1,2-difluoroethene can be given as:

methanol_xyz = vlx.Molecule.smiles_to_xyz("CO")
difluoro11_xyz = vlx.Molecule.smiles_to_xyz("C=C(F)F")
difluoro12_xyz = vlx.Molecule.smiles_to_xyz("FC=CF")

viewer = p3d.view(viewergrid=(1, 3), width=400, height=200, linked=False)
viewer.addModel(methanol_xyz, "xyz", viewer=(0, 0))
viewer.addModel(difluoro11_xyz, "xyz", viewer=(0, 1))
viewer.addModel(difluoro12_xyz, "xyz", viewer=(0, 2))
viewer.setViewStyle({"style": "outline", "width": 0.05})
viewer.setStyle({"stick": {}, "sphere": {"scale": 0.25}})
viewer.zoomTo()
viewer.show()
Loading...

Optimization using force field

Following the construction of a conformer from a SMILES string, it is generally advised to perform a short optimization to obtain a structure close to a local minimum.

As an example, for ethanol the initial conformer will have the OH-group with any rotation to CH3_3, and following a force field optimization it takes on the staggered conformation:

ethanol_initial_xyz = vlx.Molecule.smiles_to_xyz("CO", optimize=False)
ethanol_xyz = vlx.Molecule.smiles_to_xyz("CO")

viewer = p3d.view(viewergrid=(1, 2), width=400, height=200, linked=False)
viewer.addModel(ethanol_initial_xyz, "xyz", viewer=(0, 0))
viewer.addModel(ethanol_xyz, "xyz", viewer=(0, 1))
viewer.setViewStyle({"style": "outline", "width": 0.05})
viewer.setStyle({"stick": {}, "sphere": {"scale": 0.25}})
viewer.zoomTo()
viewer.show()
Loading...

Ring systems

Ring systems are constructed by labeling part(s) of the ring and then closing the ring by having the SMILES string go back to the labeled atom(s). This can be performed several time, so that, e.g. cyclohexane and norbornane are written as:

cyclohexane_xyz = vlx.Molecule.smiles_to_xyz("C1CCCCC1")
norborane_xyz = vlx.Molecule.smiles_to_xyz("C1C2CCC(C2)C1")

viewer = p3d.view(viewergrid=(1, 2), width=400, height=200, linked=False)
viewer.addModel(cyclohexane_xyz, "xyz", viewer=(0, 0))
viewer.addModel(norborane_xyz, "xyz", viewer=(0, 1))
viewer.setViewStyle({"style": "outline", "width": 0.05})
viewer.setStyle({"stick": {}, "sphere": {"scale": 0.25}})
viewer.zoomTo()
viewer.show()
Loading...

Aromatic rings can be written by using alternating single and double bonds or using lower-case for the contributing atoms. For example, p-cymene can be written as both:

pcymene1_xyz = vlx.Molecule.smiles_to_xyz("C=C(C)C1=CC=C(C)C=C1")
pcymene2_xyz = vlx.Molecule.smiles_to_xyz("C=C(C)c1ccc(C)cc1")

viewer = p3d.view(viewergrid=(1, 2), width=500, height=200, linked=False)
viewer.addModel(pcymene1_xyz, "xyz", viewer=(0, 0))
viewer.addModel(pcymene2_xyz, "xyz", viewer=(0, 1))
viewer.setViewStyle({"style": "outline", "width": 0.05})
viewer.setStyle({"stick": {}, "sphere": {"scale": 0.25}})
viewer.zoomTo()
viewer.show()
Loading...

More complex systems

For more complex molecules, the easiest option is to check if Wikipedia, ChemSpider, or any other digital resources provide the SMILES string. This will work for many common molecules, and for the rest you can either consult more detailed guides on using SMILES and go from there, or use a graphical interface.

As an example, the structures NADH (left) and NAD+^+ (right) are illustrated below, using the SMILES string from Wikipedia. Nicotinamide adenine dinucleotide partake in metabolism through the redox reaction:

NADHNAD++H++2e\mathrm{NADH} \Longleftrightarrow \mathrm{NAD}^+ + \mathrm{H}^+ + 2\mathrm{e}^-

The resulting conformers need to be corrected, but we see that the general structure and the location of the hydrogen at the nicotinamide ring is correct.

nadh_xyz = vlx.Molecule.smiles_to_xyz(
    "O=C(N)C1CC=C[N](C=1)[C@@H]2O[C@@H]([C@@H](O)[C@H]2O)COP([O-])(=O)OP(=O)([O-])OC[C@H]5O[C@@H](n4cnc3c(ncnc34)N)[C@H](O)[C@@H]5O"
)
nadp_xyz = vlx.Molecule.smiles_to_xyz(
    "O=C(N)c1ccc[n+](c1)[C@@H]2O[C@@H]([C@@H](O)[C@H]2O)COP([O-])(=O)OP(=O)([O-])OC[C@H]5O[C@@H](n4cnc3c(ncnc34)N)[C@H](O)[C@@H]5O"
)

viewer = p3d.view(viewergrid=(1, 2), width=600, height=200, linked=False)
viewer.addModel(nadh_xyz, "xyz", viewer=(0, 0))
viewer.addModel(nadp_xyz, "xyz", viewer=(0, 1))
viewer.setViewStyle({"style": "outline", "width": 0.05})
viewer.setStyle({"stick": {}, "sphere": {"scale": 0.25}})
viewer.zoomTo()
viewer.show()
Loading...

Using a structure downloaded from the PDB databank

from urllib.request import urlopen

# Specify the PDB ID you want to download
ID = '1AOI'
url = 'https://files.rcsb.org/download/' + ID + '.pdb'

with urlopen(url) as fh:
    pdb = fh.read().decode('utf-8')

view = p3d.view()
view.addModel(pdb,'pdb')
view.setBackgroundColor('white')
view.setStyle({"stick": {}, "sphere": {"scale": 0.25}})
view.zoomTo()
view.show()
Loading...