Building the system#
This section will focus on the construction of an initial molecular structure, which can be done using a number of ways:
Specify the structure using the simplified molecular-input line-entry system (SMILES) line notation, and employ a program which can build the molecule from this
Downloading a structure from the protein data bank (PDB)
Using a graphical molecular editor, such as Avogadro
import py3Dmol as p3d
import veloxchem as vlx
from IPython.display import SVG
* Warning * Environment variable OMP_NUM_THREADS not set.
* Warning * Setting OMP_NUM_THREADS to 4.
Using SMILES#
In this section we will focus on the first option, using functionalities from the RDkit package (which can also read PDB-files). For a full description of how a molecule is written using SMILES can be found on, e.g., the wiki entry — here we will just illustrate some of the basic commands. Remember to make sure to visualize the resulting geometries and ensure they correspond to what you are interested in.
As a first example, we obtain the SMILES string of caffeine from wikipedia and construct a force-field optimized structure expressed xyz-coordinates.
smiles_str = 'CN1C=NC2=C1C(=O)N(C(=O)N2C)C'
caffeine_molecule = vlx.Molecule.read_smiles(smiles_str)
caffeine_xyz = caffeine_molecule.get_xyz_string()
In vlx.Molecule.read_smiles
we:
Build the molecule, excluding implicit atoms (hydrogens)
Add hydrogens
Generate conformer
Optimize using the Universal Force Field (UFF)
Return the ‘molecule’ object
This can then be converted to an xyz-string using get_xyz_string
, and the resulting coordinates printed as:
print(caffeine_xyz)
24
C 3.263017000000 0.665602000000 -0.212720000000
N 2.130944000000 -0.255164000000 -0.199731000000
C 2.169427000000 -1.598733000000 -0.377053000000
N 0.942209000000 -2.179661000000 -0.328611000000
C 0.122902000000 -1.131387000000 -0.108457000000
C 0.826777000000 0.020965000000 -0.033402000000
C 0.176416000000 1.232141000000 0.162798000000
O 0.830662000000 2.309769000000 0.212003000000
N -1.194119000000 1.221596000000 0.291480000000
C -1.895573000000 0.031642000000 0.181110000000
O -3.158218000000 0.040671000000 0.233363000000
N -1.232449000000 -1.164517000000 0.007067000000
C -1.967842000000 -2.428192000000 -0.134534000000
C -1.923424000000 2.489165000000 0.463343000000
H 4.214642000000 0.108953000000 -0.348684000000
H 3.149265000000 1.386854000000 -1.048964000000
H 3.309011000000 1.216933000000 0.749769000000
H 3.081659000000 -2.156591000000 -0.545352000000
H -1.328558000000 -3.298924000000 0.124931000000
H -2.841765000000 -2.451853000000 0.551042000000
H -2.320805000000 -2.537641000000 -1.181510000000
H -2.839017000000 2.343970000000 1.075630000000
H -2.209018000000 2.891819000000 -0.531182000000
H -1.306144000000 3.242585000000 0.997663000000
In order to immediately get the xyz-object, we can also use:
caffeine_xyz = vlx.Molecule.smiles_to_xyz(smiles_str)
We visualize the structure using py3Dmol:
viewer = p3d.view(width=400, height=300)
viewer.addModel(caffeine_xyz, 'xyz')
viewer.setViewStyle({"style": "outline", "width": 0.05})
viewer.setStyle({"stick":{},"sphere": {"scale":0.25}})
viewer.show()
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
This structure looks close to the actual equilibrium geometry, and in the next section we will discuss how to (further) optimize the structure using higher-level theory.
For a 2D-representation:
vlx.Molecule.draw_2d_svg(smiles_str, width=300, height=300)
Note that the image will look somewhat different each time, as the initial conformer and UFF optimized structure will have different orientations, which will affect the projection to 2D — this is particularly the case for larger molecules extending in all three directions. Still, for many molecules this will work well, and you can save the .svg-figure with right-click save.
Bond type#
The basic principle of SMILES is to start at a suitable part of the molecule, and build the full molecule by sequentially describing the connections of each atom/functional group in turn. Hydrogens are typically added implicitly, and thus do not need to be specified. As such, the first four alkanes can simply be written with an increasing number of C
:
methane_xyz = vlx.Molecule.smiles_to_xyz('C')
ethane_xyz = vlx.Molecule.smiles_to_xyz('CC')
propane_xyz = vlx.Molecule.smiles_to_xyz('CCC')
butane_xyz = vlx.Molecule.smiles_to_xyz('CCCC')
viewer = p3d.view(viewergrid=(2, 2), width=500, height=300, linked=False)
viewer.addModel(methane_xyz, 'xyz', viewer=(0, 0))
viewer.addModel(ethane_xyz, 'xyz', viewer=(0, 1))
viewer.addModel(propane_xyz, 'xyz', viewer=(1, 0))
viewer.addModel(butane_xyz, 'xyz', viewer=(1, 1))
viewer.setViewStyle({"style": "outline", "width": 0.05})
viewer.setStyle({"stick":{},"sphere": {"scale":0.25}})
viewer.show()
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
Bond types are specified with different symbols between two atoms, with -
, #
, and #
, for single-, double-, and triple-bonds, respectively, and other bond types are possible (see, e.g., wikipedia). Single bonds is assumed if no bond type is specified.
For example, ethane, ethene, and ethyne are constructed as:
ethane_xyz = vlx.Molecule.smiles_to_xyz('C-C') # or 'CC'
ethene_xyz = vlx.Molecule.smiles_to_xyz('C=C')
ethyne_xyz = vlx.Molecule.smiles_to_xyz('C#C')
viewer = p3d.view(viewergrid=(1, 3), width=400, height=200, linked=False)
viewer.addModel(ethane_xyz, 'xyz', viewer=(0, 0))
viewer.addModel(ethene_xyz, 'xyz', viewer=(0, 1))
viewer.addModel(ethyne_xyz, 'xyz', viewer=(0, 2))
viewer.setViewStyle({"style": "outline", "width": 0.05})
viewer.setStyle({"stick":{},"sphere": {"scale":0.25}})
viewer.show()
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
Branching and substitutions#
Branches are given within paranthesis, as connected to the atom just before the paranthesis. These branches can be nested, for more complex molecules.
Note
Several valid SMILES strings can typically be written for the same molecule.
Hexane and 3-ethylheptane are then written as:
hexane_xyz = vlx.Molecule.smiles_to_xyz('CCCCCCCC')
ethylheptane_xyz = vlx.Molecule.smiles_to_xyz('CCCC(CC)CC')
viewer = p3d.view(viewergrid=(1, 2), width=500, height=200, linked=False)
viewer.addModel(hexane_xyz, 'xyz', viewer=(0, 0))
viewer.addModel(ethylheptane_xyz, 'xyz', viewer=(0, 1))
viewer.setViewStyle({"style": "outline", "width": 0.05})
viewer.setStyle({"stick":{},"sphere": {"scale":0.25}})
viewer.show()
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
Substitutions are done by replacing the relevant atom, or adding it to replace an implicit hydrogen. For example, methanol, 1,1-difluoroethene, and 1,2-difluoroethene can be given as:
methanol_xyz = vlx.Molecule.smiles_to_xyz('CO')
difluoro11_xyz = vlx.Molecule.smiles_to_xyz('C=C(F)F')
difluoro12_xyz = vlx.Molecule.smiles_to_xyz('FC=CF')
viewer = p3d.view(viewergrid=(1, 3), width=400, height=200, linked=False)
viewer.addModel(methanol_xyz, 'xyz', viewer=(0, 0))
viewer.addModel(difluoro11_xyz, 'xyz', viewer=(0, 1))
viewer.addModel(difluoro12_xyz, 'xyz', viewer=(0, 2))
viewer.setViewStyle({"style": "outline", "width": 0.05})
viewer.setStyle({"stick":{},"sphere": {"scale":0.25}})
viewer.show()
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
UFF optimization#
Following the construction of a conformer from a SMILES string, it is generally advised to perform a short UFF (universal force field) optimization to obtain a structure close to a local minima.
As an example, for ethanol the initial conformer will have the OH-group with any rotation to CH\(_3\), and following a UFF optimization it takes on the staggered conformation:
ethanol_initial_xyz = vlx.Molecule.smiles_to_xyz('CO', optimize=False)
ethanol_xyz = vlx.Molecule.smiles_to_xyz('CO')
viewer = p3d.view(viewergrid=(1, 2), width=400, height=200, linked=False)
viewer.addModel(ethanol_initial_xyz, 'xyz', viewer=(0, 0))
viewer.addModel(ethanol_xyz, 'xyz', viewer=(0, 1))
viewer.setViewStyle({"style": "outline", "width": 0.05})
viewer.setStyle({"stick":{},"sphere": {"scale":0.25}})
viewer.show()
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
Ring systems#
Ring systems are constructed by labeling part(s) of the ring and then closing the ring by having the SMILES string go back to the labeled atom(s). This can be performed several time, so that, e.g. cyclohexane and norbornane are written as:
cyclohexane_xyz = vlx.Molecule.smiles_to_xyz('C1CCCCC1')
norborane_xyz = vlx.Molecule.smiles_to_xyz('C1C2CCC(C2)C1')
viewer = p3d.view(viewergrid=(1, 2), width=400, height=200, linked=False)
viewer.addModel(cyclohexane_xyz, 'xyz', viewer=(0, 0))
viewer.addModel(norborane_xyz, 'xyz', viewer=(0, 1))
viewer.setViewStyle({"style": "outline", "width": 0.05})
viewer.setStyle({"stick":{},"sphere": {"scale":0.25}})
viewer.show()
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
Aromatic rings can be written by using the aromatic bond symbole (:
) or using lower-case for for the contributing atoms. For example, p-cymene can be written as both:
pcymene1_xyz = vlx.Molecule.smiles_to_xyz('C=C(C)C1:C:C:C(C):C:C1')
pcymene2_xyz = vlx.Molecule.smiles_to_xyz('C=C(C)c1ccc(C)cc1')
viewer = p3d.view(viewergrid=(1, 2), width=500, height=200, linked=False)
viewer.addModel(pcymene1_xyz, 'xyz', viewer=(0, 0))
viewer.addModel(pcymene1_xyz, 'xyz', viewer=(0, 1))
viewer.setViewStyle({"style": "outline", "width": 0.05})
viewer.setStyle({"stick":{},"sphere": {"scale":0.25}})
viewer.show()
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
More complex systems#
For more complex molecules, the easiest option is to check if wikipedia, ChemSpider, or any other digital resources provide the SMILES string. This will work for many common molecules, and for the rest you can either consult more detailed guides on using SMILES and go from there, or use a graphical interface.
Note
No matter which option is chosen, we stress that you should always check that the structure you obtain is reasonably close to what is sought. For some systems you may need to sample many different conformers, and for PDB-files you might have to add hydrogens.
As an example, the structures NADH (left) and NAD\(^+\) (right) are illustrated below, using the SMILES string from wikipedia. Nicotinamide adenine dinucleotide partake in metabolism through the redox reaction:
The resulting conformers need to be corrected, but we see that the general structure and the location of the hydrogen at the nicotinamide ring is correct.
nadh_xyz = vlx.Molecule.smiles_to_xyz('O=C(N)C1CC=C[N](C=1)[C@@H]2O[C@@H]([C@@H](O)[C@H]2O)COP([O-])(=O)OP(=O)([O-])OC[C@H]5O[C@@H](n4cnc3c(ncnc34)N)[C@H](O)[C@@H]5O')
nadp_xyz = vlx.Molecule.smiles_to_xyz('O=C(N)c1ccc[n+](c1)[C@@H]2O[C@@H]([C@@H](O)[C@H]2O)COP([O-])(=O)OP(=O)([O-])OC[C@H]5O[C@@H](n4cnc3c(ncnc34)N)[C@H](O)[C@@H]5O')
viewer = p3d.view(viewergrid=(1, 2), width=600, height=200, linked=False)
viewer.addModel(nadh_xyz, 'xyz', viewer=(0, 0))
viewer.addModel(nadp_xyz, 'xyz', viewer=(0, 1))
viewer.setViewStyle({"style": "outline", "width": 0.05})
viewer.setStyle({"stick":{},"sphere": {"scale":0.25}})
viewer.zoomTo()
viewer.show()
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol