Building systems#
This section several ways to construct an initial molecular structure. Common choices include:
Specifying the structure using the simplified molecular-input line-entry system (SMILES) line notation. A presentation of this technique using VeloxChem is given below
Downloading a structure from the protein data bank (PDB)
Using a graphical molecular editor, such as Avogadro
import veloxchem as vlx
Using SMILES#
The interaction with SMILES strings in VeloxChem uses functionalities from the Open Babel package. We will illustrate some of basic SMILES string features.
Note
Remember to make sure to visualize the resulting geometries and ensure they correspond to what you are interested in.
We can obtain the SMILES string of caffeine from Wikipedia and construct a force-field optimized structure.
smiles_str = "CN1C=NC2=C1C(=O)N(C(=O)N2C)C"
caffeine_molecule = vlx.Molecule.read_smiles(smiles_str)
In vlx.Molecule.read_smiles
we:
Build the molecule, excluding implicit atoms (hydrogens)
Add hydrogens
Generate conformer
Optimize using the MMFF94 force field
Return the
molecule
object
A graphical illustration of the molecule is obtained with the show
method.
caffeine_molecule.show()
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
The molecular coordinates can be retrieved as an xyz-string using the get_xyz_string
method.
caffeine_xyz = caffeine_molecule.get_xyz_string()
print(caffeine_xyz)
24
C -3.246030000000 -1.130310000000 0.025980000000
N -2.262080000000 -0.077920000000 0.007760000000
C -2.516750000000 1.268080000000 -0.001470000000
N -1.402970000000 1.973590000000 -0.005660000000
C -0.410650000000 1.040380000000 -0.000280000000
C -0.905090000000 -0.228440000000 0.004840000000
C -0.080570000000 -1.385890000000 0.010530000000
O -0.536520000000 -2.525370000000 0.015670000000
N 1.283180000000 -1.078550000000 0.010220000000
C 1.833810000000 0.220410000000 0.006860000000
O 3.056760000000 0.390670000000 0.007910000000
N 0.939960000000 1.291810000000 0.003250000000
C 1.425170000000 2.659750000000 0.004880000000
C 2.230720000000 -2.175770000000 0.018140000000
H -4.247690000000 -0.693450000000 0.002350000000
H -3.117910000000 -1.709050000000 0.944170000000
H -3.099090000000 -1.761210000000 -0.854070000000
H -3.519330000000 1.677350000000 -0.002970000000
H 2.517080000000 2.705220000000 0.028990000000
H 1.035370000000 3.176560000000 0.887670000000
H 1.073890000000 3.164310000000 -0.900700000000
H 1.742510000000 -3.153400000000 0.018220000000
H 2.861160000000 -2.094140000000 0.909710000000
H 2.871610000000 -2.099260000000 -0.866370000000
In order to immediately get the xyz-object, we can also use:
caffeine_xyz = vlx.Molecule.smiles_to_xyz(smiles_str)
For a 2D-representation:
from IPython.display import SVG
caffeine_molecule.draw_2d(width=300, height=300)
Since the initial conformer and force field optimized structure will have different orientations, the image will look somewhat different each time and this will affect the projection to 2D, particularly so for larger molecules extending in all three dimensions. Still, for many molecules this will work quite well, and you can save the .svg-figure with right-click save.
Bond type#
The basic principle of SMILES is to start at a suitable part of the molecule, and build the full molecule by sequentially describing the connections of each atom/functional group in turn. Hydrogens are typically added implicitly, and thus do not need to be specified. As such, the first four alkanes can simply be written with an increasing number of “C”.
import py3Dmol as p3d
methane_xyz = vlx.Molecule.smiles_to_xyz("C")
ethane_xyz = vlx.Molecule.smiles_to_xyz("CC")
propane_xyz = vlx.Molecule.smiles_to_xyz("CCC")
butane_xyz = vlx.Molecule.smiles_to_xyz("CCCC")
viewer = p3d.view(viewergrid=(2, 2), width=500, height=300, linked=False)
viewer.addModel(methane_xyz, "xyz", viewer=(0, 0))
viewer.addModel(ethane_xyz, "xyz", viewer=(0, 1))
viewer.addModel(propane_xyz, "xyz", viewer=(1, 0))
viewer.addModel(butane_xyz, "xyz", viewer=(1, 1))
viewer.setViewStyle({"style": "outline", "width": 0.05})
viewer.setStyle({"stick": {}, "sphere": {"scale": 0.25}})
viewer.zoomTo()
viewer.show()
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
Bond types are specified with different symbols between two atoms, with -
, =
, and #
, for single-, double-, and triple-bonds, respectively, and other bond types are possible (see, e.g., Wikipedia). Single bonds is assumed if no bond type is specified.
For example, ethane, ethene, and ethyne are constructed as:
ethane_xyz = vlx.Molecule.smiles_to_xyz("C-C") # or 'CC'
ethene_xyz = vlx.Molecule.smiles_to_xyz("C=C")
ethyne_xyz = vlx.Molecule.smiles_to_xyz("C#C")
viewer = p3d.view(viewergrid=(1, 3), width=400, height=200, linked=False)
viewer.addModel(ethane_xyz, "xyz", viewer=(0, 0))
viewer.addModel(ethene_xyz, "xyz", viewer=(0, 1))
viewer.addModel(ethyne_xyz, "xyz", viewer=(0, 2))
viewer.setViewStyle({"style": "outline", "width": 0.05})
viewer.setStyle({"stick": {}, "sphere": {"scale": 0.25}})
viewer.zoomTo()
viewer.show()
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
Branching and substitutions#
Branches are given within paranthesis, as connected to the atom just before the paranthesis. These branches can be nested, for more complex molecules.
Note
Several valid SMILES strings can typically be written for the same molecule.
Hexane and 3-ethylheptane are then written as:
hexane_xyz = vlx.Molecule.smiles_to_xyz("CCCCCC")
ethylheptane_xyz = vlx.Molecule.smiles_to_xyz("CCCCC(CC)CC")
viewer = p3d.view(viewergrid=(1, 2), width=500, height=200, linked=False)
viewer.addModel(hexane_xyz, "xyz", viewer=(0, 0))
viewer.addModel(ethylheptane_xyz, "xyz", viewer=(0, 1))
viewer.setViewStyle({"style": "outline", "width": 0.05})
viewer.setStyle({"stick": {}, "sphere": {"scale": 0.25}})
viewer.zoomTo()
viewer.show()
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
Substitutions are done by replacing the relevant atom, or adding it to replace an implicit hydrogen. For example, methanol, 1,1-difluoroethene, and 1,2-difluoroethene can be given as:
methanol_xyz = vlx.Molecule.smiles_to_xyz("CO")
difluoro11_xyz = vlx.Molecule.smiles_to_xyz("C=C(F)F")
difluoro12_xyz = vlx.Molecule.smiles_to_xyz("FC=CF")
viewer = p3d.view(viewergrid=(1, 3), width=400, height=200, linked=False)
viewer.addModel(methanol_xyz, "xyz", viewer=(0, 0))
viewer.addModel(difluoro11_xyz, "xyz", viewer=(0, 1))
viewer.addModel(difluoro12_xyz, "xyz", viewer=(0, 2))
viewer.setViewStyle({"style": "outline", "width": 0.05})
viewer.setStyle({"stick": {}, "sphere": {"scale": 0.25}})
viewer.zoomTo()
viewer.show()
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
Optimization using force field#
Following the construction of a conformer from a SMILES string, it is generally advised to perform a short optimization to obtain a structure close to a local minimum.
As an example, for ethanol the initial conformer will have the OH-group with any rotation to CH\(_3\), and following a force field optimization it takes on the staggered conformation:
ethanol_initial_xyz = vlx.Molecule.smiles_to_xyz("CO", optimize=False)
ethanol_xyz = vlx.Molecule.smiles_to_xyz("CO")
viewer = p3d.view(viewergrid=(1, 2), width=400, height=200, linked=False)
viewer.addModel(ethanol_initial_xyz, "xyz", viewer=(0, 0))
viewer.addModel(ethanol_xyz, "xyz", viewer=(0, 1))
viewer.setViewStyle({"style": "outline", "width": 0.05})
viewer.setStyle({"stick": {}, "sphere": {"scale": 0.25}})
viewer.zoomTo()
viewer.show()
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
Ring systems#
Ring systems are constructed by labeling part(s) of the ring and then closing the ring by having the SMILES string go back to the labeled atom(s). This can be performed several time, so that, e.g. cyclohexane and norbornane are written as:
cyclohexane_xyz = vlx.Molecule.smiles_to_xyz("C1CCCCC1")
norborane_xyz = vlx.Molecule.smiles_to_xyz("C1C2CCC(C2)C1")
viewer = p3d.view(viewergrid=(1, 2), width=400, height=200, linked=False)
viewer.addModel(cyclohexane_xyz, "xyz", viewer=(0, 0))
viewer.addModel(norborane_xyz, "xyz", viewer=(0, 1))
viewer.setViewStyle({"style": "outline", "width": 0.05})
viewer.setStyle({"stick": {}, "sphere": {"scale": 0.25}})
viewer.zoomTo()
viewer.show()
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
Aromatic rings can be written by using alternating single and double bonds or using lower-case for the contributing atoms. For example, p-cymene can be written as both:
pcymene1_xyz = vlx.Molecule.smiles_to_xyz("C=C(C)C1=CC=C(C)C=C1")
pcymene2_xyz = vlx.Molecule.smiles_to_xyz("C=C(C)c1ccc(C)cc1")
viewer = p3d.view(viewergrid=(1, 2), width=500, height=200, linked=False)
viewer.addModel(pcymene1_xyz, "xyz", viewer=(0, 0))
viewer.addModel(pcymene2_xyz, "xyz", viewer=(0, 1))
viewer.setViewStyle({"style": "outline", "width": 0.05})
viewer.setStyle({"stick": {}, "sphere": {"scale": 0.25}})
viewer.zoomTo()
viewer.show()
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
More complex systems#
For more complex molecules, the easiest option is to check if Wikipedia, ChemSpider, or any other digital resources provide the SMILES string. This will work for many common molecules, and for the rest you can either consult more detailed guides on using SMILES and go from there, or use a graphical interface.
Note
No matter which option is chosen, we stress that you should always check that the structure you obtain is reasonably close to what is sought. For some systems you may need to sample many different conformers, and for PDB-files you might have to add hydrogens.
As an example, the structures NADH (left) and NAD\(^+\) (right) are illustrated below, using the SMILES string from Wikipedia. Nicotinamide adenine dinucleotide partake in metabolism through the redox reaction:
The resulting conformers need to be corrected, but we see that the general structure and the location of the hydrogen at the nicotinamide ring is correct.
nadh_xyz = vlx.Molecule.smiles_to_xyz(
"O=C(N)C1CC=C[N](C=1)[C@@H]2O[C@@H]([C@@H](O)[C@H]2O)COP([O-])(=O)OP(=O)([O-])OC[C@H]5O[C@@H](n4cnc3c(ncnc34)N)[C@H](O)[C@@H]5O"
)
nadp_xyz = vlx.Molecule.smiles_to_xyz(
"O=C(N)c1ccc[n+](c1)[C@@H]2O[C@@H]([C@@H](O)[C@H]2O)COP([O-])(=O)OP(=O)([O-])OC[C@H]5O[C@@H](n4cnc3c(ncnc34)N)[C@H](O)[C@@H]5O"
)
viewer = p3d.view(viewergrid=(1, 2), width=600, height=200, linked=False)
viewer.addModel(nadh_xyz, "xyz", viewer=(0, 0))
viewer.addModel(nadp_xyz, "xyz", viewer=(0, 1))
viewer.setViewStyle({"style": "outline", "width": 0.05})
viewer.setStyle({"stick": {}, "sphere": {"scale": 0.25}})
viewer.zoomTo()
viewer.show()
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
Using a structure downloaded from the PDB databank#
from urllib.request import urlopen
# Specify the PDB ID you want to download
ID = '1AOI'
url = 'https://files.rcsb.org/download/' + ID + '.pdb'
with urlopen(url) as fh:
pdb = fh.read().decode('utf-8')
view = p3d.view()
view.addModel(pdb,'pdb')
view.setBackgroundColor('white')
view.setStyle({"stick": {}, "sphere": {"scale": 0.25}})
view.zoomTo()
view.show()
3Dmol.js failed to load for some reason. Please check your browser console for error messages.