Optimization methods#
The first class of special points on the PES that we will discuss are local energy minima. These correspond to equilibrium molecular structures and are characterized by a vanishing first order energy derivative combined with a Hessian matrix which has only positive eigenvalues. The procedure to determine a local minimum (i.e. finding the coordinates that minimize the energy) is called a structure optimization.

In practical terms, the ingredients to perform a geometry optimization include: (1) the initial molecular coordinates, (2) a choice of coordinate system, (3) the energy at a specific geometry \(E(\mathbf{x})\), (4) the gradient \(\mathbf{g}(\mathbf{x})=\nabla E(\mathbf{x})\), (5) the Hessian, and (6) a procedure to update the coordinates and Hessian and move on the potential energy surface towards lower energy.
Having addressed the issue of coordinate system, the remaining question is what procedure to use to move along the potential energy surface and arrive at a local energy minimum. There are several iterative methods to do this, some of which need only information on the energy gradient (e.g. gradient-descent, conjugate gradient), while others take into account also the Hessian (Newton–Raphson, quasi-Newton). For a detailed review of minimization techniques, see [Sny05].
Gradient descent#
The simplest optimization procedure is to repeatedly take a step in the direction opposite to the local gradient
where by \(\mathbf{x}_{i+1}\) we denote the new coordinates (in a generic coordinate system – either Cartesian or internal coordinates), \(\mathbf{x}_i\) are the coordinates at the previous step \(i\), \(\nabla E\) is the energy gradient and \(k_i\) is the step size. The step size can either be kept constant, or adjusted at each iteration, e.g. by the line search procedure.
The gradient-descent method is simple to implement and is guaranteed to converge, but has the disadvantage that it requires many steps and becomes slow when close to the minimum where the gradient is small. It always converges to a local minimum, given enough steps.
Implementation#
Let’s implement the gradient descent method. We will need to set up a molecule, an SCF driver and a gradient driver.
# Import section
import veloxchem as vlx
import py3Dmol as p3d
from veloxchem.veloxchemlib import bohr_in_angstroms
import numpy as np
from matplotlib import pyplot as plt
# Define the molecule
molecule_xyz = """3
water
O 0.000 0.000 0.000
H 0.000 0.000 0.950
H 0.896 0.000 -0.317
"""
molecule = vlx.Molecule.read_xyz_string(molecule_xyz)
# and basis set
basis_set_label = 'sto-3g'
basis = vlx.MolecularBasis.read(molecule, basis_set_label)
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
Show code cell source
# SCF Energy
scf_drv = vlx.ScfRestrictedDriver()
scf_drv.conv_thresh = 1.0e-6
# for DFT
# scf_drv.xcfun = "b3lyp"
# scf_drv.grid_level = 4
# Gradient
grad_driver = vlx.ScfGradientDriver()
Now, write your own routine to run one gradient descent iteration:
def gradient_descent_iteration(coordinates, gradient, step):
...
return new_coordinates
Show code cell source
def gradient_descent_iteration(coordinates, gradient, step):
new_coordinates = coordinates - step * gradient
return new_coordinates
And the routine that runs the optimization:
def gradient_descent(molecule, basis, scf_driver, gradient_driver,
step=0.1, threshold=1e-3, max_iter=10):
# set ostream state to False, to avoid printout from every new scf calculation
ostream_state = scf_driver.ostream.state
scf_driver.ostream.state = False
gradient_driver.ostream.state = False
iteration = 0
grad_norm = 100
# atom labels (symbols)
labels = molecule.get_labels()
# initial atomc coordinates
old_coords = molecule.get_coordinates()
scf_results = scf_driver.compute(molecule, basis)
old_energy = scf_driver.get_scf_energy()
gradient_driver.compute(molecule, basis, scf_driver)
old_gradient = gradient_driver.gradient
print("Starting gradient descent:\n")
text = "Iteration Energy (H) "
text += "Max. displacement (bohr) Gradient norm (H/bohr)"
print(text)
energies = [old_energy]
iterations = [0]
while (grad_norm >= threshold) and (iteration <= max_iter):
...
...
...
if iteration <= max_iter:
text = "\n *** Gradient Descent converged in "
text += "%d iteration(s). *** " % iteration
print(text)
return new_molecule, iterations, energies
else:
print("\n !!! Gradient Descent did not converge !!! ")
return None, None, None
# Set the ostream state to initial value
scf_driver.ostream.state = ostream_state
gradient_driver.ostream.state = ostream_state
Show code cell source
def gradient_descent(molecule, basis, scf_driver, gradient_driver,
step=0.1, threshold=1e-3, max_iter=10):
# set ostream state to False, to avoid printout from every new scf calculation
ostream_state = scf_driver.ostream.state
scf_driver.ostream.state = False
gradient_driver.ostream.state = False
iteration = 0
grad_norm = 100
# atom labels (symbols)
labels = molecule.get_labels()
# initial atomic coordinates
old_coords = molecule.get_coordinates()
scf_results = scf_driver.compute(molecule, basis)
old_energy = scf_driver.get_scf_energy()
gradient_driver.compute(molecule, basis, scf_driver)
old_gradient = gradient_driver.gradient
print("Starting gradient descent:\n")
text = "Iteration Energy (H) "
text += "Max. displacement (bohr) Gradient norm (H/bohr)"
print(text)
energies = [old_energy]
iterations = [0]
while (grad_norm >= threshold) and (iteration <= max_iter):
coords = gradient_descent_iteration(old_coords, old_gradient, step)
# calculate the energy and gradient corresponding to the new coordinates
new_molecule = vlx.molecule.Molecule(labels, coords, units='au')
scf_results = scf_driver.compute(new_molecule, basis)
energy = scf_driver.get_scf_energy()
energies.append(energy)
gradient_driver.compute(new_molecule, basis, scf_driver)
gradient = gradient_driver.gradient
grad_norm = np.linalg.norm(gradient)
displacement = old_coords - coords
max_disp = np.amax(abs(displacement))
# calculate energy difference
delta_e = abs(energy - old_energy)
text = " %3d. %15.7f " % (iteration, energy, )
text += "%15.7f %15.7f" % (max_disp, grad_norm)
print(text)
# save
old_energy = energy
old_gradient = gradient
old_coords = coords
iteration += 1
iterations.append(iteration)
if iteration <= max_iter:
text = "\n *** Gradient Descent converged in "
text += "%d iteration(s). *** " % iteration
print(text)
return new_molecule, iterations, energies
else:
print("\n !!! Gradient Descent did not converge !!! ")
return None, None, None
scf_driver.ostream.state = ostream_state
gradient_driver.ostream.state = ostream_state
opt_mol, gd_iterations, gd_energies = gradient_descent(molecule, basis,
scf_drv, grad_driver,
threshold=1e-2,
max_iter=100)
Show code cell output
Starting gradient descent:
Iteration Energy (H) Max. displacement (bohr) Gradient norm (H/bohr)
0. -74.9605109 0.0071311 0.0994035
1. -74.9614427 0.0063668 0.0881487
2. -74.9621766 0.0056990 0.0784445
3. -74.9627589 0.0051136 0.0700529
4. -74.9632239 0.0045989 0.0627797
5. -74.9635981 0.0041452 0.0564644
6. -74.9639013 0.0037442 0.0509731
7. -74.9641488 0.0033892 0.0461931
8. -74.9643524 0.0030741 0.0420288
9. -74.9645214 0.0027940 0.0383981
10. -74.9646627 0.0025445 0.0352305
11. -74.9647818 0.0023220 0.0324649
12. -74.9648833 0.0021232 0.0300478
13. -74.9649703 0.0019453 0.0279327
14. -74.9650457 0.0017860 0.0260789
15. -74.9651116 0.0016430 0.0244504
16. -74.9651696 0.0015146 0.0230160
17. -74.9652211 0.0013991 0.0217483
18. -74.9652671 0.0012950 0.0206236
19. -74.9653086 0.0012012 0.0196211
20. -74.9653462 0.0011164 0.0187230
21. -74.9653805 0.0010397 0.0179140
22. -74.9654119 0.0009703 0.0171810
23. -74.9654409 0.0009260 0.0165129
24. -74.9654676 0.0009062 0.0159001
25. -74.9654924 0.0008862 0.0153348
26. -74.9655155 0.0008664 0.0148103
27. -74.9655371 0.0008466 0.0143208
28. -74.9655573 0.0008270 0.0138617
29. -74.9655762 0.0008075 0.0134291
30. -74.9655940 0.0007883 0.0130196
31. -74.9656107 0.0007693 0.0126305
32. -74.9656264 0.0007506 0.0122596
33. -74.9656412 0.0007322 0.0119049
34. -74.9656552 0.0007141 0.0115647
35. -74.9656683 0.0006963 0.0112378
36. -74.9656808 0.0006788 0.0109230
37. -74.9656926 0.0006617 0.0106194
38. -74.9657037 0.0006449 0.0103261
39. -74.9657142 0.0006284 0.0100423
40. -74.9657241 0.0006123 0.0097677
*** Gradient Descent converged in 41 iteration(s). ***
Show code cell source
plt.figure(figsize=(6,4))
plt.plot(gd_iterations, gd_energies,'o-', label='Gradient descent')
plt.axis(xmin=-2, xmax=43)
plt.xlabel('Iteration')
plt.ylabel('Energy (H)')
plt.title("Gradient descent convergence")
plt.legend()
plt.tight_layout(); plt.show()

Conjugate gradient#
An improved method over the gradient-descent approach is to use the “gradient history” (steps \(i\) and \(i-1\)) to determine the coordinates at step \(i+1\)
with \(\mathbf{h}_i = \mathbf{g}(\mathbf{x}_i)+\gamma_i\mathbf{h}_{i-1}\). The function \(\gamma_i\) contains gradient information from steps \(i\) and \(i-1\) and can be defined in different ways. For example, in the Fletcher-Reeves conjugate gradient method
Implementation#
Implement the conjugate gradient optimization algorithm and compare its performance to the gradient descent method.
def conjugate_gradient_iteration(coordinates, h, step):
...
return new_coordinates
Show code cell source
def conjugate_gradient_iteration(coordinates, h, step):
new_coordinates = coordinates - step * h
return new_coordinates
def conjugate_gradient(molecule, basis, scf_driver, gradient_driver,
step=0.1, threshold=1e-3, max_iter=10):
# set ostream state to False, to avoid printout from every new scf calculation
ostream_state = scf_driver.ostream.state
scf_driver.ostream.state = False
gradient_driver.ostream.state = False
iteration = 0
grad_norm = 100
# atom labels (symbols)
labels = molecule.get_labels()
# initial atomc coordinates
old_coords = molecule.get_coordinates()
scf_results = scf_driver.compute(molecule, basis)
old_energy = scf_driver.get_scf_energy()
gradient_driver.compute(molecule, basis, scf_driver)
old_gradient = gradient_driver.gradient
old_h = old_gradient
print("Starting gradient descent:\n")
text = "Iteration Energy (H) Max. displacement (bohr)"
text += "Gradient norm (H/bohr) gamma"
print(text)
energies = [old_energy]
iterations = [0]
while (grad_norm >= threshold) and (iteration <= max_iter):
...
...
...
gamma = grad_norm**2 / old_grad_norm**2
text = " %3d. %15.7f " % (iteration, energy)
text += "%15.7f %15.7f" % (max_disp, grad_norm)
text += " %15.7f" % (gamma)
h = gradient + gamma * old_h
# save
old_energy = energy
old_gradient = gradient
old_coords = coords
old_h = h
iteration += 1
iterations.append(iteration)
if iteration <= max_iter:
text = "\n *** Conjugate Gradient converged in "
text += "%d iteration(s). *** " % iteration
print(text)
return new_molecule, iterations, energies
else:
print("\n !!! Conjugate Gradient did not converge !!! ")
return None, None, None
# restore ostream state to its original value
scf_driver.ostream.state = ostream_state
gradient_driver.ostream.state = ostream_state
Show code cell source
def conjugate_gradient(molecule, basis, scf_driver, gradient_driver,
step=0.1, threshold=1e-3, max_iter=10):
# set ostream state to False, to avoid printout from every new scf calculation
ostream_state = scf_driver.ostream.state
scf_driver.ostream.state = False
gradient_driver.ostream.state = False
iteration = 0
grad_norm = 100
# atom labels (symbols)
labels = molecule.get_labels()
# initial atomc coordinates
old_coords = molecule.get_coordinates()
scf_results = scf_driver.compute(molecule, basis)
old_energy = scf_driver.get_scf_energy()
gradient_driver.compute(molecule, basis, scf_driver)
old_gradient = gradient_driver.gradient
old_h = old_gradient
print("Starting gradient descent:\n")
text = "Iteration Energy (H) Max. displacement (bohr)"
text += " Gradient norm (H/bohr) gamma"
print(text)
energies = [old_energy]
iterations = [0]
while (grad_norm >= threshold) and (iteration <= max_iter):
coords = conjugate_gradient_iteration(old_coords, old_h, step)
# calculate the energy and gradient corresponding to the new coordinates
new_molecule = vlx.molecule.Molecule(labels, coords, units='au')
scf_results = scf_driver.compute(new_molecule, basis)
energy = scf_driver.get_scf_energy()
gradient_driver.compute(new_molecule, basis, scf_driver)
gradient = gradient_driver.gradient
grad_norm = np.linalg.norm(gradient)
old_grad_norm = np.linalg.norm(old_gradient)
displacement = old_coords - coords
max_disp = np.amax(abs(displacement))
energies.append(energy)
gamma = grad_norm**2 / old_grad_norm**2
text = " %3d. %15.7f " % (iteration, energy)
text += "%15.7f %15.7f" % (max_disp, grad_norm)
text += " %15.7f" % (gamma)
print(text)
h = gradient + gamma * old_h
# save
old_energy = energy
old_gradient = gradient
old_coords = coords
old_h = h
iteration += 1
iterations.append(iteration)
if iteration <= max_iter:
text = "\n *** Conjugate Gradient converged in "
text += "%d iteration(s). *** " % iteration
print(text)
return new_molecule, iterations, energies
else:
print("\n !!! Conjugate Gradient did not converge !!! ")
return None, None, None
scf_driver.ostream.state = ostream_state
gradient_driver.ostream.state = ostream_state
cg_opt_mol, cg_iterations, cg_energies = conjugate_gradient(molecule, basis,
scf_drv, grad_driver,
threshold=1e-2,
max_iter=50)
Show code cell output
Starting gradient descent:
Iteration Energy (H) Max. displacement (bohr) Gradient norm (H/bohr) gamma
0. -74.9605109 0.0071311 0.0994035 0.7806759
1. -74.9621708 0.0119338 0.0783209 0.6208010
2. -74.9634822 0.0125176 0.0576486 0.5417781
3. -74.9642742 0.0106076 0.0418381 0.5267046
4. -74.9647155 0.0083648 0.0311623 0.5547716
5. -74.9649702 0.0066283 0.0244842 0.6173223
6. -74.9651335 0.0054866 0.0206065 0.7083367
7. -74.9652543 0.0048212 0.0185662 0.8117767
8. -74.9653573 0.0044747 0.0176220 0.9008706
9. -74.9654549 0.0044303 0.0171710 0.9494769
10. -74.9655522 0.0051889 0.0166759 0.9431610
11. -74.9656477 0.0057511 0.0156704 0.8830443
12. -74.9657341 0.0057875 0.0138841 0.7850059
13. -74.9658021 0.0050952 0.0114119 0.6755822
14. -74.9658474 0.0038490 0.0087073 0.5821731
*** Conjugate Gradient converged in 15 iteration(s). ***
Show code cell source
plt.figure(figsize=(6,4))
plt.plot(gd_iterations, gd_energies,'o-', label='Gradient descent')
plt.plot(cg_iterations, cg_energies,'o-', label='Conjugate gradient')
# darw a line which shows the energy obtained using gradient descent
plt.plot([-2,43], [gd_energies[-1], gd_energies[-1]], '--', color='gray')
plt.axis(xmin=-2, xmax=43)
plt.xlabel('Iteration')
plt.ylabel('Energy (H)')
plt.title("Conjugate gradient vs. gradient descent")
plt.legend()
plt.tight_layout(); plt.show()

Newton–Raphson and quasi-Newton#
The next step in the hierarchy of minimization methods is to use both the first and second order energy derivatives (i.e. gradient \(\mathbf{g}\) and Hessian \(\mathbf{H}\)) in determining the next step in conformation space. This is based on a quadratic approximation for the local shape of the PES
Here, \(\Delta \mathbf{x}\) is the Newton step used to update the coordinates
When redundant internal coordinates are used, it is important to ensure that the displacements are only performed in the non-redundant region of the internal coordinate space. This is achieved by applying a projector \(\mathbf{P}=\mathbf{G}^{-}\mathbf{G}\) to the gradient and Hessian before constructing the Newton step [Neal09]:
where \(\alpha\) is an arbitrary large value (e.g. 1000).
As evident from the equations above, the second order derivatives of the energy with respect to nuclear displacements are required to generate the Newton step. The direct computation of these derivatives (which compose the Hessian matrix) is quite expensive, but good approximations for the Hessian can be constructed using the gradient history. For example, the Broyden-Fletcher-Goldfarb-Shanno (BFGS, used by geomeTRIC) approach uses the relation:
with,
to update the Hessian at step \(i+1\) using the Hessian at step \(i\) and information about the gradient at the current and previous step.
Quasi-Newton methods, i.e., methods that use approximate Hessians, achieve a very quick convergence at the same computational cost as the gradient-descent method.
Comparison to gradient descent and conjugate gradient#
To compare the gradient descent and conjugate gradient algorithms with the quasi-Newton method, we will use the quasi-Newton implementation from geomeTRIC.
# Let's run the SCF and gradient again, to make sure we start from the same point:
scf_drv.ostream.state = False
grad_driver.ostream.state = False
scf_drv.compute(molecule, basis)
grad_driver.compute(molecule, basis, scf_drv)
scf_drv.ostream.state = True
grad_driver.ostream.state = True
# Optimization settings and driver:
optimization_settings = {'coordsys' : 'tric'}
# 'tric': TRIC, default
# 'cart': Cartesian
# 'prim': Primitive Internal Coordinates
# 'dlc': Delocalized Internal Coordinates
# 'hdlc': Hybrid Delocalized Internal Coordinates
opt_drv = vlx.OptimizationDriver(grad_driver)
opt_drv.update_settings(opt_dict=optimization_settings)
opt_mol = opt_drv.compute(molecule, basis, scf_drv)
Show code cell output
Optimization Driver Setup
===========================
Coordinate System : TRIC
Constraints : No
Max. Number of Steps : 300
Transition State : No
Hessian : never
* Info * Computing energy and gradient...
* Info * Energy : -74.9593193103 a.u.
* Info * Gradient : 6.495395e-02 a.u. (RMS)
* Info * 8.771955e-02 a.u. (Max)
* Info * Time : 1.56 sec
* Info * Computing energy and gradient...
* Info * Energy : -74.9648452542 a.u.
* Info * Gradient : 1.544611e-02 a.u. (RMS)
* Info * 1.844116e-02 a.u. (Max)
* Info * Time : 1.61 sec
* Info * Computing energy and gradient...
* Info * Energy : -74.9657830293 a.u.
* Info * Gradient : 6.194312e-03 a.u. (RMS)
* Info * 7.272241e-03 a.u. (Max)
* Info * Time : 1.53 sec
* Info * Computing energy and gradient...
* Info * Energy : -74.9659001353 a.u.
* Info * Gradient : 9.009561e-04 a.u. (RMS)
* Info * 1.171116e-03 a.u. (Max)
* Info * Time : 1.55 sec
* Info * Computing energy and gradient...
* Info * Energy : -74.9659012109 a.u.
* Info * Gradient : 6.320209e-05 a.u. (RMS)
* Info * 8.691343e-05 a.u. (Max)
* Info * Time : 1.57 sec
* Info * Computing energy and gradient...
* Info * Energy : -74.9659012173 a.u.
* Info * Gradient : 3.105557e-07 a.u. (RMS)
* Info * 4.011695e-07 a.u. (Max)
* Info * Time : 1.58 sec
* Info * Geometry optimization completed.
Molecular Geometry (Angstroms)
================================
Atom Coordinate X Coordinate Y Coordinate Z
O -0.047424122161 0.000000000136 -0.033716002296
H 0.034041957195 -0.000000000645 0.952334286422
H 0.909380962448 -0.000000000408 -0.285619981741
Summary of Geometry Optimization
==================================
Opt.Step Energy (a.u.) Energy Change (a.u.) Displacement (RMS, Max)
-------------------------------------------------------------------------------------
0 -74.959319310327 0.000000000000 0.000e+00 0.000e+00
1 -74.964845254244 -0.005525943917 6.683e-02 7.980e-02
2 -74.965783029343 -0.000937775099 3.382e-02 3.774e-02
3 -74.965900135283 -0.000117105940 8.081e-03 9.810e-03
4 -74.965901210888 -0.000001075605 4.828e-04 6.814e-04
5 -74.965901217298 -0.000000006410 4.323e-05 5.698e-05
Statistical Deviation between
Optimized Geometry and Initial Geometry
=========================================
Internal Coord. RMS deviation Max. deviation
-----------------------------------------------------------
Bonds 0.039 Angstrom 0.039 Angstrom
Angles 9.457 degree 9.457 degree
*** Time spent in Optimization Driver: 9.49 sec
Show code cell source
tric_energies = [-74.959319310327, -74.964845254243, -74.965783029341, -74.965900135283]
tric_iterations = [0, 1, 2, 3]
Show code cell source
plt.figure(figsize=(6,4))
plt.plot(gd_iterations, gd_energies,'o-', label='Gradient descent')
plt.plot(cg_iterations, cg_energies,'o-', label='Conjugate gradient')
plt.plot(tric_iterations, tric_energies,'o-', label='TRIC quasi-Newton')
# darw a line which shows the energy obtained using gradient descent
plt.plot([-2,43], [gd_energies[-1], gd_energies[-1]], '--', color='gray')
plt.axis(xmin=-2, xmax=43)
plt.xlabel('Iteration')
plt.ylabel('Energy (H)')
plt.title("Gradient descent vs. conjugate gradient vs. quasi-Newton")
plt.legend()
plt.tight_layout(); plt.show()

Note
GeomeTRIC offers different options for the type of coordinates used in the optimization: Cartesian, delocalized internal coordinates (DLC), hybrid delocalized internal coordinates (HDLC), as well as translation-rotation internal coordinates (TRIC). How does the choice of coordinates influence the optimization process?