# Optimization methods#

The first class of special points on the PES that we will discuss are local energy minima. These correspond to equilibrium molecular structures and are characterized by a vanishing first order energy derivative combined with a Hessian matrix which has only positive eigenvalues. The procedure to determine a local minimum (i.e. finding the coordinates that minimize the energy) is called a structure optimization.

In practical terms, the ingredients to perform a geometry optimization include: (1) the initial molecular coordinates, (2) a choice of coordinate system, (3) the energy at a specific geometry $$E(\mathbf{x})$$, (4) the gradient $$\mathbf{g}(\mathbf{x})=\nabla E(\mathbf{x})$$, (5) the Hessian, and (6) a procedure to update the coordinates and Hessian and move on the potential energy surface towards lower energy.

Having addressed the issue of coordinate system, the remaining question is what procedure to use to move along the potential energy surface and arrive at a local energy minimum. There are several iterative methods to do this, some of which need only information on the energy gradient (e.g. gradient-descent, conjugate gradient), while others take into account also the Hessian (Newton–Raphson, quasi-Newton). For a detailed review of minimization techniques, see [Sny05].

The simplest optimization procedure is to repeatedly take a step in the direction opposite to the local gradient

(36)#$\begin{equation} \mathbf{x}_{i+1} = \mathbf{x}_i - k_i\mathbf{g}(\mathbf{x}_i) \nonumber \end{equation}$

where by $$\mathbf{x}_{i+1}$$ we denote the new coordinates (in a generic coordinate system – either Cartesian or internal coordinates), $$\mathbf{x}_i$$ are the coordinates at the previous step $$i$$, $$\nabla E$$ is the energy gradient and $$k_i$$ is the step size. The step size can either be kept constant, or adjusted at each iteration, e.g. by the line search procedure.

The gradient-descent method is simple to implement and is guaranteed to converge, but has the disadvantage that it requires many steps and becomes slow when close to the minimum where the gradient is small. It always converges to a local minimum, given enough steps.

### Implementation#

Let’s implement the gradient descent method. We will need to set up a molecule, an SCF driver and a gradient driver.

# Import section
import veloxchem as vlx
import py3Dmol as p3d
from veloxchem.veloxchemlib import bohr_in_angstroms
import numpy as np
from matplotlib import pyplot as plt

# Define the molecule
molecule_xyz = """3
water
O 0.000 0.000  0.000
H 0.000 0.000  0.950
H 0.896 0.000 -0.317
"""

# and basis set
basis_set_label = 'sto-3g'


You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol

Hide code cell source
# SCF Energy
scf_drv = vlx.ScfRestrictedDriver()
scf_drv.conv_thresh = 1.0e-6
# for DFT
# scf_drv.xcfun = "b3lyp"
# scf_drv.grid_level = 4



def gradient_descent_iteration(coordinates, gradient, step):
...
return new_coordinates

Hide code cell source
def gradient_descent_iteration(coordinates, gradient, step):
new_coordinates = coordinates - step * gradient
return new_coordinates


And the routine that runs the optimization:

def gradient_descent(molecule, basis, scf_driver, gradient_driver,
step=0.1, threshold=1e-3, max_iter=10):

# set ostream state to False, to avoid printout from every new scf calculation
ostream_state = scf_driver.ostream.state
scf_driver.ostream.state = False

iteration = 0

# atom labels (symbols)
labels = molecule.get_labels()

# initial atomc coordinates
old_coords = molecule.get_coordinates()
scf_results = scf_driver.compute(molecule, basis)
old_energy = scf_driver.get_scf_energy()

text = "Iteration     Energy (H)      "
text += "Max. displacement (bohr)    Gradient norm (H/bohr)"
print(text)

energies = [old_energy]
iterations = 

while (grad_norm >= threshold) and (iteration <= max_iter):
...
...
...

if iteration <= max_iter:
text = "\n   *** Gradient Descent converged in "
text += "%d iteration(s). *** " % iteration
print(text)
return new_molecule, iterations, energies
else:
print("\n   !!! Gradient Descent did not converge  !!! ")
return None, None, None

# Set the ostream state to initial value
scf_driver.ostream.state = ostream_state

Hide code cell source
def gradient_descent(molecule, basis, scf_driver, gradient_driver,
step=0.1, threshold=1e-3, max_iter=10):
# set ostream state to False, to avoid printout from every new scf calculation
ostream_state = scf_driver.ostream.state
scf_driver.ostream.state = False

iteration = 0
# atom labels (symbols)
labels = molecule.get_labels()

# initial atomic coordinates
old_coords = molecule.get_coordinates()

scf_results = scf_driver.compute(molecule, basis)
old_energy = scf_driver.get_scf_energy()

text = "Iteration     Energy (H)      "
text += "Max. displacement (bohr)    Gradient norm (H/bohr)"
print(text)
energies = [old_energy]
iterations = 
while (grad_norm >= threshold) and (iteration <= max_iter):

# calculate the energy and gradient corresponding to the new coordinates
new_molecule = vlx.molecule.Molecule(labels, coords, units='au')

scf_results = scf_driver.compute(new_molecule, basis)
energy = scf_driver.get_scf_energy()
energies.append(energy)

displacement = old_coords - coords
max_disp = np.amax(abs(displacement))

# calculate energy difference
delta_e = abs(energy - old_energy)
text = "   %3d.  %15.7f      " % (iteration, energy, )
text += "%15.7f          %15.7f" % (max_disp, grad_norm)
print(text)

# save
old_energy = energy
old_coords = coords
iteration += 1
iterations.append(iteration)

if iteration <= max_iter:
text = "\n   *** Gradient Descent converged in "
text += "%d iteration(s). *** " % iteration
print(text)
return new_molecule, iterations, energies
else:
print("\n   !!! Gradient Descent did not converge  !!! ")
return None, None, None
scf_driver.ostream.state = ostream_state

opt_mol, gd_iterations, gd_energies = gradient_descent(molecule, basis,
threshold=1e-2,
max_iter=100)

Hide code cell output
Starting gradient descent:

Iteration     Energy (H)      Max. displacement (bohr)    Gradient norm (H/bohr)

     0.      -74.9605109            0.0071311                0.0994035

     1.      -74.9614427            0.0063668                0.0881487

     2.      -74.9621766            0.0056990                0.0784445

     3.      -74.9627589            0.0051136                0.0700529

     4.      -74.9632239            0.0045989                0.0627797

     5.      -74.9635981            0.0041452                0.0564644

     6.      -74.9639013            0.0037442                0.0509731

     7.      -74.9641488            0.0033892                0.0461931

     8.      -74.9643524            0.0030741                0.0420288

     9.      -74.9645214            0.0027940                0.0383981

    10.      -74.9646627            0.0025445                0.0352305

    11.      -74.9647818            0.0023220                0.0324649

    12.      -74.9648833            0.0021232                0.0300478

    13.      -74.9649703            0.0019453                0.0279327

    14.      -74.9650457            0.0017860                0.0260789

    15.      -74.9651116            0.0016430                0.0244504

    16.      -74.9651696            0.0015146                0.0230160

    17.      -74.9652211            0.0013991                0.0217483

    18.      -74.9652671            0.0012950                0.0206236

    19.      -74.9653086            0.0012012                0.0196211

    20.      -74.9653462            0.0011164                0.0187230

    21.      -74.9653805            0.0010397                0.0179140

    22.      -74.9654119            0.0009703                0.0171810

    23.      -74.9654409            0.0009260                0.0165129

    24.      -74.9654676            0.0009062                0.0159001

    25.      -74.9654924            0.0008862                0.0153348

    26.      -74.9655155            0.0008664                0.0148103

    27.      -74.9655371            0.0008466                0.0143208

    28.      -74.9655573            0.0008270                0.0138617

    29.      -74.9655762            0.0008075                0.0134291

    30.      -74.9655940            0.0007883                0.0130196

    31.      -74.9656107            0.0007693                0.0126305

    32.      -74.9656264            0.0007506                0.0122596

    33.      -74.9656412            0.0007322                0.0119049

    34.      -74.9656552            0.0007141                0.0115647

    35.      -74.9656683            0.0006963                0.0112378

    36.      -74.9656808            0.0006788                0.0109230

    37.      -74.9656926            0.0006617                0.0106194

    38.      -74.9657037            0.0006449                0.0103261

    39.      -74.9657142            0.0006284                0.0100423

    40.      -74.9657241            0.0006123                0.0097677

*** Gradient Descent converged in 41 iteration(s). ***

Hide code cell source
plt.figure(figsize=(6,4))

plt.axis(xmin=-2, xmax=43)

plt.xlabel('Iteration')
plt.ylabel('Energy (H)')
plt.legend()
plt.tight_layout(); plt.show() An improved method over the gradient-descent approach is to use the “gradient history” (steps $$i$$ and $$i-1$$) to determine the coordinates at step $$i+1$$

(37)#$\begin{equation} \mathbf{x}_{i+1} = \mathbf{x}_i - k_i \mathbf{h}_i \nonumber \end{equation}$

with $$\mathbf{h}_i = \mathbf{g}(\mathbf{x}_i)+\gamma_i\mathbf{h}_{i-1}$$. The function $$\gamma_i$$ contains gradient information from steps $$i$$ and $$i-1$$ and can be defined in different ways. For example, in the Fletcher-Reeves conjugate gradient method

(38)#$\begin{equation} \gamma_i = \frac{|\mathbf{g}(\mathbf{x}_i)|^2}{|\mathbf{g}(\mathbf{x}_{i-1})|^2} \nonumber \end{equation}$

### Implementation#

Implement the conjugate gradient optimization algorithm and compare its performance to the gradient descent method.

def conjugate_gradient_iteration(coordinates, h, step):
...
return new_coordinates

Hide code cell source
def conjugate_gradient_iteration(coordinates, h, step):
new_coordinates = coordinates - step * h
return new_coordinates

def conjugate_gradient(molecule, basis, scf_driver, gradient_driver,
step=0.1, threshold=1e-3, max_iter=10):
# set ostream state to False, to avoid printout from every new scf calculation
ostream_state = scf_driver.ostream.state
scf_driver.ostream.state = False

iteration = 0

# atom labels (symbols)
labels = molecule.get_labels()

# initial atomc coordinates
old_coords = molecule.get_coordinates()

scf_results = scf_driver.compute(molecule, basis)
old_energy = scf_driver.get_scf_energy()

text = "Iteration     Energy (H)      Max. displacement (bohr)"
text += "Gradient norm (H/bohr)      gamma"
print(text)
energies = [old_energy]
iterations = 
while (grad_norm >= threshold) and (iteration <= max_iter):

...
...
...

text = "   %3d.  %15.7f      " % (iteration, energy)
text += "%15.7f          %15.7f" % (max_disp, grad_norm)
text += "         %15.7f" % (gamma)

h = gradient + gamma * old_h

# save
old_energy = energy
old_coords = coords
old_h = h
iteration += 1
iterations.append(iteration)

if iteration <= max_iter:
text = "\n   *** Conjugate Gradient converged in "
text += "%d iteration(s). *** " % iteration
print(text)
return new_molecule, iterations, energies
else:
print("\n   !!! Conjugate Gradient did not converge  !!! ")
return None, None, None

# restore ostream state to its original value
scf_driver.ostream.state = ostream_state

Hide code cell source
def conjugate_gradient(molecule, basis, scf_driver, gradient_driver,
step=0.1, threshold=1e-3, max_iter=10):
# set ostream state to False, to avoid printout from every new scf calculation
ostream_state = scf_driver.ostream.state
scf_driver.ostream.state = False

iteration = 0

# atom labels (symbols)
labels = molecule.get_labels()

# initial atomc coordinates
old_coords = molecule.get_coordinates()

scf_results = scf_driver.compute(molecule, basis)
old_energy = scf_driver.get_scf_energy()

text = "Iteration     Energy (H)      Max. displacement (bohr)"
text += "     Gradient norm (H/bohr)      gamma"
print(text)
energies = [old_energy]
iterations = 
while (grad_norm >= threshold) and (iteration <= max_iter):

# calculate the energy and gradient corresponding to the new coordinates
new_molecule = vlx.molecule.Molecule(labels, coords, units='au')

scf_results = scf_driver.compute(new_molecule, basis)
energy = scf_driver.get_scf_energy()

displacement = old_coords - coords
max_disp = np.amax(abs(displacement))

energies.append(energy)

text = "   %3d.  %15.7f      " % (iteration, energy)
text += "%15.7f          %15.7f" % (max_disp, grad_norm)
text += "         %15.7f" % (gamma)
print(text)

h = gradient + gamma * old_h

# save
old_energy = energy
old_coords = coords
old_h = h
iteration += 1
iterations.append(iteration)

if iteration <= max_iter:
text = "\n   *** Conjugate Gradient converged in "
text += "%d iteration(s). *** " % iteration
print(text)
return new_molecule, iterations, energies
else:
print("\n   !!! Conjugate Gradient did not converge  !!! ")
return None, None, None
scf_driver.ostream.state = ostream_state

cg_opt_mol, cg_iterations, cg_energies = conjugate_gradient(molecule, basis,
threshold=1e-2,
max_iter=50)

Hide code cell output
Starting gradient descent:

Iteration     Energy (H)      Max. displacement (bohr)     Gradient norm (H/bohr)      gamma

     0.      -74.9605109            0.0071311                0.0994035               0.7806759

     1.      -74.9621708            0.0119338                0.0783209               0.6208010

     2.      -74.9634822            0.0125176                0.0576486               0.5417781

     3.      -74.9642742            0.0106076                0.0418381               0.5267046

     4.      -74.9647155            0.0083648                0.0311623               0.5547716

     5.      -74.9649702            0.0066283                0.0244842               0.6173223

     6.      -74.9651335            0.0054866                0.0206065               0.7083367

     7.      -74.9652543            0.0048212                0.0185662               0.8117767

     8.      -74.9653573            0.0044747                0.0176220               0.9008706

     9.      -74.9654549            0.0044303                0.0171710               0.9494769

    10.      -74.9655522            0.0051889                0.0166759               0.9431610

    11.      -74.9656477            0.0057511                0.0156704               0.8830443

    12.      -74.9657341            0.0057875                0.0138841               0.7850059

    13.      -74.9658021            0.0050952                0.0114119               0.6755822

    14.      -74.9658474            0.0038490                0.0087073               0.5821731

*** Conjugate Gradient converged in 15 iteration(s). ***

Hide code cell source
plt.figure(figsize=(6,4))

# darw a line which shows the energy obtained using gradient descent
plt.plot([-2,43], [gd_energies[-1], gd_energies[-1]], '--', color='gray')
plt.axis(xmin=-2, xmax=43)

plt.xlabel('Iteration')
plt.ylabel('Energy (H)')
plt.legend()
plt.tight_layout(); plt.show() ## Newton–Raphson and quasi-Newton#

The next step in the hierarchy of minimization methods is to use both the first and second order energy derivatives (i.e. gradient $$\mathbf{g}$$ and Hessian $$\mathbf{H}$$) in determining the next step in conformation space. This is based on a quadratic approximation for the local shape of the PES

(39)#$\begin{equation} E(\mathbf{x}+\Delta\mathbf{x}) \approx E(\mathbf{x}) + \mathbf{g}(\mathbf{x})\Delta\mathbf{x} + \frac{1}{2}\Delta\mathbf{x}^\mathrm{T}\mathbf{H}\Delta\mathbf{x} \label{eq:quadratic_approx_PES} \nonumber \end{equation}$

Here, $$\Delta \mathbf{x}$$ is the Newton step used to update the coordinates

(40)#\begin{align} \Delta \mathbf{x} &= -\mathbf{H}^{-1}\mathbf{g}(\mathbf{x})\label{eq:Newton_step} \nonumber \\ \mathbf{x}_\mathrm{i+1} &= \mathbf{x}_\mathrm{i}+\Delta \mathbf{x} \nonumber \end{align}

When redundant internal coordinates are used, it is important to ensure that the displacements are only performed in the non-redundant region of the internal coordinate space. This is achieved by applying a projector $$\mathbf{P}=\mathbf{G}^{-}\mathbf{G}$$ to the gradient and Hessian before constructing the Newton step [Neal09]:

(41)#\begin{align} \tilde{\mathbf{g}}_q &= \mathbf{P}\mathbf{g}(\mathbf{q}) \, , \nonumber \\ \tilde{\mathbf{H}}_q &= \mathbf{P}\mathbf{H}_q\mathbf{P}+\alpha(1-\mathbf{P}) \nonumber \,, \end{align}

where $$\alpha$$ is an arbitrary large value (e.g. 1000).

As evident from the equations above, the second order derivatives of the energy with respect to nuclear displacements are required to generate the Newton step. The direct computation of these derivatives (which compose the Hessian matrix) is quite expensive, but good approximations for the Hessian can be constructed using the gradient history. For example, the Broyden-Fletcher-Goldfarb-Shanno (BFGS, used by geomeTRIC) approach uses the relation:

(42)#\begin{align} \mathbf{H}_{i+1} &= \mathbf{H}_i + \frac{\mathbf{h}^{}_i\mathbf{h}_i^\mathrm{T}}{\mathbf{h}_i^\mathrm{T}\mathbf{s}^{}_i} - \frac{\mathbf{H}_i\mathbf{s}_i\left(\mathbf{H}_i\mathbf{s}_i\right)_{}^\mathrm{T}}{\mathbf{s}_i^\mathrm{T}\mathbf{H}_i^{}\mathbf{s}_i^{}} \, ,\nonumber \end{align}

with,

(43)#\begin{align} \mathbf{h}_i &= \mathbf{g}(\mathbf{x}_{i+1}) - \mathbf{g}(\mathbf{x}_i) \, \nonumber \\ \mathbf{s}_i &= \mathbf{x}_{i+1} - \mathbf{x}_{i} \nonumber \,, \end{align}

to update the Hessian at step $$i+1$$ using the Hessian at step $$i$$ and information about the gradient at the current and previous step.

Quasi-Newton methods, i.e., methods that use approximate Hessians, achieve a very quick convergence at the same computational cost as the gradient-descent method.

To compare the gradient descent and conjugate gradient algorithms with the quasi-Newton method, we will use the quasi-Newton implementation from geomeTRIC.

# Let's run the SCF and gradient again, to make sure we start from the same point:
scf_drv.ostream.state = False

scf_drv.compute(molecule, basis)

scf_drv.ostream.state = True

# Optimization settings and driver:
optimization_settings = {'coordsys' : 'tric'}
# 'tric': TRIC, default
# 'cart': Cartesian
# 'prim': Primitive Internal Coordinates
# 'dlc': Delocalized Internal Coordinates
# 'hdlc': Hybrid Delocalized Internal Coordinates

opt_drv.update_settings(opt_dict=optimization_settings)

opt_mol = opt_drv.compute(molecule, basis, scf_drv)

Hide code cell output

Optimization Driver Setup
===========================

Coordinate System       :    TRIC
Constraints             :    No
Max. Number of Steps    :    300
Transition State        :    No
Hessian                 :    never


* Info * Computing energy and gradient...

* Info *   Energy   : -74.9593193103 a.u.
* Info *   Gradient : 6.495395e-02 a.u. (RMS)
* Info *              8.771955e-02 a.u. (Max)
* Info *   Time     : 1.56 sec


* Info * Computing energy and gradient...

* Info *   Energy   : -74.9648452542 a.u.
* Info *   Gradient : 1.544611e-02 a.u. (RMS)
* Info *              1.844116e-02 a.u. (Max)
* Info *   Time     : 1.61 sec


* Info * Computing energy and gradient...

* Info *   Energy   : -74.9657830293 a.u.
* Info *   Gradient : 6.194312e-03 a.u. (RMS)
* Info *              7.272241e-03 a.u. (Max)
* Info *   Time     : 1.53 sec


* Info * Computing energy and gradient...

* Info *   Energy   : -74.9659001353 a.u.
* Info *   Gradient : 9.009561e-04 a.u. (RMS)
* Info *              1.171116e-03 a.u. (Max)
* Info *   Time     : 1.55 sec


* Info * Computing energy and gradient...

* Info *   Energy   : -74.9659012109 a.u.
* Info *   Gradient : 6.320209e-05 a.u. (RMS)
* Info *              8.691343e-05 a.u. (Max)
* Info *   Time     : 1.57 sec


* Info * Computing energy and gradient...

* Info *   Energy   : -74.9659012173 a.u.
* Info *   Gradient : 3.105557e-07 a.u. (RMS)
* Info *              4.011695e-07 a.u. (Max)
* Info *   Time     : 1.58 sec


* Info * Geometry optimization completed.

Molecular Geometry (Angstroms)
================================

Atom         Coordinate X          Coordinate Y          Coordinate Z

O          -0.047424122161        0.000000000136       -0.033716002296
H           0.034041957195       -0.000000000645        0.952334286422
H           0.909380962448       -0.000000000408       -0.285619981741

Summary of Geometry Optimization
==================================

Opt.Step       Energy (a.u.)       Energy Change (a.u.)       Displacement (RMS, Max)
-------------------------------------------------------------------------------------
0         -74.959319310327        0.000000000000         0.000e+00      0.000e+00
1         -74.964845254244       -0.005525943917         6.683e-02      7.980e-02
2         -74.965783029343       -0.000937775099         3.382e-02      3.774e-02
3         -74.965900135283       -0.000117105940         8.081e-03      9.810e-03
4         -74.965901210888       -0.000001075605         4.828e-04      6.814e-04
5         -74.965901217298       -0.000000006410         4.323e-05      5.698e-05

Statistical Deviation between
Optimized Geometry and Initial Geometry
=========================================

Internal Coord.        RMS deviation         Max. deviation
-----------------------------------------------------------
Bonds               0.039 Angstrom        0.039 Angstrom
Angles              9.457 degree          9.457 degree


                                     *** Time spent in Optimization Driver: 9.49 sec


Hide code cell source
tric_energies = [-74.959319310327, -74.964845254243, -74.965783029341, -74.965900135283]
tric_iterations = [0, 1, 2, 3]

Hide code cell source
plt.figure(figsize=(6,4))

plt.plot(tric_iterations, tric_energies,'o-', label='TRIC quasi-Newton')

# darw a line which shows the energy obtained using gradient descent
plt.plot([-2,43], [gd_energies[-1], gd_energies[-1]], '--', color='gray')
plt.axis(xmin=-2, xmax=43)

plt.xlabel('Iteration')
plt.ylabel('Energy (H)') 