Optimization methods#

The first class of special points on the PES that we will discuss are local energy minima. These correspond to equilibrium molecular structures and are characterized by a vanishing first order energy derivative combined with a Hessian matrix which has only positive eigenvalues. The procedure to determine a local minimum (i.e. finding the coordinates that minimize the energy) is called a structure optimization.

../../_images/PES.svg

In practical terms, the ingredients to perform a geometry optimization include: (1) the initial molecular coordinates, (2) a choice of coordinate system, (3) the energy at a specific geometry \(E(\boldsymbol{\xi})\), (4) the gradient \(\nabla E(\boldsymbol{\xi})\), (5) the Hessian, and (6) a procedure to update the coordinates and Hessian and move on the potential energy surface towards lower energy.

Having addressed the issue of coordinate system, the remaining question is what procedure to use to move along the potential energy surface and arrive at a local energy minimum. There are several iterative methods to do this, some of which need only information on the energy gradient (e.g. gradient-descent, conjugate gradient), while others take into account also the Hessian (Newton–Raphson, quasi-Newton). For a detailed review of minimization techniques, see [Sny05].

Gradient descent#

The simplest optimization procedure is to repeatedly take a step in the direction opposite to the local gradient:

(36)#\[\begin{equation} \mathbf{x}_{i+1} = \mathbf{x}_i - k_i\nabla E(\mathbf{x}_i) \nonumber \,, \end{equation}\]

where by \(\mathbf{x}_{i+1}\) we denote the new coordinates (in a generic coordinate system – either Cartesian or internal coordinates), \(\mathbf{x}_i\) are the coordinates at the previous step \(i\), \(\nabla E\) is the energy gradient and \(k_i\) is the step size. The step size can either be kept constant, or adjusted at each iteration, e.g. by the line search procedure.

The gradient-descent method is simple to implement and is guaranteed to converge, but has the disadvantage that it requires many steps and becomes slow when close to the minimum where the gradient is small. It always converges to a local minimum, given enough steps.

Implementation#

Let’s implement the gradient descent method. We will need to set up a molecule, an SCF driver and a gradient driver.

# Import section
import veloxchem as vlx
import py3Dmol as p3d
from veloxchem.veloxchemlib import bohr_in_angstroms
import numpy as np
from matplotlib import pyplot as plt
* Warning * Environment variable OMP_NUM_THREADS not set.
* Warning * Setting OMP_NUM_THREADS to 8.
* Warning * Setting MKL_THREADING_LAYER to "GNU".
# Molecule and basis set
basis_set_label = 'sto-3g'

molecule_xyz = """3
water
  O 0.000 0.000  0.000
  H 0.000 0.000  0.950
  H 0.896 0.000 -0.317
"""
molecule = vlx.Molecule.from_xyz_string(molecule_xyz)
basis = vlx.MolecularBasis.read(molecule, basis_set_label)

view = p3d.view(linked=True, viewergrid=(1,1),width=200,height=200)
view.addModel(molecule_xyz, 'xyz', viewer=(0,0))
view.setStyle({'stick': {}})
view.rotate(-90, "x")
view.zoomTo()
view.show()

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol

# SCF settings
scf_settings = {'conv_thresh': 1.0e-6}
method_settings = {} # for DFT use {'xcfun': 'b3lyp', 'grid_level': 4}

# SCF
scfdrv = vlx.ScfRestrictedDriver()
scfdrv.update_settings(scf_settings, method_settings)
scfdrv.compute(molecule, basis)

# Gradient
grad_driver = vlx.ScfGradientDriver(scf_drv=scfdrv)
grad_driver.compute(molecule, basis)
                                                                                                                          
                                            Self Consistent Field Driver Setup                                            
                                           ====================================                                           
                                                                                                                          
                   Wave Function Model             : Spin-Restricted Hartree-Fock                                         
                   Initial Guess Model             : Superposition of Atomic Densities                                    
                   Convergence Accelerator         : Two Level Direct Inversion of Iterative Subspace                     
                   Max. Number of Iterations       : 50                                                                   
                   Max. Number of Error Vectors    : 10                                                                   
                   Convergence Threshold           : 1.0e-06                                                              
                   ERI Screening Scheme            : Cauchy Schwarz + Density                                             
                   ERI Screening Mode              : Dynamic                                                              
                   ERI Screening Threshold         : 1.0e-12                                                              
                   Linear Dependence Threshold     : 1.0e-06                                                              
                                                                                                                          
* Info * Nuclear repulsion energy: 9.2514793147 a.u.                                                                      
                                                                                                                          
* Info * Overlap matrix computed in 0.00 sec.                                                                             
                                                                                                                          
* Info * Kinetic energy matrix computed in 0.00 sec.                                                                      
                                                                                                                          
* Info * Nuclear potential matrix computed in 0.00 sec.                                                                   
                                                                                                                          
* Info * Orthogonalization matrix computed in 0.02 sec.                                                                   
                                                                                                                          
* Info * SAD initial guess computed in 0.00 sec.                                                                          
                                                                                                                          
* Info * Starting Reduced Basis SCF calculation...                                                                        
* Info * ...done. SCF energy in reduced basis set: -74.959319307971 a.u. Time: 0.04 sec.                                  
                                                                                                                          
* Info * Overlap matrix computed in 0.00 sec.                                                                             
                                                                                                                          
* Info * Kinetic energy matrix computed in 0.00 sec.                                                                      
                                                                                                                          
* Info * Nuclear potential matrix computed in 0.00 sec.                                                                   
                                                                                                                          
* Info * Orthogonalization matrix computed in 0.00 sec.                                                                   
                                                                                                                          
                                                                                                                          
               Iter. | Hartree-Fock Energy | Energy Change | Gradient Norm | Max. Gradient | Density Change               
               --------------------------------------------------------------------------------------------               
                  1       -74.959319310077    0.0000000000      0.00002657      0.00000643      0.00000000                
                  2       -74.959319310295   -0.0000000002      0.00000889      0.00000203      0.00002226                
                  3       -74.959319310327   -0.0000000000      0.00000049      0.00000011      0.00001225                
                                                                                                                          
               *** SCF converged in 3 iterations. Time: 0.01 sec.                                                         
                                                                                                                          
               Spin-Restricted Hartree-Fock:                                                                              
               -----------------------------                                                                              
               Total Energy                       :      -74.9593193103 a.u.                                              
               Electronic Energy                  :      -84.2107986251 a.u.                                              
               Nuclear Repulsion Energy           :        9.2514793147 a.u.                                              
               ------------------------------------                                                                       
               Gradient Norm                      :        0.0000004869 a.u.                                              
                                                                                                                          
                                                                                                                          
               Ground State Information                                                                                   
               ------------------------                                                                                   
               Charge of Molecule            :  0.0                                                                       
               Multiplicity (2S+1)           :  1.0                                                                       
               Magnetic Quantum Number (M_S) :  0.0                                                                       
                                                                                                                          
                                                                                                                          
                                                 Spin Restricted Orbitals                                                 
                                                 ------------------------                                                 
                                                                                                                          
               Molecular Orbital No.   1:                                                                                 
               --------------------------                                                                                 
               Occupation: 2.000 Energy:  -20.23340 a.u.                                                                  
               (   1 O   1s  :     0.99)                                                                                  
                                                                                                                          
               Molecular Orbital No.   2:                                                                                 
               --------------------------                                                                                 
               Occupation: 2.000 Energy:   -1.26571 a.u.                                                                  
               (   1 O   1s  :    -0.23) (   1 O   2s  :     0.83) (   2 H   1s  :     0.16)                              
               (   3 H   1s  :     0.16)                                                                                  
                                                                                                                          
               Molecular Orbital No.   3:                                                                                 
               --------------------------                                                                                 
               Occupation: 2.000 Energy:   -0.62927 a.u.                                                                  
               (   1 O   1p+1:     0.35) (   1 O   1p0 :    -0.49) (   2 H   1s  :    -0.44)                              
               (   3 H   1s  :     0.44)                                                                                  
                                                                                                                          
               Molecular Orbital No.   4:                                                                                 
               --------------------------                                                                                 
               Occupation: 2.000 Energy:   -0.44167 a.u.                                                                  
               (   1 O   2s  :    -0.52) (   1 O   1p+1:     0.65) (   1 O   1p0 :     0.46)                              
               (   2 H   1s  :     0.27) (   3 H   1s  :     0.27)                                                        
                                                                                                                          
               Molecular Orbital No.   5:                                                                                 
               --------------------------                                                                                 
               Occupation: 2.000 Energy:   -0.38765 a.u.                                                                  
               (   1 O   1p-1:     1.00)                                                                                  
                                                                                                                          
               Molecular Orbital No.   6:                                                                                 
               --------------------------                                                                                 
               Occupation: 0.000 Energy:    0.60284 a.u.                                                                  
               (   1 O   2s  :     0.91) (   1 O   1p+1:     0.58) (   1 O   1p0 :     0.41)                              
               (   2 H   1s  :    -0.81) (   3 H   1s  :    -0.81)                                                        
                                                                                                                          
               Molecular Orbital No.   7:                                                                                 
               --------------------------                                                                                 
               Occupation: 0.000 Energy:    0.76592 a.u.                                                                  
               (   1 O   1p+1:    -0.58) (   1 O   1p0 :     0.82) (   2 H   1s  :    -0.84)                              
               (   3 H   1s  :     0.84)                                                                                  
                                                                                                                          
                                                                                                                          
                                                   SCF Gradient Driver                                                    
                                                  =====================                                                   
                                                                                                                          
                                              Molecular Geometry (Angstroms)                                              
                                             ================================                                             
                                                                                                                          
                          Atom         Coordinate X          Coordinate Y          Coordinate Z                           
                                                                                                                          
                           O           0.000000000000        0.000000000000        0.000000000000                         
                           H           0.000000000000        0.000000000000        0.950000000000                         
                           H           0.896000000000        0.000000000000       -0.317000000000                         
                                                                                                                          
                                                 Gradient (Hartree/Bohr)                                                  
                                                -------------------------                                                 
                                                                                                                          
                          Atom           Gradient X            Gradient Y            Gradient Z                           
                                                                                                                          
                           O           0.071311074571        0.000000000043        0.051082772849                         
                           H          -0.022188926188       -0.000000000021       -0.044935563920                         
                           H          -0.049122148056        0.000000000014       -0.006147199315                         
                                                                                                                          
                                   *** Time spent in gradient calculation: 1.05 sec ***                                   
                                                                                                                          

Now, write your own routine to run one gradient descent iteration:

def gradient_descent_iteration(coordinates, gradient, step):
    ...
    return new_coordinates
def gradient_descent_iteration(coordinates, gradient, step):
    new_coordinates = coordinates - step * gradient
    return new_coordinates

And the routine that runs the optimization:

def gradient_descent(molecule, basis, scf_driver, gradient_driver,
                     step=0.1, threshold=1e-3, max_iter=10):

    # set ostream state to False, to avoid printout from every new scf calculation
    ostream_state = scf_driver.ostream.state
    scf_driver.ostream.state = False
    gradient_driver.ostream.state = False
    
    iteration = 0
    grad_norm = 100
    # atom labels (symbols)
    labels = molecule.get_labels()
    # initial atomc coordinates
    old_coords = molecule.get_coordinates()
    scf_driver.compute(molecule, basis, None)
    old_energy = scf_driver.get_scf_energy()
    grad_driver.compute(molecule, basis)
    old_gradient = gradient_driver.gradient
    
    print("Starting gradient descent:\n")
    text = "Iteration     Energy (H)      "
    text += "Max. displacement (bohr)    Gradient norm (H/bohr)"
    print(text)

    energies = [old_energy]
    iterations = [0]

    while (grad_norm >= threshold) and (iteration <= max_iter):
        ...
        ...
        ...

        
    if iteration <= max_iter:
        text = "\n   *** Gradient Descent converged in "
        text += "%d iteration(s). *** " % iteration
        print(text)
        return new_mol, iterations, energies
    else:
        print("\n   !!! Gradient Descent did not converge  !!! ")
        return None, None, None

    # Set the ostream state to initial value
    scf_driver.ostream.state = ostream_state
    gradient_driver.ostream.state = ostream_state
def gradient_descent(molecule, basis, scf_driver, gradient_driver,
                     step=0.1, threshold=1e-3, max_iter=10):
    # set ostream state to False, to avoid printout from every new scf calculation
    ostream_state = scf_driver.ostream.state
    scf_driver.ostream.state = False
    gradient_driver.ostream.state = False
    
    iteration = 0
    grad_norm = 100
    # atom labels (symbols)
    labels = molecule.get_labels()
    # initial atomc coordinates
    old_coords = molecule.get_coordinates()
    scf_driver.compute(molecule, basis, None)
    old_energy = scf_driver.get_scf_energy()
    grad_driver.compute(molecule, basis)
    old_gradient = gradient_driver.gradient
    
    print("Starting gradient descent:\n")
    text = "Iteration     Energy (H)      "
    text += "Max. displacement (bohr)    Gradient norm (H/bohr)"
    print(text)
    energies = [old_energy]
    iterations = [0]
    while (grad_norm >= threshold) and (iteration <= max_iter):
        coords = gradient_descent_iteration(old_coords, old_gradient, step)
        
        # calculate the energy and gradient corresponding to the new coordinates
        new_mol = vlx.molecule.Molecule(labels, coords, units='au')
        scf_driver.compute(new_mol, basis, None)
        energy = scf_driver.get_scf_energy()
        energies.append(energy)
        gradient_driver.compute(new_mol, basis)
        gradient = gradient_driver.gradient
        grad_norm = np.linalg.norm(gradient)
        displacement = old_coords - coords
        max_disp = np.amax(abs(displacement))
        
        # calculate energy difference
        delta_e = abs(energy - old_energy)
        text = "   %3d.  %15.7f      " % (iteration, energy, )
        text += "%15.7f          %15.7f" % (max_disp, grad_norm)
        print(text)
        
        # save 
        old_energy = energy
        old_gradient = gradient
        old_coords = coords
        iteration += 1
        iterations.append(iteration)
        
    if iteration <= max_iter:
        text = "\n   *** Gradient Descent converged in "
        text += "%d iteration(s). *** " % iteration
        print(text)
        return new_mol, iterations, energies
    else:
        print("\n   !!! Gradient Descent did not converge  !!! ")
        return None, None, None
    scf_driver.ostream.state = ostream_state
    gradient_driver.ostream.state = ostream_state
opt_mol, gd_iterations, gd_energies = gradient_descent(molecule, basis,
                                                       scfdrv, grad_driver,
                                                       threshold=1e-2,
                                                       max_iter=100)
Starting gradient descent:

Iteration     Energy (H)      Max. displacement (bohr)    Gradient norm (H/bohr)
     0.      -74.9605109            0.0071311                0.0994035
     1.      -74.9614427            0.0063668                0.0881487
     2.      -74.9621766            0.0056990                0.0784445
     3.      -74.9627589            0.0051136                0.0700529
     4.      -74.9632239            0.0045989                0.0627797
     5.      -74.9635981            0.0041452                0.0564644
     6.      -74.9639013            0.0037442                0.0509731
     7.      -74.9641488            0.0033892                0.0461931
     8.      -74.9643524            0.0030741                0.0420288
     9.      -74.9645214            0.0027940                0.0383981
    10.      -74.9646627            0.0025445                0.0352305
    11.      -74.9647818            0.0023220                0.0324649
    12.      -74.9648833            0.0021232                0.0300478
    13.      -74.9649703            0.0019453                0.0279327
    14.      -74.9650457            0.0017860                0.0260789
    15.      -74.9651116            0.0016430                0.0244504
    16.      -74.9651696            0.0015146                0.0230160
    17.      -74.9652211            0.0013991                0.0217483
    18.      -74.9652671            0.0012950                0.0206236
    19.      -74.9653086            0.0012012                0.0196211
    20.      -74.9653462            0.0011164                0.0187230
    21.      -74.9653805            0.0010397                0.0179140
    22.      -74.9654119            0.0009703                0.0171810
    23.      -74.9654409            0.0009260                0.0165129
    24.      -74.9654676            0.0009062                0.0159001
    25.      -74.9654924            0.0008862                0.0153348
    26.      -74.9655155            0.0008664                0.0148103
    27.      -74.9655371            0.0008466                0.0143208
    28.      -74.9655573            0.0008270                0.0138617
    29.      -74.9655762            0.0008075                0.0134291
    30.      -74.9655940            0.0007883                0.0130196
    31.      -74.9656107            0.0007693                0.0126305
    32.      -74.9656264            0.0007506                0.0122596
    33.      -74.9656412            0.0007322                0.0119049
    34.      -74.9656552            0.0007141                0.0115647
    35.      -74.9656683            0.0006963                0.0112378
    36.      -74.9656808            0.0006788                0.0109230
    37.      -74.9656926            0.0006617                0.0106194
    38.      -74.9657037            0.0006449                0.0103261
    39.      -74.9657142            0.0006284                0.0100423
    40.      -74.9657241            0.0006123                0.0097677

   *** Gradient Descent converged in 41 iteration(s). *** 
plt.figure(figsize=(7,4))

plt.plot(gd_iterations, gd_energies,'o-', label='Gradient descent')

plt.axis(xmin=-2, xmax=43)

plt.xlabel('Iteration')
plt.ylabel('Energy (H)')
plt.title("Gradient descent convergence")
plt.legend()
plt.tight_layout(); plt.show()
../../_images/opt_struct_10_0.png

Conjugate gradient#

An improved method over the gradient-descent approach is to use the “gradient history” (steps \(i\) and \(i-1\)) to determine the coordinates at step \(i+1\):

(37)#\[\begin{equation} \mathbf{x}_{i+1} = \mathbf{x}_i - k_i \mathbf{h}_i\,,\nonumber \end{equation}\]

with \(\mathbf{h}_i = \nabla E (\mathbf{x}_i)+\gamma_i\mathbf{h}_{i-1}\). The function \(\gamma_i\) contains gradient information from steps \(i\) and \(i-1\) and can be defined in different ways. For example, in the Fletcher-Reeves conjugate gradient method:

(38)#\[\begin{equation} \gamma_i = \frac{|\nabla E(\mathbf{x}_i)|^2}{|\nabla E(\mathbf{x}_{i-1})|^2}\,. \nonumber \end{equation}\]

Implementation#

Implement the conjugate gradient optimization algorithm and compare its performance to the gradient descent method.

def conjugate_gradient_iteration(coordinates, h, step):
    ...
    return new_coordinates
def conjugate_gradient_iteration(coordinates, h, step):
    new_coordinates = coordinates - step * h
    return new_coordinates
def conjugate_gradient(molecule, basis, scf_driver, gradient_driver,
                       step=0.1, threshold=1e-3, max_iter=10):
    # set ostream state to False, to avoid printout from every new scf calculation
    ostream_state = scf_driver.ostream.state
    scf_driver.ostream.state = False
    gradient_driver.ostream.state = False
    
    iteration = 0
    grad_norm = 100
    # atom labels (symbols)
    labels = molecule.get_labels()
    # initial atomc coordinates
    old_coords = molecule.get_coordinates()
    scf_driver.compute(molecule, basis, None)
    old_energy = scf_driver.get_scf_energy()
    grad_driver.compute(molecule, basis)
    old_gradient = gradient_driver.gradient
    old_h = old_gradient
    
    print("Starting gradient descent:\n")
    text = "Iteration     Energy (H)      Max. displacement (bohr)"
    text += "Gradient norm (H/bohr)      gamma"
    print(text)
    energies = [old_energy]
    iterations = [0]
    while (grad_norm >= threshold) and (iteration <= max_iter):
        
        ...
        ...
        ...
        
        gamma = grad_norm**2 / old_grad_norm**2
        
        text = "   %3d.  %15.7f      " % (iteration, energy)
        text += "%15.7f          %15.7f" % (max_disp, grad_norm)
        text += "         %15.7f" % (gamma)

        h = gradient + gamma * old_h
        
        # save 
        old_energy = energy
        old_gradient = gradient
        old_coords = coords
        old_h = h
        iteration += 1
        iterations.append(iteration)
        
    if iteration <= max_iter:
        text = "\n   *** Conjugate Gradient converged in "
        text += "%d iteration(s). *** " % iteration
        print(text)
        return new_mol, iterations, energies
    else:
        print("\n   !!! Conjugate Gradient did not converge  !!! ")
        return None, None, None

    # restore ostream state to its original value
    scf_driver.ostream.state = ostream_state
    gradient_driver.ostream.state = ostream_state
def conjugate_gradient(molecule, basis, scf_driver, gradient_driver,
                       step=0.1, threshold=1e-3, max_iter=10):
    # set ostream state to False, to avoid printout from every new scf calculation
    ostream_state = scf_driver.ostream.state
    scf_driver.ostream.state = False
    gradient_driver.ostream.state = False
    
    iteration = 0
    grad_norm = 100
    # atom labels (symbols)
    labels = molecule.get_labels()
    # initial atomc coordinates
    old_coords = molecule.get_coordinates()
    scf_driver.compute(molecule, basis, None)
    old_energy = scf_driver.get_scf_energy()
    grad_driver.compute(molecule, basis)
    old_gradient = gradient_driver.gradient
    old_h = old_gradient
    
    print("Starting gradient descent:\n")
    text = "Iteration     Energy (H)      Max. displacement (bohr)"
    text += "     Gradient norm (H/bohr)      gamma"
    print(text)
    energies = [old_energy]
    iterations = [0]
    while (grad_norm >= threshold) and (iteration <= max_iter):
        
        
        coords = conjugate_gradient_iteration(old_coords, old_h, step)
        
        # calculate the energy and gradient corresponding to the new coordinates
        new_mol = vlx.molecule.Molecule(labels, coords, units='au')
        scf_driver.compute(new_mol, basis, None)
        energy = scf_driver.get_scf_energy()
        gradient_driver.compute(new_mol, basis)
        gradient = gradient_driver.gradient
        grad_norm = np.linalg.norm(gradient)
        old_grad_norm = np.linalg.norm(old_gradient)
        displacement = old_coords - coords
        max_disp = np.amax(abs(displacement))
        
        energies.append(energy)
        
        gamma = grad_norm**2 / old_grad_norm**2
        
        text = "   %3d.  %15.7f      " % (iteration, energy)
        text += "%15.7f          %15.7f" % (max_disp, grad_norm)
        text += "         %15.7f" % (gamma)
        print(text)

        h = gradient + gamma * old_h
        
        # save 
        old_energy = energy
        old_gradient = gradient
        old_coords = coords
        old_h = h
        iteration += 1
        iterations.append(iteration)
        
    if iteration <= max_iter:
        text = "\n   *** Conjugate Gradient converged in "
        text += "%d iteration(s). *** " % iteration
        print(text)
        return new_mol, iterations, energies
    else:
        print("\n   !!! Conjugate Gradient did not converge  !!! ")
        return None, None, None
    scf_driver.ostream.state = ostream_state
    gradient_driver.ostream.state = ostream_state
cg_opt_mol, cg_iterations, cg_energies = conjugate_gradient(molecule, basis,
                                                            scfdrv, grad_driver,
                                                            threshold=1e-2,
                                                            max_iter=50)
Starting gradient descent:

Iteration     Energy (H)      Max. displacement (bohr)     Gradient norm (H/bohr)      gamma
     0.      -74.9605109            0.0071311                0.0994035               0.7806759
     1.      -74.9621708            0.0119338                0.0783209               0.6208010
     2.      -74.9634822            0.0125176                0.0576486               0.5417781
     3.      -74.9642742            0.0106076                0.0418381               0.5267046
     4.      -74.9647155            0.0083648                0.0311623               0.5547716
     5.      -74.9649702            0.0066283                0.0244842               0.6173223
     6.      -74.9651335            0.0054866                0.0206065               0.7083367
     7.      -74.9652543            0.0048212                0.0185662               0.8117767
     8.      -74.9653573            0.0044747                0.0176220               0.9008707
     9.      -74.9654549            0.0044303                0.0171710               0.9494769
    10.      -74.9655522            0.0051889                0.0166759               0.9431609
    11.      -74.9656477            0.0057511                0.0156704               0.8830443
    12.      -74.9657341            0.0057875                0.0138841               0.7850059
    13.      -74.9658021            0.0050952                0.0114119               0.6755822
    14.      -74.9658474            0.0038490                0.0087073               0.5821731

   *** Conjugate Gradient converged in 15 iteration(s). *** 
plt.figure(figsize=(7,4))

plt.plot(gd_iterations, gd_energies,'o-', label='Gradient descent')
plt.plot(cg_iterations, cg_energies,'o-', label='Conjugate gradient')

# darw a line which shows the energy obtained using gradient descent 
plt.plot([-2,43], [gd_energies[-1], gd_energies[-1]], '--', color='gray')
plt.axis(xmin=-2, xmax=43)

plt.xlabel('Iteration')
plt.ylabel('Energy (H)')
plt.title("Conjugate gradient vs. gradient descent")
plt.legend()
plt.tight_layout(); plt.show()
../../_images/opt_struct_17_0.png

Newton–Raphson and quasi-Newton#

The next step in the hierarchy of minimization methods is to use both the first and second order energy derivatives (i.e. gradient \( \nabla E\) and Hessian \(\mathbf{H}\)) in determining the next step in conformation space. This is based on a quadratic approximation for the local shape of the PES:

(39)#\[\begin{equation} E(\mathbf{x}+\Delta\mathbf{x}) \approx E(\mathbf{x}) + \nabla E(\mathbf{x})\Delta\mathbf{x} + \frac{1}{2}\Delta\mathbf{x}^\mathrm{T}\mathbf{H}\Delta\mathbf{x} \label{eq:quadratic_approx_PES} \,. \nonumber \end{equation}\]

Here, \(\Delta \mathbf{x}\) is the Newton step used to update the coordinates:

(40)#\[\begin{align} \Delta \mathbf{x} &= -\mathbf{H}^{-1}\nabla E(\mathbf{x})\label{eq:Newton_step}\,,\nonumber \\ \mathbf{x}_\mathrm{i+1} &= \mathbf{x}_\mathrm{i}+\Delta \mathbf{x}\,. \nonumber \end{align}\]

When redundant internal coordinates are used, it is important to ensure that the displacements are only performed in the non-redundant region of the internal coordinate space. This is achieved by applying a projector \(\mathbf{P}=\mathbf{G}^{-}\mathbf{G}\) to the gradient and Hessian before constructing the Newton step [Neal09]:

(41)#\[\begin{align} \tilde{\mathbf{g}}_q &= \mathbf{P}\nabla E(\mathbf{q}) \, , \nonumber \\ \tilde{\mathbf{H}}_q &= \mathbf{P}\mathbf{H}_q\mathbf{P}+\alpha(1-\mathbf{P}) \nonumber \,, \end{align}\]

where \(\alpha\) is an arbitrary large value (e.g. 1000).

As evident from the equations above, the second order derivatives of the energy with respect to nuclear displacements are required to generate the Newton step. The direct computation of these derivatives (which compose the Hessian matrix) is quite expensive, but good \textit{approximations} for the Hessian can be constructed using the gradient history. For example, the Broyden-Fletcher-Goldfarb-Shanno (BFGS, used by geomeTRIC) approach uses the relation:

(42)#\[\begin{align} \mathbf{H}_{i+1} &= \mathbf{H}_i + \frac{\mathbf{g}^{}_i\mathbf{g}_i^\mathrm{T}}{\mathbf{g}_i^\mathrm{T}\mathbf{s}^{}_i} - \frac{\mathbf{H}_i\mathbf{s}_i\left(\mathbf{H}_i\mathbf{s}_i\right)_{}^\mathrm{T}}{\mathbf{s}_i^\mathrm{T}\mathbf{H}_i^{}\mathbf{s}_i^{}} \, ,\nonumber \end{align}\]

with,

(43)#\[\begin{align} \mathbf{g}_i &= \nabla E(\mathbf{x}_{i+1}) - \nabla E(\mathbf{x}_i) \, \nonumber \\ \mathbf{s}_i &= \mathbf{x}_{i+1} - \mathbf{x}_{i} \nonumber \,, \end{align}\]

to update the Hessian at step \(i+1\) using the Hessian at step \(i\) and information about the gradient at the current and previous step.

Quasi-Newton methods, i.e., methods that use approximate Hessians, achieve a very quick convergence at the same computational cost as the gradient-descent method.

../../_images/geom_opt_flowchart.svg

Comparison to gradient descent and conjugate gradient#

To compare the gradient descent and conjugate gradient algorithms with the quasi-Newton method, we will use the quasi-Newton implementation from geomeTRIC.

# Let's run the SCF and gradient again, to make sure we start from the same point:
scfdrv.ostream.state = False
grad_driver.ostream.state = False

scfdrv.compute(molecule, basis)
grad_driver.compute(molecule, basis)

scfdrv.ostream.state = True
grad_driver.ostream.state = True
# Optimization settings and driver:
optimization_settings = {'coordsys' : 'tric'}
# 'tric': TRIC, default
# 'cart': Cartesian
# 'prim': Primitive Internal Coordinates
# 'dlc': Delocalized Internal Coordinates
# 'hdlc': Hybrid Delocalized Internal Coordinates

opt_drv = vlx.OptimizationDriver(grad_driver)
opt_drv.update_settings(opt_dict=optimization_settings)

opt_mol = opt_drv.compute(molecule, basis)
                                                                                                                          
                                                Optimization Driver Setup                                                 
                                               ===========================                                                
                                                                                                                          
                                     Coordinate System       :    TRIC                                                    
                                     Constraints             :    No                                                      
                                     Max. Number of Steps    :    300                                                     
                                     Transition State        :    No                                                      
                                     Hessian                 :    never                                                   
                                                                                                                          
* Info * Computing energy and gradient...                                                                                 
* Info *   Energy   : -74.9593193103 a.u.                                                                                 
* Info *   Gradient : 6.495395e-02 a.u. (RMS)                                                                             
* Info *              8.771955e-02 a.u. (Max)                                                                             
* Info *   Time     : 1.29 sec                                                                                            
                                                                                                                          
* Info * Computing energy and gradient...                                                                                 
* Info *   Energy   : -74.9648452542 a.u.                                                                                 
* Info *   Gradient : 1.544611e-02 a.u. (RMS)                                                                             
* Info *              1.844116e-02 a.u. (Max)                                                                             
* Info *   Time     : 1.34 sec                                                                                            
                                                                                                                          
* Info * Computing energy and gradient...                                                                                 
* Info *   Energy   : -74.9657830293 a.u.                                                                                 
* Info *   Gradient : 6.194312e-03 a.u. (RMS)                                                                             
* Info *              7.272241e-03 a.u. (Max)                                                                             
* Info *   Time     : 1.33 sec                                                                                            
                                                                                                                          
* Info * Computing energy and gradient...                                                                                 
* Info *   Energy   : -74.9659001353 a.u.                                                                                 
* Info *   Gradient : 9.009561e-04 a.u. (RMS)                                                                             
* Info *              1.171116e-03 a.u. (Max)                                                                             
* Info *   Time     : 1.34 sec                                                                                            
                                                                                                                          
* Info * Computing energy and gradient...                                                                                 
* Info *   Energy   : -74.9659012109 a.u.                                                                                 
* Info *   Gradient : 6.320210e-05 a.u. (RMS)                                                                             
* Info *              8.691344e-05 a.u. (Max)                                                                             
* Info *   Time     : 1.33 sec                                                                                            
                                                                                                                          
* Info * Computing energy and gradient...                                                                                 
* Info *   Energy   : -74.9659012173 a.u.                                                                                 
* Info *   Gradient : 3.105703e-07 a.u. (RMS)                                                                             
* Info *              4.012037e-07 a.u. (Max)                                                                             
* Info *   Time     : 1.33 sec                                                                                            
                                                                                                                          
* Info * Geometry optimization completed.                                                                                 
                                                                                                                          
                                              Molecular Geometry (Angstroms)                                              
                                             ================================                                             
                                                                                                                          
                          Atom         Coordinate X          Coordinate Y          Coordinate Z                           
                                                                                                                          
                           O          -0.047424121270       -0.000000000023       -0.033716004730                         
                           H           0.034041955287       -0.000000001714        0.952334284218                         
                           H           0.909380964026        0.000000002182       -0.285619981419                         
                                                                                                                          
                                                                                                                          
                                             Summary of Geometry Optimization                                             
                                            ==================================                                            
                                                                                                                          
                  Opt.Step       Energy (a.u.)       Energy Change (a.u.)       Displacement (RMS, Max)                   
                  -------------------------------------------------------------------------------------                   
                      0         -74.959319310327        0.000000000000         0.000e+00      0.000e+00                   
                      1         -74.964845254234       -0.005525943907         6.683e-02      7.980e-02                   
                      2         -74.965783029341       -0.000937775107         3.382e-02      3.774e-02                   
                      3         -74.965900135283       -0.000117105942         8.081e-03      9.810e-03                   
                      4         -74.965901210888       -0.000001075605         4.828e-04      6.814e-04                   
                      5         -74.965901217298       -0.000000006410         4.323e-05      5.698e-05                   
                                                                                                                          
                                                                                                                          
                                              Statistical Deviation between                                               
                                         Optimized Geometry and Initial Geometry                                          
                                        =========================================                                         
                                                                                                                          
                               Internal Coord.        RMS deviation         Max. deviation                                
                               -----------------------------------------------------------                                
                                  Bonds               0.039 Angstrom        0.039 Angstrom                                
                                  Angles              9.457 degree          9.457 degree                                  
                                                                                                                          
                                     *** Time spent in Optimization Driver: 8.04 sec                                      
                                                                                                                          
tric_energies = [-74.959319310327, -74.964845254243, -74.965783029341, -74.965900135283]
tric_iterations = [0, 1, 2, 3]
plt.figure(figsize=(7,4))

plt.plot(gd_iterations, gd_energies,'o-', label='Gradient descent')
plt.plot(cg_iterations, cg_energies,'o-', label='Conjugate gradient')
plt.plot(tric_iterations, tric_energies,'o-', label='TRIC quasi-Newton')

# darw a line which shows the energy obtained using gradient descent 
plt.plot([-2,43], [gd_energies[-1], gd_energies[-1]], '--', color='gray')
plt.axis(xmin=-2, xmax=43)

plt.xlabel('Iteration')
plt.ylabel('Energy (H)')
plt.title("Gradient descent vs. conjugate gradient vs. quasi-Newton")
plt.legend()
plt.tight_layout(); plt.show()
../../_images/opt_struct_23_0.png

Note

GeomeTRIC offers different options for the type of coordinates used in the optimization: Cartesian, delocalized internal coordinates (DLC), hybrid delocalized internal coordinates (HDLC), as well as translation-rotation internal coordinates (TRIC) . How does the choice of coordinates influence the optimization process?