# Energy gradients#

In order to perform a geometry optimization and find a minimum energy structure or a transition state,
one needs to calculate the derivative of the energy \(E\) with respect to the **nuclear coordinates**.
This is usually referred to as the *molecular gradient* and it needs to vanish at any stationary geometry.
The second derivatives of the energy with respect to nuclear displacement (the molecular **Hessian**) can be used to characterize
the stationary structure, i.e., to confirm whether a true minimum or a transition state has been found.
But not only molecular gradients and Hessians can be calculated as derivatives of the energy,
also other properties such as permanent and induced (dipole) moments, polarizabilities, and magnetizabilites when taking the derivative with respect to external **electromagnetic field**, or NMR and EPR parameters when additionally **nuclear magnetic moments** are involved [Jen06].

The energy derivatives themselves can either be calculated **numerically** by using finite differences
or **analytically**. The former is usually simple to implement, but suffers from difficulties
in numerical accuracy and computational efficiency.
For the latter, considerable programming effort is required, but it has the advantages of greater speed, precision, and convenience.

## Numerical Gradients#

The simplest method to calculate the derivative of the energy \(E(x)\) with respect to some parameter \(x\) is to use finite difference approximations. Choosing a small change \(x_0\) in \(x\), a two-point estimation is given by

which is known as a first-order **divided difference** and its error to the true derivative is approximately proportional to \(x_0\).
The true derivative of \(E\) at \(x\) is given by the limit

Another two-point formula is the **symmetric difference quotient** given by

where it can be shown that the first-order error cancels, so it is approximately proportional to \(x_0^2\), which means that for small \(x_0\) this is a more accurate approximation to the true derivative and it is commonly used in numerical derivative codes. VeloxChem uses a default value of \(0.001~a_0\). However, in both cases two calculations of the energy \(E(x)\) need to be performed in order to obtain the derivative with respect to one variable \(x\).

There are also higher-order methods for approximating the first derivative, such as the “five-point method” given by

where the error is approximately proportional to \(x_0^4\) and it hence gives a very accurate approximation to the gradient. However, since four individual energy calculations need to be performed for each perturbation, this expression is usually only employed for debugging of the analytical derivative expressions.

## Analytical Gradients#

To determine the energy gradient analytically, we first need to identify the non-variational components of the energy functional. In the case of self-consistent field (SCF) methods, i.e., HF and Kohn–Sham DFT, these are the molecular-orbital (MO) coefficients, which exhibit an implicit dependence on the nuclear coordinates when atom-centered basis functions such as Gaussian or Slater-type atomic orbitals (AOs) are used. Considering this implicit dependence, the total energy derivative with respect to a particular nuclear coordinate \(x\) is obtained via the chain rule [LWK05, Reh15]

Here, \(\mathbf{C}\) is the MO coefficient matrix, which transforms from a set of AOs \(\{\chi_\mu\}\) to a set of MOs \(\{\phi_p\}\) via

The first term, \(\partial E/\partial x\), is the Hellmann–Feynman contribution which describes the explicit dependence of the energy on the nuclear coordinate \(x\) through the nuclear-electron and nuclear-nuclear interaction terms of the Hamiltonian [HJ88, Lev05]. The second term stems from the implicit dependence of the energy on \(x\) due to the fact that the molecular orbitals are expanded in a finite atomic-centered basis set [HJ88]

It may seem at first surprising that the derivative \({\partial E}/{\partial\mathbf{C}}\) has to be computed. If the MO coefficients are obtained variationally for a specified molecular geometry, how is it that this derivative is not zero? The key to this conundrum lies in the phrase “for a specified molecular geometry”. Since the SCF energy and density are constructed using a constrained LCAO parametrization, if we perform a nuclear displacement, the “old” MO coefficients no longer correspond to the minimum energy and must be re-optimized. Thus the partial derivative with respect to the MO coefficients, as well as the derivative of the MO coefficients with respect to \(x\) are required. The explicit computation of the latter is complicated, but can be avoided by making use of a new functional, the Lagrangian, for which the partial derivative \(\partial L / \partial \mathbf{C}\) is zero and constrained to the HF/DFT configuration space by construction [HJO14, LWK05]

where \(\boldsymbol{\Lambda}\) are a set of undetermined Lagrange multipliers and \(f_c(\mathbf{C})=0\) define the constraints for the non-variational parameters \(\mathbf{C}\). These constraints ensure that we are moving only in the HF/DFT configuration space, rather than in the infinite space of all orthogonally equivalent combinations of orbital bases [HJ88].

By using the Lagrangian, we have shifted the difficult problem of computing \({\mathrm{d} \mathbf{C}}/{\mathrm{d}x}\) to the much simpler problem of determining the unknown Lagrange multipliers which satisfy \(\partial L / \partial \mathbf{C}=0\). Equations for these are derived by imposing that the explicit form of \(\partial L / \partial \mathbf{C}\) is zero [HJO14]. Once the Lagrange multipliers have been obtained, the total derivative of the energy functional with respect to the nuclear coordinate \(x\) can be computed as

### Ground state#

#### HF#

As an example, we will derive in the following the analytical expression for the Hartree–Fock (HF) energy gradient. The electronic ground-state energy described at the HF level of theory can be written as

where \(F_{pq}\) are Fock matrix elements, \(\langle pq || rs \rangle\) are anti-symmetrized two-electron integrals in physicists’ (“1212”) notation [SO12]. As usual, the indices \(i,j,\ldots\) denote occupied molecular orbitals, \(a,b,\ldots\) denote unoccupied (virtual) ones, and \(p,q,\ldots\) stand for both occupied and virtual, while Greek letters \(\mu,\nu,\ldots\) are used for AO indices.

The Lagrangian for this energy functional is constructed as follows

with \(\boldsymbol{\Lambda}=\{\lambda_{pq}\}\) and \(\boldsymbol{\Omega}=\{\omega_{pq}\}\) being the Lagrange multipliers, \(\mathbf{F}=\{F_{pq}\}\) the Fock matrix, \(\{\epsilon_p\}\) the HF orbital energies, and \(\mathbf{S}=\{S_{pq}\}\) the overlap matrix. Here, the conditions for the Lagrange multipliers \(\{\lambda_{pq}\}\) and \(\{\omega_{pq}\}\) ensure that the Fock matrix is diagonal and, respectively, the overlap matrix is unity for the HF state.

To calculate the total derivative of the energy with respect to \(x\) we must now only calculate the partial derivative of the Lagrangian with respect to the same variable

We want to re-write the above derivative of the Lagrangian in terms of effective density matrices. For this purpose, we express the energy in terms of the one- and two-particle density matrices, \(\boldsymbol{\gamma} = \{\gamma_{pq}\}\) and \(\boldsymbol{\Gamma} = \{\Gamma_{pqrs}\}\), respectively

With this definition, the above equation becomes

where \(h_{pq}\) represents a matrix element of the core-Hamiltonian operator. The superscript \((\xi)\) indicates a partial derivative with respect to variable \(x\), i.e., with fixed MO coefficients \(\mathbf{C}\). Explicitly, they are given by [LWK05]

We also made use of the definition of the Fock matrix [SO12]

This equation is the working equation to compute the partial derivative of the Lagrangian with respect to variable \(x\), and implicitly the equation for the total derivative of the energy with respect to the same variable. Two ingredients are required to calculate \(\partial L/\partial x\): (1) the derivatives of the core-Hamiltonian matrix, of the anti-symmetrized two-electron integrals, and of the overlap matrix, and (2) finding the density matrices \(\boldsymbol{\gamma}\) and \(\boldsymbol{\Gamma}\), as well as the Lagrange multipliers \(\boldsymbol{\Lambda}\) and \(\boldsymbol{\Omega}\).

By comparing this equation to the HF energy expression, we can immediately identify the non-vanishing blocks of the density matrices

Equations for the \(\{\lambda_{pq}\}\) and \(\{\omega_{pq}\}\) multipliers are obtained by imposing the Lagrangian to be stationary with respect to the orbital transformation matrix \(\{C_{\mu t}\}\)

or the programmable version [LWK05, RD19]

To calculate the partial derivative of the Lagrangian with respect to \(C_{\mu t}\), we will need the following three expressions

where we have used the definitions of the Fock matrix, two-electron integrals and overlap matrix in terms of the orbital transformation matrix \(\{C_{\mu p}\}\) – see here. The Kronecker delta \(\delta_{t\epsilon_\mathrm{o}}\) is equal to one if \(t\) is an occupied orbital and is zero otherwise.

Using the Lagrangian expressed in terms of the density matrices:

the partial derivative of the Lagrangian with respect to \(\mathbf{C}\) can be written as:

By using the conditions \(F_{pq}=\epsilon_p\delta_{pq}\) and \(S_{pq}=\delta_{pq}\), the above equation becomes:

where we have used \(\gamma_{pq}=\gamma_{qp}\), \(\braket{pu||qt}=\braket{qt||pu}\), \(\Gamma_{pqrs}=\Gamma_{qpsr}=\Gamma_{srpq}\) (real orbitals), and we have imposed that \(\lambda_{pq}=\lambda_{qp}\), and \(\omega_{pq}=\omega_{qp}\) (symmetric representation). Some of the indices have been renamed.

To obtain equations for the orbital response Lagrange multipliers, we first have to decouple \(\boldsymbol{\Lambda}\) from \(\boldsymbol{\Omega}\) by taking the difference

The system of equations for \(\boldsymbol{\Lambda}\) is then obtained by choosing \(u\) and \(t\) from different orbital spaces in the following equation

Once \(\boldsymbol{\Lambda}\) is determined, \(\boldsymbol{\Omega}\) can be calculated in a similar way, using the following equation

If we explicitly write the equations for different blocks of \(\boldsymbol{\Lambda}\), we find that they are all zero. This simplifies the equation for \(\boldsymbol{\Omega}\) to

The only non-zero block of \(\boldsymbol{\Omega}\) is the occupied-occupied block

Using the expressions for the density matrices and the non-zero Lagrange multipliers, we get the following expression for the electronic HF energy derivative:

Finally, the derivative of the total energy is obtained by adding the trivial contribution from the nuclear repulsion energy term [SO12] \(\mathrm{d} V_{nn}/\mathrm{d} x\).

#### DFT#

The DFT gradient can be derived in a similar way, (partially) replacing the exact exchange integrals with the corresponding exchange-correlation (xc) functional contributions. Here, we only give equations for the simplest case of the local density approximation (LDA), but the approach is easily generalizable to other types of xc functionals. Instead of using the Kohn–Sham (KS) matrix in the energy expression, we employ the core Hamiltonian

where \(0 \leq c_{\text{x}} < 1\) is the fraction of exact exchange in hybrid functionals and \(E_{\text{xc}}\) is the exchange-correlation energy contribution, which is often written in the form

Setting up the Lagrangian in the same way as for HF with \(E_{\text{HF}}\) replaced by \(E_{\text{DFT}}\) yields the same results for the Lagrange multipliers. Thus, the only additional consideration we need to take into account is the partial derivative of \(E_{\text{xc}}\). For this, we consider the KS electron density

where \(\mathbf{D}\) is the AO density matrix given in terms of the (real) MO coefficients as \(D_{\mu \nu} = \sum_{i} C_{\mu i} C_{\nu i}\). The partial derivative of \(E_{\text{xc}}\) with respect to \(\xi\) can thus be written as

where the partial derivative of the density is given by

It should be noted that the exchange-correlation functional contribution to the DFT energy and its molecular gradient is evaluated via numerical integration. Thus, the molecular gradient includes grid point weight contributions, which arise from the explicit dependence of the grid partitioning function on the molecular geometry. Neglecting these contributions to the molecular gradient leads to the breakdown of rotation-translation invariance of the molecular gradient. Despite this, if a fine integration grid is used in practical calculations, the grid point weight contribution to the molecular gradient can safely be neglected.

#### MP2#

In the case of Møller–Plesset (MP) perturbation theory as well as coupled-cluster theory, the energy functional has additional non-variational parameters that have to be considered when computing the gradient. These are the so-called \(t\)-amplitudes \(\mathbf{T} = \{ t_{ijab} \}\), so the corresponding term which has to be determined is called amplitude response.

The analytic expression for the MP energy gradient is obtained in a very similar way as we have done for the SCF ground state. The difference is that the Lagrangian contains additional Lagrange multipliers and constraints for the \(t\)-amplitudes. After obtaining the corresponding amplitude response Lagrange multipliers, these additional contributions will be written in terms of one- and two-particle density matrices, exactly as the total energy. Let’s illustrate the procedure for the second-order MP perturbation theory (MP2). At this level of theory, the total energy functional can be written as

where

are the MP2 \(t\)-amplitudes.

The Lagrangian corresponding to this energy functional is

Here, \(\tilde{\mathbf{T}}=\{\tilde{t}_{ijab}\}\) are the amplitude response Lagrange multipliers and \(f_t(\mathbf{T})=0\) is the constraint. For MP2, this is

The amplitude response Lagrange multipliers are determined by imposing the Lagrangian to be stationary with respect to the \(t\)-amplitudes.

Replacing \(L\) and \(\mathbf{T}\) in the equation above with the corresponding MP2 expressions, we get

From the two equations above it follows that

From here, we can follow the same procedure as we did for the SCF gradient. We first rewrite the Lagrangian in terms of one- and two-particle density matrices

where we have written also the amplitude contribution in terms of one- and two-particle density matrices, \(\gamma^\mathrm{A}_{pq}\) and \(\Gamma^\mathrm{A}_{pqrs}\) respectively. Denoting \(\boldsymbol{\gamma}'=\boldsymbol{\gamma}+\boldsymbol{\gamma}^\mathrm{A}\) and \(\boldsymbol{\Gamma}'=\boldsymbol{\Gamma}+\boldsymbol{\Gamma}^\mathrm{A}\), the Lagrangian becomes

To be able to obtain the gradient, we now must identify the density matrices and then solve the orbital response equations. The density matrices corresponding to the HF contribution are the same as in the previous section. There are additional contributions from the MP2 energy correction, as well as the amplitude response terms. The MP2 energy contribution can be easily identified from the last term of the corresponding explicit expression and gives rise to the following two-particle density matrix:

The amplitude response (\(R^\mathrm{A}_\mathrm{MP2}\)) contributions are also reasonably easy to identify

resulting in the following density matrices

Combining all density matrices together and replacing the amplitude response Lagrange multipliers with the corresponding explicit expression, we have

Finally, to determine the orbital response Lagrange multipliers \(\boldsymbol{\Lambda}\), we insert these density matrices into the orbital response equation. The only non-zero block of \(\boldsymbol\Lambda\) is the occupied-virtual block and the HF density matrices cancel out, so the orbital response equation is

Once the \(\boldsymbol\Lambda\) multipliers are determined using an iterative technique, such as the conjugate gradient algorithm, the different blocks of the \(\boldsymbol\Omega\) multipliers can be computed using this equation. Explicitly

### Excited states#

The derivation of excited state gradients follows the same procedure as illustrated above for the ground state:

Identify the one- and two-particle density matrices that contribute to the energy,

Construct the Lagrangian with appropriate constraints,

If required by the theory level, determine the amplitude response Lagrange multipliers and construct the corresponding density matrices,

Set up and solve the \(\boldsymbol{\Lambda}\) orbital response equations,

Determine the \(\boldsymbol{\Omega}\) Lagrange multipliers,

Determine the energy gradient.

#### CIS#

To illustrate this procedure, we will take the configuration interaction singles (CIS) method [FHGP92] as an example, which is equivalent to linear-response time-dependent Hartree–Fock (TDHF) theory within the Tamm–Dancoff approximation [DHG05]. Note that for excitation energies and excited-state properties, CIS also yields the same results as the ADC(1) scheme [DW15]. The approach is then easily generalizable to TDHF and TDDFT.

In the CIS scheme, a Hermitian eigenvalue equation of the following form is solved

where \(\omega_n\) is the excitation energy for excited state \(n\) with corresponding eigenvector \(\mathbf{X}_n\) (normalized according to \(\mathbf{X}_n^\dagger \mathbf{X}_n = 1\)), and \(\mathbf{A}\) is the CIS matrix given by the elements

(The CIS matrix elements correspond to the sum of the zeroth- and first-order terms of the ADC(1) matrix elements.) Besides the density matrices required for the HF reference state derived above, we need to identify additional one- and two-particle density matrices for the excitation energy. We therefore formally represent the excitation energy \(\omega_n = \mathbf{X}_n^\dagger\mathbf{A}\mathbf{X}_n\) in terms of one- and two-particle density matrices:

where the superscript \((n)\) indicates that we are referring to the difference density matrices for the \(n\)-th excited state. To identify these density matrices, we carry out explicitly the matrix-vector multiplication on the left hand side,

where we have used the explicit form of the CIS matrix elements, and \(x_{ia}\) are the elements of a specific eigenvector \(\mathbf{X}_n\).

From the above equation, we identify the excited-state density matrix contributions:

where the indices of the vectors in the two-particle density matrix have been renamed.

To obtain the molecular gradient of the excited state \({n}\), the density matrices of the ground state must also be included. For CIS, this is the HF reference state, so the density matrices for the excited state are

Since, when written in terms of one- and two-particle density matrices, the excited state Lagrangian is virtually identical to the Lagrangians written for HF and MP2, we leave the exercise of plugging in the expressions of the DMs to the reader. The orbital response equations are also straightforwad to derive by using the above density matrices in the general orbital response equations. We leave the step-by-step derivation to the reader, with a note that care should be given to the symmetry of the two-particle density matrix \(\Gamma_{pqrs}\). Here, we provide the final expressions for \(\boldsymbol{\Lambda}\) (to be determined iteratively)

and \(\boldsymbol{\Omega}\)

Using the density matrices and Lagrange multipliers, the analytical CIS gradient can now be determined from the partial derivative of the Lagrangian with respect to \(x\).

#### TDHF#

In linear-response time-dependent Hartree–Fock (TDHF), also known as the *random phase approximation* (RPA) [DHG05], one solves a pseudo-eigenvalue equation of the form

where \(\mathbf{X}_n\) and \(\mathbf{Y}_n\) are referred to as the *excitation* and *de-excitation* parts of the response vectors, respectively, the matrix \(\mathbf{A}\) corresponds to the one from CIS,
and the \(\mathbf{B}\) matrix is given by the elements \(B_{ia,jb} = -\braket{ab||ij}\).
The vectors are usually normalized according to \(\mathbf{X}_n^\dagger \mathbf{X}_n - \mathbf{Y}_n^\dagger \mathbf{Y}_n = 1\).

The following procedure is analogous to the CIS case and only a few changes are required in the definition of the one- and two-particle density matrices. The non-vanishing contributions to \(\boldsymbol{\gamma}^{(n)}\) and \(\boldsymbol{\Gamma}^{(n)}\) for TDHF are given in the following in terms the elements \(x_{ia}\) and \(y_{ia}\) of \(\mathbf{X}_n\) and \(\mathbf{Y}_n\), respectively, or rather their linear combinations \(\mathbf{X}_n \pm \mathbf{Y}_n\):

Note that there is an additional non-vanishing block in the two-particle density matrix as compared to CIS, namely \(\Gamma_{ijab}^{(n)}\), that enters both the right-hand side of the \(\boldsymbol{\Lambda}\) orbital response equations and the \(\boldsymbol{\Omega}\) multipliers.

#### TDDFT#

Analogous to the ground state, analytical gradients in the case of linear-response time-dependent density functional theory (TDDFT) are virtually identical to the TDHF ones [FA02]. Only the exchange-correlation terms need to be considered additionally, meaning the matrix elements of \(\mathbf{A}\) and \(\mathbf{B}\) need to be modified accordingly.

The excitation energy \(\omega_n\) for a general hybrid functional can be written as

where the KS matrix \(\mathbf{F}\) is given by

and the xc contributions \(v_{pq}^{\text{xc}}\) and \(f^{\text{xc}}_{pqrs}\) introduced above, sometimes referred to as the xc potential and xc kernel, respectively [DHG05], are given in the LDA in a real MO basis as

For the orbital response, derivatives of these two quantities with respect to the MO coefficients \(\mathbf{C}\) are required. Derivatives of the orbitals \(\phi_p\) behave exactly like for the normal integrals given above, however, additional terms occur since \(e_{\text{xc}}\) depends on \(\mathbf{C}\) through the density. The derivative of \(v_{pq}^{\text{xc}}\) thus gives a term analogous to \(f_{pqrs}^{\text{xc}}\), while the derivative of the latter gives a term with a third-order functional derivative [FA02],

For the nuclear gradient of the TDDFT excitation energy we thus need the partial derivatives of those two terms.

For the xc potential, this is given by:

where the first term is completely analogous to the ground-state contribution, except that it gets contracted with the relaxed one-particle density matrix \(\boldsymbol{\gamma} + \boldsymbol{\Lambda}\) (instead of the ground-state density matrix \(\mathbf{D}\)), and the second term corresponds to an \(f^{\text{xc}}\)-like term with the partial derivative of the ground-state density. The partial derivative of the xc kernel with respect to \(x\) gives

where the first term is again \(f^\text{xc}\)-like with an orbital derivative, and the second term is analogous to \(g^\text{xc}\) from the orbital response contributions, including again the partial derivative of the ground-state density. This term eventually gets contracted with the two-particle density matrix \(\boldsymbol{\Gamma}\), so with two excitation or response vectors.

TDDFT within the Tamm–Dancoff approximation (TDA) [HHG99] is obtained by setting \(\mathbf{B} = \mathbf{0}\) (or \(\mathbf{Y}_n = \mathbf{0}\)). All the above considerations are equally valid for TDDFT/TDA, with the one- and two-particle density matrices simplifying to the ones from CIS. Note that TDHF (and thus CIS, which is TDHF within the TDA) can be considered a special case of general hybrid TDDFT with \(c_{\text{x}} = 1\) and \(e_{\text{xc}} = 0\) [FA02].

### First-order properties#

Not only nuclear gradients, but many other time-independent (or “static”) molecular properties can be calculated as derivatives of the energy [Jen06]. For instance, the electric dipole moment can be calculated as the derivative of the energy with respect to an external electric field, and the magnetic dipole moment with respect to an external magnetic field. Second derivatives with respect to the external field give the electric polarizability and magnetizability, respectively. Properties that can be calculated as first derivatives of the energy are referred to as **first-order properties**.

For exact wave functions \(\ket{\Psi}\) and energies \(E = \langle \Psi | \hat{H} | \Psi \rangle\), the **Hellmann–Feynman theorem** holds [Jen06]

stating that derivative of the energy with respect to an external perturbation \(\xi\) is identical to the expectation value of the perturbed Hamiltonian with the unperturbed wave function. The above derivative is to be taken at zero perturbation strength, \(\xi = 0\). If the basis functions do not depend on the perturbation, the above equation also holds for fully variational methods like the SCF and MCSCF schemes.

Consider the electric dipole moment \(\boldsymbol{\mu}\) as an example. Numerically, each component of \(\boldsymbol{\mu}\) can be obtained by calculating the energy \(E\) in presence of a static electric field \(\mathcal{\boldsymbol{F}}\), once with a positive and once with a negative sign in one of its components, and then applying the formula of the symmetric difference quotient. Starting from a Lagrangian \(L\) of this form, dipole moments are obtained analytically from a perturbed Hamiltonian, \(\hat{H}_{\mathcal{\boldsymbol{F}}} = \hat{H} + \mathcal{\boldsymbol{F}} \hat{\mu}\), where \(\hat{\mu}\) is the dipole operator, in the Lagrange expression followed by differentiation,

The determination of the Lagrange multipliers \(\boldsymbol{\Lambda}\) and \(\mathbf{\tilde{T}}\) is identical to the procedures described above. The dipole moment as a derivative of the energy can then be calculated as:

where \(\boldsymbol{\gamma}' = \boldsymbol{\gamma} + \boldsymbol{\gamma}^{\text{A}}\) is the (orbital) **unrelaxed** one-particle density matrix, which includes contributions from the amplitude response, \(\boldsymbol{\gamma}' + \boldsymbol{\Lambda}\) is referred to as the (orbital) **relaxed density matrix**, and \(\mu_{qp}\) are elements of the dipole operator in the MO basis.
Taking all terms of the above equation into account is referred to as the *relaxed* dipole moment, whereas neglecting \(\boldsymbol{\Lambda}\) yields so-called *unrelaxed* dipole moments. The latter do not correspond to a proper energy derivative, but the computation is somewhat simplified since the iterative solution of the orbital response equations is avoided.

Another approach to one-electron properties such as dipole moments is to set out from the expectation value of the corresponding operator with the wave function, such as \(\boldsymbol{\mu} = \langle \Psi | \hat{\mu} | \Psi \rangle\), see the Hellmann-Feynman theorem above. Depending on the wave-function model, this approach can be equivalent to the orbital-unrelaxed approach, such as in configuration interaction, but in particular for schemes based on perturbation theory, all three different approaches to dipole moments differ [HRDH19].