cfd_otes

52
MT5241 Computational Fluid Dynamics I J.S.B. Gajjar September 2004 1 Introduction In this course we will be studying computational fluid dynamics or CFD. CFD is concerned with the the study of fluid flow problems using computational tech- niques, as opposed to analytical or experimental methods. The modelling of fluid flow leads to the solution of unsteady, nonlinear, partial differential equations, and there are only a handful of known ‘exact’ analytical solution. Even in the simplest of geometries the solution of these equations is quite challenging. In this CFD course we will aim to study the different techniques which are used to solve the different types of equations which arise. As we will see, to be good at CFD, one has to be fluent in many different subject areas such as numerical analysis, programming, fluid mechanics. This introductory course will concentrate mainly on finite-difference methods. In CFD II you will meet the popular finite-volume method. In the finite-difference method, the solution is sought on a grid of points, and we deal with function values at these nodal points. In a finite-element envi- ronment on the other hand, the regions is split into various sub-regions, and the solution in these sub-regions is approximated by basis functions. 1.1 Errors To start with we need to discuss errors. Typically numerical computation of any sort will generate errors and it is one of the tasks of the CFD practitioner to be able to assess and be aware of the errors arising from numerical computation. The different types of errors can be categorised into the following main types: Roundoff errors. Errors in modelling. Programming errors. Truncation and discretization errors. Roundoff errors. 1

description

lecture notes on cfd subject

Transcript of cfd_otes

MT5241 Computational Fluid Dynamics I

J.S.B. Gajjar

September 2004

1 Introduction

In this course we will be studying computational fluid dynamics or CFD. CFDis concerned with the the study of fluid flow problems using computational tech-niques, as opposed to analytical or experimental methods. The modelling of fluidflow leads to the solution of unsteady, nonlinear, partial differential equations,and there are only a handful of known ‘exact’ analytical solution. Even in thesimplest of geometries the solution of these equations is quite challenging. In thisCFD course we will aim to study the different techniques which are used to solvethe different types of equations which arise. As we will see, to be good at CFD,one has to be fluent in many different subject areas such as numerical analysis,programming, fluid mechanics. This introductory course will concentrate mainlyon finite-difference methods. In CFD II you will meet the popular finite-volumemethod. In the finite-difference method, the solution is sought on a grid of points,and we deal with function values at these nodal points. In a finite-element envi-ronment on the other hand, the regions is split into various sub-regions, and thesolution in these sub-regions is approximated by basis functions.

1.1 Errors

To start with we need to discuss errors. Typically numerical computation of anysort will generate errors and it is one of the tasks of the CFD practitioner to beable to assess and be aware of the errors arising from numerical computation.The different types of errors can be categorised into the following main types:

• Roundoff errors.

• Errors in modelling.

• Programming errors.

• Truncation and discretization errors.

Roundoff errors.

1

These arise when a computer is used for doing numerical calculations. Sometypical examples include the inexact representation of numbers such as π,

√2.

Roundoff and chopping errors arise from the way numbers are stored on thecomputer, and in the way arithmetic operations are carried out. Whereas mostof the time the way numbers are stored on a computer is not under our control,the way certain expressions are computed definitely is. A simple example willillustrate the point. Consider the computation of the roots of a quadratic equationax2 + bx + c = 0 with the expressions

x1 =−b +

√b2 − 4ac

2a, x2 =

−b −√

b2 − 4ac

2a.

Let us take a = c = 1, b = −28. Then

x1 = 14 +√

195, x2 = 14 −√

195.

Now to 5 significant figures we have√

195 = 13.964. Hence x1 = 27.964 x2 =0.036. So what can we say about the error. We need some way to quantify errors.There are two useful measures for this.Absolute error

Suppose that φ∗ is an approximation to a quantity φ. Then the absolute erroris defined by |φ∗ − φ|.Relative error

Another measure is the relative error and this is defined by |φ∗ − φ|/|φ| ifφ 6= 0.

For our example above we see that |x∗1 − x1| = 2.4 ∗ 10−4 and |x∗

2 − x2| =

2.4 ∗ 10−4 which look small. On the other hand the relative errors are|x∗

1−x1|

|x1|=

8.6 ∗ 10−6 and|x∗

2−x2|

|x2|= 6.7 ∗ 10−3. Thus the accuracy in computing x2 is far less

than in computing x1. On the other hand if we compute x2 via

x2 = 14 −√

195 =1

14 +√

195

we obtain x2 = 0.03576 with a much smaller absolute error of 3.4 ∗ 10−7 and arelative error of 9.6 ∗ 10−6.

Errors in numerical modelling.These arise for example when the equations being solved are not the proper

equations for the flow problem in question. An example is using the Euler equa-tions for calculating the solution of flows where viscosity effects are important.No matter how accurately the solution has been computed, it may not be closeto the real physical solution because viscous effects have been neglected throughtthe computation.

Programming errors, and bugs.These are errors entirely under the control of the programmer. To eliminate

these requires careful testing of the code and logic, as well as comparison with

2

previous work. Even then, for your problem for which there may not be previ-ous work to compare with, one has to do numerous self-consistency checks withfurther analysis as necessary.

Truncation and discretization errors.These errors arise when we take the continuum model and replace it with a

discrete approximation. For example suppose we wish to solve

d2U

dx2= f(x).

Using Taylor series we can apoproximate the second derivative term by

u(xi+1) − 2u(xi) + u(xi−1)

h2

where we have taken a uniform grid with spacing h say and node points xi. As farthe approximation of the equation is concerned we will have a truncation error

given by

τ(xi) =d2U(x1)

dx2− f(xi) =

h2

12

d4U(xi)

dx4+ . . . .

Even though the discrete equations may be solved to high accuracy, there will bestill an error of O(h2) arising from the discretization of the equations. Of coursewith more points, we would expect the error to diminish.

3

2 Initial value problems

Here we will look at the solution of ordinary differential equations of the type,say

dy

dx= f(x, y), a ≤ x ≤ b, (1)

subject to an initial conditiony(a) = α (2)

The methods that we will use generalise readily to systems of differential equa-tions of the type

dY

dx= F(x,Y), a ≤ x ≤ b, (3)

whereY = (y1(x), y2(x), ..., yN(x))T ,

F = (f1(x,Y), f2(x,Y), ..., fN(x,Y))T ,

with initial dataY(a) = α, (4)

say, where α = (α1, α2, ..., αN)T .One question which arises immediately is, what about second and third order

differential equations? Consider for example

g′′′ + gg′′ +1

2(1 − g′2) = 0, g(0) = g′(0) = g′(∞) − 1 = 0, (5)

which arises in fluid applications. The answer to this is that if we take anydifferential equation, we can always write it as a system of first order equations.

ExampleIn (5) let

Y1 = g(x), Y2 = g′(x), Y3 = g′′(x),

then (5) becomes

dY

dx=

Y2

Y3

−Y1Y3 − 12(1 − Y 2

1 )

= F. (6)

The boundary conditions in (5) are not of the form as in (4) because we haveconditions given at two points x = 0 and x = ∞. We will see how to deal withthis later.

Consider equation (1). There are various mathematical results that we shouldbe aware of, although we will not go too deeply into the implications or theorybehind some of these.

Suppose we define D to be the domain

D = (x, y) | a ≤ x ≤ b, −∞ < y < ∞

4

and f(x, y) is continuous on D. If f(x, y) satisfies a Lipschitz condition on Dthen (1) has a unique solution for a ≤ x ≤ b. Recall f(x, y) satisfies a Lipschitzcondition on D means that there exists a constant L > 0 (called the Lipschitzconstant) such that

|f(x1, y1) − f(x2, y2)| ≤ L|y1 − y2|

whenever (x1, y1), (x2, y2) belong to D.

2.1 Euler’s Method

This is the simplest of techniques for the numerical solution of (1). It is good forproving various results, but it is rarely used in practice because of far superiormethods.

For simplicity define an equally spaced mesh

xj = a + jh, j = 0, .., N

where h = (b − a)/N is called the step size.

PSfrag replacements

a = x0 x1 x2 xi xi+1 xN = b

h

We can derive Euler’s method as follows. Suppose y(x) is the unique solutionto (1), and twice differentiable. Then by Taylor’s theorem we have

y(xi+1) = y(xi + h) = y(xi) + y′(xi)h +h2

2y′′(ξ) (7)

where xi ≤ ξ ≤ xi+1. But from the differential equation y′(xi) = f(xi), andyi = y(xi). This suggests the scheme

w0 = α wi+1 = wi + hf(xi, wi), i = 1, 2, .., N − 1, (8)

for calculating the wi which will be our approximate solution to the equation.This is called Euler’s method.

2.1.1 Truncation error for Euler’s method

Suppose that yi = y(xi) is the exact solution (1) at x = xi. Then the truncationerror is defined by

τi+1(h) =yi+1 − (yi + hf(xi, yi))

h

=yi+1 − yi

h− f(xi, yi), (9)

5

for i = 0, 1, ..., N − 1. From earlier (7) we find that

τi+1(h) =h

2y′′(ξi)

for some ξi in (xi, xi+1). So if y′′(x) is bounded by a constant M in (a, b) then

|τi+1(h)| ≤ h

2M

and thus we see that the truncation error for Euler’s method is O(h). In generalif τi+1 = hp we say that the method is of order hp. In principle if h decreases, weshould be able to achieve greater accuracy, although in practice roundoff errorlimits the smallest size of h that we can take.

2.2 Higher order methods

2.2.1 Modified Euler

The modified Euler method is given by

w0 = α k1 = hf(xi, wi),

wi+1 = wi +h

2[f(xi, wi) + f(xi+1, wi+1)], i = 1, 2, .., N − 1,

This has truncation error O(h2). Sometimes this is also called a Runge-Kuttamethod of order 2.

2.2.2 Runge-Kutta method of order 4

One of the most common Runge-Kutta methods of order 4 is given by

w0 = α,

k1 = hf(xi, wi),

k2 = hf(xi +h

2, wi +

1

2k1) (10)

k3 = hf(xi +h

2, wi +

1

2k2)

k4 = hf(xi+1, wi + k3)

wi+1 = wi +1

6(k1 + 2k2 + 2k3 + k4) for i = 0, 1, .., N − 1.

Note that this requires 4 function evaluations per step.All these methods generalise to a system of first order equations of the form

(3). Thus for instance the RK(4) method (10) above becomes

6

w0 = α,

k1 = hf(xi,wi),

k2 = hf(xi +h

2,wi +

1

2k1)

k3 = hf(xi +h

2,wi +

1

2k2)

k4 = hf(xi+1,wi + k3)

wi+1 = wi +1

6(k1 + 2k2 + 2k3 + k4) for i = 0, 1, .., N − 1.

In the above we have taken the step size to be constant. But there is noreason to do this and the step size can be adjusted dyanmically. This forms thebasis of the so called Kutta-Fehlberg methods.

2.3 m-step multi-step method

The methods discussed above are called one-step methods. Methods that use theapproximate values at more than one previous mesh point are called multi-stepmethods. There are two distinct types worth mentioning.

These are of the form

wi+1 = cm−1wi + cm−2wi−1 + ... + c0wi+1−m

+h[bmf(xi+1, wi+1) + bm−1f(xi, wi) + ... + b0f(xi+1−m, wi+1−m)] (11)

aIf bm = 0 (so there is no term with wi+1 on the right hand side of (11), themethod is explicit.

If bm 6= 0 we have an implicit method.

2.3.1 Adams-Bashforth 4th order method (explicit)

Here we havew0 = α, w1 = α1, w2 = α2, w3 = α3,

where these values are obtained using other methods such as RK(4) for instance.Then for i = 3, 4, ..., N − 1 we use

wi+1 = wi +h

24[55f(xi, wi) − 59f(xi−1, wi−1) + 37f(xi−2, wi−2) − 9f(xi−3, wi−3)].

aa

7

2.3.2 Adams-Moulton 4th order method (implicit)

Here we havew0 = α, w1 = α1, w2 = α2,

where these values are obtained using by other methods such as RK(4) for in-stance. Then

wi+1 = wi +h

24[9f(xi+1,wi+1) + 19f(xi, wi)

−5f(xi−1, wi−1) + f(xi−2, wi−2)], for i = 2, 4, ..., N − 1. (12)

Notice the presence of the term containing wi+1 on the right hand side of (12).The Adams-Bashforth, Adams-Moulton methods can be derived by integrat-

ing the differential equations as

dy

dx= f(x, y), y(xi+1) − y(xi) =

∫ xi+1

xi

f(s, y) ds, (13)

and replacing f(x, y) in the integral by an interpolating polynomial passingthrough previous points.

The implicit methods are generally more stable, and have less roundoff errorsthan the explicit methods. But here we need to bring in the ideas of stability ofa method.

2.3.3 Predictor-corrector methods

In practice the multi-step methods are rarely used as described above. Insteadthey are frequently used as predictor-corrector methods. First we predict a valueusing an explicit method, and correct it using an implicit method.4th order predictor-corrector

w(0)i+1 = wi +

h

24[55f(xi, wi) − 59f(xi−1, wi−1) + 37f(xi−2, wi−2) − 9f(xi−3, wi−3)],

(14)

w(k)i+1 = wi+

h

24[9f(xi+1, w

(k−1)i+1 +19f(xi, wi)−9f(xi−1, wi−1)+f(xi−2, wi−2)]. (15)

Here (14) is used to predict a value for wi+1 and (15) used to iteratively correct.

2.4 Improving accuracy with extrapolation

Suppose that we use a method with truncation error of O(hm) to compute anapproximation wi. We can use Richardson extrapolation to get an approximationwith greater accuracy. First generate an approximation w

(1)i say with step size h

and an approximation w(2)i with step size 2h. Then we can write

w(1)i = yi + Ehm + E1h

m+1 + . . . ,

8

andw

(2)i = yi + E(2h)m + E1(2h)m+1 + . . . .

Then we can eliminate the E term to get

2mw(1)i − w

(2)i = (2m − 1)yi + O(hm+1).

Thus

w∗i =

2mw(1)i − w

(2)i

2m − 1

is a more accurate approximationto the solution than w(1)i or w

(2)i .

For a 4th order Runge-Kutta method the above gives

w∗i =

16w(1)i − w

(2)i

15.

9

3 Stability

We have introduced a number of different methods. How do we select whichone to use? In practice most methods should work reasonably well on standardproblems. However certain types of problems (stiff problems) can cause difficultyand care needs to be exercised in the choice of the method. Typically boundarylayer type problems (say with a small parameter multiplying a highest derivative)are examples of stiff problems. This is where one needs to be aware of the stabilityproperties of a particular method.

3.1 Consistent

A method is said to be consistent if the local truncation error tends to zero asthe step size → 0, i.e

limh→0

maxi

|τi(h)| = 0.

3.2 Convergence

A method is said to be convergent with respect to the equation it approximatesif

limh→0

maxi

|wi − y(xi)| = 0,

where y(x) is the exact solution and wi an approximation produced by themethod.

3.3 Stability

If we consider an m-step method

w0 = α0, w1 = α1, . . . , wm−1 = αm−1,

wi+1 = am−1wi + am−2wi−1 + ... + a0wi+1−m

+h[F (xi, wi+1, wi, ..., wi+1−m)] (16)

then ignoring the F term the homogenous part is just a difference equation. Thestability is thus connected with the stability of this difference equation and hencethe roots of the characteristic polynomial

λm − am−1λm−1 − ... − a1λ − a0 = 0.

Why? Well consider (1) with f(x, y) = 0. This has the solution y(x) = α.Thus the difference equation has to produce the same solution, ie wn = α. Nowconsider

wi+1 = am−1wi + am−2wi−1 + ... + a0wi+1−m. (17)

10

If we look for solutions of the form wn = λn then this gives the characteristicpolynomial equation

λm − am−1λm−1 − ... − a1λ − a0 = 0. (18)

Suppose λ1, ...λm are distinct roots of (18). Then we can write

wn =m∑

i=1

ciλmi . (19)

Since wn = α is a solution, the difference equation (17) gives

α − am−1α − ... − a0α = 0,

orα(1 − am−1 − ... − a0) = 0.

This shows that λ = 1 is a root, and we may take c1 = α in (19). Thus (19) maybe written as

wn = α +m∑

i=2

ciλni . (20)

In the absence of round-off error all the ci in (20) would be zero. If |λi| ≤ 1 thenthe error due to roundoff will not grow. The argument used above generalisesto multiple roots λi of the characteristic polynomial. Thus if the roots λi, (i =1, .., m) of the characteristic polynomial satisfy |λi| ≤ 1, then the method isstable.Example

Consider the 4th order Adams-Bashforth method

wi+1 = wi +h

24[55f(xi, wi) − 59f(xi−1, wi−1) + 37f(xi−2, wi−2) − 9f(xi−3, wi−3)].

This has the characteristic equation

λ4 − λ3 = 0,

giving the roots as λ = 1, 0, 0, 0. Thus this is stable according to our definitionof stability above.

It can be proven that if the difference method is consistent withthe differential equation, then the method is stable if and only if themethod is convergent.

Is it enough just to have stability as defined above? Consider the solution of

dy

dx= −30y, y(0) = 1/3.

The RK(4) method, although stable, has difficulty in computing the accuratesolution of this problem. This means that we need something more than just theidea of stability defined above.

11

3.4 Absolute stability

Considerdy

dx= ky, y(0) = α, k < 0. (21)

The exact solution of this is y(x) = αekx. If we take our one-step method andapply it to this equation we obtain

wi+1 = Q(hk)wi.

Similarly for a multi-step of the type used earlier (11), when applied to the testequation (21) gives rise to the difference equation

wi+1 = am−1wi + am−2wi−1 + ... + a0wi+1−m

+h[bmkwi+1 + bm−1kwi + ... + b0kwi+1−m]. (22)

Thus if we seek solutions of the form wi = zi this will give rise to the characteristicpolynomial equation

Q(z, hk) = 0,

where

Q(z, hk) = (1 − hkbm)zm − (am−1 + hkbm−1)zm−1 − ... − (a0 + hkb0).

The region R of absolute stability for a one-step method is defined as theregion in the complex plane R= hk ∈ C, |Q(hk)| < 1, and for a multi-stepmethod R = hk ∈ C, |βj| < 1, where βj is a root of Q(z, hk) = 0.

A numerical method is A-stable if R contains the entire left half plane.In practice the above conditions place a limit on the size of the step size which

we can use.Consider the modified Euler method

w0 = α k1 = hf(xi, wi),

wi+1 = wi +h

2[f(xi, wi) + f(xi+1, wi+1)], i = 1, 2, .., N − 1.

This is an A-stable method.

12

4 Boundary Value Problems

4.1 Shooting Methods

Consider the differential equation

d2y

dx2+ k

dy

dx+ xy = 0, y(0) = 0, y(1) = 1. (23)

This is an example of a boundary value problem because conditions have tosatisfied at both ends. If we write this as a system of first order equations wehave

Y1 = y,

Y2 =dy

dx,

dY1

dx= Y2, (24)

dY2

dx= −kY2 − xY1.

The boundary conditions give

Y1(0) = 0, Y1(1) = 1.

We do not know the value of Y2(0). If we knew this value then we could use ourstandard integrator to obtain a solution. So how do we handle this situation.Suppose we guess the value of Y2(0) = g, say. Then we can integrate (24) withthe initial condition

Y(0) =

(

Y1(0)Y2(0)

)

=

(

0g

)

,

using any integrator eg RK(4). This will give us

Y(1) =

(

Y1(1)Y2(1)

)

=

(

β1

β2

)

,

where, because we guessed the value g = Y2(0), β1 will not necessarily satisfy therequired condition Y1(1) = 1, in (23). So now we need to go through some kindof iterative process to try and get the correct value of g such that the requiredcondition at x = 1 is satisfied.

To do this defineφ(g) = Y1(1; g) − 1, (25)

where the semi-colon notation is used to indicate an additional parametric de-pendence on g. We want to find the value of g such that φ(g) = 0. This gives riseto the idea of a shooting method. We adjust the value of g to force φ(g) = 0. Wecan also think of this as finding the root of the equation φ(g) = 0.

13

4.2 How do we adjust the value of g?

To solve φ(g) = 0 we can use any of the root finding methods we know, eg secantmethod, Newton’s method, bisection, etc.

4.3 Newton’s Method, Secant Method

Suppose that we have a guess g and we seek a correction dg such that φ(g+dg) =0. By Taylor expansion we have

φ(g + dg) = φ(g) +dφ

dg(g)dg + O(dg2).

This suggests that we take

dg = − φ(g)

φ′(g),

and hence a new value for g is g + dg. Thus this gives rise to a sequence ofiterations for n = 0, 1, ...,

gn+1 = gn − φ(gn)

φ′(gn), (26)

where we start the iteration with a suitable guess g0. Provided that g0 is close to tothe true value, Newton’s method converges very fast, ie quadratically. Difficultieswith convergence arise when φ′(g) is close to zero.

Now in (26) we are required to find φ′(gn). How can we do this? One way isto estimate φ′(gn) by

φ′(gn) =φ(gn) − φ(gn−1)

gn − gn−1.

This gives

gn+1 = gn − φ(gn)(gn − gn−1)

φ(gn) − φ(gn−1), (27)

which is known as the secant method. We need two starting values for g andthen we can use (27) to generate the remaining values.

Consider (24) again

dY1

dx= Y2,

dY2

dx= −kY2 − xY1,

with Y(0) = (0, g)T . Now from (25)

φ′(g) =∂Y1

∂g(1; g),

14

so this suggests differentiating the original system of equations and boundaryconditions with respect to g to get

d

dx

(

∂Y

∂g

)

=

(

∂Y2

∂g

−k ∂Y2

∂g− x∂Y1

∂g

)

,

(

∂Y

∂g

)

(x = 0) =

(

01

)

. (28)

The system (28) defines another initial value problem with given initial condi-tions.

Thus the procedure is as follows. First integrate (24) with initial conditionsY(0) = (0, g)T and with a suitable initial guess g. Then, either simultaneouslyat each step, or after a step, solve (28). After integrating over all the steps wewill have at x = 1 the values Y(x = 1) from (24) and ∂Y

∂g(x = 1) from (28). Note

that from (25)

φ′(g) =∂Y1

∂g(1; g).

From the solution of (28) we can extract ∂Y1

∂g(x = 1) and hence substitute into (26)

to update g. This forms the basis of Newton’s method combined with shooting,to solve boundary value problems.

It is clear how one can solve (28) after solving (24) at each step. This can alsobe done simultaneously by working with an augmented system where we define

Y1 = y, Y2 =dy

dx, Y3 =

∂y

∂g=

∂Y1

∂g, Y4 =

∂Y2

∂g,

and then

d

dx

Y1

Y2

Y3

Y4

=

Y2

−kY2 − xY1

Y4

−kY4 − xY3

, Y(0) =

Y1(0)Y2(0)Y3(0)Y4(0)

=

0g01

. (29)

4.4 Multiple end conditions

What we have discussed so far works fine if we just have one condition to satisfy.The same procedure works also for cases where we have two or more conditionsto satisfy. For example with

d4y

dx4= y3 − (

dy

dx)2, y(0) = 1,

dy

dx(0) = 0, y(1) = 2,

dy

dx(1) = 1,

we would need two starting values at x = 0 and then we will have two conditionsto satisfy at x = 1. The corrections can be calculated in a similar way. Thus inthe example above, suppose we define

Y1 = y, Y2 = y′, Y3 = y′′, y4 = y′′′,

15

then we will need to use guesses for Y3(0) = e, say, and Y4(0) = g. The conditionswe need to satisfy can be written as

φ1(e, g) = φ2(e, g) = 0,

whereφ1(e, g) = Y1(1; e; g)− 2, φ2(e, g) = Y2(1; e; g)− 1.

To obtain the corrections to guessed values e, g we have

φ1(e + de, g + dg) = 0 = φ1(e, g) + de∂φ1

∂e(e, g) + dg

∂φ1

∂g(e, g) + O(de2, dg2),

φ2(e + de, g + dg) = 0 = φ2(e, g) + de∂φ2

∂e(e, g) + dg

∂φ2

∂g(e, g) + O(de2, dg2).

This can be solved to obtain the corrections (de, dg).If we generalise to a multidimensional case where we have a vector of guesses

g we can find the corrections dg as

dg = −J−1(g)φ(g),

where J is the Jacobian ∂φi

∂gkand φ is the vector of conditions.

There are of course many variations on the above theme and different strate-gies when dealing with linear equations. For stiff systems one can also integratefrom the two ends and match in the middle. Many of these techniques are dis-cussed extensively in the book by Keller (1976).

16

4.5 Solution of boundary value problems using finite-differences

Boundary value problems can also be tackled directly using finite-differences orsome other technique such as spectral approximation. We will look at one specificexample to illustrate the finite-difference technique.

Consider

d2y

dx2=

1

8(32 + 2x3 − y

dy

dx), 1 ≤ x ≤ 3,

y(1) = 17, y(3) =43

3. (30)

The exact solution of (30) is y(x) = x2 + (16/x).Let us first define a uniform grid (x0, x1, ..., xN) with N + 1 points and grid

spacing h = (xN − x0)/N, so that xj = x0 + jh, for (j = 0, 1, .., N).

PSfrag replacements

a = x0 x2 x3 xi xi+1 xN = b

h

Let us approximate y by wi at each of the nodes x = xi. The derivatives of yin (30) are approximated in finite-difference form as

(

dy

dx

)

x=xi

=wi+1 − wi−1

2h+ O(h2), (31)

(

d2y

dx2

)

x=xi

=wi+1 − 2wi + wi−1

h2+ O(h2). (32)

These can be derived by making use of a Taylor expansion about the point x = xi.Thus for example

y(xi+1) = y(xi) + hdy

dx(xi) +

h2

2

d2y

dx2(xi) +

h3

6

d3y

dx3(xi) +

h4

24

d4y

dx4(xi) + O(h5),

(33)

y(xi−1) = y(xi) − hdy

dx(xi) +

h2

2

d2y

dx2(xi) −

h3

6

d3y

dx3(xi) +

h4

24

d4y

dx4(xi) + O(h5).

(34)

By adding and subtracting (33),(34) and replacing y(xi) by wi we obtain (31),(32).Next replace y and its derivatives in (30) by the above approximations to get

wi+1 − 2wi + wi−1

h2= 4 +

x3i

4− wi

(

wi+1 − wi−1

16h

)

,

for (i = 1, 2, ...N − 1) (35)

17

and

w0 = 17, wN =43

3. (36)

The system of equations (35),(36) is a set of nonlinear difference equations. Wehave N + 1 equations for N + 1 unknowns w0, ..., wN . At this stage one hasrecourse to a number of different techniques for handling the nonlinearity. Oneis to set up an iteration scheme replacing wi by w

(k)i say, and starting with a

suitable guess for w(0)i . The nonlinear term in (35) can now be tackled in many

different ways. Thus for example we can replace it by

w(k−1)i

w(k)i+1 − w

(k)i−1

16h

,

or

w(k)i

w(k−1)i+1 − w

(k−1)i−1

16h

,

and we are then left with a linear system of equations to find the w(k)i and the

iterations can be continued until a suitable convergence condition is satisfied.What is a suitable convergence criterion? One condition is that the nonlineardifference equations (35) need to be satisfied to sufficient accuracy.

4.5.1 Newton linearization

Where practicable Newton’s method offers the best means for handling nonlinearequations because of its superior convergence properties. To use this techniquefor nonlinear difference equations of the form (35), suppose that we have a guessfor the solutions wi = Wi. We seek corrections δwi such that the wi = Wi + δwi

satisfies the system (35). Substituting wi = Wi + δwi into (35) and linearizing,ie ignoring terms of O(δw2

i ), gives

δwi+1 − 2δwi + δwi−1

h2= Fi − δwi

(

Wi+1 − Wi−1

16h

)

− Wi

(

δwi+1 − δwi−1

16h

)

,

for (i = 1, 2, ...N − 1) (37)

andδw0 = F0, δwN = FN . (38)

where

Fi = 4+x3

i

4−Wi

(

Wi+1 − Wi−1

16h

)

−Wi+1 − 2Wi + Wi−1

h2, for i = 1, ..., N−1,

and

F0 = 17 − W0, FN =43

3− WN .

18

The system (37),(38) is now linear and can be solved to obtain the correctionsδwi and hence we can update the Wi by Wi +δwi. This is continued until the δwi

are small enough and the difference equations are satisfied to sufficient accuracy.Instead of solving for the corrections δwi one can also solve for the total

quantities wi by replacing δwi in (37), (38) by δwi = wi −Wi, and solving for thewi.

The choice of the iteration scheme is dependent on a number of factors includ-ing ease of implementation, convergence properties and so on. The techniquesdescribed above lead to the solution of a tridiagonal systems of linear equationsof the form

αiwi−1 + βiwi + γiwi+1 = δi, i = 0, 1, .., N, (39)

where the αi, βi, γi are coefficients obtainable from the linear system and α0 =γN = 0. For instance from (37) we have

αi =1

h2− Wi

16h, βi = − 2

h2+

Wi+1 − Wi−1

16h,

γi =1

h2+

Wi

16h, δi = Fi, i = 1, 2, .., N − 1,

andβ0 = 1, γ0 = 0, δ0 = F0, αN = 0, βN = 1, δN = FN .

This can be solved using any standard library routine, or Thomas’s algorithm.

4.6 Thomas’s tridiagonal algorithm

This version of a tridiagonal solver is based on Gaussian elimination. First wecreate zeros below the diagonal and then once we have a triangular matrix, wesolve for the wi using back substitution. Thus the algorithm takes the form

βj = βj −γj−1αj

βj−1j = 2, 3, ..., N,

δj = δj −δj−1αj

βj−1, j = 2, 3, ..., N,

wN =δN

βN

, wj =(δj − γjwj+1)

βj

, j = N − 1, ..., 1.

19

5 Numerical solution of pde’s

In this part of the course we will be concerned with the numerical solution of par-tial differential equations (pde’s), using finite-difference methods. This involvescovering the domain with a mesh of points and calculating the approximate solu-tion of the pde at these mesh points. The techniques that we will use will dependon the type of pde that we are dealing with, ie whether it is elliptic, parabolic,or hyperbolic, although there are common features. We will need to learn how toconstruct approximations to the given equation, how to solve the equation usingiterative methods, investigate stability and so on.

5.1 Classification

Partial differential equations can be classified as being of type elliptic, parabolic orhyperbolic. In some cases equations can be of mixed type. There are systematicmethods for classifying an equation but for our purposes it is enough to summarizethe main results. Consider the second order pde

A∂2φ

∂x2+ B

∂2φ

∂x∂y+ C

∂2φ

∂y2+ D

∂φ

∂x+ E

∂φ

∂y+ Fφ + G = 0, (40)

where, in general, A, B, C, D, E, F, and G are functions of the independent vari-ables x and y and of the dependent variables φ. The equation (40) is said tobe

• elliptic if B2 − 4AC < 0,

• parabolic if B2 − 4AC = 0, or

• hyperbolic if B2 − 4AC > 0.

An example of an elliptic equation is Poisson’s equation

∂2φ

∂x2+

∂2φ

∂y2= f(x, y).

The heat equation∂φ

∂t= k

∂2φ

∂x2

is of parabolic type, and the wave equation

∂2φ

∂x2− ∂2φ

∂y2= 0

is a hyperbolic pde. An example of a mixed type equation is the transonic smalldisturbance equation given by

(K − ∂φ

∂x)∂2φ

∂x2+

∂2φ

∂y2= 0.

20

A more formal approach to classification is to consider a system of first orderpartial differential equations for the unknowns U = (u1, u2, ..., un)

T and indepen-dent variables x = (x1, x2, ..., xm)T . Suppose that the equations can be writtenin quasi-linear form

m∑

k=1

Ak

∂U

∂xk

= Q (41)

where the Ak are (n × n) matrices and Q is an (n × 1) column vector, and bothcan depend on xk and U but not on the derivatives of U. If we seek plane wavesolutions of the homogeneous part of (41) in the form

U = Uoeix.s,

where s = (s1, s2, .., sm)T , then (41) leads to the system of equations

i

[

m∑

k=1

Aksk

]

U = 0. (42)

This will have a non-trivial solution only if the characteristic equation

det |m∑

k=1

Aksk| = 0,

holds. This will have at most n solutions and n vectors s, which can be regardedas normals to the characteristic surfaces (which are the surfaces of constant phaseof the wave).

The system is hyperbolic if n real characteristics exist. If all the charcteristicsare complex, the system is elliptic. If some are real and some complex, the systemis of mixed type. If the system (42) is of rank less than n, then we have a parabolicsystem.Example

Consider the steady two-dimensional Euler equations for incompressible flow

∂u

∂x+

∂v

∂y= 0,

u∂u

∂x+ v

∂u

∂y= −∂p

∂x,

u∂v

∂x+ v

∂v

∂y= −∂p

∂y.

Introducing the vector U = (u, v, p)T and x = (x, y)T we can express this inmatrix vector form (41) with Q = 0 and

A1 =

1 0 0u 0 10 u 0

, A2 =

0 1 0v 0 00 v 1

.

21

Introducing λ = s1/s2 the characteristic equation gives

det

λ 1 0uλ + v 0 λ

0 uλ + v 1

= 0,

which leads to λ = −v/u or λ = ±i. Thus the Euler equations are hybrid.

5.2 Consistency, convergence and Lax equivalence theo-

rem

These ideas were introduced earlier for ordinary differntial equations but applyequally to pdes.

Consistent A discrete approximation to a partial differential equation is saidto be consistent if in the limit of the step-size(s) going to zero, the original pdesystem is recovered, ie the truncation error approaches zero.

Stability If we define the error to be the difference between the computedsolutions and the exact solution of the discrete approximation, then the schemeis stable if the error remains uniformly bounded for successive iterations.

Convergence A scheme is stable if the solution of the discrete equationsapproaches the solution of the pde in the limit that the step-sizes approach zero.

Lax’s Equivalence Theorem For a well posed initial-value problem anda consistent discretization, stability is the necessary and sufficient condition forconvergence.

5.3 Difference formulae

Let us first consider the various approximations to derivatives that we can con-struct. The usual tool for this is the Taylor expansion. Suppose that we have agrid of points with equal mesh spacing ∆x in the x− direction and equal spacing∆y in the y− direction. Thus we can define points xi, yj by

xi = x0 + i∆x, yj = y0 + j∆y.

Suppose that we are trying to approximate a derivative of a function φ(x, y) atthe points xi, yj. Denote the approximate value of φ(x, y) at the point xi, yj bywi,j say.

5.3.1 Central Differences

The first and second derivatives in x or y may be approximated as before by

(

∂2φ

∂x2

)

ij

=wi+1,j − 2wi,j + wi−1,j

(∆x)2+ O((∆x)

2), (43)

22

(

∂2φ

∂y2

)

ij

=wi,j+1 − 2wi,j + wi,j−1

(∆y)2+ O((∆y)

2), (44)

(

∂φ

∂x

)

ij

=wi+1,j − wi−1,j

2∆x

+ O((∆x)2), (45)

(

∂φ

∂y

)

ij

=wi,j+1 − wi,j−1

2∆y

+ O((∆y)2). (46)

Note that replacing the derivative

(

∂2φ

∂x2

)

ij

by the approximation in (43) gives

rise to a truncation error O((∆x)2). The approximations listed above are centered

at the points (xi, yj), and are called central-difference approximations.

•i − 1, j

×i, j

•i + 1, j

Consider the approximation

(

∂φ

∂x

)

ij

=wi+1,j − wi,j

∆x

.

By Taylor expansion we see that this gives rise to a truncation error of O(∆x).In addition this approximation is centered at the point xi+ 1

2,j.

•i, j

× •i + 1, j

5.3.2 One-sided approximations

We can also construct one-sided approximations to derivatives. Thus for examplea second-order forward approximation to ∂φ

∂xat the point (xi, yj) is given by

(

∂φ

∂x

)

ij

=−3wi,j + 4wi+1,j − wi+2,j

2∆x

.

Tables 1 and 2 list some of the more commonly used central and one-sidedapproximations. In these tables the numbers denote the weights at the nodes,and the whole expression is multiplied by 1

(∆x)j where j denote the order of thederivative, ie j = 1 for a first derivative and j = 2 for a second derivative.

5.3.3 Mixed derivatives

The techniques for finding suitable discrete approximations for mixed derivativesis exactly the same, except this time we would make use a multidimensional

23

Table 1: Weights for central differences

Node PointsOrder of i − 2 i − 1 i i + 1 i + 2Accuracy1st derivative(∆x)

2 −12

0 12

(∆x)4 1

12−2

30 2

3− 1

12

2nd derivative(∆x)

2 1 -2 1

(∆x)4 − 1

1243

−52

43

− 112

Table 2: Weights for one-sided differences

Node PointsOrder of i i + 1 i + 2 i + 3 i + 4Accuracy1st derivative(∆x) -1 1

(∆x)2 −3

22 −1

2

(∆x)3 −11

63 −3

213

(∆x)4 −25

124 -3 4

3−1

4

2nd derivative(∆x) 1 -2 1

(∆x)2 2 -5 4 -1

(∆x)3 35

12−26

3192

−143

1112

24

Taylor expansion. Thus for example second order approximations to ∂2φ/∂x∂yat the point i, j are given

∂2φ

∂x∂y=

wi+1,j+1 − wi−1,j+1 + wi−1,j−1 − wi+1,j−1

4∆x∆y

+ O((∆x)2, (∆y)

2),

or

∂2φ

∂x∂y=

wi+1,j+1 − wi+1,j − wi,j+1 + wi−1,j−1 − wi−1,j − wi,j−1 + 2wi,j

2∆x∆y

+O((∆x)2, (∆y)

2).

5.4 Solution of elliptic pde’s

A prototype elliptic pde is Poisson’s equation given by

∂2φ

∂x2+

∂2φ

∂y2= f(x, y), (47)

where f(x, y) is a known/given function. The equation (47) has to be solved in adomain D, say, with boundary conditions given on the boundary δD of D. Thesecan be of three types:

• Dirichlet φ = g(x, y) on δD.

• Neumann∂φ

∂n= g(x, y) on δD.

• Robin/Mixed B(φ,∂φ

∂n) = 0 on δD. Robin boundary conditions involve a

linear combination of φ and its normal derivative on the boundary. Mixedboundary conditions involve different conditions for one part of the bound-ary, and another type for other parts of the boundary.

In the above g(x, y) is a given function, and derivatives with respect to n denotethe normal derivative, ie the derivative in the direction n which is normal to theboundary.

Let us consider a model problem with

∂2φ

∂x2+

∂2φ

∂y2= f(x, y), 0 < x, y < 1 (48)

φ = 0 on δD. (49)

Here the domain D is the square region 0 < x < 1 and 0 < y < 1. Construct afinite difference mesh with points (xi, yj), say where

xi = i∆x, i = 0, 1, ..., N, yj = j∆y, j = 0, 1, ..., M,

where ∆x = 1/N, and ∆y = 1/M are the step sizes in the x and y directions.

25

Next replace the derivatives in (48) by the approximations from (43),(44) toget

wi+1,j − 2wi,j + wi−1,j

(∆x)2+

wi,j+1 − 2wi,j + wi,j−1

(∆y)2= fi,j, (50)

1 ≤ i ≤ N − 1, 1 ≤ j ≤ M − 1

andwi,j = 0, if i = 1, N, 1 ≤ j ≤ M, (51)

wi,j = 0, if j = 1, M, 1 ≤ i ≤ N. (52)

Thus we have (N −1)×(M −1) unknown values wi,j to find at the interior pointsof the domain. If we write wi = (wi,1, wi,2, ..., wi,M−1)

T , fi = (fi,1, fi,2, ..., fi,M−1)T ,

we can write the above system of equations (50) in matrix form as

B II B I

I B I

I B

w1

w2

w3

wN−1

= (∆x)2

f1f2f3

fN−1

.

In the above I is the (M − 1)× (M − 1) identity matrix and B is a similar sizedmatrix given by

B =

b cc b c0 c b c

..

.c b

, with b = −2(1

(∆x)2+

1

+(∆y)2), c = 1.

What we observe is that the matrix is very sparse and can be very large evenwith a modest number of grid points. For instance with N = M = 101 the linearsystem is of size 104 × 104 and we have 104 unknowns to find.

To solve the system of equations arising from the above discretization, onehas recourse to a number of methods. Direct methods are not feasible because ofthe size of the matrices involved and in addition they are expensive and require alot of storage, unless one exploits the sparsity pattern of the matrices. Iterativemethods offer an alternative and are easy to use, but depending on the method,convergence can be slow with an increasing number of mesh points.

26

5.5 Iterative methods for the solution of linear systems

5.5.1 Jacobi method

Consider (50) which we can rewrite as

wi,j =1

2(1 + β2)

(

wi+1,j + wi−1,j + β2(wi,j+1 + wi,j−1) − (∆x)2fi,j

)

, (53)

with β = ∆x/∆y. This suggests the iterative scheme

wnewi,j =

1

2(1 + β2)

(

woldi+1,j + wold

i−1,j + β2(woldi,j+1 + wold

i,j−1

)

− (∆x)2fi,j). (54)

We start with a guess for woldi,j and compute a new value from (54) and continue

iterating until suitable convergence criteria are satisfied. What are suitable con-vergence criteria? Ideally we wish to stop when the corrections are small, or whenthe equations (53) are satisfied to sufficient accuracy.

Suppose we write the linear system as

Av = f

where v is the exact solution of the linear system. Then if w is an approximatesolution, the error e is defined by e = v−w. But this is not much use unless weknow the exact solution. On the other hand we can write

Ae = A(v − w) = f − Aw.

Then the residual is defined by r = f − Aw which can be computed. For theJacobi scheme the residual is given by

ri,j = (∆x)2fi,j + 2(1 + β2)wi,j − (wi+1,j + wi−1,j + β2(wi,j+1 + wi,j−1)).

Therefore a suitable stopping condition might be

maxi,j

|ri,j| < ε1, or√

i,j

r2i,j < ε2.

5.5.2 Gauss-Siedel iteration

The Jacobi scheme involves two levels of storage, wnewi,j and wold

i,j which can bewasteful. It is more economical to overwrite values as and when they are com-puted and use the new values when ready. This gives rise to the Gauss-Seidelscheme, which, if we sweep with i increasing followed by j, is given by

wnewi,j =

1

2(1 + β2)

(

woldi+1,j + wnew

i−1,j + β2(woldi,j+1 + wnew

i,j−1

)

− (∆x)2fi,j), (55)

where the new values overwrite existing values.

27

5.5.3 Relaxation and the SOR method

Instead of updating the new values as indicated above, it is better to use relax-ation. Here we compute

wi,j = (1 − ω)woldi,j + ωw∗

i,j,

where ω is called the relaxation factor, and w∗i,j denotes the value as computed by

the Jacobi, or Gauss-Seidel scheme. ω = 1 reduces to the Jacobi or Gauss-Seidelscheme. The Gauss-Seidel scheme with ω 6= 1 is called the method of successiveoverrelaxation or SOR scheme.

5.5.4 Line relaxation

The Jacobi, Gauss-Seidel and SOR schemes are called point relaxation methods.On the other hand we may compute a whole line of new values at a time, leadingto the line-relaxation methods. For instance suppose we write the equations as

wnewi+1,j − 2(1 + β2)wnew

i,j + wnewi−1,j = −β2(wi,j+1 + wold

i,j−1)) + (∆x)2fi,j, (56)

then starting from j = 1 we may compute the values wi,j, for i = 1, ..., N − 1 inone go. The left hand side of (56) can be seen to lead to a tridiagonal systemof equations and we may use a tridiagonal solver to compute the solution, andincrementing j until all the points have been computed. The above scheme canbe combined with relaxation to give us a line-SOR method. In general the moreimplicit the scheme the better the convergence properties. A variation on theabove scheme is to alternate the direction of sweep, by doing i lines first and onthe next iteration doing the j lines first. This leads to the ADI, or alternatingdirection implicit, scheme.

28

5.6 Convergence properties of basic iteration schemes

Consider the linear systemAx = b (57)

where A = (ai,j) is an n× n matrix, and x,b are n× 1 column vectors. Supposewe write the matrix A in the form A = D−L−U where D,L,U are the diagonalmatrix, lower and upper triangular parts of A, ie

D =

a1,1 00 a2,2 00 0 a3,3 0

..

.0 an,n

, −L =

0 0a2,1 0 0a3,1 a3,2 0 0

..

.an,1 an,2 an,n−1 0

,

−U =

0 a1,2 a1,3 . . . a1,n

0 0 a2,3 . . . a2,n

0 0 0 a3,4 . . a3,n

..

0 an−1,n

0 0 0

.

Then (57) can be written as

Dx = (L + U)x + b.

The Jacobi iteration is then defined as

x(k+1) = D−1(L + U)x(k) + D−1b. (58)

The Gauss-Seidel iteration is defined by

(D − L)x(k+1) = Ux(k) + b, (59)

orx(k+1) = (D − L)−1Ux(k) + (D − L)−1b. (60)

In general an iteration scheme for (57) may be written as

x(k+1) = Px(k) + Pb (61)

where P is called the iteration matrix. For the Jacobi scheme scheme we have

P = PJ = D−1(L + U)

29

and for the Gauss-Seidel scheme

P = PG = (D − L)−1U.

The exact solution to the linear system (57) satisfies

x = Px + Pb. (62)

Hence if we define the error at the kth iteration by e(k) = x − x(k), then bysubtracting (61) from (62) we see that the error satisfies the equation

e(k+1) = Pe(k) = P2e(k−1) = Pk+1e(0). (63)

In order that the error diminishes as k → ∞ we must have

||Pk|| → 0 as k → ∞, (64)

Since||Pk|| = ||P||k

we see that (64) is true provided ||P|| < 1. From linear algebra it can be proventhat the condition that ||P|| < 1 is equivalent to the requirement that

ρ(P) = maxi

|λi| < 1

where λi are the eigenvalues of the matrix P. Note that ρ(P) is called the spectralradius of P. Thus for the iteration to converge we require that the spectral matrixof the iteration matrix to be less than unity. One further result which is usefulis that for large k

||e(k+1)|| = ρ||e(k)||.Suppose we want to estimate how many iterations it takes to reduce the initialerror by a factor ε. Then we see that we need q iterations where q is the smallestvalue for which

ρq < ε

giving

q ≥ qd =ln ε

ln ρ.

The quantity − ln ρ is called the asymptotic rate of convergence of the iteration.It gives a measure of how much the error decreases at each iteration. Thusiteration matrices where the spectral radius is close to 1 will converge slowly.

For the model problem it can be shown that for Jacobi iteration

ρ = ρ(PJ) =1

2

(

cosπ

N+ cos

π

M

)

,

30

and for Gauss-Seidelρ = ρ(PG) = [ρ(PJ)]2.

If we take N = M and N >> 1 then for Jacobi iteration we have

qd =ln ε

ln(1 − π2

2N2 )= −2N2

π2ln ε.

Gauss-Seidel converges twice as fast as Jacobi, and requires less storage.For point SOR the spectral radius depends on the relaxation factor ω, but for

the model problem with optimum values and N = M it can be shown that

ρ =1 − sin π

N

1 + sin πN

giving

qd = −N

2πln ε.

Thus SOR with the optimum requires an order of magnitude less number ofiterations to converge as compared to Gauss-Seidel. Finally similar results canalso be derived for line SOR, and it can be shown that using optimum values

ρ =

(

1 − sin π2N

1 + sin π2N

)2

, qd = − N

2√

2πln ε.

31

6 Solution of differential equations using Cheby-

chev collocation

In this section we will briefly look at the solution of equation using Chebychevcollocation. Spectral methods offer a completely different means for obtaining asolution of a differential equation (ode or pde) and they have a number of majoradvantages over finite difference and other similar methods. One is the conceptof spectral accuracy. With finite difference methods increasing the number ofpoints for a second order method means truncation error reducing by a factor 4.On the other hand with spectral methods the accuracy of the approximation (forsmooth functions) increases exponentially with increased number of points. Thisis referred to as spectral accuracy. There are many different types of spectralmethods (for example the tau method, Galerkin method, Fourier approximation)and a good place to start studying the basic ideas and theory is in the bookby Canuto et al. (1988). Here we will briefly look at one namely, Chebychevcollocation.

With any spectral method the idea is to represent the function as an expansionin terms of some basis functions, usually orthogonal polynomials or in the case ofFourier approximation, trigonometric functions or complex exponentials. WithChebychev collocation a function is represented in terms of expansions usingChebychev polynomials.

6.1 Basic properties of Chebychev polynomials

A Chebychev polynomial of degree n is defined by

Tn(x) = cos(n cos−1 x). (65)

The Chebychev polynomials are eigenfunctions of the Sturm-Liouville equation

d

dx

(√1 − x2

dTk(x)

dx

)

+k2

√1 − x2

Tk(x) = 0.

The following properties are easy to prove

Tk+1(x) = 2xTk(x) − Tk−1(x),

with T0(x) = 1 and T1(x) = x. Also

|Tk(x)| ≤ 1, −1 ≤ x ≤ 1,

Tk(±1) = (±1)k, T ′k(±1) = (±1)kk2,

2Tk(x) =1

(k + 1)

dTk+1(x)

dx− 1

(k − 1)

dTk−1(x)

dx, k ≥ 1.

32

Suppose we expand a (smooth) function u(x) in terms of Chebychev polyno-mials, then

u(x) =∞∑

k=0

ukTk(x), uk =2

πck

∫ 1

−1

u(x)Tk(x)√1 − x2

dx,

where

ck =

2 k = 0, N1 1 ≤ k ≤ N − 1

.

If we define u(θ) = u(cos θ) then

u(θ) =∞∑

k=0

uk cos(kθ),

so that the Chebychev series for u corresponds to a cosine series for u. Thisimplies that if u is infinitely differentiable then the Chebychev coefficients of theexpansion will decay faster than algebraically.

In a collocation method we may represent a smooth function in terms of itsvalues at set of discrete points. Derivatives of the function are approximated byanalytic derivatives of the interpolating polynomial. It is common to work withthe Gauss-Lobatto points defined by

xj = cos(jπ

N) j = 0, 1, ..., N.

Suppose the given function is approximated at these points and we represent theapproxmiate values of the function u(x) at these points x = xj by uj. Here weare assuming that we are working with N + 1 points in the interval −1 ≤ x ≤ 1.In a given differential equation we need to approximate the derivatives at thesenode points. The theory says that we can write

(

du

dx

)

j

=N∑

k=0

Dj,kuk,

where Dj,k are the elements of the Chebychev collocation differentiation matrixD say. The elements of the matrix D are given by

Dp,j =

cp(−1)p+j

cj(xp−xj)p 6= j

− xj

2(1−x2j)

1 ≤ p = j ≤ N − 1

2N2+16

p = j = 1

−2N2+16

p = j = N

33

Thus in matrix form

dw0

dxdw1

dx

.

.dwN

dx

= D

w0

w1

.

.wN

, and

d2w0

dx2

d2w1

dx2

.

.d2wN

dx2

= D2

w0

w1

.

.wN

34

7 Parabolic Equations

In this section we will look at the solution of parabolic partial differential equa-tions. The techniques introduced earlier apply equally to parabolic pdes.

One of the simplest parabolic pde is the diffusion equation which in one spacedimensions is

∂u

∂t= κ

∂2u

∂x2. (66)

For two or more space dimensions we have

∂u

∂t= κ∇2u.

In the above κ is some given constant.Another familiar set of parabolic pdes is the boundary layer equations

ux + vy = 0,

ut + uux + vuy = −px + uyy,

0 = −py.

With a parabolic pde we expect, in addition to boundary conditions, an initialcondition at say, t = 0.

Let us consider (66) in the region a ≤ x ≤ b. We will take a uniform mesh inx with xj = a + j∆x, j = 0, 1, .., N and ∆x = (b − a)/N . For the differencingin time we assume a constant step size ∆t so that t = tk = k∆t.

7.1 First order central difference approximation

We may approximate (66) by

wk+1j − wk

j

∆t

= κ

[

wkj+1 − 2wk

j + wkj−1

∆2x

]

. (67)

Here wkj denotes an approximation to the exact solution u(x, t) of the pde at

x = xj, t = tk. The scheme given in (67) is first order in time O(∆t) andsecond order in space O(∆x)

2. Let us assume that we are given a suitable initialcondition, and boundary conditions of the form

u(a, t) = f(t), u(b, t) = g(t).

The scheme (67) is explicit because the unknowns at level k+1 can be computeddirectly. Notice that there is a time lag before the effect of the boundary data isfelt on the solution. As we will see later this scheme is conditionally stable for

β ≤ 1/2, where β =κ∆t

∆2x

. Note that β is sometimes called the Peclet or diffusion

number.

35

7.2 Fully implicit, first order

A better approximation is one which makes use of an implicit scheme. Theninstead of (67) we have

wk+1j − wk

j

∆t

= κ

[

wk+1j+1 − 2wk+1

j + wk+1j−1

∆2x

]

. (68)

The unknowns at level k + 1 are coupled together and we have a set of implicitequations to solve. If we rearrange (68) then

βwk+1j−1 − (1 + 2β)wk+1

j + βwk+1j−1 = −wk

j , 1 ≤ j ≤ N − 1, (69)

In addition approximation of the boundary conditions gives

wk+10 = f(tk+1), wk+1

N = g(tk+1). (70)

The discrete equations (69),(70) are of tridiagonal form and thus easily solved.The scheme is unconditionally stable . The accuracy of the above fully implicitscheme is only first order in time. We can try and improve on this with a secondorder scheme.

7.3 Richardson method

Considerwk+1

j − wk−1j

2∆t

= κ

[

wkj+1 − 2wk

j + wkj−1

∆2x

]

. (71)

This uses three time levels and has accuracy O(∆2t , ∆

2x). The scheme was devised

by a meteorologist and is unconditionally unstable!

7.4 Du-Fort Frankel

This uses the approximation

wk+1j − wk−1

j

2∆t

= κ

[

wkj+1 − wk+1

j − wk−1j + wk

j−1

∆2x

]

. (72)

This has truncation error O(∆2t , ∆

2x, (

∆t

∆x

)2), and is an explicit scheme. The

scheme is unconditionally stable, but is inconsistent if ∆t → 0, ∆x → 0 but with∆t/∆x remaining fixed.

36

7.5 Crank-Nicolson

A popular scheme is the Crank-Nicolson scheme given by

wk+1j − wk

j

∆t

2

[

wk+1j+1 − 2wk+1

j + wk+1j−1

∆2x

+wk

j+1 − 2wkj + wk

j−1

∆2x

]

. (73)

This is second order accurate O(∆2t , ∆

2x) and is unconditionally stable. (Taking

very large time steps can however cause problems). As can be seen it is also animplicit scheme.

7.6 Multi-space dimensions

The schemes outlined above are easily extended to multi-dimensions. Thus intwo space dimensions a first order explict approximation to

∂u

∂t= κ∇2u,

is

wk+1i,j − wk

i,j

∆t

= κ

[

wki+1,j − 2wk

i,j + wki−1,j

∆2x

+wk

i,j+1 − 2wki,j + wk

i,j−1

∆2y

]

. (74)

This is first order in ∆t and second order in space. It is conditionally stable for

κ∆t

(∆x)2+

κ∆t

(∆y)2≤ 1

2.

If we use a fully implicit scheme we would obtain

wk+1i,j − wk

i,j

∆t

= κ

[

wk+1i+1,j − 2wk+1

i,j + wk+1i−1,j

∆2x

+wk+1

i,j+1 − 2wk+1i,j + wk+1

i,j−1

∆2y

]

. (75)

This leads to an implicit system of equations of the form

αwk+1i+1,j + αwk+1

i−1,j − (2α + 2β + 1)wk+1i,j + βwk+1

i,j−1 + βwk+1i,j+1 = −wk

i,j,

where α = κ∆t/∆2x, β = κ∆t/∆2

y. The form of the discrete equations is verymuch like the system of equations arising in elliptic pdes.

From the computational point of view a better scheme is

wk+ 1

2

i,j − wki,j

∆t/2= κ

wk+ 1

2

i+1,j − 2wk+ 1

2

i,j + wk+ 1

2

i−1,j

∆2x

+wk

i,j+1 − 2wki,j + wk

i,j−1

∆2y

,

wk+1i,j − w

k+ 1

2

i,j

∆t/2= κ

wk+ 1

2

i+1,j − 2wk+ 1

2

i,j + wk+ 1

2

i−1,j

∆2x

+wk+1

i,j+1 − 2wk+1i,j + wk+1

i,j−1

∆2y

, (76)

which leads to a tridiagonal system of equations similar to the ADI scheme. Theabove scheme is second order in time and space and also unconditionally stable.

37

7.7 Consistency revisited

Let us consider the truncation error for the first order central (explicit) scheme(67), and also the Du-Fort Frankel scheme (72).

If u(x, t) is the exact solution then we may write ukj = u(xj, tk) and thus from

a Taylor series expansion

uk+1j = u(xj, tk + ∆t) =

ukj +∆t

(

∂u

∂t

)

j,k

+∆2

t

2

(

∂2u

∂t2

)

j,k

+ O(∆t)3, (77)

and

ukj+1 = u(xj + ∆x, tk) =

ukj + ∆x

(

∂u

∂x

)

j,k

+∆2

x

2

(

∂2u

∂x2

)

j,k

+∆3

x

6

(

∂3u

∂x3

)

j,k

+∆4

x

24

(

∂4u

∂x4

)

j,k

+ O(∆x)5,

(78)

and

ukj−1 = u(xj − ∆x, tk) =

ukj − ∆x

(

∂u

∂x

)

j,k

+∆2

x

2

(

∂2u

∂x2

)

j,k

− ∆3x

6

(

∂3u

∂x3

)

j,k

+∆4

x

24

(

∂4u

∂x4

)

j,k

+ O(∆x)5.

(79)

Using (77, 78,79) and substituting into (67) gives

1

∆t

ukj + ∆t

(

∂u

∂t

)

j,k

+∆2

t

2

(

∂2u

∂t2

)

j,k

− ukj + O(∆t)

3

=

κ

∆2x

ukj + ∆x

(

∂u

∂x

)

j,k

+∆2

x

2

(

∂2u

∂x2

)

j,k

+∆3

x

6

(

∂3u

∂x3

)

j,k

+∆4

x

24

(

∂4u

∂x4

)

j,k

− 2ukj

+ ukj − ∆x

(

∂u

∂x

)

j,k

+∆2

x

2

(

∂2u

∂x2

)

j,k

− ∆3x

6

(

∂3u

∂x3

)

j,k

+∆4

x

24

(

∂4u

∂x4

)

j,k

+ O(∆x)5

from which we obtain

[

∂u

∂t− κ

∂2u

∂x2

]

j,k

= −∆t

2(∂2u

∂t2)j,k +

κ∆2x

12(∂4u

∂4x

)j,k. (80)

Thus (80) shows that as ∆t → 0 and ∆x → 0 the original pde is satisfied, andthe right hand side of (80) implies a truncation error O(∆t, ∆

2x).

38

If we do the same for the Du-Fort Frankel scheme we find that

uk+1j − uk−1

j

2∆t

− κ

[

ukj+1 − uk+1

j − uk−1j + uk

j−1

∆2x

]

=

[

∂u

∂t− κ

∂2u

∂x2+ κ

∆2t

∆2x

∂2u

∂t2

]

j,k

+ O(∆2t , ∆

2x,

∆2t

∆2x

). (81)

This shows that the Du-Fort scheme is only consistent if the step sizes approachzero and also ∆t

∆x→ 0 simultaneously. Otherwise if we take step sizes such that

∆t

∆xremains constant as both step sizes approach zero, then (81) shows that we

are solving the wrong equation.

7.8 Stability

Consider the first order explicit scheme which can be written as

wk+1j = βwk

j−1 + (1 − 2β)wkj + βwk

j+1, 1 ≤ j ≤ N − 1,

with wk0 , w

kN given. We can write the above in matrix form as

wk+1 =

(1 − 2β) ββ (1 − 2β) β0 β (1 − 2β) β

..

β (1 − 2β)

wk, (82)

where wk = (wk1 , w

k2 , ..., w

kN−1)

T . Thus (82) can be written as

wk+1 = Awk.

If we recall the results from the discussion of convergence of iterative methods,the above scheme is stable if and only if ||A|| ≤ 1. Now the infinity norm ||A||∞is defined by

||A||∞ = maxi

N∑

j

|ai,j|

for an N × N matrix A. Thus from (82) we have

||A||∞ = β + |1 − 2β| + β = 2β + |1 − 2β|.So if (1 − 2β) ≥ 0 ie β ≤ 1/2 then

||A||∞ = 2β + 1 − 2β = 1.

If (1 − 2β) < 0 then

||A||∞ = 2β + 2β − 1 = 4β − 1 > 1.

Thus we have proved that the explicit scheme is unstable if β > 1/2.

39

7.8.1 Stability of Crank-Nicolson scheme

The Crank-Nicolson (73) scheme may be written as

−βwj−1,k+1 + (2 + 2β)wj,k+1 − βwj+1,k+1 = βwj−1,k + (2 − 2β)wj,k + βwj+1,k,

j = 1, 2, ..., N − 1 (83)

where β = ∆tκ(∆x)2

. In matrix form this can be written as

(2 + 2β) −β−β (2 + 2β) −β0 −β (2 + 2β) −β

..

−β (2 + 2β)

wk+1 =

(2 − 2β) ββ (2 − 2β) β0 β (2 − 2β) β

..

β (2 − 2β)

wk, (84)

where wk = (wk1 , w

k2 , ..., w

kN−1)

T . Thus (84) is of the form

Bwk+1 = Awk,

where B = 2IN−1−βSN−1, and A = 2IN−1 +βSN−1 and IN is the N×N identitymatrix, and SN−1 is the (N − 1) × (N − 1) matrix

SN−1 =

−2 11 −2 10 1 −2 1

..

1 −2

.

Thus the Crank-Nicolson scheme will be stable if the spectral radius of the matrixB−1A is less than unity, ie ρ(B−1A) < 1. We therefore need the eigenvalues ofthe matrix B−1A. To find these we need to invoke some theorems from linearalgebra. Recall that λ is an eigenvalue of the matrix S, and x a correspondingeigenvector if

Sx = λx.

40

Thus for any integer p

Spx = Sp−1Sx = Sp−1λx = ... = λpx.

Hence the eigenvalues of Sp are λp with eigenvector x. Extending this result, ifP (S) is the matrix polynomial

P (S) = a0Sn + a1S

n−1 + ... + anI

then

P (S)x = P (λ)x, and P−1(S)x =1

P (λ)x.

Finally if Q(S) is any other polynomial in S then we see that

P−1(S)Q(S)x =Q(λ)

P (λ)x.

If we let P = B(SN−1) = 2IN−1 − βSN−1, and Q = A(SN−1) = 2IN−1 + βSN−1

then the eigenvalues of the matrix B−1A are given by

µ =2 + βλ

2 − βλ

where λ is an eigenvalue of the matrix SN−1.Now the eigenvalues of the N × N matrix

T =

a bc a b0 c a b

. . .. . .

c a

can be shown to be given by

λ = λn = a + 2√

bc cosnπ

N + 1, n = 1, 2, .., N.

Hence the eigenvalues of SN−1 are

λn = −4 sin2 nπ

2N, n = 1, 2, ..., N − 1

and so the eigenvalues of B−1A are

µn =2 − 4β sin2 nπ

N

2 + 4β sin2 nπN

n = 1, 2, .., N − 1.

Clearlyρ(B−1A) = max

n|µn| < 1 ∀β > 0.

This proves that the Crank-Nicolson scheme is unconditionally stable.

41

7.9 Stability condition allowing exponential growth

In the above discussion of stability we have said that the solution of

wk+1 = Awk

is stable if ||A|| ≤ 1. This condition does not make allowance for solutions ofthe pde which may be growing exponentially in time. A necessary and sufficientcondition for stability when the solution of the pde is increasing exponentially intime is that

||A|| ≤ 1 + M∆t = 1 + O(∆t)

where M is a constant independent of ∆x and ∆t.

7.10 von-Neumann stability analysis

The matrix method for analysing the stability of a scheme can become difficult formore complicated schemes and in multi-dmensional situations. A very versatiletool in this regard is the Fourier method developed by von Neumann. Here initialvalues at mesh points are expressed in terms of a finite Fourier series, and weconsider the growth of individual Fourier components. A finite sine or cosineseries expansion in the interval a ≤ x ≤ b takes the form

n

an sin(nπx

L), or

n

bn cos(nπx

L),

where L = b − a. Now consider an individual component written in complexexponential form at a mesh point x = xj = a + j∆x

Aneinxπ

L = Aneinaπ

L einjπ∆x

L = Aneiαnj∆x

where αn = nπ/L.Given initial data we can express the initial values as

w0p =

N∑

n=0

Aneiαnp∆x p = 0, 1, ..., N,

and we have N + 1 equations to determine the N + 1 unknowns A. To find howeach Fourier mode develops in time, assume a simple separable solution of theform

wkp = eiαnp∆xeΩtk = eiαnp∆xeΩk∆t = eiαnp∆xξk,

where ξ = eΩ∆t. Here ξ is called the amplification factor. For stability we thusrequire |ξ| ≤ 1. If the exact solution of the pde grows exponentially, then thedifference scheme will allow such solutions if

|ξ| ≤ 1 + M∆t

where M does not depend on ∆x or ∆t.

42

7.10.1 Fully implicit scheme

Consider the fully implicit scheme

wk+1j − wk

j

∆t

= κ

[

wk+1j+1 − 2wk+1

j + wk+1j−1

∆2x

]

. (85)

Letwk

j = ξkeiαnj∆x.

Then substituting into (85) gives

1

∆t

ξk(ξ − 1)eiαnj∆x =κξk+1

∆2x

(e−iαn∆x − 2 + eiαn∆x)eiαnj∆x.

Thus with β = ∆tκ/∆2x

ξ − 1 = βξ(2 cos(αn∆x) − 2) = −4βξ sin2(αn∆x

2).

This gives

ξ =1

1 + 4β sin2(αn∆x

2),

and clearly 0 < ξ ≤ 1 for all β > 0 and for all αn. Thus the fully implicit schemeis unconditionally stable.

7.10.2 Richardson’s scheme

The Richardson scheme is given by

wk+1j − wk−1

j

2∆t

= κ

[

wkj+1 − 2wk

j + wkj−1

∆2x

]

. (86)

Using a von-Neumann analysis and writing

wkp = ξkeiαnp∆x,

gives after substitution into (86)

eiαnp∆xξk−1(ξ2 − 1) = βξk(e−iαn∆x − 2 + eiαn∆x)eiαnp∆x.

This gives

ξ2 − 1 = −4ξβ sin2(αn∆x

2),

where β = 2∆tκ/∆2x. Thus

ξ2 + 4ξβ sin2(αn∆x

2) − 1 = 0.

43

This quadratic has two roots ξ1, ξ2. The sum and product of the roots is givenby

ξ1 + ξ2 = −4ξβ sin2(αn∆x

2), ξ1ξ2 = −1. (87)

For stability we require |ξ1| ≤ 1 and |ξ2| ≤ 1 and (87) shows that if |ξ1| < 1 then|ξ2| > 1, and vice-versa. Also if ξ1 = 1 and ξ2 = −1 then again from (87) wemust have β = 0. Thus the Richardson scheme is unconditionally unstable.

7.10.3 von Neumann analysis in multi-dimensions

For two or more dimensions, the Fourier method extends easily. For instance fortwo space dimensions we would seek solutions of the form

wkp,q = eiαnp∆xeiγnq∆yξk.

44

7.11 Advanced iterative methods

We have looked at a number of simple iterative techniques such as point SOR,line SOR and ADI. In this section we will consider more advanced techniquessuch as the conjugate gradient method, and minimimal residual methods.

7.11.1 Conjugate Gradient Method

This method was first put forward by Hestenes & Stiefel (1952) and was rein-vented again and popularised in the 1970’s. The method has one importantproperty that it converges to the exact solution of the linear system in a finitenumber of iterations, in the absence of roundoff error. Used on its own themethod has a number of drawbacks, but combined with a good preconditioner itis an effective tool for solving linear equations iteratively.

ConsiderAx = b, (88)

where A is an n×n symmetric and positive definite matrix. Recall A is symmetricif AT = A and positive definite if xTAx > 0 for any non zero vector x. Considerthe quadratic form

φ(x) =1

2xTAx − xT b. (89)

We see that

φ(x) =1

2xT (Ax− b) − 1

2xTb,

and from the postive definiteness of A the quadratic form has a unique minimum-12bTA−1b when x = A−1b.We will try and find a solution to (88) iteratively. Let xk be the kth iterate.

Now we know that φ decreases most rapidly in the direction ∇φ, and from (89)we have

∇φ(xk) = b − Axk = rk.

If the residual rk 6= 0 then we can find an α such that φ(xk + αxk) < φ(xk). Inthe method of steepest descent we choose α to minimize

φ(xk + αrk) =1

2(xT

k + αrTk )A(xk + αrk) − (xT

k + αrTk )b

=1

2xT

k Axk + αrTk Axk +

1

2α2rT

k Ark − xTk b − αrT

k b, (90)

where we have used xTk Ark = (xT

k Ark)T = rT

k Axk. Differentiating (90) withrespect to α and equating to zero to find a minimum gives

0 = rTk Axk + αrT

k Ark − rTb,

which gives

αk =rT

k rk

rTk Ark

. (91)

45

The problem with the steepest descent algorithm as constructed above is thatconvergence can be slow especially if A is ill conditioned. The condition numberof a matrix A is defined by

κ(A) =max λi

min λi

where λi is an eigenvalue of the matrix A. The matrix is said to be ill-conditionedif κ is large. The steepest descent algorithm works well if κ is small, but for thesystem of equations arising from a second-order finite difference approximationto Poisson’s equation, κ is very large. In a sense the steepest descent methodis slow because the search directions are too similar. We can try and search forthe new iterate in another direction, say pk. To minimize φ(xk + αpk) we mustchoose as in (91)

αk =pT

k pk

pTk Apk

,

where now the search direction is different from the direction of the gradient.Then

φ(xTk−1 + αkp

Tk ) =

1

2(xT

k−1 + αkpTk )A(xk−1 + αkpk) − (xT

k−1 + αkpTk )b,

= φ(xk−1) −1

2

(pTk rk−1)

2

pTk Apk

,

and φ is reduced provided that pk is not orthogonal to rk−1. Thus the algorithmloops over

rk−1 = b − Axk−1

αk =pT

k pk

pTk Apk

(92)

xk = xk−1 + αkpk

Thus xk is a linear combination of the vectors p1,p2, ...,pk. How can we choosethe pi? It turns out that we can choose them so that each xk solves the problem

min φ(x), x ∈ span(p1,p2, ...,pk),

when we choose α to solve

minα

φ(xk−1 + αpk)

provided that we make the pi A conjugate.Define an n×k matrix Pk by Pk = [p1,p2, ...,pk] . Then if x ∈ span(p1,p2, ...,pk)

there exists a (k − 1) vector y such that

x = Pk−1y + αpk,

46

and

φ(x) = φ(Pk−1y) + αyTPTk−1Apk −

α2

2pT

k Apk − αpkb.

Choosing the pi so thatPT

k−1Apk = 0

decouples the globalpart of the minimization from the part in the pk direction.Then the required global minimization of φ(x) is obtained by choosing

Pk−1y = xk−1, and αk =pT

k b

pTk Apk

=pT

k rk−1

pTk Apk

,

as before.From (??)

pTi Apk = 0, i = 1, 2, ..., k − 1,

ie pk is A to p1,p2, ...,pk−1. Thus the pi are linearly independent and hencexn is a linear combination of p1,p2, ...,pn−1 is the exact solution of

Ax = b.

Condition ?? does not determine pk uniquely. To do this we choose the pk

to be closest to rk−1 because rk−1 gives the most rapid reduction of φ(x).It can be shown that, (eg, see Golub & van Loan) that

• the residuals r1, r2, ..., rk−1 are orthogonal,

• pTi rj = 0, i = 1, 2, .., j,

• rj = rj−1 − αjApj,

• pk = rk−1 + βkpk−1 where

βk = −pTk−1Ark−1

pTk−1Apk−1

.

47

7.11.2 Conjugate Gradient Algorithm

Using these results we obtain the following algorithm

r0 = b − Ax0

p0 = r0

FOR i = 0, ..., n DO

BEGIN

αi =(ri, ri)

(pi,Api)xi+1 = xi + αipi

ri+1 = ri − αiApi

βi =(ri+1, ri+1)

(ri, ri)

pi+1 = ri+1 + βipi

END

In practice xn 6= x because of rounding errors and the residuals are notorthogonal. It can be shown that

||x− xk||A ≤ ||x− x0||A[

1 −√κ

1 +√

κ

]2k

so that convergence can be slow for very large κ. Preconditioning of A is requiredfor the CG method to be useful.

7.11.3 Implementation issues

Note that the algorithm requires the matrix vector product Ap to be computed.The matrix A is usually very large and sparse. As far as implementation of thealgorithm is concerned we do not explicitly construct the matrix A, only the effectof A on a vector. Consider, for example, a second-order discretization applied tosay Poisson’s equation, ∇2φ = f(x, y) with Dirichlet boundary conditions. Thediscretization gives the set of equations

wi+1,j − 2wi,j + wi+1,j

∆2x

+wi,j+1 − 2wi,j + wi,j−1

∆2y

= fi,j,2 ≤ i ≤ N − 12 ≤ j ≤ M − 1

,

andw1,j = fl(j), w(N, j) = fr(j), 1 ≤ j ≤ M,

48

wi,1 = gb(i), wi,M = gt(i), 2 ≤ i ≤ N − 1,

and fl(y), fr(y), gb(x), gt(x) are known functions given on the boundaries. Herewe may define x to the NM × 1 vector of unknowns

x = (w1,1, w2,1.., wN,1, w1,2, ..., wi,j, ..., wN,M)T .

In fact we may identify the qth component xq of x by

xq = wi,j where q = i + N(j − 1).

Then the matrix vector product Ax will have elements dq with q = i + N(j − 1)given by

dq =wi+1,j − 2wi,j + wi+1,j

∆2x

+wi,j+1 − 2wi,j + wi,j−1

∆2y

,

if i, j identifies an interior point of the domain, and

d1+N(j−1) = w1,j, dN+N(j−1) = wN,j, 1 ≤ j ≤ M,

di = wi,1, di+(M−1)N = wi,M , 2 ≤ i ≤ N − 1,

for points on the boundary.One other point to re-emphazise is that in the above analysis/discussion the

matrix A is assumed to be symmetric and positive definite. Usually the systemof equations that we come across are not symmetric. Here one needs to consideralternative algorithms, such as the GMRES method discussed later, or one couldtry and use the conjugate gradient algorithm with the system

ATAx = ATb.

The latter is not recommended as the condition number of the matrix ATAdeteriorates if the original matrix A is ill-conditioned to start with.

7.11.4 CG, alternative derivation

Let A be a symmetric positive definite matrix ie

A = AT , xT Ax > 0

for any non-zero vector x.Also define an inner product

(x,y) = xT y.

We saw that the functional

F (x) =1

2(x,Ax) − (b,x),

49

is minimized when x is a solution of

Ax = b

The steepest descent method is defined by looking at iterations of the form

x(k+1) = x(k) + αkr(k)

wherer(k) = b − Ax(k)

is the residual, and in the direction of the gradient of F (x).The α’s are chosen to minimize F (x(k+1)), and come out to be

αk =(r(k), r(k))

(r(k),Ax(k)).

The convergence of this technique can be slow, especially if the matrix isill-conditioned. By choosing the directions differently we obtain the conjugategradient algorithm.

This works as follows. Let

x(k+1) = x(k) + αkp(k)

where p(k) is a direction vector. For the Conjugate Gradient method we let

p(k) = r(k) + βkp(k−1)

for n ≥ 1, and βk is chosen so that p(k) is A conjugate to x, ie

(p(k),Ap(k−1)) = 0.

This gives

βk = − (r(k),Ap(k−1))

(p(k−1),Ap(k−1)).

As before α is chosen to minimize F (x(k+1)), and come out to be

αk =(p(k), r(k))

(p(k),Ap(k)).

Collecting all the results together gives the CG algorithm

x(0) arbitrary

x(k+1) = x(k) + αkp(k), k = 0, 1, ..

p(k) =

r(k), k = 0r(k) + βkp

(k−1), k = 1, 2,

βk = − (r(k),Ap(k−1))

(p(k−1),Ap(k−1)), k = 1, 2, ..

r(k) = b − Ax(k), k = 0, 1, ..

αk =(p(k), r(k))

(p(k),Ap(k)), k = 0, 1, ..

50

It can be shown that the αk, βk and r(k) are equivalent to

βk = − (r(k), r(k))

(r(k−1), r(k−1)), k = 1, 2, ..

αk = − (r(k), r(k))

(p(k),Ap(k)), k = 0, 1, ..

r(k) = r(k−1) − αk−1Ap(k−1), k = 1, 2, ..

Also the following relations hold

(r(i), r(j)) = 0 i 6= j

(p(i),Ap(j)) = 0 i 6= j

(r(i),Ap(j)) = 0 i 6= j and i 6= j + 1

The residual vectors r(k) are mutually orthogonal and the direction vectors p(k)

are mutually A conjugate. Hence the name conjugate gradient.

7.11.5 Preconditioning

The conjugate gradient method combined with preconditioning is a very effectivetool for solving linear equations iteratively. Preconditioning helps to redistributethe eigenvalues of the iteration matrix and this can significantly enhance theconvergence properties. Suppose that we are dealing with the linear system

Ax = b. (93)

A preconditioning matrix M is such that M is close to A in some sense, andeasily invertible. For instance if we were dealing with the matrix A arising froma discretization of the Laplace operator, then one could take M to be any onematrices stemming from one of the classical iteration schemes such as Jacobi,SOR, or ADI, say. The system (93) can be replaced by

M−1Ax = M−1b. (94)

The conjugate algorithm is then applied to (94) leading to the algorithm

x0 guess r0 = Ax0 − b

solve Mz0 = r0 and put

p0 = z0

FOR k = 1, n

αk =(zk−1, rk−1)

(pk−1,Apk−1)

51

xk = xk−1 + αkpk−1

rk = rk−1 − αkApk−1

Solve Mzk = rk

βk =(zk, rk)

(zk−1, rk−1)

pk = zk + βkpk−1

END

52