Large parallel processing revisited: A second tutorial
An efficient parallel LU algorithm that is suitable for a local-memory MIMD (multiple input multiple data) computer, such as an array of transputers, is described. A graphical approach is used to elucidate the algorithm. The results of a theoretical timing analysis are given. Some methods for reducing the communication load, by intelligent exploitation of the capabilities of certain parallel hardware, are described. Timing results for a code implementing the algorithm on a transputer array are given and compared to results for a parallel conjugate-gradient algorithm. The stability of LU decomposition is discussed. Pivoting is briefly reviewed, although the algorithm described in this paper does not implement this, at present. PARNEC, a parallel version of NEC2, is described. The parallel generation of the matrix elements is discussed, and a solution for NEC2 presented. Results for a preliminary test of the accuracy of PARNEC are given. The choice of a CG or LU solver for the solution of the system of linear equations generated by method-of-moment formulation and new parallel hardware are discussed.