Dekel; Nassimi & Sahni Parallel Implementation (Topological Sorting Topological Sorting)
(Their n*n*n cube setup seems to only require each processor to keep track of one entry A(i, j), B(i, j) and propagates the results across the structure, only requiring O(1) auxiliary space per processor. Comparison sorting can be done easily with O(1) auxiliary space per processor.)
Model of Computation
PRAM (not sure what type), SIMD computers?