In the case of flow simulations (CFD) the computational problem can be defined on a 2D or 3D array (NxM, NxMxL) type Virtual Cellular Machine while the operation of each processing element is described as a mathematical expression, acyclic data flow graph or UMF diagram. The problem to be solved is how to map the computational problem on a virtual array to a given physical FPGA where area/processor (logic slices, DSP slices), on-chip memory (BRAM) and off-chip memory bandwidth are limited. To conserve memory bandwidth the arrays are computed serially as a 1D stream of cells. This requires some on-chip memory to store belts from the array in 2D or plates in 3D and further reduce bandwidth. Depending on the complexity of the operator a small amount of physical execution units can be implemented The operator can be decomposed into small basic blocks which use either the logic resources (such as adders) or the dedicated resources (embedded multipliers) of the FPGA. The result of this process is a Physical Cellular Machine optimized for the given application where the main components are the specialized on-chip memory architecture and the specialized execution unit (CLBs and DSP slices). There is a new computational aspect in this problem formulation: the role of the geometric address of a given memory or processor and the precedence of locality.