main.bash uses whole nodes, not scattered cores
Changing from: (256 GB memory overall) To: On ARC4, this is 80 cores and 384 GB (40 cores and 192 GB per node). Tests find large performance benefits (up to 40% speed up) from reduced node communication, memory and network latency. This approach is more efficient, general, and reproducible. If the user requires specific domain decompositions then they can request node/core/memory allocations using: However, this approach can be wasteful (e.g., 16 idle cores). The downside to using whole nodes is the queue times can be longer, though this shouldn't be too much of an issue for 1/2 nodes.