WIP: Improve code generation for closure ref counting
The reference counting used with closures generates very poor code on arm64. As a result the closures test takes >150 min to run to completion on a particular arm64 platform.
This small set of patches updates the ref counting used in closure to generate tighter code which results in significant speeds. The closures test now takes ~20 min on the same hardware.
Closes: #1316