GCC cannot compile the existing assembly here on ARMv7-M,
claiming impossible constraints. It is actually possible to
compile if the input arguments (addresses and sizes) are
first moved to a high register so as not to conflict with
the use of r0-r7 in ldm/stm -- this is exactly what GCC does
for ARMv6, but it won't do it on ARMv7-M for some reason.
We can get a result similar to the ARMv6 code by manually
moving the inputs into temporaries, but the generated code
is a actually a bit smaller on ARMv7-M if the r0-r7 block is
shifted up to r3-r10. This only works since ARMv7-M supports
the 32-bit Thumb encoding -- 16-bit Thumb can't represent an
ldm/stm instruction of this type.
It's worth #ifdef'ing the code because although the ARMv7-M
version works on ARMv6 too, it spills a lot of registers on
the stack even though register use is mostly similar.
Change-Id: I9bc8b5c76e198aecfd0a0e7a2158b1c00f82c4df