It's been over 15 years since I was involved in compiler construction, so I might not remember all the details.
Basically when going out of SSA, you need to generate load/store instructions into virtual registers at the end of all blocks leading to phi-nodes in successor blocks. This will lead to generating a number of virtual registers, which is usually higher than the available registers on the actual machine. So you apply register allocation on the local variables to come up with the real registers, spilling on the stack those values that don't fit.
For a stack-based machine, just don't do the last step. You end up with roughly the same number of virtual registers as phi-nodes in the compiled function (the algorithm is actually not trivial, a good starting point is the paper Efficiently Computing Single Static Assignment Form and the Control Dependence Graph by Ron Cytron, Jeane Ferrante, et. al.) These virtual registers will be your local variables.
When reading a value from a virtual register (local variable) to be used by an operation, first use an instruction to push it on the stack. The Java VM iload index
instruction is such an example: it loads the local variable at index and pushed its value on the stack. (see https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html#jvms-6.5.iload)
When writing a value into a local variable, you pop it from the stack. See Java VM istore index
instruction (see https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html#jvms-6.5.istore).
For example, if after going out of SSA you need to encode
local 5 = MUL local[2], local[4]
then you need to generate something like this:
ILOAD 4
ILOAD 2
MUL
ISTORE 5
For CIL bytecode, you have the equivalent ldarg
and starg
operations.
Of course, there is a lot of room for optimizations, to avoid redundant load/stores.