The Google Home does very very little on-device processing. Sending out local network calls is not one of the things it does. Almost all processing, including IoT controls through the Smart Home API, are done through cloud-based services.
Update
I can't answer "why" it doesn't do this, since I'm not one of the engineers that built it, but I can make a lot of guesses about why.
For starters - it increases the complexity of the software and hardware on the device dramatically. Right now, the device is really little more than a microphone and a speaker, with a little logic to detect the hotword and then stream everything else to the server, and then get a result back and play it. Most of the rest of the code is likely to handle setup and configuration.
If the device has to also be a general purpose IoT hub, then it needs software and hardware for Bluetooth and possibly other signaling systems. It needs to be able to keep track of the state of other devices on the network and manage that in between power cycles of the device (or even handle interruptions in power for the device itself). Some of the implications of that may need to open up the networking on the device to receive messages, not just send them. It has to have more extensive network configuration - to understand what local networking is and not just what the local router is and how to deal with that configuration (and that configuration when it changes). These are all possible, to be sure, but increase the complexity and, in some cases, lower the security of a device.
And that might be reasonable... if there was significant value in doing so. But you've already stipulated in the question that the voice processing could be done in the cloud, so once commands are sent to the cloud and parsed there - why not also do all of the above (device and state tracking, changing, etc) in the cloud? Particularly since most IoT devices maintain cloud servers anyway because people also want to be able to control or monitor their home devices when they aren't on their home LAN. Having a dual set of commands (some for when you're local, and some when you're not) does make sense in some cases - but also dramatically increases the complexity of both the controller and devices, so most just rely on the cloud, again.
So while I understand why some people would like to have a nice little system that can just sent your play local REST server a command now and then, the reality is that to do this for a consumer system isn't that reasonable.
If you really wanted a system that can do this - you can continue in the hobbyist spirit and build something with the Assistant SDK and your favorite IoT platform.