Sorry if this answer isn't perfect, but with the age of the question and how many people are interested, I thought I'd take a shot and hopefully help someone if not the OP.
I'm not sure how VML/Word handles clipboard data. If it places multiple formats in the Windows Clipboard, one with the HTML you want, and one with the VML format, then you are in luck and this should work. If not, then maybe you could use this to clean up the code on insert at least.
You'll want to look in to IDocHostUIHandler::TranslateAccelerator. You need to implement IDocHostUIHandler if you aren't already. You use ICustomDoc::SetUIHandler to register it, after the HTML document is loaded (can be an empty page if you use that).
Inside TranslateAccelerator you need to watch for nCmdID == IDM_PASTE
. This is fired before the user pastes something to the HTML control, and you can modify the clipboard contents before the paste occurs.
You can use something like GetClipboardData(RegisterClipboardFormat("HTML Format")), to get the HTML format from the clipboard. You can use SetClipboardData to replace the clipboard data.
For your usage, if you see that there are multiple clipboard formats after copying from Word, you can simply delete one of the formats, the one you do not want. That way, when the HTML control completes the paste, it will only use the format you want.
I have code examples if needed, but they are part of a large project and using Borland's VCL library for some parts. My code checks for CF_BITMAP format in the clipboard and converts to HTML Format using a PNG file instead. So that users that paste a screen capture to the control get a smaller PNG image instead of a huge BMP file. The concept is about the same as what you want though.