Frankly, in the case of HTML I'd just store... the html - it is kinda pre-serialised! However, to answer the question:
In protobuf-net v2, you can configure a TypeModel at runtime, which allows everything you can do via attributes and a few other tricks (in v2 the attributes just help steer the model if nothing else is specified). And because you can do all this at runtime, you dont need to change the type - and hence can apply it to models outside your control. The default model instance is RuntimeTypeModel.Default, and you can add types to the model, and configure each MetaType individually (which maps to Type). This allows you to tell it what members (properties/fields), sub-types, callbacks, etc to apply.
If that gets too complex, you can also specify a "surrogate", which allows you to configure a simple DTO, and use a standard conversion operator (explicit or implicit) to change between the complex model and the simple DTO model.
For info, the significance of the default model is: that is what Serializer.*
uses. However, if you use the TypeModel instance to perform serialization/deserialization you can have multiple differently configured models for the same types.
I can't remember the full details of HTML-agility-pack, but those are the main options available for your scenario via protobuf-net.