The problem of supporting multiple versions of the same system at once is at the root of this question, and it's one I'm actively pondering. Let's go one piece at a time!
What is a feature toggle?
Sorry other answer! Feature toggles need to do one thing only: decouple feature releases from deployments. How a toggle does so for any arbitrary change is the tricky bit, as this question identifies.
Generally, the system powering a toggle has to satisfy two conditions to be considered complete:
- Backward compatibility. Older features must still work regardless of this toggle's state. This includes the old behaviour of this feature!
- Forward compatibility. New features have to run properly regardless of this toggle's state.
For any non-trivial change, this is... challenging. That's one word for it anyway 🙄
Example 1: UI styling
Pretend you have a form whose input styles aren't accessible. Your job is to fix the styling to be accessible, but you use trunk-based development and your team expects all work to be integrated with your project's only branch, master
, by the end of each day.
This change is extremely isolated. It affects:
- A single form
- No functionality (i.e. no business logic has to change)
- No non-functional requirements (this won't introduce scaling issues, for example)
As a result, switching which CSS stylesheet is loaded based on the state of a toggle is enough:
- Backward compatibility: There's no functionality affected, so old stuff should work by default. Turning the flag off requires no special behaviour, so simple conditional logic is all that's needed.
- Forward compatibility: This is trickier, but assuming the form is properly DRYed out, any new inputs will automatically inherit the styles indicated by the flag. Assuming good separation of concerns, changes to these styles won't affect any other components and vise-versa.
Example 2: A new form field
I hope you enjoyed that beautiful simplicity, because we're now in trouble. This is exactly the kind of case OP is describing.
This change spans multiple systems. It affects (at a minimum):
- The UX form
- The back-end's data API, since there's a new field
- The database layer, since there's a new field
Such a small difference makes things much more challenging. We'll go system by system here.
UX form:
- Backward compatibility: Identical to the previous example. If this is truly a new field, then old code shouldn't care. Any code path that does has to be covered by this feature toggle.
- Forward compatibility: The major concern here is that a field could exist one day, then be gone the next when the switch is flipped back. New logic may require a default be set in the front-end state management, or be provided by the back-end.
Data API:
- Backward compatibility: This field represents a change to the API's contract. In order to support certain use-cases (validation comes to mind) defaults may need to be provided if the toggle is off. Otherwise, old stuff should be okay though YMMV
- Forward compatibility: Once again, the tricky part comes down to making sure there's something for new code to consume if this toggle gets turned off. In the worst cases, special conditional logic may need to be coded into new features to handle the case where the flag is turned off
Database layer:
- Backward compatibility: At a database level, backwards compatibility requires us to only add optional fields. Requiredness can be enforced elsewhere in our application, but a schema can't be considered backwards compatible if it adds a new required field. Old inserts and updates will immediately break. So, your data migration adds a new, optional field. Easy?
- Forward compatibility: Okay, new code comes in. Should it expect the field or not? If it must, this is where defaults come into play. Note I'm not specifying what should declare the default, since this will depend on the application, but something has to be there. In the worst cases, special conditional logic will have to cover the possibility that the field is the default.
This sounds like madness! How do I keep it manageable?
There are three major principles to follow here to keep things sane:
- Keep your changes as small as possible, and refactor pain points that prevent small changes. This means more flags, but less complexity. Incremental improvement is the name of the game.
- Consider long-lived flags to be critical technical debt. Your flags shouldn't last long in production. Have clear rules about when a change is "stable", and a clear window to clean up related flags. Cleaning up your flags as a regular part of your maintenance is essential for controlling how many code paths need to be supported.
- Don't be dogmatic! Throw away extremes, and use long-lived feature branches when you have to. Some changes are too complex already, and the extra overhead of the flagging isn't worth it. If you follow points #1 and #2, this should happen less and less.
Best of luck!