RDBMS use a completely different approach to data storage than object or key-value databases.
The relational model assumes you don't know what data will be needed in the future, or how data will be accessed in the future. This has proven to be a pretty reliable assumption in my experience.
That's one reason a SQL dbms will let you add indexes as they're needed, and let you drop indexes that have proven useless. It will let you add constraints as they become known--constraints that sometimes require adding more tables--and drop constraints as the requirements change. It will let you add columns as you discover more things that would be good to know. It will let you replace tables with views and replace views with tables. Some dbms will let you create materialized views--their impact on query speed can be dramatic, and their impact on disk usage, devastating.
Useful databases extend their reach. A SQL database, designed according to the relational model, makes it relatively easy to add features nobody dreamed of during the initial design, and without crushing other parts of the system. So they're called often called upon to do things their initial designers didn't imagine.
All of these things
- adding and dropping indexes over time,
- adding and dropping constraints over time,
- adding and dropping columns over time,
- adding and dropping tables over time,
make any estimate of disk usage look like a waste of time. Any one of them alone can drastically change the disk space required for a database.
You can calculate the space required by a row and a page fairly accurately. (Try Google for "YourDBMSname row layout" and "YourDBMSname page layout".) But when you try to multiply by the number of rows required you have to estimate the number of rows. That puts you at the big end of what Steve McConnell calls "the cone of uncertainty".
If you haven't measured disk usage in multiple projects over time at your own company, estimating the impact of those bullet points above is just guessing.
The last Fortune 100 company I worked for had an operational database that had been in production since the 1970s. Hundreds of applications, written in more than 25 programming languages over the course of 40 years hit that thing every day. (I think it was built on IBM's IMS originally; today it runs on Oracle.)
Even just a few years ago, nobody there imagined that their database would be used to translate engineering drawings and bills of materials into Chinese, and also to produce the customs documents they'd need to get finished products out of China. Implementing those new features required storing additional data about every part and about every design document in their live inventory. Early in that project, our estimates were pretty far off. That's the big end of the cone. (We estimated several things, but not disk usage. We were required to succeed, so whatever design I came up with, somebody would be required to supply the needed disk space.) But when we went live, we knew the exact value for every estimate, because we'd already done the work. (That's the narrow end of the cone.)
So, how do you mitigate the risk of guesswork in a database design and deployment environment? Take a lesson from 1972.
Build a prototype, and measure it.
Chemical engineers learned long ago that a process that works in the
laboratory cannot be implemented in a factory in only one step. An
intermediate step called the pilot plant is necessary to give
experience in scaling quantities up and in operating in nonprotective
environments. . . .
. . . Project after project designs a set of algorithms and then plunges into construction of customer-deliverable software on a schedule that demands delivery of the first thing built. . . .
The management question, therefore, is not whether to build a pilot system and throw it away. You will do that. The only question is whether to plan in advance to build a throwaway, or to promise to deliver the throwaway to customers.
Fred Brooks, Jr., in The Mythical Man-Month, p 116.