Database Optimization techniques for amateurs

Asked 25/4, 2010 at 21:18 Answered 25/4, 2010 at 21:22

Can we get a list of basic optimization techniques going (anything from modeling to querying, creating indexes, views to query optimization). It would be nice to have a list of these, one technique per answer. As a hobbyist I would find this to be very useful, thanks.

And for the sake of not being too vague, let's say we are using a maintstream DB such as MySQL or Oracle, and that the DB will contain 500,000-1m or so records across ~10 tables, some with foreign key contraints, all using the most typical storage engines (eg: InnoDB for MySQL). And of course, the basics such as PKs are defined as well as FK contraints.

Ruffle answered 25/4, 2010 at 21:18 Comment(1)

I wish it was getting more answers though. – Ruffle 26/4, 2010 at 14:52

Learn about indexes, and use them properly. Generally speaking*, follow these guidelines:

Every table should have a clustered index
Fields used for filters and sorts are good candidates for indexing
More selective fields are better candidates for indexing
For best performance on crucial queries, design "covering indexes" for those queries
Make sure your indexes are actually being used, and remove those that aren't
If your table has 15 fields, and you make 15 indexes, each with only a single field, you're doing it wrong :)

*There are some exceptions to these rules if you know what you're doing. My experience is Microsoft SQL Server, but I would presume most of this advice would still apply to a different RDMS.

Stopper answered 25/4, 2010 at 21:18 Comment(2)

Some caution should be taken when using clustered indexes on a table that could become large. When you insert or update a row, it is possible the clustered index could cause a reordering of the table that could pose a performance hit. – Strapper 16/12, 2010 at 23:41

Unless your primary keys are GUIDS, then a clustered index on those is a bad idea – Experimentalize 3/8, 2018 at 17:35

IMO, by far the best optimization is to have the data model fit the problem domain for which it was built. When it does not, the resulting symptom is difficult-to-write or convoluted queries in order to get the information desired and that typically rears itself when reports are built against the database. Thus, in designing a database it helps to have an idea as to the types and nature of the information, such as reports, that the users will want from the system.

Chaille answered 25/4, 2010 at 21:18 Comment(4)

'resulting system' perhaps? Not 'resulting symptom?' – Indue 25/4, 2010 at 22:52

@Indue - I think I've stated it correctly. How do you know the data model does not fit the problem domain? The symptoms are complicated or difficult to write queries. – Chaille 25/4, 2010 at 23:14

I see. I misread. I thought you were saying "the resulting system is difficult to write", and now I see you meant "the resulting symptom is difficult-to-write-queries." My bad. I thought it was a typo. – Indue 26/4, 2010 at 0:6

Yeah. By using 'symptom', he was essentially saying that certain tell-tale signs (such as convoluted queries) are often indicative of not fitting the data model to the problem domain. (+1 btw) – Handmaid 15/5, 2010 at 4:3

When talking database design, check out the database normalization, e.g. the wikipedia article: Normal forms.

If you have a good design and still you need to optimize for performance, try Denormalisation.

If you have specific needs which are not covered by relational model efficiently, look at other models covered by the term NoSQL.

Heeley answered 25/4, 2010 at 21:22 Comment(1)

This is fantastic advice - normalization is NOT always the answer! – Mastery 18/5, 2010 at 11:32

Some query/schema optimizations:

Be mindful when using DISTINCT or GROUP BY. I find that many new developers will use DISTINCT in places where it really is not needed or could be rewritten more efficiently using an Exists statement or a derived query.
Be mindful of Left Joins. All too often I find new SQL developers will ignore the schema in place and use Left Joins where they really are not necessary. For example:

Select
From Orders
    Left Join Customers
        On Customers.Id = Orders.CustomerId

If Orders.CustomerId is a required column, then it is not necessary to use a left join.

Be a student of new features. Currently, MySQL does not support common-table expressions which means that some types of queries are cumbersome and probably slower to write than they would be if CTEs were supported. However, that will not be true forever. Keep up on new syntax features in MySQL which might be used to make existing queries more efficient.
You do not have to use surrogate keys everywhere. There might be tables better suited to an intelligent key (e.g. US State abbreviations, Currency Codes etc) which would enable developers to avoid additional joins in many cases.
If possible, find ways of archiving data to an OLAP or reporting server. The smaller you can make the production data, the faster it will run.

Chaille answered 25/4, 2010 at 21:18 Comment(0)

A design that concisely models your problem is always a good start. Overgeneralizing the data model can lead to performance problems. For example, I've heard reports of projects striving for uber-flexibility that use the RDBMS as a dumb "name/value" store - and resulting performance was appalling.

Once a good design is in place, then use the tools provided by the RDBMS to help it achieve good performance. Single field PKs (no composites), but composite business keys as an index with unique constraint, use of appropriate data types, e.g. using appropriate numeric types for numeric values rather than char or similar. Physical attributes of the hardware the RDBMS is running on should also be considered, since the bulk of query time is often disk I/O - but of course don't take this for granted - use a profiler to find out where the time is going.

Depending upon the update/query ratio, materialized views/indexed views can be useful in improving performance for slow running queries. A poor-man's alternative is to use triggers to invoke a procedure that populates the table with a result of a slow-running, infrequently-changed view.

Query optimization is a bit of a black art since it is often database-dependent, but some rules of thumb are given here - Optimizing SQL.

Finally, although possibly outside the intended scope of your question, use a good data access layer in your application, and avoid the temptation to roll your own - there are surely tested and performant implementations available for all major languages. Use of caching at the data access layer, middle tier and application layer can help improve performance considerably.

Dorice answered 25/4, 2010 at 21:18 Comment(0)

Do use less query whenever possible. Use "JOIN", and group your tables so that a single query gives your results.

A good example is the Modified Preorder Tree Transversal (MPTT) to get all of a tree node parents, ordered, in a single query.

Submarine answered 25/4, 2010 at 21:18 Comment(0)

Take a holistic approach to optimization.

Consider the impact of slow disks, network latency, lack of memory, and server load.

Mastery answered 25/4, 2010 at 21:18 Comment(0)

Recommended topics

Hot tags