Preferred way of retrieving row with multiple relating rows

C

5

0

I'm currently hand-writing a DAL in C# with SqlDataReader and stored procedures. Performance is important, but it still should be maintainable...

Let's say there's a table recipes

(recipeID, author, timeNeeded, yummyFactor, ...)

and a table ingredients

(recipeID, name, amount, yummyContributionFactor, ...)

Now I'd like to query like 200 recipes with their ingredients. I see the following possibilities:

Query all recipes, then query the ingredients for each recipe.
This would of course result in maaany queries.
Query all recipes and their ingredients in a big joined list. This will cause a lot of useless traffic, because every recipe data will be transmitted multiple times.
Query all recipes, then query all the ingredients at once by passing the list of recipeIDs back to the database. Alternatively issue both queries at one and return multiple resultsets. Back in the DAL, associate the ingredients to the recipes by their recipeID.
Exotic way: Cursor though all recipes and return for each recipe two separate resultsets for recipe and ingredients. Is there a limit for resultsets?

For more variety, the recipes can be selected by a list of IDs from the DAL or by some parametrized SQL condition.

Which one you think has the best performance/mess ratio?

Cowshed answered 7/2, 2010 at 17:44 Comment(0)

S

2

If you only need to join two tables and an "ingredient" isn't a huge amount of data, the best balance of performance and maintainability is likely to be a single joined query. Yes, you are repeating some data in the results, but unless you have 100,000 rows and it's overloading the database server/network, it's too soon to be optimizing.

The story is a little bit different if you have many layers of joins each with decreasing cardinality. For example, in one of my apps I have something like the following:

Event -> EventType -> EventCategory
                   -> EventPriority
                   -> EventSource   -> EventSourceType -> Vendor

A query like this results in a significant amount of duplication which is unacceptable when there are 100k events to retrieve, 1000 event types, maybe 10 categories/priorities, 50 sources, and 5 vendors. So in that case, I have a stored procedure that returns multiple result sets:

All 100k Events with just EventTypeID
The 1000 EventTypes with CategoryID, PriorityID, etc. that apply to these Events
The 10 EventCategories and EventPriorities that apply to the above EventTypes
The 50 EventSources that generated the 100k events
And so on, you get the idea.

Because the cardinality goes down so drastically, it is much quicker to download only what is needed here and use a few dictionaries on the client side to piece it together (if that is even necessary). In some cases the low-cardinality data may even be cached in memory and never retrieved from the database at all (except on app start or when the data is changed).

The determining factors in using an approach such as this are a very high number of results and a steep decrease in cardinality for the joins, in other words fanning in. This is actually the reverse of most usages and probably the reverse of what you are doing here. If you are selecting "recipes" and joining to "ingredients", you are probably fanning out, which can make this approach wasteful, especially if there are only two tables to join.

So I'm just putting it out there that this is a possible alternative if performance becomes an issue down the road; at this point in your design, before you have real-world performance data, I would simply go the route of using a single joined result set.

Seppala answered 7/2, 2010 at 18:38 Comment(2)

I should probably have pointed out that the only reason I sometimes use this approach is because I'm mapping it to a domain model where many Event instances share the exact same EventType reference. This is only worth doing when you (a) have a separate domain model and (b) don't have to rebuild all of the redundant data on the front-end. – Seppala 7/2, 2010 at 19:1

Thanks for your detailed answer. I always felt a bit weird writing the code that neglects a considerable amount of the returned data. In all cases, I assume it's most important to reduce the number of queries. The round-trip- and initialization-time should be a large factor in such data-retrivals. – Cowshed 7/2, 2010 at 19:27

S

2

The best performance/mess ratio is 42.

On a more serious note, go with the simplest solution: retrieve everything with a single query. Don't optimize before you encounter a performance issue. "Premature optimization is the root of all evil" :)

Surakarta answered 7/2, 2010 at 18:16 Comment(1)

42 sounds good enough to me. This explains the slow site of the illuminati. ;) Actually I try to optimize few important queries, which pose a bottleneck, eventually. Also, I want to do it right, next time I write the stuff from the start. – Cowshed 7/2, 2010 at 19:9

S

2

If you only need to join two tables and an "ingredient" isn't a huge amount of data, the best balance of performance and maintainability is likely to be a single joined query. Yes, you are repeating some data in the results, but unless you have 100,000 rows and it's overloading the database server/network, it's too soon to be optimizing.

The story is a little bit different if you have many layers of joins each with decreasing cardinality. For example, in one of my apps I have something like the following:

Event -> EventType -> EventCategory
                   -> EventPriority
                   -> EventSource   -> EventSourceType -> Vendor

A query like this results in a significant amount of duplication which is unacceptable when there are 100k events to retrieve, 1000 event types, maybe 10 categories/priorities, 50 sources, and 5 vendors. So in that case, I have a stored procedure that returns multiple result sets:

All 100k Events with just EventTypeID
The 1000 EventTypes with CategoryID, PriorityID, etc. that apply to these Events
The 10 EventCategories and EventPriorities that apply to the above EventTypes
The 50 EventSources that generated the 100k events
And so on, you get the idea.

Because the cardinality goes down so drastically, it is much quicker to download only what is needed here and use a few dictionaries on the client side to piece it together (if that is even necessary). In some cases the low-cardinality data may even be cached in memory and never retrieved from the database at all (except on app start or when the data is changed).

The determining factors in using an approach such as this are a very high number of results and a steep decrease in cardinality for the joins, in other words fanning in. This is actually the reverse of most usages and probably the reverse of what you are doing here. If you are selecting "recipes" and joining to "ingredients", you are probably fanning out, which can make this approach wasteful, especially if there are only two tables to join.

So I'm just putting it out there that this is a possible alternative if performance becomes an issue down the road; at this point in your design, before you have real-world performance data, I would simply go the route of using a single joined result set.

Seppala answered 7/2, 2010 at 18:38 Comment(2)

I should probably have pointed out that the only reason I sometimes use this approach is because I'm mapping it to a domain model where many Event instances share the exact same EventType reference. This is only worth doing when you (a) have a separate domain model and (b) don't have to rebuild all of the redundant data on the front-end. – Seppala 7/2, 2010 at 19:1

Thanks for your detailed answer. I always felt a bit weird writing the code that neglects a considerable amount of the returned data. In all cases, I assume it's most important to reduce the number of queries. The round-trip- and initialization-time should be a large factor in such data-retrivals. – Cowshed 7/2, 2010 at 19:27

R

0

One stored proc that returns 2 datasets: "recipe header" and "recipe details"?

This is what I'd do if I needed the data all at once in one go. If I don't need it in one go, I'd still get 2 datasets but with less data.

We've found it slightly easier to work with this in the client rather than one big query as Andomar suggested, but his/her answer is still very valid.

Richela answered 7/2, 2010 at 18:31 Comment(0)

P

0

I would look at the bigger picture - do you really need to retrieve ingredients for 200 recipes? What happens when you have 2,000?

For example, if this is in a web page I would have the 200 recipes listed (if not less because of paging), and when the user clicked on one to see the ingredient then I would get the ingredients from the database.

If this isn't doable, I would have 1 stored proc that returns one DataSet containing 2 tables. One with the recipes and the second with the list of ingredients.

Pleo answered 7/2, 2010 at 18:49 Comment(1)

Of course, in many cases it would result in a better performance to retrieve the additional data on damand. I'm doing this already, when it's reasonable. But I'm reluctant to write extra DAL functionality for every processing function. – Cowshed 7/2, 2010 at 19:13

P

0

"I'm currently hand-writing a DAL in C#..." As a side note, you might want to check out the post: Generate Data Access Layer Methods From Stored Procs. It can save you a lot of time.

Pleo answered 7/2, 2010 at 18:53 Comment(0)

Recommended topics

Hot tags