Parameterize an SQL IN clause

D

41

1138

How do I parameterize a query containing an IN clause with a variable number of arguments, like this one?

SELECT * FROM Tags 
WHERE Name IN ('ruby','rails','scruffy','rubyonrails')
ORDER BY Count DESC

In this query, the number of arguments could be anywhere from 1 to 5.

I would prefer not to use a dedicated stored procedure for this (or XML), but if there is some elegant way specific to SQL Server 2008, I am open to that.

Deluxe answered 3/12, 2008 at 16:16 Comment(3)

For MySQL, see MySQL Prepared statements with a variable size variable list. – Voltaire 4/4, 2012 at 12:2

Similar: Passing array parameters to a stored procedure, PreparedStatement IN clause alternatives. – Sadi 29/4, 2019 at 22:33

In new SQL Server 2016 (13.x) and later have in built function STRING_SPLIT. For instance SELECT [Value] FROM STRING_SPLIT('ruby,rails,scruffy,rubyonrails',',') – Manducate 16/6, 2022 at 8:17

L

324

Here's a quick-and-dirty technique I have used:

SELECT * FROM Tags
WHERE '|ruby|rails|scruffy|rubyonrails|'
LIKE '%|' + Name + '|%'

So here's the C# code:

string[] tags = new string[] { "ruby", "rails", "scruffy", "rubyonrails" };
const string cmdText = "select * from tags where '|' + @tags + '|' like '%|' + Name + '|%'";

using (SqlCommand cmd = new SqlCommand(cmdText)) {
   cmd.Parameters.AddWithValue("@tags", string.Join("|", tags);
}

Two caveats:

The performance is terrible. LIKE "%...%" queries are not indexed.
Make sure you don't have any |, blank, or null tags or this won't work

There are other ways to accomplish this that some people may consider cleaner, so please keep reading.

Lite answered 3/12, 2008 at 16:41 Comment(17)

yeah, it is 10x slower, but it's very easily parameterized, heh. Not sure how much faster it would be to call fnSplit() as proposed by Longhorn213's answer – Deluxe 3/12, 2008 at 16:48

Yes, this is a table scan. Great for 10 rows, lousy for 100,000. – Car 3/12, 2008 at 16:48

I agree...this is a good solution for a small table. Doesn't require any temp tables or a bunch of parameters. – Tapeworm 3/12, 2008 at 16:51

Longhorn213's fnSplit function would be called once, taking a little time, but is then able to take advantage of an index on Tags.Name. Joel's solution probably requires a full scan of Tags, which may be slow for a big table. Having said that, I do use Joel's method myself for small tables. – Moersch 3/12, 2008 at 16:51

Make sure you test on tags that have pipes in them. – Hypnotic 3/12, 2008 at 17:16

This doesn't even answer the question. Granted, it's easy to see where to add the parameters, but how can you accept this a solution if it doesn't even bother to parameterize the query? It only looks simpler than @Geognosy Brackett's because it isn't parameterized. – Spleeny 3/12, 2008 at 20:14

tvanfosson: Good point. You're not using parameters, but actually still just strings... – Castanets 3/12, 2008 at 20:37

"Granted, it's easy to see where to add the parameters" it's like the np-complete thing.. we've reduced the query to a typical form which is trivial to parameterize. The problem with IN is the inherent variability, how many INs can we have? 50? 1000? 10000? – Deluxe 3/12, 2008 at 21:10

Apparently in MS-SQL the number is so large that they don't say what it is. If you're getting upwards of 10K, then the table join solution is probably better. This particular query is just going to keep getting worse and worse as the number increases. Imagine scanning a 50K char string each time. – Spleeny 5/12, 2008 at 17:40

In this case, we're obviously talking about tags, and the SO system limits you to 5 total, so it probably won't be that bad. – Hypnotic 11/12, 2008 at 14:26

@Joel - there's actually 2 inefficiencies in this solution. The parsing of the char string (the '|' + @tags + '|'), and the size of the table - since this needs a table scan. The former shouldn't be an issue with SO's tag system, but the latter certainly could be (there's about 16500 tags now) – Grande 18/12, 2008 at 14:5

I've used this method with success in the past. I've also tested it. On a "typical" table of 500k rows, this method takes about four seonds. You can optimized by pre-creating the piped parameter and storing that as a field. Doing so reduces the query time by about half. – Ommiad 18/12, 2008 at 17:29

@Joel: Clever, and it works. So what if it's going to do an index scan, performance only has to be "good enough". Not knowing the constraints on the Name column, I'm going to consider the edge cases (null, empty string, contains pipe character), as well as the obscure corner case, a Name value containing a wildcard e.g. 'pe%ter' is going to match '|peanut|butter|' but not '|butter|peanut|'. (Yes, it's an obscure case, one that isn't going to be tested in QA, but will get exercised in production.) It's a fairly easy workaround (in some DBMS) to escape the wildcards. – Emplace 29/5, 2009 at 21:27

Agree with above comments... this is not a full or complete answer to the problem. If you catered for the case where the string contains pipes, (which you can using the above approach, but it's a bit more complex) then the answer would be better. – Leastways 14/9, 2015 at 10:19

Working with string in SQL is very slow. You should avoid it. – Monteria 3/1, 2018 at 14:34

It depends where the select list comes in your query. If it's in a relatively small "top" table in the query it will have a low cost. If it is executed late in a big query it would be worth the messiness to pump the match data into a temporary table (or table variable) with an index and join to it. – Broil 18/9, 2020 at 3:42

G

769

You can parameterize each value, so something like:

string[] tags = new string[] { "ruby", "rails", "scruffy", "rubyonrails" };
string cmdText = "SELECT * FROM Tags WHERE Name IN ({0})";

string[] paramNames = tags.Select(
    (s, i) => "@tag" + i.ToString()
).ToArray();
     
string inClause = string.Join(", ", paramNames);
using (SqlCommand cmd = new SqlCommand(string.Format(cmdText, inClause)))
{
    for (int i = 0; i < paramNames.Length; i++)
    {
       cmd.Parameters.AddWithValue(paramNames[i], tags[i]);
    }
}

Which will give you:

cmd.CommandText = "SELECT * FROM Tags WHERE Name IN (@tag0, @tag1, @tag2, @tag3)"
cmd.Parameters["@tag0"] = "ruby"
cmd.Parameters["@tag1"] = "rails"
cmd.Parameters["@tag2"] = "scruffy"
cmd.Parameters["@tag3"] = "rubyonrails"

No, this is not open to SQL injection. The only injected text into CommandText is not based on user input. It's solely based on the hardcoded "@tag" prefix, and the index of an array. The index will always be an integer, is not user generated, and is safe.

The user inputted values are still stuffed into parameters, so there is no vulnerability there.

Edit:

Injection concerns aside, take care to note that constructing the command text to accomodate a variable number of parameters (as above) impede's SQL server's ability to take advantage of cached queries. The net result is that you almost certainly lose the value of using parameters in the first place (as opposed to merely inserting the predicate strings into the SQL itself).

Not that cached query plans aren't valuable, but IMO this query isn't nearly complicated enough to see much benefit from it. While the compilation costs may approach (or even exceed) the execution costs, you're still talking milliseconds.

If you have enough RAM, I'd expect SQL Server would probably cache a plan for the common counts of parameters as well. I suppose you could always add five parameters, and let the unspecified tags be NULL - the query plan should be the same, but it seems pretty ugly to me and I'm not sure that it'd worth the micro-optimization (although, on Stack Overflow - it may very well be worth it).

Also, SQL Server 7 and later will auto-parameterize queries, so using parameters isn't really necessary from a performance standpoint - it is, however, critical from a security standpoint - especially with user inputted data like this.

Grande answered 3/12, 2008 at 16:35 Comment(16)

This is how LINQ to SQL does it, BTW – Berger 18/12, 2008 at 18:55

Isn't there a max number of Parameters? so if the user doesn't know how many tags, it might go over the max_number (around 200 or 255 params?). Secondly, why is using params better than just a dynamic sql with the values constructed on the fly (replace @Tag1 with the value, in the above example)? – Ethos 2/1, 2009 at 2:15

@Pure: The whole point of this is to avoid SQL Injection, which you would be vulnerable to if you used dynamic SQL. – Biaxial 4/2, 2009 at 23:27

Injection concerns aside, take care to note that constructing the command text to accomodate a variable number of parameters (as above) impede's SQL server's ability to take advantage of cached queries. The net result is that you almost certainly loose the value of using parameters in the first place (as opposed to merely inserting the predicate strings into the SQL itself). – Geognosy 19/8, 2009 at 19:1

@God of Data - Yes, I suppose if you need more than 2100 tags you'll need a different solution. But Basarb's could only reach 2100 if the average tag length was < 3 chars (since you need a delimiter as well). msdn.microsoft.com/en-us/library/ms143432.aspx – Grande 11/2, 2010 at 12:17

@Geognosy - that's only half true, as it would cache a plan for each version and even that may be fine (if not, why not optimize for an ad hoc workload?). For example in a paging scenario most of your queries will be using the page size number of parameters when populating things (an SO question list, for example). – Inclinometer 15/6, 2011 at 15:58

i've read it four times, and i still have no idea what it's doing. QuotedStr() it is! – Animate 19/6, 2012 at 21:0

Auto parameterization in SQL Server is by default only enabled for single parameter queries. Everything more complex is treated as an ad-hoc query. It is possible to force parameterization, which may give you problems elsewhere. So a parameterized query is still best. – Gotland 11/1, 2014 at 12:54

This is a good solution (insert a parameter placeholder for each IN value). However, SQL Server will reuse query plans by string equality, causing a new plan to be created for every different number of parameters. If the IN clause contains only a few, it's not bad. You may get a dozen query plans, for max. 12 values, but for a max 1000 values, there may be up to 1000 query plans neccessary. Some object-relational mappers use specific algorithms to split such queries into multiple, with recurring numbers of parameters to match existing query plans. – Gotland 11/1, 2014 at 13:0

Assume the tags are dynamic. E.g. Multi-Value extended select mode List Box. User is allowed to choose one or more. In that case, the tags can be one or more. So how can define the number of tags to be passed in SQL string. cmd.CommandText = "SELECT * FROM Tags WHERE Name IN (@tag0,@tag1,@tag2,@tag3, ....., @tagN)" N is variable...based on user selection. What's the catch? – Androgen 27/6, 2014 at 13:20

@Androgen - your selected values are in an array; you just loop over the array and add a parameter (suffixed with the index) for each one. – Grande 27/6, 2014 at 13:42

SQL Query is in a static class as a static string e.g. ..."WHERE TS.[SESSIONE] IN (@SessionList) AND ..." Array iteration is clear to me and I have built a set of parameters based on your answer. However connecting them to above query is an issue since the parameter is @SessionList where as array created parameters are @Session1, @Session2...etc. Hm... Did I just missed out {(0)} to replace @SessionList? – Androgen 27/6, 2014 at 13:47

This will also go wrong in the case where the (admittedly unusual) case where the client DB has the option DECIMAL=COMMA - you will need to add a trailing space after every comma when generating the string to avoid this.... ("1,5" -> means one-and-a-half, not "one then a five", "1, 5" (comma-space)-> "one then a five" – Benign 31/10, 2014 at 15:37

@Benign - I'd suspect that the decimal = comma wouldn't be a problem, as the comma delimits are between parameter names. Though it's tricial (and, arguably, better) to string.Join(", ") to make it more human readable.... – Grande 5/1, 2017 at 23:54

That's a C# solution, not SQL. – Grendel 30/11, 2018 at 13:23

@Grendel There is no such thing parameters in SQL. Parameters are something performed in your language, in conjunction with the database access library you're using (e.g. ADO in native code, ADO.net in .NET. Hibernate in Java) For example, in native code ADO syntax, the idea is to write the SQL SELECT * FROM Tags WHERE Name IN (?, ?, ?) I use ? because ADO/OLEDB (like ODBC) only has positional parameters - not named ones. The pure SQL approach involves writing the string as is - and make sure you don't screw up injection. – Animate 8/7, 2020 at 16:25

L

324

Here's a quick-and-dirty technique I have used:

SELECT * FROM Tags
WHERE '|ruby|rails|scruffy|rubyonrails|'
LIKE '%|' + Name + '|%'

So here's the C# code:

string[] tags = new string[] { "ruby", "rails", "scruffy", "rubyonrails" };
const string cmdText = "select * from tags where '|' + @tags + '|' like '%|' + Name + '|%'";

using (SqlCommand cmd = new SqlCommand(cmdText)) {
   cmd.Parameters.AddWithValue("@tags", string.Join("|", tags);
}

Two caveats:

The performance is terrible. LIKE "%...%" queries are not indexed.
Make sure you don't have any |, blank, or null tags or this won't work