Parameterize an SQL IN clause
Asked Answered
D

41

1138

How do I parameterize a query containing an IN clause with a variable number of arguments, like this one?

SELECT * FROM Tags 
WHERE Name IN ('ruby','rails','scruffy','rubyonrails')
ORDER BY Count DESC

In this query, the number of arguments could be anywhere from 1 to 5.

I would prefer not to use a dedicated stored procedure for this (or XML), but if there is some elegant way specific to SQL Server 2008, I am open to that.

Deluxe answered 3/12, 2008 at 16:16 Comment(3)
For MySQL, see MySQL Prepared statements with a variable size variable list.Voltaire
Similar: Passing array parameters to a stored procedure, PreparedStatement IN clause alternatives.Sadi
In new SQL Server 2016 (13.x) and later have in built function STRING_SPLIT. For instance SELECT [Value] FROM STRING_SPLIT('ruby,rails,scruffy,rubyonrails',',')Manducate
L
324

Here's a quick-and-dirty technique I have used:

SELECT * FROM Tags
WHERE '|ruby|rails|scruffy|rubyonrails|'
LIKE '%|' + Name + '|%'

So here's the C# code:

string[] tags = new string[] { "ruby", "rails", "scruffy", "rubyonrails" };
const string cmdText = "select * from tags where '|' + @tags + '|' like '%|' + Name + '|%'";

using (SqlCommand cmd = new SqlCommand(cmdText)) {
   cmd.Parameters.AddWithValue("@tags", string.Join("|", tags);
}

Two caveats:

  • The performance is terrible. LIKE "%...%" queries are not indexed.
  • Make sure you don't have any |, blank, or null tags or this won't work

There are other ways to accomplish this that some people may consider cleaner, so please keep reading.

Lite answered 3/12, 2008 at 16:41 Comment(17)
yeah, it is 10x slower, but it's very easily parameterized, heh. Not sure how much faster it would be to call fnSplit() as proposed by Longhorn213's answerDeluxe
Yes, this is a table scan. Great for 10 rows, lousy for 100,000.Car
I agree...this is a good solution for a small table. Doesn't require any temp tables or a bunch of parameters.Tapeworm
Longhorn213's fnSplit function would be called once, taking a little time, but is then able to take advantage of an index on Tags.Name. Joel's solution probably requires a full scan of Tags, which may be slow for a big table. Having said that, I do use Joel's method myself for small tables.Moersch
Make sure you test on tags that have pipes in them.Hypnotic
This doesn't even answer the question. Granted, it's easy to see where to add the parameters, but how can you accept this a solution if it doesn't even bother to parameterize the query? It only looks simpler than @Geognosy Brackett's because it isn't parameterized.Spleeny
tvanfosson: Good point. You're not using parameters, but actually still just strings...Castanets
"Granted, it's easy to see where to add the parameters" it's like the np-complete thing.. we've reduced the query to a typical form which is trivial to parameterize. The problem with IN is the inherent variability, how many INs can we have? 50? 1000? 10000?Deluxe
Apparently in MS-SQL the number is so large that they don't say what it is. If you're getting upwards of 10K, then the table join solution is probably better. This particular query is just going to keep getting worse and worse as the number increases. Imagine scanning a 50K char string each time.Spleeny
In this case, we're obviously talking about tags, and the SO system limits you to 5 total, so it probably won't be that bad.Hypnotic
@Joel - there's actually 2 inefficiencies in this solution. The parsing of the char string (the '|' + @tags + '|'), and the size of the table - since this needs a table scan. The former shouldn't be an issue with SO's tag system, but the latter certainly could be (there's about 16500 tags now)Grande
I've used this method with success in the past. I've also tested it. On a "typical" table of 500k rows, this method takes about four seonds. You can optimized by pre-creating the piped parameter and storing that as a field. Doing so reduces the query time by about half.Ommiad
@Joel: Clever, and it works. So what if it's going to do an index scan, performance only has to be "good enough". Not knowing the constraints on the Name column, I'm going to consider the edge cases (null, empty string, contains pipe character), as well as the obscure corner case, a Name value containing a wildcard e.g. 'pe%ter' is going to match '|peanut|butter|' but not '|butter|peanut|'. (Yes, it's an obscure case, one that isn't going to be tested in QA, but will get exercised in production.) It's a fairly easy workaround (in some DBMS) to escape the wildcards.Emplace
What if your tag is 'ruby|rails'. It will match, which will be wrong. When you roll out such solutions, you need to either make sure tags do not contain pipes, or explicitly filter them out: select * from Tags where '|ruby|rails|scruffy|rubyonrails|' like '%|' + Name + '|%' AND name not like '%!%'Uneven
Agree with above comments... this is not a full or complete answer to the problem. If you catered for the case where the string contains pipes, (which you can using the above approach, but it's a bit more complex) then the answer would be better.Leastways
Working with string in SQL is very slow. You should avoid it.Monteria
It depends where the select list comes in your query. If it's in a relatively small "top" table in the query it will have a low cost. If it is executed late in a big query it would be worth the messiness to pump the match data into a temporary table (or table variable) with an index and join to it.Broil
G
769

You can parameterize each value, so something like:

string[] tags = new string[] { "ruby", "rails", "scruffy", "rubyonrails" };
string cmdText = "SELECT * FROM Tags WHERE Name IN ({0})";

string[] paramNames = tags.Select(
    (s, i) => "@tag" + i.ToString()
).ToArray();
     
string inClause = string.Join(", ", paramNames);
using (SqlCommand cmd = new SqlCommand(string.Format(cmdText, inClause)))
{
    for (int i = 0; i < paramNames.Length; i++)
    {
       cmd.Parameters.AddWithValue(paramNames[i], tags[i]);
    }
}

Which will give you:

cmd.CommandText = "SELECT * FROM Tags WHERE Name IN (@tag0, @tag1, @tag2, @tag3)"
cmd.Parameters["@tag0"] = "ruby"
cmd.Parameters["@tag1"] = "rails"
cmd.Parameters["@tag2"] = "scruffy"
cmd.Parameters["@tag3"] = "rubyonrails"

No, this is not open to SQL injection. The only injected text into CommandText is not based on user input. It's solely based on the hardcoded "@tag" prefix, and the index of an array. The index will always be an integer, is not user generated, and is safe.

The user inputted values are still stuffed into parameters, so there is no vulnerability there.

Edit:

Injection concerns aside, take care to note that constructing the command text to accomodate a variable number of parameters (as above) impede's SQL server's ability to take advantage of cached queries. The net result is that you almost certainly lose the value of using parameters in the first place (as opposed to merely inserting the predicate strings into the SQL itself).

Not that cached query plans aren't valuable, but IMO this query isn't nearly complicated enough to see much benefit from it. While the compilation costs may approach (or even exceed) the execution costs, you're still talking milliseconds.

If you have enough RAM, I'd expect SQL Server would probably cache a plan for the common counts of parameters as well. I suppose you could always add five parameters, and let the unspecified tags be NULL - the query plan should be the same, but it seems pretty ugly to me and I'm not sure that it'd worth the micro-optimization (although, on Stack Overflow - it may very well be worth it).

Also, SQL Server 7 and later will auto-parameterize queries, so using parameters isn't really necessary from a performance standpoint - it is, however, critical from a security standpoint - especially with user inputted data like this.

Grande answered 3/12, 2008 at 16:35 Comment(16)
This is how LINQ to SQL does it, BTWBerger
Isn't there a max number of Parameters? so if the user doesn't know how many tags, it might go over the max_number (around 200 or 255 params?). Secondly, why is using params better than just a dynamic sql with the values constructed on the fly (replace @Tag1 with the value, in the above example)?Ethos
@Pure: The whole point of this is to avoid SQL Injection, which you would be vulnerable to if you used dynamic SQL.Biaxial
Injection concerns aside, take care to note that constructing the command text to accomodate a variable number of parameters (as above) impede's SQL server's ability to take advantage of cached queries. The net result is that you almost certainly loose the value of using parameters in the first place (as opposed to merely inserting the predicate strings into the SQL itself).Geognosy
@God of Data - Yes, I suppose if you need more than 2100 tags you'll need a different solution. But Basarb's could only reach 2100 if the average tag length was < 3 chars (since you need a delimiter as well). msdn.microsoft.com/en-us/library/ms143432.aspxGrande
@Geognosy - that's only half true, as it would cache a plan for each version and even that may be fine (if not, why not optimize for an ad hoc workload?). For example in a paging scenario most of your queries will be using the page size number of parameters when populating things (an SO question list, for example).Inclinometer
i've read it four times, and i still have no idea what it's doing. QuotedStr() it is!Animate
Auto parameterization in SQL Server is by default only enabled for single parameter queries. Everything more complex is treated as an ad-hoc query. It is possible to force parameterization, which may give you problems elsewhere. So a parameterized query is still best.Gotland
This is a good solution (insert a parameter placeholder for each IN value). However, SQL Server will reuse query plans by string equality, causing a new plan to be created for every different number of parameters. If the IN clause contains only a few, it's not bad. You may get a dozen query plans, for max. 12 values, but for a max 1000 values, there may be up to 1000 query plans neccessary. Some object-relational mappers use specific algorithms to split such queries into multiple, with recurring numbers of parameters to match existing query plans.Gotland
Assume the tags are dynamic. E.g. Multi-Value extended select mode List Box. User is allowed to choose one or more. In that case, the tags can be one or more. So how can define the number of tags to be passed in SQL string. cmd.CommandText = "SELECT * FROM Tags WHERE Name IN (@tag0,@tag1,@tag2,@tag3, ....., @tagN)" N is variable...based on user selection. What's the catch?Androgen
@Androgen - your selected values are in an array; you just loop over the array and add a parameter (suffixed with the index) for each one.Grande
SQL Query is in a static class as a static string e.g. ..."WHERE TS.[SESSIONE] IN (@SessionList) AND ..." Array iteration is clear to me and I have built a set of parameters based on your answer. However connecting them to above query is an issue since the parameter is @SessionList where as array created parameters are @Session1, @Session2...etc. Hm... Did I just missed out {(0)} to replace @SessionList?Androgen
This will also go wrong in the case where the (admittedly unusual) case where the client DB has the option DECIMAL=COMMA - you will need to add a trailing space after every comma when generating the string to avoid this.... ("1,5" -> means one-and-a-half, not "one then a five", "1, 5" (comma-space)-> "one then a five"Benign
@Benign - I'd suspect that the decimal = comma wouldn't be a problem, as the comma delimits are between parameter names. Though it's tricial (and, arguably, better) to string.Join(", ") to make it more human readable....Grande
That's a C# solution, not SQL.Grendel
@Grendel There is no such thing parameters in SQL. Parameters are something performed in your language, in conjunction with the database access library you're using (e.g. ADO in native code, ADO.net in .NET. Hibernate in Java) For example, in native code ADO syntax, the idea is to write the SQL SELECT * FROM Tags WHERE Name IN (?, ?, ?) I use ? because ADO/OLEDB (like ODBC) only has positional parameters - not named ones. The pure SQL approach involves writing the string as is - and make sure you don't screw up injection.Animate
L
324

Here's a quick-and-dirty technique I have used:

SELECT * FROM Tags
WHERE '|ruby|rails|scruffy|rubyonrails|'
LIKE '%|' + Name + '|%'

So here's the C# code:

string[] tags = new string[] { "ruby", "rails", "scruffy", "rubyonrails" };
const string cmdText = "select * from tags where '|' + @tags + '|' like '%|' + Name + '|%'";

using (SqlCommand cmd = new SqlCommand(cmdText)) {
   cmd.Parameters.AddWithValue("@tags", string.Join("|", tags);
}

Two caveats:

  • The performance is terrible. LIKE "%...%" queries are not indexed.
  • Make sure you don't have any |, blank, or null tags or this won't work

There are other ways to accomplish this that some people may consider cleaner, so please keep reading.

Lite answered 3/12, 2008 at 16:41 Comment(17)
yeah, it is 10x slower, but it's very easily parameterized, heh. Not sure how much faster it would be to call fnSplit() as proposed by Longhorn213's answerDeluxe
Yes, this is a table scan. Great for 10 rows, lousy for 100,000.Car
I agree...this is a good solution for a small table. Doesn't require any temp tables or a bunch of parameters.Tapeworm
Longhorn213's fnSplit function would be called once, taking a little time, but is then able to take advantage of an index on Tags.Name. Joel's solution probably requires a full scan of Tags, which may be slow for a big table. Having said that, I do use Joel's method myself for small tables.Moersch
Make sure you test on tags that have pipes in them.Hypnotic
This doesn't even answer the question. Granted, it's easy to see where to add the parameters, but how can you accept this a solution if it doesn't even bother to parameterize the query? It only looks simpler than @Geognosy Brackett's because it isn't parameterized.Spleeny
tvanfosson: Good point. You're not using parameters, but actually still just strings...Castanets
"Granted, it's easy to see where to add the parameters" it's like the np-complete thing.. we've reduced the query to a typical form which is trivial to parameterize. The problem with IN is the inherent variability, how many INs can we have? 50? 1000? 10000?Deluxe
Apparently in MS-SQL the number is so large that they don't say what it is. If you're getting upwards of 10K, then the table join solution is probably better. This particular query is just going to keep getting worse and worse as the number increases. Imagine scanning a 50K char string each time.Spleeny
In this case, we're obviously talking about tags, and the SO system limits you to 5 total, so it probably won't be that bad.Hypnotic
@Joel - there's actually 2 inefficiencies in this solution. The parsing of the char string (the '|' + @tags + '|'), and the size of the table - since this needs a table scan. The former shouldn't be an issue with SO's tag system, but the latter certainly could be (there's about 16500 tags now)Grande
I've used this method with success in the past. I've also tested it. On a "typical" table of 500k rows, this method takes about four seonds. You can optimized by pre-creating the piped parameter and storing that as a field. Doing so reduces the query time by about half.Ommiad
@Joel: Clever, and it works. So what if it's going to do an index scan, performance only has to be "good enough". Not knowing the constraints on the Name column, I'm going to consider the edge cases (null, empty string, contains pipe character), as well as the obscure corner case, a Name value containing a wildcard e.g. 'pe%ter' is going to match '|peanut|butter|' but not '|butter|peanut|'. (Yes, it's an obscure case, one that isn't going to be tested in QA, but will get exercised in production.) It's a fairly easy workaround (in some DBMS) to escape the wildcards.Emplace
What if your tag is 'ruby|rails'. It will match, which will be wrong. When you roll out such solutions, you need to either make sure tags do not contain pipes, or explicitly filter them out: select * from Tags where '|ruby|rails|scruffy|rubyonrails|' like '%|' + Name + '|%' AND name not like '%!%'Uneven
Agree with above comments... this is not a full or complete answer to the problem. If you catered for the case where the string contains pipes, (which you can using the above approach, but it's a bit more complex) then the answer would be better.Leastways
Working with string in SQL is very slow. You should avoid it.Monteria
It depends where the select list comes in your query. If it's in a relatively small "top" table in the query it will have a low cost. If it is executed late in a big query it would be worth the messiness to pump the match data into a temporary table (or table variable) with an index and join to it.Broil
G
262

For SQL Server 2008, you can use a table valued parameter. It's a bit of work, but it is arguably cleaner than my other method.

First, you have to create a type

CREATE TYPE dbo.TagNamesTableType AS TABLE ( Name nvarchar(50) )

Then, your ADO.NET code looks like this:

string[] tags = new string[] { "ruby", "rails", "scruffy", "rubyonrails" };
cmd.CommandText = "SELECT Tags.* FROM Tags JOIN @tagNames as P ON Tags.Name = P.Name";

// value must be IEnumerable<SqlDataRecord>
cmd.Parameters.AddWithValue("@tagNames", tags.AsSqlDataRecord("Name")).SqlDbType = SqlDbType.Structured;
cmd.Parameters["@tagNames"].TypeName = "dbo.TagNamesTableType";

// Extension method for converting IEnumerable<string> to IEnumerable<SqlDataRecord>
public static IEnumerable<SqlDataRecord> AsSqlDataRecord(this IEnumerable<string> values, string columnName) {
    if (values == null || !values.Any()) return null; // Annoying, but SqlClient wants null instead of 0 rows
    var firstRecord = values.First();
    var metadata= new SqlMetaData(columnName, SqlDbType.NVarChar, 50); //50 as per SQL Type
    return values.Select(v => 
    {
       var r = new SqlDataRecord(metadata);
       r.SetValues(v);
       return r;
    });
}

Update As Per @Doug

Please try to avoid var metadata = SqlMetaData.InferFromValue(firstRecord, columnName);

It's set first value length, so if first value is 3 characters then its set max length 3 and other records will truncated if more then 3 characters.

So, please try to use: var metadata= new SqlMetaData(columnName, SqlDbType.NVarChar, maxLen);

Note: -1 for max length.

Grande answered 3/12, 2008 at 16:53 Comment(14)
You can't [easily] use TVPs with Linq To Sql, so you need to fall back onto the good old SqlCommand object. I'm having to do exactly this right now because to get around Linq-To-Sql's lousey round-trip update/insert habit.Geognosy
we tested this and table valued parameters are DOG slow. It is literally faster to execute 5 queries than it is to do one TVP.Deluxe
Any idea how to prepare this statement? I get this error when I call cmd.Prepare(). Prepare method requires all variable length parameters to have an explicitly set non-zero length.Appulse
@JeffAtwood - Have you tried reshuffling the query to something like SELECT * FROM tags WHERE tags.name IN (SELECT name from @tvp);? In theory, this really should be the fastest approach. You can use relevant indexes (e.g. an index on tag name that INCLUDEs count would be ideal), and SQL Server should be doing a few seeks to grab all the tags and their counts. What does the plan look like?Atalanta
I've also tested this and it is FAST AS LIGHTNING (compared to constructing a large IN string). I had some problems setting the parameter though since I was constantly getting "Failed to convert parameter value from a Int32[] to a IEnumerable`1.". Anyway, solved that and here's a sample I made pastebin.com/qHP05CXcBurgener
@FredrikJohansson - Out of 130 upvotes, you may be the only run that's actually tried to run this! I made a mistake reading the docs, and you actually need an IEnumerable<SqlDataRecord>, not just any IEnumerable. Code has been updated.Grande
@MarkBrackett Great with an update! Accually this code really saved the day for me since I'm quering a Lucene search-index and it sometimes returns more than 50.000 or so hits that need to be doublechecked against SQL server - So I create an array of int[] (document/SQL keys) and then the code above comes in. The whole OP now takes less than 200ms :)Burgener
@Keith, there is no practical limit when using TVPs - that's one of the good reasons to use them.Cottonseed
Was talking to a db admin today and he suggested string split over TVP. I tried it out and the split was indeed faster. Granted I'm passing in large strings that split to over 10,000 values, but it still surprised me. Looking at the raw sql generated, TVP simply generates SQL that manually inserts one row at a time into a table parameter, so there can end up being a lot of query code to parse.Affirm
@JeffAtwood - Regarding performance create a primary key on the TVP so it uses an index. If it is likely to have many rows using OPTION (RECOMPILE) might also help.Cavein
I've been using this code for years before encountering this problem: Using the first record to set the SqlMetaData gives you a four-character string (see SqlMetaData.cs source code ). So the other three tags don't match anything (or match only "ruby", "rail", and "scru"). Anyone else seeing this behavior?Swampy
I just got tripped up by the bug @Swampy mentioned in the sample code: the metadata MaxLength limits the compared characters to 4 in the case of first word "ruby." The minimal fix was to use the longest value to set the data type: var firstRecord = values.OrderByDescending(v=> v?.Length ?? 0).First();Victorvictoria
@Victorvictoria - I ended up using metadata = New SqlMetaData(columnName, SqlDbType.NVarChar, -1) for strings which sets the type to nvarchar(max). Not sure if this affects performance or not.Swampy
Thanks @Doug, I have updated answer as per your comment.Obsolesce
E
191

The original question was "How do I parameterize a query ..."

This is not an answer to that original question. There are some very good demonstrations of how to do that, in other answers.

See the first answer from Mark Brackett (the first answer starting "You can parameterize each value") and Mark Brackett's second answer for the preferred answer that I (and 231 others) upvoted. The approach given in his answer allows 1) for effective use of bind variables, and 2) for predicates that are sargable.

Selected answer

I am addressing here the approach given in Joel Spolsky's answer, the answer "selected" as the right answer.

Joel Spolsky's approach is clever. And it works reasonably, it's going to exhibit predictable behavior and predictable performance, given "normal" values, and with the normative edge cases, such as NULL and the empty string. And it may be sufficient for a particular application.

But in terms generalizing this approach, let's also consider the more obscure corner cases, like when the Name column contains a wildcard character (as recognized by the LIKE predicate.) The wildcard character I see most commonly used is % (a percent sign.). So let's deal with that here now, and later go on to other cases.

Some problems with % character

Consider a Name value of 'pe%ter'. (For the examples here, I use a literal string value in place of the column name.) A row with a Name value of `'pe%ter' would be returned by a query of the form:

select ...
 where '|peanut|butter|' like '%|' + 'pe%ter' + '|%'

But that same row will not be returned if the order of the search terms is reversed:

select ...
 where '|butter|peanut|' like '%|' + 'pe%ter' + '|%'

The behavior we observe is kind of odd. Changing the order of the search terms in the list changes the result set.

It almost goes without saying that we might not want pe%ter to match peanut butter, no matter how much he likes it.

Obscure corner case

(Yes, I will agree that this is an obscure case. Probably one that is not likely to be tested. We wouldn't expect a wildcard in a column value. We may assume that the application prevents such a value from being stored. But in my experience, I've rarely seen a database constraint that specifically disallowed characters or patterns that would be considered wildcards on the right side of a LIKE comparison operator.

Patching a hole

One approach to patching this hole is to escape the % wildcard character. (For anyone not familiar with the escape clause on the operator, here's a link to the SQL Server documentation.

select ...
 where '|peanut|butter|'
  like '%|' + 'pe\%ter' + '|%' escape '\'

Now we can match the literal %. Of course, when we have a column name, we're going to need to dynamically escape the wildcard. We can use the REPLACE function to find occurrences of the % character and insert a backslash character in front of each one, like this:

select ...
 where '|pe%ter|'
  like '%|' + REPLACE( 'pe%ter' ,'%','\%') + '|%' escape '\'

So that solves the problem with the % wildcard. Almost.

Escape the escape

We recognize that our solution has introduced another problem. The escape character. We see that we're also going to need to escape any occurrences of escape character itself. This time, we use the ! as the escape character:

select ...
 where '|pe%t!r|'
  like '%|' + REPLACE(REPLACE( 'pe%t!r' ,'!','!!'),'%','!%') + '|%' escape '!'

The underscore too

Now that we're on a roll, we can add another REPLACE handle the underscore wildcard. And just for fun, this time, we'll use $ as the escape character.

select ...
 where '|p_%t!r|'
  like '%|' + REPLACE(REPLACE(REPLACE( 'p_%t!r' ,'$','$$'),'%','$%'),'_','$_') + '|%' escape '$'

I prefer this approach to escaping because it works in Oracle and MySQL as well as SQL Server. (I usually use the \ backslash as the escape character, since that's the character we use in regular expressions. But why be constrained by convention!

Those pesky brackets

SQL Server also allows for wildcard characters to be treated as literals by enclosing them in brackets []. So we're not done fixing yet, at least for SQL Server. Since pairs of brackets have special meaning, we'll need to escape those as well. If we manage to properly escape the brackets, then at least we won't have to bother with the hyphen - and the carat ^ within the brackets. And we can leave any % and _ characters inside the brackets escaped, since we'll have basically disabled the special meaning of the brackets.

Finding matching pairs of brackets shouldn't be that hard. It's a little more difficult than handling the occurrences of singleton % and _. (Note that it's not sufficient to just escape all occurrences of brackets, because a singleton bracket is considered to be a literal, and doesn't need to be escaped. The logic is getting a little fuzzier than I can handle without running more test cases.)

Inline expression gets messy

That inline expression in the SQL is getting longer and uglier. We can probably make it work, but heaven help the poor soul that comes behind and has to decipher it. As much of a fan I am for inline expressions, I'm inclined not use one here, mainly because I don't want to have to leave a comment explaining the reason for the mess, and apologizing for it.

A function where ?

Okay, so, if we don't handle that as an inline expression in the SQL, the closest alternative we have is a user defined function. And we know that won't speed things up any (unless we can define an index on it, like we could with Oracle.) If we've got to create a function, we might better do that in the code that calls the SQL statement.

And that function may have some differences in behavior, dependent on the DBMS and version. (A shout out to all you Java developers so keen on being able to use any database engine interchangeably.)

Domain knowledge

We may have specialized knowledge of the domain for the column, (that is, the set of allowable values enforced for the column. We may know a priori that the values stored in the column will never contain a percent sign, an underscore, or bracket pairs. In that case, we just include a quick comment that those cases are covered.

The values stored in the column may allow for % or _ characters, but a constraint may require those values to be escaped, perhaps using a defined character, such that the values are LIKE comparison "safe". Again, a quick comment about the allowed set of values, and in particular which character is used as an escape character, and go with Joel Spolsky's approach.

But, absent the specialized knowledge and a guarantee, it's important for us to at least consider handling those obscure corner cases, and consider whether the behavior is reasonable and "per the specification".


Other issues recapitulated

I believe others have already sufficiently pointed out some of the other commonly considered areas of concern:

  • SQL injection (taking what would appear to be user supplied information, and including that in the SQL text rather than supplying them through bind variables. Using bind variables isn't required, it's just one convenient approach to thwart with SQL injection. There are other ways to deal with it:

  • optimizer plan using index scan rather than index seeks, possible need for an expression or function for escaping wildcards (possible index on expression or function)

  • using literal values in place of bind variables impacts scalability


Conclusion

I like Joel Spolsky's approach. It's clever. And it works.

But as soon as I saw it, I immediately saw a potential problem with it, and it's not my nature to let it slide. I don't mean to be critical of the efforts of others. I know many developers take their work very personally, because they invest so much into it and they care so much about it. So please understand, this is not a personal attack. What I'm identifying here is the type of problem that crops up in production rather than testing.

Emplace answered 29/5, 2009 at 23:18 Comment(2)
can you please let us know if you use or like parameterized querys? in this particular case is it correct to jump over de rule of 'use parameterized querys' and sanitize with the original language? THANKS a lotHowler
@Luis: yes, i prefer using bind variables in SQL statements, and will only avoid bind variables when using them causes a performance problem. my normative pattern for the original problem would be to dynamically create the SQL statement with the required number of placeholders in the IN list, and then bind each value to one of the placeholders. See the answer from Mark Brackett, which is the answer that I (and 231 others) upvoted.Emplace
L
143

You can pass the parameter as a string

So you have the string

DECLARE @tags

SET @tags = ‘ruby|rails|scruffy|rubyonrails’

select * from Tags 
where Name in (SELECT item from fnSplit(@tags, ‘|’))
order by Count desc

Then all you have to do is pass the string as 1 parameter.

Here is the split function I use.

CREATE FUNCTION [dbo].[fnSplit](
    @sInputList VARCHAR(8000) -- List of delimited items
  , @sDelimiter VARCHAR(8000) = ',' -- delimiter that separates items
) RETURNS @List TABLE (item VARCHAR(8000))

BEGIN
DECLARE @sItem VARCHAR(8000)
WHILE CHARINDEX(@sDelimiter,@sInputList,0) <> 0
 BEGIN
 SELECT
  @sItem=RTRIM(LTRIM(SUBSTRING(@sInputList,1,CHARINDEX(@sDelimiter,@sInputList,0)-1))),
  @sInputList=RTRIM(LTRIM(SUBSTRING(@sInputList,CHARINDEX(@sDelimiter,@sInputList,0)+LEN(@sDelimiter),LEN(@sInputList))))

 IF LEN(@sItem) > 0
  INSERT INTO @List SELECT @sItem
 END

IF LEN(@sInputList) > 0
 INSERT INTO @List SELECT @sInputList -- Put the last item in
RETURN
END
Lizalizabeth answered 3/12, 2008 at 16:27 Comment(9)
You can also join to the table-function with this approach.Kayekayla
I use a solution similar to this in Oracle. It doesn't have to be re-parsed as some of the other solutions do.Aldo
This is a pure database approach the other require work in the code outside of the database.Lizalizabeth
Does this to a table scan or can it take advantage of indexs, etc?Ethos
better would be to use CROSS APPLY against the SQL table function (at least in 2005 onwards), which essentially joins against the table that is returnedWeymouth
@adolf garlic there's no need for cross apply because there's no outer reference. Just join to the fnsplit function. select T.* from Tags T INNER JOIN fnSplit(@tags, '|') X ON T.Name = X.itemJon
But it [fnSplit] returns a table...I wasn't aware you could join directly to a table function without using APPLYWeymouth
@DavidBasarab: just a question, Is this open to SQL Injection attacks? i'm not really sure when i could say this code is prone to SQL Injection attacks or notKedge
it is dynamicaly created SQL but it is not open to SQL Injection attacksMaher
C
67

I heard Jeff/Joel talk about this on the podcast today (episode 34, 2008-12-16 (MP3, 31 MB), 1 h 03 min 38 secs - 1 h 06 min 45 secs), and I thought I recalled Stack Overflow was using LINQ to SQL, but maybe it was ditched. Here's the same thing in LINQ to SQL.

var inValues = new [] { "ruby","rails","scruffy","rubyonrails" };

var results = from tag in Tags
              where inValues.Contains(tag.Name)
              select tag;

That's it. And, yes, LINQ already looks backwards enough, but the Contains clause seems extra backwards to me. When I had to do a similar query for a project at work, I naturally tried to do this the wrong way by doing a join between the local array and the SQL Server table, figuring the LINQ to SQL translator would be smart enough to handle the translation somehow. It didn't, but it did provide an error message that was descriptive and pointed me towards using Contains.

Anyway, if you run this in the highly recommended LINQPad, and run this query, you can view the actual SQL that the SQL LINQ provider generated. It'll show you each of the values getting parameterized into an IN clause.

Cobwebby answered 19/12, 2008 at 5:40 Comment(0)
K
54

If you are calling from .NET, you could use Dapper dot net:

string[] names = new string[] {"ruby","rails","scruffy","rubyonrails"};
var tags = dataContext.Query<Tags>(@"
select * from Tags 
where Name in @names
order by Count desc", new {names});

Here Dapper does the thinking, so you don't have to. Something similar is possible with LINQ to SQL, of course:

string[] names = new string[] {"ruby","rails","scruffy","rubyonrails"};
var tags = from tag in dataContext.Tags
           where names.Contains(tag.Name)
           orderby tag.Count descending
           select tag;
Kettering answered 15/6, 2011 at 11:4 Comment(3)
which happens to be what we use on this page, for the actual question asked (dapper) i.stack.imgur.com/RBAjL.pngUrba
Note that dapper now also supports Table Valued Parameters as first class citizensKettering
This falls over if names is longHeger
A
49

In SQL Server 2016+ you could use STRING_SPLIT function:

DECLARE @names NVARCHAR(MAX) = 'ruby,rails,scruffy,rubyonrails';

SELECT * 
FROM Tags
WHERE Name IN (SELECT [value] FROM STRING_SPLIT(@names, ','))
ORDER BY [Count] DESC;

or:

DECLARE @names NVARCHAR(MAX) = 'ruby,rails,scruffy,rubyonrails';

SELECT t.*
FROM Tags t
JOIN STRING_SPLIT(@names,',')
  ON t.Name = [value]
ORDER BY [Count] DESC;

LiveDemo

The accepted answer will of course work and it is one of the way to go, but it is anti-pattern.

E. Find rows by list of values

This is replacement for common anti-pattern such as creating a dynamic SQL string in application layer or Transact-SQL, or by using LIKE operator:

SELECT ProductId, Name, Tags
FROM Product
WHERE ',1,2,3,' LIKE '%,' + CAST(ProductId AS VARCHAR(20)) + ',%';

Addendum:

To improve the STRING_SPLIT table function row estimation, it is a good idea to materialize splitted values as temporary table/table variable:

DECLARE @names NVARCHAR(MAX) = 'ruby,rails,scruffy,rubyonrails,sql';

CREATE TABLE #t(val NVARCHAR(120));
INSERT INTO #t(val) SELECT s.[value] FROM STRING_SPLIT(@names, ',') s;

SELECT *
FROM Tags tg
JOIN #t t
  ON t.val = tg.TagName
ORDER BY [Count] DESC;

SEDE - Live Demo

Related: How to Pass a List of Values Into a Stored Procedure


Original question has requirement SQL Server 2008. Because this question is often used as duplicate, I've added this answer as reference.
Allogamy answered 2/5, 2016 at 10:20 Comment(1)
A temp table is exactly what I needed! I had a partial string Select * from Client ..join other.. where Client.Name like 'a%', that made my request perfom really bad; storing the result in a temp table INSERT INTO #t(Name) SELECT DISTINCT Name FROM Client WHERE Client.Name LIKE 'a%'then joining on said temp table was OP! Thanks a lot!Histogram
I
33

This is possibly a half nasty way of doing it, I used it once, was rather effective.

Depending on your goals it might be of use.

  1. Create a temp table with one column.
  2. INSERT each look-up value into that column.
  3. Instead of using an IN, you can then just use your standard JOIN rules. ( Flexibility++ )

This has a bit of added flexibility in what you can do, but it's more suited for situations where you have a large table to query, with good indexing, and you want to use the parametrized list more than once. Saves having to execute it twice and have all the sanitation done manually.

I never got around to profiling exactly how fast it was, but in my situation it was needed.

Iberian answered 3/12, 2008 at 17:4 Comment(2)
This is not nasty at all! Even more, it is IMHO a very clean way. And if you look into the execution plan, you see that it is the same like the IN clause. Instead of a temp table, you could also create a fixed table with indexes, where you store the parameters together with the SESSIONID.Criticism
A temp table is exactly what I needed! I had a partial string Select * from Client ..join other.. where Client.Name like 'a%', that made my request perfom really bad; storing the result in a temp table INSERT INTO #t(Name) SELECT DISTINCT Name FROM Client WHERE Client.Name LIKE 'a%'then joining on said temp table was OP! Thanks a lot!Histogram
W
25

We have function that creates a table variable that you can join to:

ALTER FUNCTION [dbo].[Fn_sqllist_to_table](@list  AS VARCHAR(8000),
                                           @delim AS VARCHAR(10))
RETURNS @listTable TABLE(
  Position INT,
  Value    VARCHAR(8000))
AS
  BEGIN
      DECLARE @myPos INT

      SET @myPos = 1

      WHILE Charindex(@delim, @list) > 0
        BEGIN
            INSERT INTO @listTable
                        (Position,Value)
            VALUES     (@myPos,LEFT(@list, Charindex(@delim, @list) - 1))

            SET @myPos = @myPos + 1

            IF Charindex(@delim, @list) = Len(@list)
              INSERT INTO @listTable
                          (Position,Value)
              VALUES     (@myPos,'')

            SET @list = RIGHT(@list, Len(@list) - Charindex(@delim, @list))
        END

      IF Len(@list) > 0
        INSERT INTO @listTable
                    (Position,Value)
        VALUES     (@myPos,@list)

      RETURN
  END 

So:

@Name varchar(8000) = null // parameter for search values    

select * from Tags 
where Name in (SELECT value From fn_sqllist_to_table(@Name,',')))
order by Count desc
Walleye answered 3/12, 2008 at 17:11 Comment(0)
C
21

This is gross, but if you are guaranteed to have at least one, you could do:

SELECT ...
       ...
 WHERE tag IN( @tag1, ISNULL( @tag2, @tag1 ), ISNULL( @tag3, @tag1 ), etc. )

Having IN( 'tag1', 'tag2', 'tag1', 'tag1', 'tag1' ) will be easily optimized away by SQL Server. Plus, you get direct index seeks

Castanets answered 3/12, 2008 at 16:31 Comment(1)
Optional parameters with Null checks spoil performance, since the optimizer requires the number of parameters used to create efficient queries. A query for 5 parameters may need a different query plan than one for 500 parameters.Gotland
L
20

I would pass a table type parameter (since it's SQL Server 2008), and do a where exists, or inner join. You may also use XML, using sp_xml_preparedocument, and then even index that temporary table.

Liquesce answered 3/12, 2008 at 16:30 Comment(1)
Ph.E's answer has an example building temp table (from csv).Allargando
C
18

In my opinion, the best source to solve this problem, is what has been posted on this site:

Syscomments. Dinakar Nethi

CREATE FUNCTION dbo.fnParseArray (@Array VARCHAR(1000),@separator CHAR(1))
RETURNS @T Table (col1 varchar(50))
AS 
BEGIN
 --DECLARE @T Table (col1 varchar(50))  
 -- @Array is the array we wish to parse
 -- @Separator is the separator charactor such as a comma
 DECLARE @separator_position INT -- This is used to locate each separator character
 DECLARE @array_value VARCHAR(1000) -- this holds each array value as it is returned
 -- For my loop to work I need an extra separator at the end. I always look to the
 -- left of the separator character for each array value

 SET @array = @array + @separator

 -- Loop through the string searching for separtor characters
 WHILE PATINDEX('%' + @separator + '%', @array) <> 0 
 BEGIN
    -- patindex matches the a pattern against a string
    SELECT @separator_position = PATINDEX('%' + @separator + '%',@array)
    SELECT @array_value = LEFT(@array, @separator_position - 1)
    -- This is where you process the values passed.
    INSERT into @T VALUES (@array_value)    
    -- Replace this select statement with your processing
    -- @array_value holds the value of this element of the array
    -- This replaces what we just processed with and empty string
    SELECT @array = STUFF(@array, 1, @separator_position, '')
 END
 RETURN 
END

Use:

SELECT * FROM dbo.fnParseArray('a,b,c,d,e,f', ',')

CREDITS FOR: Dinakar Nethi

Capparidaceous answered 22/7, 2010 at 14:47 Comment(1)
Great answer, clean and modular, super fast execution except for the initial CSV parsing into a table (one time, small number of elements). Although could use simpler/faster charindex() instead of patindex()? Charindex() also allows argument 'start_location' which may be able to avoid chopping input string each iter? To answer the original question can just join with function result.Allargando
F
12

The proper way IMHO is to store the list in a character string (limited in length by what the DBMS support); the only trick is that (in order to simplify processing) I have a separator (a comma in my example) at the beginning and at the end of the string. The idea is to "normalize on the fly", turning the list into a one-column table that contains one row per value. This allows you to turn

in (ct1,ct2, ct3 ... ctn)

into an

in (select ...)

or (the solution I'd probably prefer) a regular join, if you just add a "distinct" to avoid problems with duplicate values in the list.

Unfortunately, the techniques to slice a string are fairly product-specific. Here is the SQL Server version:

 with qry(n, names) as
       (select len(list.names) - len(replace(list.names, ',', '')) - 1 as n,
               substring(list.names, 2, len(list.names)) as names
        from (select ',Doc,Grumpy,Happy,Sneezy,Bashful,Sleepy,Dopey,' names) as list
        union all
        select (n - 1) as n,
               substring(names, 1 + charindex(',', names), len(names)) as names
        from qry
        where n > 1)
 select n, substring(names, 1, charindex(',', names) - 1) dwarf
 from qry;

The Oracle version:

 select n, substr(name, 1, instr(name, ',') - 1) dwarf
 from (select n,
             substr(val, 1 + instr(val, ',', 1, n)) name
      from (select rownum as n,
                   list.val
            from  (select ',Doc,Grumpy,Happy,Sneezy,Bashful,Sleepy,Dopey,' val
                   from dual) list
            connect by level < length(list.val) -
                               length(replace(list.val, ',', ''))));

and the MySQL version:

select pivot.n,
      substring_index(substring_index(list.val, ',', 1 + pivot.n), ',', -1) from (select 1 as n
     union all
     select 2 as n
     union all
     select 3 as n
     union all
     select 4 as n
     union all
     select 5 as n
     union all
     select 6 as n
     union all
     select 7 as n
     union all
     select 8 as n
     union all
     select 9 as n
     union all
     select 10 as n) pivot,    (select ',Doc,Grumpy,Happy,Sneezy,Bashful,Sleepy,Dopey,' val) as list where pivot.n <  length(list.val) -
                   length(replace(list.val, ',', ''));

(Of course, "pivot" must return as many rows as the maximum number of items we can find in the list)

Fallal answered 4/2, 2009 at 18:51 Comment(0)
S
11

If you've got SQL Server 2008 or later I'd use a Table Valued Parameter.

If you're unlucky enough to be stuck on SQL Server 2005 you could add a CLR function like this,

[SqlFunction(
    DataAccessKind.None,
    IsDeterministic = true,
    SystemDataAccess = SystemDataAccessKind.None,
    IsPrecise = true,
    FillRowMethodName = "SplitFillRow",
    TableDefinintion = "s NVARCHAR(MAX)"]
public static IEnumerable Split(SqlChars seperator, SqlString s)
{
    if (s.IsNull)
        return new string[0];

    return s.ToString().Split(seperator.Buffer);
}

public static void SplitFillRow(object row, out SqlString s)
{
    s = new SqlString(row.ToString());
}

Which you could use like this,

declare @desiredTags nvarchar(MAX);
set @desiredTags = 'ruby,rails,scruffy,rubyonrails';

select * from Tags
where Name in [dbo].[Split] (',', @desiredTags)
order by Count desc
Shebeen answered 15/8, 2012 at 16:32 Comment(0)
G
10

I think this is a case when a static query is just not the way to go. Dynamically build the list for your in clause, escape your single quotes, and dynamically build SQL. In this case you probably won't see much of a difference with any method due to the small list, but the most efficient method really is to send the SQL exactly as it is written in your post. I think it is a good habit to write it the most efficient way, rather than to do what makes the prettiest code, or consider it bad practice to dynamically build SQL.

I have seen the split functions take longer to execute than the query themselves in many cases where the parameters get large. A stored procedure with table valued parameters in SQL 2008 is the only other option I would consider, although this will probably be slower in your case. TVP will probably only be faster for large lists if you are searching on the primary key of the TVP, because SQL will build a temporary table for the list anyway (if the list is large). You won't know for sure unless you test it.

I have also seen stored procedures that had 500 parameters with default values of null, and having WHERE Column1 IN (@Param1, @Param2, @Param3, ..., @Param500). This caused SQL to build a temp table, do a sort/distinct, and then do a table scan instead of an index seek. That is essentially what you would be doing by parameterizing that query, although on a small enough scale that it won't make a noticeable difference. I highly recommend against having NULL in your IN lists, as if that gets changed to a NOT IN it will not act as intended. You could dynamically build the parameter list, but the only obvious thing that you would gain is that the objects would escape the single quotes for you. That approach is also slightly slower on the application end since the objects have to parse the query to find the parameters. It may or may not be faster on SQL, as parameterized queries call sp_prepare, sp_execute for as many times you execute the query, followed by sp_unprepare.

The reuse of execution plans for stored procedures or parameterized queries may give you a performance gain, but it will lock you in to one execution plan determined by the first query that is executed. That may be less than ideal for subsequent queries in many cases. In your case, reuse of execution plans will probably be a plus, but it might not make any difference at all as the example is a really simple query.

Cliffs notes:

For your case anything you do, be it parameterization with a fixed number of items in the list (null if not used), dynamically building the query with or without parameters, or using stored procedures with table valued parameters will not make much of a difference. However, my general recommendations are as follows:

Your case/simple queries with few parameters:

Dynamic SQL, maybe with parameters if testing shows better performance.

Queries with reusable execution plans, called multiple times by simply changing the parameters or if the query is complicated:

SQL with dynamic parameters.

Queries with large lists:

Stored procedure with table valued parameters. If the list can vary by a large amount use WITH RECOMPILE on the stored procedure, or simply use dynamic SQL without parameters to generate a new execution plan for each query.

Gruchot answered 9/6, 2010 at 20:28 Comment(1)
What do you mean by "stored procedure" here? Could you post an example?Amplification
S
10

May be we can use XML here:

    declare @x xml
    set @x='<items>
    <item myvalue="29790" />
    <item myvalue="31250" />
    </items>
    ';
    With CTE AS (
         SELECT 
            x.item.value('@myvalue[1]', 'decimal') AS myvalue
        FROM @x.nodes('//items/item') AS x(item) )

    select * from YourTable where tableColumnName in (select myvalue from cte)
Stertorous answered 24/10, 2011 at 18:41 Comment(1)
CTE and @x can be eliminated/inlined into the subselect, if done very carefully, as shown in this article.Jesuitism
W
10

If we have strings stored inside the IN clause with the comma(,) delimited, we can use the charindex function to get the values. If you use .NET, then you can map with SqlParameters.

DDL Script:

CREATE TABLE Tags
    ([ID] int, [Name] varchar(20))
;

INSERT INTO Tags
    ([ID], [Name])
VALUES
    (1, 'ruby'),
    (2, 'rails'),
    (3, 'scruffy'),
    (4, 'rubyonrails')
;

T-SQL:

DECLARE @Param nvarchar(max)

SET @Param = 'ruby,rails,scruffy,rubyonrails'

SELECT * FROM Tags
WHERE CharIndex(Name,@Param)>0

You can use the above statement in your .NET code and map the parameter with SqlParameter.

Fiddler demo

EDIT: Create the table called SelectedTags using the following script.

DDL Script:

Create table SelectedTags
(Name nvarchar(20));

INSERT INTO SelectedTags values ('ruby'),('rails')

T-SQL:

DECLARE @list nvarchar(max)
SELECT @list=coalesce(@list+',','')+st.Name FROM SelectedTags st

SELECT * FROM Tags
WHERE CharIndex(Name,@Param)>0
Watchmaker answered 23/11, 2012 at 18:13 Comment(1)
One limitation with this option. CharIndex returns 1 if the string is found. IN returns a match for an exact terms. CharIndex for "Stack" will return 1 for a term "StackOverflow" IN will not. There is a minor tweek to this answer using PatIndex above that encloses names with '<' % name % '>' that overcomes this limitation. Creative solution to this problem though.Lynnalynne
G
9

Here is another alternative. Just pass a comma-delimited list as a string parameter to the stored procedure and:

CREATE PROCEDURE [dbo].[sp_myproc]
    @UnitList varchar(MAX) = '1,2,3'
AS
select column from table
where ph.UnitID in (select * from CsvToInt(@UnitList))

And the function:

CREATE Function [dbo].[CsvToInt] ( @Array varchar(MAX))
returns @IntTable table
(IntValue int)
AS
begin
    declare @separator char(1)
    set @separator = ','
    declare @separator_position int
    declare @array_value varchar(MAX)

    set @array = @array + ','

    while patindex('%,%' , @array) <> 0
    begin

        select @separator_position = patindex('%,%' , @array)
        select @array_value = left(@array, @separator_position - 1)

        Insert @IntTable
        Values (Cast(@array_value as int))
        select @array = stuff(@array, 1, @separator_position, '')
    end
    return
end
Grandam answered 6/4, 2013 at 2:39 Comment(0)
R
9

I'd approach this by default with passing a table valued function (that returns a table from a string) to the IN condition.

Here is the code for the UDF (I got it from Stack Overflow somewhere, i can't find the source right now)

CREATE FUNCTION [dbo].[Split] (@sep char(1), @s varchar(8000))
RETURNS table
AS
RETURN (
    WITH Pieces(pn, start, stop) AS (
      SELECT 1, 1, CHARINDEX(@sep, @s)
      UNION ALL
      SELECT pn + 1, stop + 1, CHARINDEX(@sep, @s, stop + 1)
      FROM Pieces
      WHERE stop > 0
    )
    SELECT 
      SUBSTRING(@s, start, CASE WHEN stop > 0 THEN stop-start ELSE 512 END) AS s
    FROM Pieces
  )

Once you got this your code would be as simple as this:

select * from Tags 
where Name in (select s from dbo.split(';','ruby;rails;scruffy;rubyonrails'))
order by Count desc

Unless you have a ridiculously long string, this should work well with the table index.

If needed you can insert it into a temp table, index it, then run a join...

Rite answered 11/6, 2015 at 15:16 Comment(0)
L
8

For a variable number of arguments like this the only way I'm aware of is to either generate the SQL explicitly or do something that involves populating a temporary table with the items you want and joining against the temp table.

Limoli answered 3/12, 2008 at 16:31 Comment(0)
C
8

Another possible solution is instead of passing a variable number of arguments to a stored procedure, pass a single string containing the names you're after, but make them unique by surrounding them with '<>'. Then use PATINDEX to find the names:

SELECT * 
FROM Tags 
WHERE PATINDEX('%<' + Name + '>%','<jo>,<john>,<scruffy>,<rubyonrails>') > 0
Crustal answered 12/2, 2010 at 19:22 Comment(0)
D
8

Use the following stored procedure. It uses a custom split function, which can be found here.

 create stored procedure GetSearchMachingTagNames 
    @PipeDelimitedTagNames varchar(max), 
    @delimiter char(1) 
    as  
    begin
         select * from Tags 
         where Name in (select data from [dbo].[Split](@PipeDelimitedTagNames,@delimiter) 
    end
Deadly answered 26/7, 2012 at 5:39 Comment(0)
D
7

In ColdFusion we just do:

<cfset myvalues = "ruby|rails|scruffy|rubyonrails">
    <cfquery name="q">
        select * from sometable where values in <cfqueryparam value="#myvalues#" list="true">
    </cfquery>
Disinfest answered 10/12, 2008 at 21:54 Comment(0)
D
7

Here's a technique that recreates a local table to be used in a query string. Doing it this way eliminates all parsing problems.

The string can be built in any language. In this example I used SQL since that was the original problem I was trying to solve. I needed a clean way to pass in table data on the fly in a string to be executed later.

Using a user defined type is optional. Creating the type is only created once and can be done ahead of time. Otherwise just add a full table type to the declaration in the string.

The general pattern is easy to extend and can be used for passing more complex tables.

-- Create a user defined type for the list.
CREATE TYPE [dbo].[StringList] AS TABLE(
    [StringValue] [nvarchar](max) NOT NULL
)

-- Create a sample list using the list table type.
DECLARE @list [dbo].[StringList]; 
INSERT INTO @list VALUES ('one'), ('two'), ('three'), ('four')

-- Build a string in which we recreate the list so we can pass it to exec
-- This can be done in any language since we're just building a string.
DECLARE @str nvarchar(max);
SET @str = 'DECLARE @list [dbo].[StringList]; INSERT INTO @list VALUES '

-- Add all the values we want to the string. This would be a loop in C++.
SELECT @str = @str + '(''' + StringValue + '''),' FROM @list

-- Remove the trailing comma so the query is valid sql.
SET @str = substring(@str, 1, len(@str)-1)

-- Add a select to test the string.
SET @str = @str + '; SELECT * FROM @list;'

-- Execute the string and see we've pass the table correctly.
EXEC(@str)
Deirdredeism answered 29/5, 2012 at 23:49 Comment(0)
C
7

In SQL Server 2016+ another possibility is to use the OPENJSON function.

This approach is blogged about in OPENJSON - one of best ways to select rows by list of ids.

A full worked example below

CREATE TABLE dbo.Tags
  (
     Name  VARCHAR(50),
     Count INT
  )

INSERT INTO dbo.Tags
VALUES      ('VB',982), ('ruby',1306), ('rails',1478), ('scruffy',1), ('C#',1784)

GO

CREATE PROC dbo.SomeProc
@Tags VARCHAR(MAX)
AS
SELECT T.*
FROM   dbo.Tags T
WHERE  T.Name IN (SELECT J.Value COLLATE Latin1_General_CI_AS
                  FROM   OPENJSON(CONCAT('[', @Tags, ']')) J)
ORDER  BY T.Count DESC

GO

EXEC dbo.SomeProc @Tags = '"ruby","rails","scruffy","rubyonrails"'

DROP TABLE dbo.Tags 
Cavein answered 7/11, 2015 at 14:57 Comment(0)
P
6

I have an answer that doesn't require a UDF, XML Because IN accepts a select statement e.g. SELECT * FROM Test where Data IN (SELECT Value FROM TABLE)

You really only need a way to convert the string into a table.

This can be done with a recursive CTE, or a query with a number table (or Master..spt_value)

Here's the CTE version.

DECLARE @InputString varchar(8000) = 'ruby,rails,scruffy,rubyonrails'

SELECT @InputString = @InputString + ','

;WITH RecursiveCSV(x,y) 
AS 
(
    SELECT 
        x = SUBSTRING(@InputString,0,CHARINDEX(',',@InputString,0)),
        y = SUBSTRING(@InputString,CHARINDEX(',',@InputString,0)+1,LEN(@InputString))
    UNION ALL
    SELECT 
        x = SUBSTRING(y,0,CHARINDEX(',',y,0)),
        y = SUBSTRING(y,CHARINDEX(',',y,0)+1,LEN(y))
    FROM 
        RecursiveCSV 
    WHERE
        SUBSTRING(y,CHARINDEX(',',y,0)+1,LEN(y)) <> '' OR 
        SUBSTRING(y,0,CHARINDEX(',',y,0)) <> ''
)
SELECT
    * 
FROM 
    Tags
WHERE 
    Name IN (select x FROM RecursiveCSV)
OPTION (MAXRECURSION 32767);
Pipette answered 13/5, 2011 at 15:3 Comment(0)
B
6

I use a more concise version of the top voted answer:

List<SqlParameter> parameters = tags.Select((s, i) => new SqlParameter("@tag" + i.ToString(), SqlDbType.NVarChar(50)) { Value = s}).ToList();

var whereCondition = string.Format("tags in ({0})", String.Join(",",parameters.Select(s => s.ParameterName)));

It does loop through the tag parameters twice; but that doesn't matter most of the time (it won't be your bottleneck; if it is, unroll the loop).

If you're really interested in performance and don't want to iterate through the loop twice, here's a less beautiful version:

var parameters = new List<SqlParameter>();
var paramNames = new List<string>();
for (var i = 0; i < tags.Length; i++)  
{
    var paramName = "@tag" + i;

    //Include size and set value explicitly (not AddWithValue)
    //Because SQL Server may use an implicit conversion if it doesn't know
    //the actual size.
    var p = new SqlParameter(paramName, SqlDbType.NVarChar(50) { Value = tags[i]; } 
    paramNames.Add(paramName);
    parameters.Add(p);
}

var inClause = string.Join(",", paramNames);
Bradybradycardia answered 12/3, 2015 at 18:11 Comment(1)
The most important part of this, new SqlParameter(paramName, SqlDbType.NVarChar(50) { Value = tags[i]; } is a syntax error. Should the second open parenthesis be a comma, that is new SqlParameter(paramName, SqlDbType.NVarChar, 50) ?Kuroshio
S
5

Here is another answer to this problem.

(new version posted on 6/4/13).

    private static DataSet GetDataSet(SqlConnectionStringBuilder scsb, string strSql, params object[] pars)
    {
        var ds = new DataSet();
        using (var sqlConn = new SqlConnection(scsb.ConnectionString))
        {
            var sqlParameters = new List<SqlParameter>();
            var replacementStrings = new Dictionary<string, string>();
            if (pars != null)
            {
                for (int i = 0; i < pars.Length; i++)
                {
                    if (pars[i] is IEnumerable<object>)
                    {
                        List<object> enumerable = (pars[i] as IEnumerable<object>).ToList();
                        replacementStrings.Add("@" + i, String.Join(",", enumerable.Select((value, pos) => String.Format("@_{0}_{1}", i, pos))));
                        sqlParameters.AddRange(enumerable.Select((value, pos) => new SqlParameter(String.Format("@_{0}_{1}", i, pos), value ?? DBNull.Value)).ToArray());
                    }
                    else
                    {
                        sqlParameters.Add(new SqlParameter(String.Format("@{0}", i), pars[i] ?? DBNull.Value));
                    }
                }
            }
            strSql = replacementStrings.Aggregate(strSql, (current, replacementString) => current.Replace(replacementString.Key, replacementString.Value));
            using (var sqlCommand = new SqlCommand(strSql, sqlConn))
            {
                if (pars != null)
                {
                    sqlCommand.Parameters.AddRange(sqlParameters.ToArray());
                }
                else
                {
                    //Fail-safe, just in case a user intends to pass a single null parameter
                    sqlCommand.Parameters.Add(new SqlParameter("@0", DBNull.Value));
                }
                using (var sqlDataAdapter = new SqlDataAdapter(sqlCommand))
                {
                    sqlDataAdapter.Fill(ds);
                }
            }
        }
        return ds;
    }

Cheers.

Symphysis answered 3/6, 2013 at 22:54 Comment(0)
Q
4

The only winning move is not to play.

No infinite variability for you. Only finite variability.

In the SQL you have a clause like this:

and ( {1}==0 or b.CompanyId in ({2},{3},{4},{5},{6}) )

In the C# code you do something like this:

  int origCount = idList.Count;
  if (origCount > 5) {
    throw new Exception("You may only specify up to five originators to filter on.");
  }
  while (idList.Count < 5) { idList.Add(-1); }  // -1 is an impossible value
  return ExecuteQuery<PublishDate>(getValuesInListSQL, 
               origCount,   
               idList[0], idList[1], idList[2], idList[3], idList[4]);

So basically if the count is 0 then there is no filter and everything goes through. If the count is higher than 0 the then the value must be in the list, but the list has been padded out to five with impossible values (so that the SQL still makes sense)

Sometimes the lame solution is the only one that actually works.

Quoits answered 28/4, 2011 at 21:56 Comment(0)
R
4

Here's a cross-post to a solution to the same problem. More robust than reserved delimiters - includes escaping and nested arrays, and understands NULLs and empty arrays.

C# & T-SQL string[] Pack/Unpack utility functions

You can then join to the table-valued function.

Ronrona answered 10/8, 2012 at 14:29 Comment(0)
G
4

(Edit: If table valued parameters are not available) Best seems to be to split a large number of IN parameters into multiple queries with fixed length, so you have a number of known SQL statements with fixed parameter count and no dummy/duplicate values, and also no parsing of strings, XML and the like.

Here's some code in C# I wrote on this topic:

public static T[][] SplitSqlValues<T>(IEnumerable<T> values)
{
    var sizes = new int[] { 1000, 500, 250, 125, 63, 32, 16, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1 };
    int processed = 0;
    int currSizeIdx = sizes.Length - 1; /* start with last (smallest) */
    var splitLists = new List<T[]>();

    var valuesDistSort = values.Distinct().ToList(); /* remove redundant */
    valuesDistSort.Sort();
    int totalValues = valuesDistSort.Count;

    while (totalValues > sizes[currSizeIdx] && currSizeIdx > 0)
    currSizeIdx--; /* bigger size, by array pos. */

    while (processed < totalValues)
    {
        while (totalValues - processed < sizes[currSizeIdx]) 
            currSizeIdx++; /* smaller size, by array pos. */
        var partList = new T[sizes[currSizeIdx]];
        valuesDistSort.CopyTo(processed, partList, 0, sizes[currSizeIdx]);
        splitLists.Add(partList);
        processed += sizes[currSizeIdx];
    }
    return splitLists.ToArray();
}

(you may have further ideas, omit the sorting, use valuesDistSort.Skip(processed).Take(size[...]) instead of list/array CopyTo).

When inserting parameter variables, you create something like:

foreach(int[] partList in splitLists)
{
    /* here: question mark for param variable, use named/numbered params if required */
    string sql = "select * from Items where Id in("
        + string.Join(",", partList.Select(p => "?")) 
        + ")"; /* comma separated ?, one for each partList entry */

    /* create command with sql string, set parameters, execute, merge results */
}

I've watched the SQL generated by the NHibernate object-relational mapper (when querying data to create objects from), and that looks best with multiple queries. In NHibernate, one can specify a batch-size; if many object data rows have to be fetched, it tries to retrieve the number of rows equivalent to the batch-size

SELECT * FROM MyTable WHERE Id IN (@p1, @p2, @p3, ... , @p[batch-size])

,instead of sending hundreds or thousands of

SELECT * FROM MyTable WHERE Id=@id

When the remaining IDs are less then batch-size, but still more than one, it splits into smaller statements, but still with certain length.

If you have a batch size of 100, and a query with 118 parameters, it would create 3 queries:

  • one with 100 parameters (batch-size),
  • then one with 12
  • and another one with 6,

but none with 118 or 18. This way, it restricts the possible SQL statements to likely known statements, preventing too many different, thus too many query plans, which fill the cache and in great parts never get reused. The above code does the same, but with lengths 1000, 500, 250, 125, 63, 32, 16, 10-to-1. Parameter lists with more than 1000 elements are also split, preventing a database error due to a size limit.

Anyway, it's best to have a database interface which sends parameterized SQL directly, without a separate Prepare statement and handle to call. Databases like SQL Server and Oracle remember SQL by string equality (values change, binding params in SQL not!) and reuse query plans, if available. No need for separate prepare statements, and tedious maintenance of query handles in code! ADO.NET works like this, but it seems like Java still uses prepare/execute by handle (not sure).

I had my own question on this topic, originally suggesting to fill the IN clause with duplicates, but then preferring the NHibernate style statement split: Parameterized SQL - in / not in with fixed numbers of parameters, for query plan cache optimization?

This question is still interesting, even more than 5 years after being asked...

EDIT: I noted that IN queries with many values (like 250 or more) still tend to be slow, in the given case, on SQL Server. While I expected the DB to create a kind of temporary table internally and join against it, it seemed like it only repeated the single value SELECT expression n-times. Time was up to about 200ms per query - even worse than joining the original IDs retrieval SELECT against the other, related tables.. Also, there were some 10 to 15 CPU units in SQL Server Profiler, something unusual for repeated execution of the same parameterized queries, suggesting that new query plans were created on repeated calls. Maybe ad-hoc like individual queries are not worse at all. I had to compare these queries to non-split queries with changing sizes for a final conclusion, but for now, it seems like long IN clauses should be avoided anyway.

Gotland answered 12/1, 2014 at 0:32 Comment(0)
G
4

You can do this in a reusable way by doing the following -

public static class SqlWhereInParamBuilder
{
    public static string BuildWhereInClause<t>(string partialClause, string paramPrefix, IEnumerable<t> parameters)
    {
        string[] parameterNames = parameters.Select(
            (paramText, paramNumber) => "@" + paramPrefix + paramNumber.ToString())
            .ToArray();

        string inClause = string.Join(",", parameterNames);
        string whereInClause = string.Format(partialClause.Trim(), inClause);

        return whereInClause;
    }

    public static void AddParamsToCommand<t>(this SqlCommand cmd, string paramPrefix, IEnumerable<t> parameters)
    {
        string[] parameterValues = parameters.Select((paramText) => paramText.ToString()).ToArray();

        string[] parameterNames = parameterValues.Select(
            (paramText, paramNumber) => "@" + paramPrefix + paramNumber.ToString()
            ).ToArray();

        for (int i = 0; i < parameterNames.Length; i++)
        {
            cmd.Parameters.AddWithValue(parameterNames[i], parameterValues[i]);
        }
    }
}

For more details have a look at this blog post - Parameterized SQL WHERE IN clause c#

Germander answered 28/1, 2016 at 4:7 Comment(0)
B
4

This is a reusable variation of the solution in Mark Bracket's excellent answer.

Extension Method:

public static class ParameterExtensions
{
    public static Tuple<string, SqlParameter[]> ToParameterTuple<T>(this IEnumerable<T> values)
    {
        var createName = new Func<int, string>(index => "@value" + index.ToString());
        var paramTuples = values.Select((value, index) => 
        new Tuple<string, SqlParameter>(createName(index), new SqlParameter(createName(index), value))).ToArray();
        var inClause = string.Join(",", paramTuples.Select(t => t.Item1));
        var parameters = paramTuples.Select(t => t.Item2).ToArray();
        return new Tuple<string, SqlParameter[]>(inClause, parameters);
    }
}

Usage:

        string[] tags = {"ruby", "rails", "scruffy", "rubyonrails"};
        var paramTuple = tags.ToParameterTuple();
        var cmdText = $"SELECT * FROM Tags WHERE Name IN ({paramTuple.Item1})";

        using (var cmd = new SqlCommand(cmdText))
        {
            cmd.Parameters.AddRange(paramTuple.Item2);
        }
Braddy answered 30/12, 2016 at 3:34 Comment(0)
V
3
    create FUNCTION [dbo].[ConvertStringToList]


      (@str VARCHAR (MAX), @delimeter CHAR (1))
        RETURNS 
        @result TABLE (
            [ID] INT NULL)
    AS
    BEG

IN

    DECLARE @x XML 
    SET @x = '<t>' + REPLACE(@str, @delimeter, '</t><t>') + '</t>'

    INSERT INTO @result
    SELECT DISTINCT x.i.value('.', 'int') AS token
    FROM @x.nodes('//t') x(i)
    ORDER BY 1

RETURN
END

--YOUR QUERY

select * from table where id in ([dbo].[ConvertStringToList(YOUR comma separated string ,',')])
Valrievalry answered 18/3, 2015 at 7:43 Comment(1)
Some explanation would be nice.Baylor
P
2

Use a dynamic query. The front end is only to generate the required format:

DECLARE @invalue VARCHAR(100)
SELECT @invalue = '''Bishnu'',''Gautam'''

DECLARE @dynamicSQL VARCHAR(MAX)
SELECT @dynamicSQL = 'SELECT * FROM #temp WHERE [name] IN (' + @invalue + ')'
EXEC (@dynamicSQL)

SQL Fiddle

Pulsatory answered 1/1, 2014 at 18:32 Comment(2)
This is not safe against SQL injection.Cavein
Please don't do this. While this may sometimes be done in a not totally unreasonable way. It truly is an accident (incident) waiting to happen if done in the wrong place with the wrong data (which is usually the case).Wormy
C
2

There is a nice, simple and tested way of doing that:

/* Create table-value string: */
CREATE TYPE [String_List] AS TABLE ([Your_String_Element] varchar(max) PRIMARY KEY);
GO
/* Create procedure which takes this table as parameter: */

CREATE PROCEDURE [dbo].[usp_ListCheck]
@String_List_In [String_List] READONLY  
AS   
SELECT a.*
FROM [dbo].[Tags] a
JOIN @String_List_In b ON a.[Name] = b.[Your_String_Element];

I have started using this method to fix the issues we had with the entity framework (was not robust enough for our application). So we decided to give the Dapper (same as Stack) a chance. Also specifying your string list as table with PK column fix your execution plans a lot. Here is a good article of how to pass a table into Dapper - all fast and CLEAN.

Contort answered 12/4, 2017 at 9:49 Comment(0)
K
1

Create a temp table where names are stored, and then use the following query:

select * from Tags 
where Name in (select distinct name from temp)
order by Count desc
Kroon answered 30/7, 2017 at 18:44 Comment(2)
Hmm.... I see a temp table solution here. Where I work, when you encounter a difficulty that you don't understand or know how to properly circumvent, we simply create a table field (a flag) or a temp table... Do such solutions make you a better software engineer ? That's the questionMchugh
@KurtMiller, I think this address the original question. In this case the temp table is a substitute of magic strings. Lets suppose your strings are a few dozens, not always the same number of strings... will you concat and the split the string? A temp table is a lot more flexble. Only thing, it should be an [at]temp table, a kind that only exists during the query executionTheresatherese
H
1

In SQL SERVER 2016 or higher you can use STRING_SPLIT.

DECLARE @InParaSeprated VARCHAR(MAX) = 'ruby,rails,scruffy,rubyonrails'
DECLARE @Delimeter VARCHAR(10) = ','
SELECT 
    * 
FROM 
    Tags T
    INNER JOIN STRING_SPLIT(@InputParameters,@Delimeter) SS ON T.Name = SS.value
ORDER BY 
    Count DESC

I use this because some times join faster than Like Operator works in my queries.
In addition you can put unlimited number of inputs in any separated format that you like.
I like this ..

Horst answered 25/2, 2019 at 16:57 Comment(0)
E
1

Step 1:-

string[] Ids = new string[] { "3", "6", "14" };
string IdsSP = string.Format("'|{0}|'", string.Join("|", Ids));

Step 2:-

@CurrentShipmentStatusIdArray [nvarchar](255) = NULL

Step 3:-

Where @CurrentShipmentStatusIdArray is null or @CurrentShipmentStatusIdArray LIKE '%|' + convert(nvarchar,Shipments.CurrentShipmentStatusId) + '|%'

or

Where @CurrentShipmentStatusIdArray is null or @CurrentShipmentStatusIdArray LIKE '%|' + Shipments.CurrentShipmentStatusId+ '|%'
Electrophysiology answered 12/7, 2019 at 20:47 Comment(0)
H
0

If you are interested in doing everything in SQL procedures and functions (as the initial question was formulated), and not concatenating strings in C#, you can use a fairly simple method:

DECLARE @testFilter VARCHAR(50) = 'test1,test2,test3'
SELECT * FROM testTable WHERE testField IN (SELECT value FROM STRING_SPLIT(@testFilter, ','))
Hateful answered 16/2, 2023 at 16:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.