Filter sql based on C# List instead of a filter table
Asked Answered
M

1

0

Say I have a table with the following data:

enter image description here

Now I want to filter by the primary keys department and number. I have a list of department and number combinations that have to be filtered in code. In my mind, I would create a join that results in the following:

select * from employee e
inner join dynamicTable dyn on e.Department = dyn.Department 
                           and e.Number = dyn.Number;

dynamicTable is my List in C# code that has the primary keys to filter, but I don't know how to pass this list to the database level.

I dont't want to load everything from my employees table and that filter in code by linq or something else, because I have millions of employees in my database.

I already thought about combining the primary_keys and create a where in (...), but firebird has restriction to max 1500 records in a where in.

Database used is Firebird version 2.1

Menagerie answered 16/5, 2017 at 9:27 Comment(10)
"have millions of employees in my database." - as a matter of interest, how large is your DB file and how many tables you have there. select count(*) from RDB$RELATIONS where rdb$relation_type=0 and ( rdb$System_flag is null or rdb$System_flag=0 ) Maybe you'd really think about upgrading FB...Buffoon
I have 92 tables, and there are a cuple of tables that have a huge amount of rows. This is because there are many pay transactions and user actions that have to be saved and evaluated... Updating FB is not so easy , because there are near to one hundred of clients out there with a 2.1 version and updating means testing!! But what benefit would an update be?Menagerie
and the file size? according to www.translate.ru -> ibase.ru/profitmed FB 2.5 is faster with indexes and general I/O, which should benefit those very "long" tables. FB 3.0 is probably yet better, but personally I do have doubts how much polished it is. Additionally 2.5 makes more Monitoring Tables and introduces Tracing API for investigating bottlenecks. See 2.5 relnotes. 2.1 clients can usually work with 2.5 server without any changes, while it is not recommended. So if you have ONE server, then perhaps after testing you can make just a switch, and then slowly pull clients up.Buffoon
Unfortunately there is not one server, every client has it's own server, I do connect to the clients by a wcf service and load the data right there. I do this because firebird has no build in secure transport, and the connection to the clients is not always a high speed connection, so I compress the loaded data aditionally before snending the response. One point not to forget is, the heavy part is not only on the database, the loaded data has to be mapped to xml/ dataset or to an equivalent datasource for the reports. This is a very memory eating point.Menagerie
typical solution to let interbase/firebird through public networks was ZeeBeDee - however this requires connections to be at least stable. It encrypts, compresses, but is not made for mobile-like network when connection is often dropped/restored. Frankly, for scheduled syncs i'd rather make delta files like encrypted xxxx.json.7z and transfer those files by any transport then. Would not like to make not always predictable WAN conditions have direct impact on db serverBuffoon
Yes, I am on the way to make a synchronization mechanism, but that's not trivial in every point, I am developing on a good architecture for this. But the point with te delta files is new for me, are there any real life examples of your idea?Menagerie
Nope, but there were a lot of discussions about strategies for implementing replication. And there are number of 3rd party services for replication in Firebird, on of those would be merged into mainstream with FB4. I guess your tasks looks like partial replication, when One True main database has all the remote offices data, and remote offices only have shards of the database relevant for them. You may try bottom link at ibase.ru/admin via translate.ru - but they are old and discussing strategies, again.Buffoon
Exactly, I need only a prt of the database. But I think it is better to do it by hand, there are to many if this that thats.. :-) Thank you very much for your interest!Menagerie
firebirdfaq.org/faq249 and translate.ru with ibase.ru/d_repl or codenet.ru/db/interbase/Interbase-Replications or perhaps even repl2.narod.ru or sql.ru/forum/357703-a/replikaciya or rsdn.org/forum/db/1269536.flatBuffoon
rsdn.org/forum/db/734788.flat rsdn.org/forum/db/1222455.flat#1222455Buffoon
J
4

Personally I can see two tricks you can pursue. And one "blast from the past" more.

Route #1. Use GTT: GLOBAL TEMPORARY TABLE

GTTs were introduced in FB 2.1 (and u use it) and can be per-connection or per-transaction. You would want the per-transaction one. This difference is about the data(rows), the schema(structure and indexes, the meta-data) is persistent. See ON COMMIT DELETE ROWS option in the GTT documentation.

and so on.

In that way, you open the transaction, you fill the GTT with the data from your list (copying those 1500 value-pairs of data from your workstation to the server), you run your query JOINing over that GTT, and then you COMMIT your transaction and the table content is auto-dropped.

If you can run many almost-similar queries in the session, then it might make sense to make that GTT per-connection instead and to modify the data as you need, rather than re-fill it for every next query in every next transaction, but it is a more complex approach. Cleanse-early on every COMMIT is what i'd prefer as default approach until argued why per-connection would be better in this specific case. Just not to keep that garbage on the server between queries.

Route #2. Use string search - reversed LIKE matching.

In its basic form this method works for searching for some huge and arbitrary list of integer numbers. Your case is a bit more complex, you match against PAIRS of numbers, not single ones.

The simple idea is like that, let's assume we want to fetch rows where ID column can be 1, 4, 12, 24. Straightforward approach would be either making 4 queries for every value, or making WHERE ID = 1 or ID = 4 or ... or using WHERE id IN (1,4,12,24). Internally, IN would be unrolled into that very = or = or = and then most probably executed as four queries. Not very efficient for long lists.

So instead - for really long lists to match - we may form a special string. And match it as a text. This makes matching itself much less efficient, and prohibits using any indexing, the server runs a NATURAL SCAN over a whole table - but it makes a one-pass scan. When the matching-list is really large, the one-pass all-table scan gets more efficient than thousands of by-index fetches. BUT - only when the list-to-table ratio is really large, depends on your specific data.

We make the text enlisting all our target values, interspersed by AND wrapped into a delimiter: "~1~4~12~24~". Now we make the same delimiter-number-delimiter string of our ID column and see whether such a substring can be found.

The usual use of LIKE/CONTAINING is to match a column against data like below: SELECT * from the_table WHERE column_name CONTAINING value_param
We reverse it, SELECT * from the_table WHERE value_param CONTAINING column_name-based-expression

  SELECT * from the_table WHERE '~1~4~12~24~' CONTAINING '~' || ID || '~' 

This assumes ID would get auto-casted from integer to string. IF not you would have to do it manually: .... CONTAINING '~' || CAST( ID as VARCHAR(100) ) || '~'

Your case is a bit more complex, you need to match two numbers, Department and Number, so you would have to use TWO DIFFERENT delimiters, if you follow this way. Something like

SELECT * FROM employee e WHERE
  '~1@10~1@11~2@20~3@7~3@66~' CONTAINING
  '~' || e.Department || '@' || e.Number || '~'

Gotcha: you say your target list is 1500 elements. The target line would be... long. How exactly long???

VARCHAR in Firebird is limited with 32KB AFAIR, and longer texts should be made as text BLOBs, with reduced functionality. Does LIKE work against BLOBs in FB2.1? I don't remember, check release-notes. Also check if your library would even allow you to specify the parameter type as a BLOB not string. Now, what is your CONNECTION CHARSET? If it would be something like Windows-1250 or Windows-1251 - then one character is one byte, and you can fit 32K characters into 32KBytes. But if the CONNECTION CHARSET your application sets is UTF-8 - then each letter takes 4 bytes and your maximum VARCHARable string gets reduced to 8K letters.

You may try to avoid using parameter for this long string and to inline the target string constant into the SQL statement. But then you may hit the limit of maximum SQL statement length instead.

See Also: MON$CHARACTER_SET_ID in c:\Program Files\Firebird\Firebird_2_1\doc\README.monitoring_tables.txt and then SYSTEM TABLES section in the FB docs how to map IDs to charset textual names.

Route #3 Poor man's GTT. Enter pseudo-tables.

This trick could be used sometimes in older IB/FB versions before GTTs were introduced.

Pro: you do not need to change your persistent SCHEMA.
Con: without changing SCHEME - you can not create indices and can not use indexed joining. And yet again, you can hit the length limit of single SQL statement.

Really, don't think this would be applicable to your case, just to make the answer complete I think this trick should be mentioned too.

select * from employee e, (
  SELECT 1 as Department, 10 as Number FROM RDB$DATABASE
  UNION ALL SELECT 1, 11 FROM RDB$DATABASE
  UNION ALL SELECT 2, 20 FROM RDB$DATABASE
  UNION ALL SELECT 3, 7 FROM RDB$DATABASE
  UNION ALL SELECT 3, 66 FROM RDB$DATABASE
) t, 
where e.Department = t.Department 
  and e.Number = t.Number

Crude and ugly, but sometimes this pseudo-table might help. When? mostly it helps to make batch INSERT-from-SELECT, where indexing is not needed :-D It is rarely applicable to SELECTs - but just know the trick.

Jaela answered 16/5, 2017 at 9:40 Comment(3)
Perfect Answer for the given possibilities!! I think the most elegant way for big filter-ID-lists (up to 100.000 and more) is the global temp table. I will create a couple of integer and a couple of string columns and combine (concat) them to a single key. In the table the result load I will also combine the columns that are part of a composite key. This will take a little bit, I will report....Menagerie
@Menagerie if you use non-integer keys, like string columns, then approach #2 is but nullified, it does not scale beyond integers. And lists up to 100K elements make #3 non-working, then GTT seems all you have. But - if you make filters as large as 100K elements - then perhaps it is time to rethink the task at hands. Maybe those 100K elements can be made into a proper persistent table and kept there? with a filtering column like "filter_id" or (poor man's one) "attachment_id" ( see CURRENT_CONNECTION FB varBuffoon
Or you may try to split the filtering to two stages, making some preliminary filtering on some few columns at the server, then fetching this semi-filtered datastream to the client, and then do the final fine-grained all-columns-used filtering within your program.Buffoon

© 2022 - 2024 — McMap. All rights reserved.