Is it possible to check that a particular query opens how many files in MySQL?

Asked 14/7, 2015 at 13:35 Answered 21/7, 2015 at 16:38

Solved mysql sql linux database mysql-5.5

I have large number of open files limit in MySQL.

I have set open_files_limit to 150000 but still MySQL uses almost 80% of it.

Also I have low traffic and max concurrent connections around 30 and no query has more than 4 joins.

Adventure answered 14/7, 2015 at 13:35 Comment(5)

Are you looking at OPEN_FILES or OPENED_FILES. I ask this because 80% of 150,000 would mean you have at least 120,000 tables in your database, which I doubt. OPENED_FILES just counts from zero untill the server is restarted and will get big if you do not restart the server often. – Palmy 18/7, 2015 at 13:11

@david i know the difference between open files and opened files. I am asking about open files. Yes i am also surprise why so much files are open ? – Adventure 19/7, 2015 at 3:17

What is an "open file" in the MySQL world? Is it an open OS file handle? If so on Windows there's probably something in the Sysinternals suite to help trace this. – Snot 19/7, 2015 at 20:30

So do you have 120.000 database objects such that you have 120,000 files to open? – Palmy 20/7, 2015 at 11:43

no i don't have, that's surprising thing. If i have than it is obvious situation.. – Adventure 20/7, 2015 at 11:44

The files opened by the server are visible in the performance_schema.

See table performance_schema.file_instances.

http://dev.mysql.com/doc/refman/5.5/en/file-instances-table.html

As for tracing which query opens which file, it does not work that way, due to caching in the server itself (table cache, table definition cache).

Teasel answered 20/7, 2015 at 6:57 Comment(1)

No this is totally different thing.. I want to check query that open too many files in my system. so that i can reduce the open_file_limit.. – Adventure 20/7, 2015 at 7:5

MySQL shouldn't open that many files, unless you have set a ludicrously large value for the table_cache parameter (the default is 64, the maximum is 512K).

You can reduce the number of open files by issuing the FLUSH TABLES command.

Otherwise, the appropriate value of table_cache can be roughly estimated (in Linux) by running strace -c against all MySQLd threads. You get something like:

# strace -f -c -p $( pidof mysqld )
Process 13598 attached with 22 threads
[ ...pause while it gathers information... ]
^C
Process 13598 detached
...
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 58.82    0.040000          51       780           io_getevents
 29.41    0.020000         105       191        93 futex
 11.76    0.008000         103        78           select
  0.00    0.000000           0        72           stat
  0.00    0.000000           0        20           lstat
  0.00    0.000000           0        16           lseek
  0.00    0.000000           0        16           read
  0.00    0.000000           0         9         3 open
  0.00    0.000000           0         5           close
  0.00    0.000000           0         6           poll
...
------ ----------- ----------- --------- --------- ----------------

...and see whether there's a reasonable difference in impact in open() and close() calls; those are the calls which table_cache affects, and that influence how many open files there are at any given point.

If the impact of open() is negligible, then by all means reduce table_cache. It is mostly needed on slow IOSS'es, and there aren't many of those left around.

If you're running on Windows, you'll have to try and use ProcMon by SysInternals, or some similar tool.

Once you have table_cache to manageable levels, your query that now opens too many files will simply close and re-open many of those same files. You'll perhaps notice an impact on performances, that in all likelihood will be negligible. Chances are that a smaller table cache might actually get you results faster, as fetching an item from a modern, fast IOSS cache may well be faster than searching for it in a really large cache.

If you're into optimizing your server, you may want to look at this article too. The take-away is that as caches go, larger is not always better (it also applies to indexing).

Inspecting a specific query on Linux

On Linux you can use strace (see above) and verify what files are opened and how:

$ sudo strace -f -p $( pidof mysqld ) 2>&1 | grep 'open("'

Meanwhile from a different terminal I run a query, and:

[pid  8894] open("./ecm/db.opt", O_RDONLY) = 39
[pid  8894] open("./ecm/prof2_people.frm", O_RDONLY) = 39
[pid  8894] open("./ecm/prof2_discip.frm", O_RDONLY) = 39
[pid  8894] open("./ecm/prof2_discip.ibd", O_RDONLY) = 19
[pid  8894] open("./ecm/prof2_discip.ibd", O_RDWR) = 19
[pid  8894] open("./ecm/prof2_people.ibd", O_RDONLY) = 20
[pid  8894] open("./ecm/prof2_people.ibd", O_RDWR) = 20
[pid  8894] open("/proc/sys/vm/overcommit_memory", O_RDONLY|O_CLOEXEC) = 39

...these are the files that the query used (*be sure to run the query on a "cold-started" MySQL to prevent caching), and I see that the highest file handle assigned was 39, thus at no point were there more than 40 open files.

The same files can be checked from /proc/$PID/fd or from MySQL:

select * from performance_schema.file_instances where open_count > 1;

but the count from MySQL is slightly shorter, it does not take into account socket descriptors, log files, and temporary files.

Mastoiditis answered 21/7, 2015 at 16:38 Comment(0)

This would only be possible by adjusting the source code and add logging on that level.

ALternative: Run a test using this scenario:

You will have to setup an automated test to make this possible:

Log your queries;
Create a script which preloads your heap with a normal dataset (else you are testing against empty memory), take a snapshot of the number of open tables;
Run every query and take snapshot of open tables; (In retrospect) I think you could do this without restarting MySQL every time, so then just every query and record the results. Debugging is tedious work: Not impossible, just really tedious.

Personally I would start different:

Install cacti and percona cacti plugin
Register a week of normal workload
Then hunt down high load queries (slow log > 0.1 second, run through a script to find repeating queries).
Another week monitoring
Then hunt down additional queries with a high repeat count: This is often inefficient code firing a high number of queries where less could be used (like retrieving the keys and then all the values for every key per key (one by one: Happens a lot when programmers use ORM).

Improvise answered 19/7, 2015 at 19:51 Comment(2)

And how we can run test case? you mean run one query and check and restart ? is this feasible for hundreds of query running in a website? or you want to say something else? – Adventure 20/7, 2015 at 4:20

Updated my approach as response to your comment (too long for a comment) – Improvise 22/7, 2015 at 23:45

Inspecting a specific query on Linux

Recommended topics

Hot tags