I'm currently testing Maxscale with a Galera Cluster of 3 nodes in Read/Write Split mode. By default, Maxscale defines one node as a master and the other as slaves (my configuration says 100% of the slaves).
My intend is to check how Maxscale handles node shutdowns.
The problem is that with benchmarks (Sysbench, Mysqlslap) and also custom scripts (PHP), the connection to the backend (MariaDB) gets lost when I shut down a node of the cluster.
Error log:
MariaDB Corporation MaxScale /var/log/maxscale/error1.log Thu Oct 29 13:00:11 2015
-----------------------------------------------------------------------
--- Logging is enabled.
2015-10-29 13:00:11 Error: Failed to obtain address for host ::1, Address family for hostname not supported
2015-10-29 13:00:11 Warning: Failed to add user root@::1 for service [RW Split Router]. This user will be unavailable via MaxScale.
2015-10-29 13:00:11 Warning: Duplicate MySQL user found for service [RW Split Router]: [email protected] for database: (null)
2015-10-29 13:00:11 Warning: Duplicate MySQL user found for service [RW Split Router]: [email protected] for database: (null)
2015-10-29 13:00:11 Warning: Duplicate MySQL user found for service [RW Split Router]: [email protected] for database: (null)
2015-10-29 13:00:35 Error : Unable to write to backend due to authentication failure.
2015-10-29 13:00:40 Error : Monitor was unable to connect to server 10.58.224.113:3306 : "Can't connect to MySQL server on '10.58.224.113' (111)"
Trace log:
2015-10-29 13:00:33 [4] Route query to slave 10.58.224.113:3306 <
2015-10-29 13:00:33 [4] Servers and router connection counts:
2015-10-29 13:00:33 [4] current operations : 0 in 10.58.224.113:3306 RUNNING SLAVE
2015-10-29 13:00:33 [4] current operations : 0 in 10.26.116.84:3306 RUNNING SLAVE
2015-10-29 13:00:33 [4] current operations : 0 in 10.26.84.103:3306 RUNNING MASTER
2015-10-29 13:00:33 [4] Selected RUNNING SLAVE in 10.58.224.113:3306
2015-10-29 13:00:33 [4] Selected RUNNING SLAVE in 10.26.116.84:3306
2015-10-29 13:00:33 [4] Selected RUNNING MASTER in 10.26.84.103:3306
2015-10-29 13:00:34 [4] > Autocommit: [enabled], trx is [not open], cmd: COM_QUERY, type: QUERY_TYPE_READ, stmt: SELECT COUNT(*) FROM sbtest1
2015-10-29 13:00:34 [4] Route query to slave 10.58.224.113:3306 <
2015-10-29 13:00:36 [4] Stopped RW Split Router client session [4]
2015-10-29 13:00:42 Server changed state: server1[10.58.224.113:3306]: slave_down
PHP test script
<?php
# Test MaxScale
$db = new PDO('mysql:host=127.0.0.1;dbname=sbtest;charset=utf8;port=4446;', 'root', '***', array(PDO::ATTR_TIMEOUT => "10", PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION));
for($i=0; $i<5000; $i++)
{
try{
$q = $db->query('SELECT COUNT(*) FROM sbtest1', PDO::FETCH_NUM);
if($q){
$res = $q->fetchAll();
#var_dump($res);
echo time()." Result: {$res[0][0]}\n";
sleep(1);
}
}
catch(PDOException $Exception) {
echo "PDOException: " . $Exception->getMessage() . "\n";
die('forced script to stop');
}
}
Mysqlslap benchmark:
mysqlslap -h127.0.0.1 -uroot -p*** -P4446 --create="CREATE TABLE a (b int);INSERT INTO a VALUES (23)" --query="SELECT * FROM a" --concurrency=50 --iterations=200 --delimiter=";"
Sysbench benchmark:
sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --oltp-table-size=2500 --mysql-user=root --mysql-password=*** --mysql-host=127.0.0.1 --db-ps-mode=disable --mysql-port=4446 prepare
sysbench --num-threads=16 --max-requests=5000 --test=/usr/share/doc/sysbench/tests/db/oltp.lua --oltp-skip-trx=on --oltp-read-only=on --oltp-table-size=250000 --mysql-host=127.0.0.1 --mysql-user=root --mysql-password=*** --mysql-port=4446 run
Encountered errors:
PDOException: SQLSTATE[HY000]: General error: 2003 Authentication with backend failed. Session will be closed.
PDOException: SQLSTATE[HY000]: General error: 2006 MySQL server has gone away
PDOException: SQLSTATE[HY000]: General error: 2013 Lost connection to MySQL server during query
Maxscale configuration:
[maxscale]
threads=4
auth_connect_timeout=20
auth_read_timeout=20
auth_write_timeout=20
log_trace=1
[Galera Monitor]
type=monitor
module=galeramon
servers=server1,server2,server3
user=maxmon
passwd=***
monitor_interval=30000
backend_connect_timeout=10
backend_read_timeout=10
backend_write_timeout=10
[RW Split Router]
type=service
router=readwritesplit
servers=server2,server3,server1
user=root
passwd=***
max_slave_connections=100%
enable_root_user=1
router_options=slave_selection_criteria=LEAST_CURRENT_OPERATIONS
[Debug Interface]
type=service
router=debugcli
[CLI]
type=service
router=cli[Debug Interface]
type=service
router=debugcli
[CLI]
type=service
router=cli
[RW Split Listener]
type=listener
service=RW Split Router
protocol=MySQLClient
port=4446
[Debug Listener]
type=listener
service=Debug Interface
protocol=telnetd
address=127.0.0.1
port=4442
[CLI Listener]
type=listener
service=CLI
protocol=maxscaled
port=6603
[server1]
type=server
address=10.58.224.113
port=3306
protocol=MySQLBackend
[server2]
type=server
address=10.26.84.103
port=3306
protocol=MySQLBackend
[server3]
type=server
address=10.26.116.84
port=3306
protocol=MySQLBackend
Session monitoring shows that the sessions gets invalid like in the following example:
# maxadmin -pmariadb show sessions
Session 9 (0x7f60a4000b50)
State: Invalid State
Service: RW Split Router (0x342f460)
Client DCB: 0x7f60a40009a0
Client Address: [email protected]
Connected: Thu Oct 29 13:28:57 2015
I also played around with different timeouts variables and monitor_interval in Maxscale and also in my PHP testing script (PDO timeout), but the problem seems to be how Maxscale handles MySQL sessions.
I also read about the optimistic way of Maxscale which is forwarding the quickest response it gets from one of the nodes, but not sure if this is the cause.
Is there a way to make node shutdowns unharmful for any SQL requests propagated by Maxscale to all slave nodes of a cluster ?