I've been doing some HA testing of our database and in my simulation of server death I've found an issue.
My test uses Django and does this:
At this point everything hangs indefinitely within the mysql_ping function. As far as my app is concerned it is connected to the database (because of the previous query), it's just that the server is taking a long time to respond...
Does anyone know of any ways to handle this kind of situation? connect_timeout doesn't work as I'm already connected. read_timeout seems like a somewhat too blunt instrument (and I can't even get that working with Django anyway).
Setting the default socket timeout also doesn't work (and would be vastly too blunt as this would affect all socket operations and not just MySQL).
I'm seriously considering doing my queries within threads and using Thread.join(timeout) to perform the timeout.
In theory, if I can do this timeout then reconnect logic should kick in and our automatic failover of the database should work perfectly (kill -9 on affected processes currently does the trick but is a bit manual!).
I would think this would be more inline with setting a read_timeout on your front-facing webserver. Any number of reasons could exist to hold up your django app indefinitely. While you have found one specific case there could be many more (code errors, cache difficulties, etc).