Wednesday, February 15, 2017

[google-cloud-sql-discuss] Re: Communications Link Failure connecting to MySQL Cloud Instace

Hey Dave,

So, we've looked a lot at the question of the instances being overloaded, connections being used up, the connection timing use-patterns and the wait parameters, and none of these seem to have turned up anything. But something in your last reply sparked a question for me, that seems a quite likely candidate for an explanation:

Given that the issue occurs generally around the same time and appear to take place within the span of a few seconds to a minute or two, it's possible this is a case of maintenance restarts on the instances. The behaviour will be determined partly by whether you're running Cloud SQL v1 or v2 instances. For v2, you can set the time window for the maintenance restarts.

If this is the case, you should code a handler that will pause / postpone / queue up any work needed to be done, so that minor interruptions like this can be handled smoothly for any client programs reliant on the DB.

Let me know what you think and what your investigation turns up, for example checking in your logs whether the time for these errors is quite consistent in a certain window.

Cheers,

Nick
Cloud Platform Community Support 

On Tuesday, February 14, 2017 at 4:45:58 PM UTC-5, Dave Greenly wrote:
1) Once we get the connection, we pretty much us it right away so I don't think this is it, although we do have some lengthy queries and processing after we pull a record set
2) The wait_timeout and max_wait would be the defaults.  This seems like it could be a likely culprit.  I am investigating now how to find what these are, and potentially change them.  If you have any suggestions on how to do that in your cloud instances I would appreciate any guidance.
3) I am currently not logging any connection parameters.
4) Firewalls should not be an issue.

I did find the wait_timeout and it is set to the default in our DB, which according to the table below is 8 hours.  Which should be plenty.  I have also been watching our processing and during the "heavy times" we still have only 10 or so connections open at a time and they all seem to be closing correctly.  The time we get this error the most seems to be in the early morning hours, around 2 and 3 AM Mountain time.  (Still looking into max_wait)

wait_timeout

Command-Line Format--wait_timeout=#
System VariableNamewait_timeout
Variable ScopeGlobal, Session
Dynamic VariableYes
Permitted Values (Windows)Typeinteger
Default28800
Min Value1
Max Value2147483
Permitted Values (Other)Typeinteger
Default28800
Min Value1
Max Value31536000

The number of seconds the server waits for activity on a noninteractive connection before closing it.


Thanks
Dave

On Tuesday, February 14, 2017 at 11:43:16 AM UTC-7, paynen wrote:
Hey Dave,

Thanks for the detail on how and when the exceptions occur. Given that they can happen after - rather than only at the moment that - you make a connection (or attempt to make one, as the case may be), it seems unlikely that the DB server doesn't have connections available. It's rather the case that the connection is closing having been established. Here are some further questions which might get us closer to understanding:

1. Where in the timeline of a request / task does the connection get opened, and how long might it take to actually use the connection? Are there gaps between subsequent uses of the same connection?
 
2. What are the wait_timeout and max_wait parameters on the DB server?
 
3. When (if?) the issue manifests at the very opening of the connection, are you logging the connection parameters?
 
4. Do you have any firewalls set up that might be at issue?

Ultimately, it appears there are many possible issues underlying this general exception. Maybe some of the ideas in this Stack Q&A might be of help. One of the best pieces of advice is to update the JDBC libraries to newest versions.

As for the question of actual network failures, this seems quite unlikely given that everything is in-data-center. 

Let me know what you find and what you think. I'll be here to help.

Cheers,

Nick
Cloud Platform Community Support



On Monday, February 13, 2017 at 4:55:20 PM UTC-5, Dave Greenly wrote:

Hi Nick, I am working on getting some of the code that shows where we close the connections and will post it when I have it organized.  I am also looking at it and debugging to see if we are missing any DB close calls, but, I am skeptical at this time if this is the issue because the code fails with this Communications Link Failure and different times, and, sometimes it is when we have an established connection already, and the connection seems to just drop.  Also for an exception to be causing us to "miss" the code is unlikely because that implies we would be getting exceptions on a regular basis and we would notice other things not processing correctly if this were the case.  Still digging on my side for sure though.

1) How many connections should we be able to have open at one time?
2) Could this simply be a case that we don't have a big enough of a Cloud MySQL instance to run our DB instance?

Thanks
Dave


On Thursday, February 9, 2017 at 4:47:30 PM UTC-7, paynen wrote:
Hey Dave,

Thanks for coming back with that information. It appears that you're doing nothing unusual in that particular block of code. I do have some questions, though, which should take us closer to the root cause.

As you can read in this Stack Overflow Q&A, there are several possible reasons for this behaviour:
  1. IP address or hostname in JDBC URL is wrong.
  2. Hostname in JDBC URL is not recognized by local DNS server.
  3. Port number is missing or wrong in JDBC URL.
  4. DB server is down.
  5. DB server doesn't accept TCP/IP connections.
  6. DB server has run out of connections.
  7. Something in between Java and DB is blocking connections, e.g. a firewall or proxy.
I don't believe 1, 2, 3 or 7 are at play since your connections work at other times and all those causes would be likely to affect every single connection, so this leaves the following options:
  • DB server is down
  • DB server is not accepting TCP/IP connections
  • DB server has run out of connections
Specifically, the issue that seems most plausible to me is the last one, since this is a common source of issues.

Could you share a more full code example showing that the code definitely closes connections it opens?

It even be could be possible even if you normally close connections that an unexpected exception or other exit path from the code is causing the connection-closing code to be missed. This would result in a buildup over time of sockets in a TIME_WAIT state, and eventually the DB server would be unable to service requests until some of these timed out. 

You could also check this by logging onto the DB server machine and checking sudo netstat -nat

If this is the case, you can solve the issue by making sure that all possible code paths close the connection. You could, in addition to this (although this should take care of the vast majority of hanging TIME_WAIT connections) potentially tweak the mysql
wait_timeout parameter, although you'll want to gather statistics on TCP traffic on the machine to make sure that you don't set it too low so that it cuts off actual connections that wouldn't ordinarily fail, although they happen to experience some significant delays between packet arrivals for whatever reason.

Let me know how this all looks to you and what you investigation of your code turns up. We can of course look into other avenues if this doesn't correspond to the actual root cause.

Cheers,

Nick
Cloud Platform Community Support


On Thursday, February 9, 2017 at 3:22:54 PM UTC-5, Dave Greenly wrote:
Sure - here is how we connect in Java with as much of the log as was output.

Let me know if I can provide any more info, this is becoming a very painful problem for us. 

Thanks!!
Dave


public String ConnectToDb(String host, String user, String password, String dbname) {
try {
Class.forName("com.mysql.jdbc.Driver");
db = DriverManager.getConnection(
"jdbc:mysql://" + host + ":3306/" + dbname + "?user=" + user + "&password=" + password);
} catch (Exception e) {
db = null;
e.printStackTrace();
System.out.println("SQLException: " + e.getMessage());
return e.getMessage();
}
return "success";
}

Exception Caught in Query()SQLException: Communications link failure The last packet successfully received from the server was 3,561 milliseconds ago. The last packet sent successfully to the server was 3,561 milliseconds ago. SQLState: 08S01 VendorError: 0 Feb 09, 2017 1:14:37 PM spaceNavSendMail.Sql queryAggregatePcQuery SEVERE: Error in queryAggregatePcQuery() com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure The last packet successfully received from the server was 3,561 milliseconds ago. The last packet sent successfully to the server was 3,561 milliseconds ago. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) at java.lang.reflect.Constructor.newInstance(Unknown Source) at com.mysql.jdbc.Util.handleNewInstance(Util.java:377) at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1036) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3427) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3327) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3814) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2435) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2582) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2526) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2484) at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1446) at spaceNavSendMail.Sql.queryAggregatePcQuery(Sql.java:3435) at spaceNavSendMail.SpaceNavSendMail.generateEmailMessage(SpaceNavSendMail.java:980) at spaceNavSendMail.SpaceNavSendMail.getConjunctionReport(SpaceNavSendMail.java:897) at spaceNavSendMail.SpaceNavSendMail.sendSummaryEmail(SpaceNavSendMail.java:710) at spaceNavSendMail.SpaceNavSendMail.main(SpaceNavSendMail.java:347) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost. at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2914) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3337)



On Thursday, January 12, 2017 at 9:22:35 AM UTC-7, Dave Greenly wrote:
Hi there, 

We have a Cloud Instance of a MySQL Database which we use daily.  We have multiple compute engines that pull and push data to it, and on those compute engines are Java programs that connect to the database and read/write.

Pretty standard stuff.

On a semi-consistent basis, our Java programs have difficulty connecting to the database, and we receive a "Communications Link Failure" and our code cannot connect to the DB.  After some indeterminate amount of time, things "just start working again".

We actually put a check in our code for this error, and if we get it, we wait a few seconds, and try again, but even after 5 or so retry attempts, sometimes it is still not working.

We have chalked this up in the past to some type of network anomaly going on in the cloud.

We are using mysql-connector-java-5.1.38, and the MySQL instance on the cloud is version MySQL 5.5.

Has anyone else experienced this?

Thanks
Dave

--
You received this message because you are subscribed to the Google Groups "Google Cloud SQL discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-sql-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-sql-discuss/cd06d1c7-713e-4a21-af9a-d6491bc3536d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

No comments:

Post a Comment