Tuesday, December 8, 2015

Re: [google-cloud-sql-discuss] Loosing connections and temporary slowness in Cloud SQL

To close out this thread: Joakim and I have spent some time looking in to the issue, but our results have been largely inconclusive. I suspect that the connection management withn the Django app, which is maintaining a fixed number of connections, suffers badly when some connections are slow or fail. That said we did not determine why the connection issues were occurring in the first place.

I am keeping an internal investigation alive and will update this post if I find further information about this that would be productive to share.

On Wed, Nov 4, 2015 at 12:31 AM, Joakim <joakim@biddl.com> wrote:
I've now sent you the requested details privately.

Best regards,
Joakim

On Tuesday, November 3, 2015 at 9:00:22 PM UTC+2, David Newgas wrote:
Could you let me know your project and instance name, and when the error last happened? I'll check to logs and see if I can find the cause. You can reply directly to me to avoid sharing private information on the group, I will then share conclusions on the list.

Yours,
David

On Tue, Nov 3, 2015 at 9:31 AM, Joakim <joa...@biddl.com> wrote:
Hi,

I do not believe any of your suggestions helps that much:
  • The IP's of the servers are all whitelisted; the connection is randomly lost from those hosts or get just really slow.
  • The instance is not at least restarted by us, and can't see in the "Operations" tab anything special around the time of the problems. Additionally the Activation policy is set to Always on.
  • The active connections spike at around 4 (Django uses connections in a smart way). I believe a D2 instance should be able to handle that many connections.
  • Connections are long lasting; Django uses connections in a quite decent way.
  • The filesystem replication is set to Asynchronous, if that has any difference.
  • The writes per second seem to have spiked at most at 0.4 writes/second and can for example say that last time we had an issue with slow responses the writes / second was a lot lower than earlier when there was no problem at all. The reads seem to be at most at around 0.05 reads/second.
  • The amount of queries is quite stable at 15 per second; when we last noticed slowness, the queries were stable at 15 per second before the problem, went down without any explanation due to the problem, and once it again started to respond apparently there were some queries queued, which caused it to temporarily get up to 32 queries / second, to then immediately go down again to 15.
  • During the last incident, the connections were not totally lost, instead the response time went up to at most 4,020 milliseconds ≈ 4 seconds, which is not tolerable for our use case.
Any further ideas on the reasons (& possible solutions) to the slowness/lost connections would be really welcome.

Best regards,
Joakim


On Friday, October 30, 2015 at 6:02:18 PM UTC+2, David Newgas wrote:
There are several possible causes for the "Lost connection to MySQL server at 'reading initial communication packet'" error:
  • This occurs if your GCE instance's IP address is not authorized to access the Cloud SQL instance. I doubt this is the case because it would not be an intermittent error.
  • This error also occurs if your instance is unavailable, for example because it is taking a long time to restart or start. If you instance has "on demand" activation, you might want to try "always on" mode and see if that helps. Long restart times can also be caused by a large general log; If you have ever had it enabled try truncating the general log.
  • This message might occur if you are hitting one of the connection limits. Take a look at the "active connections" graph for your instance on the developers console, does it show a lot of connections when you get the error?
  • Do you open and close connections extremely frequently? This can cause you to hit a connection limit even if the total number of concurrent connections does not increase too far.
Also remember the occasional error connection to Cloud SQL is expected. Please review the Cloud SQL SLA to get an idea of the error rate we aim for.

Yours,
David

On Fri, Oct 30, 2015 at 1:54 AM, Joakim <joa...@biddl.com> wrote:
We have a Django Application (hosted on Compute Engine), that uses Cloud SQL. We've set up really frequent monitoring of it in the Load Balancers health checks. We do however about once per day get a bunch of the following errors: django.db.utils:OperationalError: (2013, 'Lost connection to MySQL server at \'reading initial communication packet\', system error: 0 "Internal error/check (Not system error)"').

Typically the MySQL/Cloud SQL interaction of the health checks takes less than 20ms, however a few times a day the it can suddenly take 500-1200 ms. 

The database is really lightly loaded (health checks count for > 95% of the requests to the system) and we've not seen any similar issues during heavy load testing. There just seems to be something randomly causing Cloud SQL to perform really slowly for a request or two or drop the connections and then continue to work normally. 

What could be the reason to both loosing the connection and to the random, but extreme slowness? Any ideas on how to possibly fix both issues?

--
You received this message because you are subscribed to the Google Groups "Google Cloud SQL discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-sql-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-sql-discuss/21b7c9b2-eee6-459c-9b69-ba7d2fcbd740%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Google Cloud SQL discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-sql-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-sql-discuss/242e652c-b220-473d-a9fb-e46e2d299c2f%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Google Cloud SQL discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-sql-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-sql-discuss/fda540d7-eb9d-43ef-ac63-efb95b3316f3%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Google Cloud SQL discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-sql-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-sql-discuss/CAJZK_bZy1W-wNRogX8B51nR0xqhC4ZPEY2L_eJX6%2BVtufJfvpQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

No comments:

Post a Comment