Thursday, April 26, 2018

[google-cloud-sql-discuss] Re: Postgres Connections Randomly Dropping

Hi Nigel,

Thanks for the feedback. For Postgres SQL instance, fail-over instance is located in the different zone with-in the same configured region, so if the master cannot serve data from its primary zone, it fails over and continues to serve data from its secondary zone. As per the cloud documentation, if cloud SQL instance configured for HA experiences an outage or becomes unresponsive, Cloud SQL automatically switches to serving data from the secondary zone. It is recommended to configure all of your instances that contain production data for HA.

Regarding the connection problem, please check and verify the following points:-

1. Default TCP/IP port on which the postgres server listen is tcp:5432. Any specific reason why you using port 3306 which is default port for MySQL DB? Please correct this if it is a mistake.

2. Please note that there is Maximum concurrent connections limit for Cloud SQL based upon instance memory. Please check if your instance is not going beyond the connection limit during peak hours.

3. Please refer to this StackOverflow thread for the similar problem and suggested solutions for the troubleshooting. 

If above does not help, please share your project number, PostgreSQL instance name, Instance operations logs, and PostgreSQL error logs along with issue date, time and duration via a private email.

Regards,
 
On Wednesday, April 25, 2018 at 5:38:43 PM UTC-4, Nigel Gutzmann wrote:
Thanks for the suggestion Dinesh, but I don't think that's it. We are getting the errors much more frequently than there are events in the operations tab. Typically we get 2-3 bursts of errors per day, but there is generally just one operation: a backup. The timing also doesn't necessarily coincide between the events and the errors. Regardless, if the operations were a problem, would turning on HA fix the issue, or would that not help?

Any other ideas about why we could be seeing these connection problems?

Nigel G.

On Wednesday, 25 April 2018 14:17:18 UTC-7, Dinesh (Google Platform Support) wrote:
As you suggesting you receive such errors only for 2-3 minutes in a day, I suspect your instance might be going through maintenance updates (that require an instance restart) during that times.  Please view operational logs of your Postgre SQL instance. You can view them from cloud console GUI inside instance details view, under operations tab. If you find instance was updated at the same time, that explain the cause of these logs. 

If that is the real cause of the mentioned errors, I will recommend configuring the schedule for the Maintenance window and Maintenance timing to avoid any surprises in the future. 

Let me know if this helps?

Regards,


On Wednesday, April 25, 2018 at 12:59:51 PM UTC-4, Nigel Gutzmann wrote:
I have a django and celery application running inside of Google Kubernetes Engine. I am connecting to my CloudSQL instance (postgres) using a Kubernetes service running the CloudSQL Proxy. Database connections and queries generally work fine, but occasionally we get spurts of errors with connections breaking. They are raised in python like this:

OperationalError: could not connect to server: Connection refused Is the server running on host "cloudsql-proxy-service" and accepting TCP/IP connections on port 3306?

or

OperationalError: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request.

I can't find anything that might cause that in the logs of the CloudSQL instance. There are some messages like this in the CloudSQL proxy logs:

2018/04/24 18:55:18 Instance <project_name>:us-central1:<instance_name> closed connection

But I can't necessarily correlate the timestamps between when those messages appear and when we get the python errors. I have tried setting CONN_MAX_AGE and tcp keepalives like this inside django's settings.py:

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'NAME': '<db_name>',
        'USER': os.environ.get('DB_USER', None),
        'HOST': os.environ.get('DB_HOST', None),
        'PORT': os.environ.get('DB_PORT', None),
        'PASSWORD': os.environ.get('DB_PASSWORD', None),
        'CONN_MAX_AGE': int(os.environ.get('CONN_MAX_AGE', 0)),
        'OPTIONS': {
            'keepalives': 1,
            'keepalives_idle': 480,
            'keepalives_interval': 10,
            'keepalives_count': 3,
        },
    },
}

But that didn't seem to make a difference. We still get the same errors in bunches, about 20 errors over the span of 2-3 minutes, 2-3 times per day.

--
You received this message because you are subscribed to the Google Groups "Google Cloud SQL discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-sql-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-sql-discuss/9a483768-b42c-441d-a5be-77a2fc73de25%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

No comments:

Post a Comment