Wednesday, April 25, 2018

[google-cloud-sql-discuss] Re: Postgres Connections Randomly Dropping

Thanks for the suggestion Dinesh, but I don't think that's it. We are getting the errors much more frequently than there are events in the operations tab. Typically we get 2-3 bursts of errors per day, but there is generally just one operation: a backup. The timing also doesn't necessarily coincide between the events and the errors. Regardless, if the operations were a problem, would turning on HA fix the issue, or would that not help?

Any other ideas about why we could be seeing these connection problems?

Nigel G.

On Wednesday, 25 April 2018 14:17:18 UTC-7, Dinesh (Google Platform Support) wrote:
As you suggesting you receive such errors only for 2-3 minutes in a day, I suspect your instance might be going through maintenance updates (that require an instance restart) during that times.  Please view operational logs of your Postgre SQL instance. You can view them from cloud console GUI inside instance details view, under operations tab. If you find instance was updated at the same time, that explain the cause of these logs. 

If that is the real cause of the mentioned errors, I will recommend configuring the schedule for the Maintenance window and Maintenance timing to avoid any surprises in the future. 

Let me know if this helps?

Regards,


On Wednesday, April 25, 2018 at 12:59:51 PM UTC-4, Nigel Gutzmann wrote:
I have a django and celery application running inside of Google Kubernetes Engine. I am connecting to my CloudSQL instance (postgres) using a Kubernetes service running the CloudSQL Proxy. Database connections and queries generally work fine, but occasionally we get spurts of errors with connections breaking. They are raised in python like this:

OperationalError: could not connect to server: Connection refused Is the server running on host "cloudsql-proxy-service" and accepting TCP/IP connections on port 3306?

or

OperationalError: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request.

I can't find anything that might cause that in the logs of the CloudSQL instance. There are some messages like this in the CloudSQL proxy logs:

2018/04/24 18:55:18 Instance <project_name>:us-central1:<instance_name> closed connection

But I can't necessarily correlate the timestamps between when those messages appear and when we get the python errors. I have tried setting CONN_MAX_AGE and tcp keepalives like this inside django's settings.py:

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'NAME': '<db_name>',
        'USER': os.environ.get('DB_USER', None),
        'HOST': os.environ.get('DB_HOST', None),
        'PORT': os.environ.get('DB_PORT', None),
        'PASSWORD': os.environ.get('DB_PASSWORD', None),
        'CONN_MAX_AGE': int(os.environ.get('CONN_MAX_AGE', 0)),
        'OPTIONS': {
            'keepalives': 1,
            'keepalives_idle': 480,
            'keepalives_interval': 10,
            'keepalives_count': 3,
        },
    },
}

But that didn't seem to make a difference. We still get the same errors in bunches, about 20 errors over the span of 2-3 minutes, 2-3 times per day.

--
You received this message because you are subscribed to the Google Groups "Google Cloud SQL discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-sql-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-sql-discuss/045116d9-3ec9-494f-a72e-7853a5d67d8c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

No comments:

Post a Comment