Monday, April 12, 2021

[google-cloud-sql-discuss] Re: Consistent "Segmentation fault" error in logs when running a query

Good news: I can confirm that disabling Query Insights on our production instance has made segfaults stop completely. What's more interesting is that we've re-enabled it with application tags disabled, and so far no segfaults either.

On Monday, April 12, 2021 at 9:50:40 PM UTC+12 Andrew K. wrote:
From our side, we can definitely confirm they are still occurring. The most recent occurrence for us is less than 20 minutes ago, 2021-04-12 09:25:38.469 UTC and we're seeing about 20 segfaults a day on our production instance.

Unfortunately, due to the inability to collect stack traces via Cloud SQL, there is not much we can do to debug further. The PG bug you have linked dates back to PG 9.6 and is unlikely to have gone unnoticed and not been fixed by PG 13 (we've verified that it happens for us both on PG 12 and 13). Considering that we have not been able to reproduce it locally so far, it could very well be isolated to Cloud SQL implementation of Postgres. Reporting it to PG won't help, since they will request stack traces we cannot collect.

There are a few threads on the Google issue tracker that suggest Query Insights may be responsible for segmentation faults. Since we have it enabled, I will try disabling it (if I can find out how to do that) and report back.
On Monday, April 12, 2021 at 9:24:55 PM UTC+12 jli...@google.com wrote:
Hello, 

Would you be able to confirm if you are still able to see the segmentation fault errors? If so, would you be able to please provide the most recent timestamp of the occurrence. 

Also, would you be able to provide a little more details regarding the query? As per Andrew's question, does the query involve a LEFT JOIN? According to the post presented here [1], this may be due to a PostgreSQL bug. Would you be able to verify this and get back to use with this information? 

Better yet, I would suggest you bring this issue up via public issue tracker [2], as this will allow for a more in depth investigation into the segmentation fault; at the same time bring the issue closer to the Cloud SQL team for further inspection.


On Tuesday, March 30, 2021 at 1:10:32 PM UTC-4 Andrew K. wrote:
Does this query involve either a LEFT JOIN or aggregate functions? We have a very similar issue with Cloud SQL Postgres 12.5, with four different queries causing segmentation faults, and these are the only similarities between then. So far we were not able to reproduce the segfaults on a local PG instance running the same DB and queries, and unfortunately it's not possible to attach a debugger to Cloud SQL (as far as we know).

On Tuesday, March 30, 2021 at 3:23:04 AM UTC+13 e...@contractbook.dk wrote:
We're seeing a "Segmentation fault" error in logs for our staging environment, caused by a single specific DB query.

I am unable to replicate the problem consistently by running the same query manually, but we see the issue in logs basically every day now.

Our environment is: PostgreSQL 13.1, the DB tier is "db-custom-1-3840".

In the 1st half of February, we were seeing "Segmentation fault" in relation to another DB query & different DB instance. That issue somehow got self-resolved around Feb 15.

I'd love to provide more info to diagnose this – is there any other info I could provide that would help solve this case? Otherwise, would it be possible for a Google Cloud SQL engineer look into this for us?

--
You received this message because you are subscribed to the Google Groups "Google Cloud SQL discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-sql-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-sql-discuss/5291da48-6bf4-43a2-a63f-d6b8948f761en%40googlegroups.com.

No comments:

Post a Comment