Saturday, September 5, 2020

[google-cloud-sql-discuss] Scaling postgres updates on app engine


Hey folks!

I'm working on a standard environment app engine Django application (python 3.7) where we have a few similarity matrices that will warrant complete updates regularly (weekly or monthly). For example, one is about 12K by 12K, and the other is 16K by 16K. These are similarity matrices, and in that the data to compare is not changing and each cell in the matrix (a score and other metadata) is it's own model instance, the operations can be considered many small tasks. What I'm wondering about are different strategies for scaling the operation, primarily to make it faster or more efficient. Running one round of updates (basically iterating through the diagonal of the matrix) in serial takes about a day.

I'm going to cross post this with both app engine and managed sql, so apologies for the double post! I've only started exploring ideas, and I've been looking at the task queue, and cloud functions, and I'm thinking some strategy that can submit a bunch of jobs to a queue to be processed, and then have (some maximum number) of connections to update allowed at once. Batch seems like a lot of overhead (and expense) to just update the matrices for a tiny application, but I haven't tried it. We would need to think about access permissions for this task to do the update (directly or indirectly). I'm not sure if this kind of operation would require horizontal scaling. I want to find a solution that isn't hugely complex, so it's easy to reproduce in the future. Thank you!

Best,

Vanessa

--
You received this message because you are subscribed to the Google Groups "Google Cloud SQL discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-sql-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-sql-discuss/243de337-7e2c-4f14-92b5-c23804d489f2n%40googlegroups.com.

No comments:

Post a Comment