Saturday, September 5, 2020

[google-cloud-sql-discuss] Scaling postgres updates for Django on App Engine

Hi Folks,

Apologies for a potential double posting - I sent a derivation of this in the web interface and it was sucked into the depths of the internet, maybe never to be seen again! I'll try to re-articulate my original post.

Detailed Summary
I'm working on a Google App Engine Django app with a postgres database, and I'm looking for ways to scale update operations for model instances that correspond to similarity values in a matrix. For example, we have two matrices, 12K by 12K and 16K by 16K, and we want to update pairwise updates of the values at some regular frequency (weekly or monthly). Each similarity score is stored in its own instance of a similarity class, so given that the input data to calculate the similarity doesn't change during update, the operations can be considered mutually exclusive. Running on a local machine in serial takes in the ballpark of a day. I'd like to scale this to be faster and optimized.

Quick Summary
Let's discuss solutions for scaling postgres updates of a similarity matrix using Google Cloud, where the database is for a Django site on App Engine.

What I'm Looking At
I'm used to scaling things with HPC, and I'm pretty excited to get a chance to try this in cloud. I've been looking at cloud functions, task queues, and batch on GKE. My thinking is that I'd like something that we can submit a job to a queue, and then have it process up to some maximum number of operations. I'm also wondering if we will need to do horizontal scaling. We will definitely need to think about how/if the task has permissions to directory or indirectly make the update.

I'd ideally like a relatively simple solution, meaning that it is easy to reproduce. Thanks muchly for your help, and apologies again for potential double posting!

Best,

Vanessa

--
You received this message because you are subscribed to the Google Groups "Google Cloud SQL discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-sql-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-sql-discuss/CAM%3Dpu%2BKdQrhVdm0rUer6wqcuR4mQH9npkVBb4_jy-g93f%2BC8dA%40mail.gmail.com.

No comments:

Post a Comment