Thursday, July 5, 2012

These features would make CloudSQL a killer solution: replicated n-instances for reads and async queries

My GAE app uses the datastore for everything except the index of a filtering feature which requires many different joins, so for that I use CloudSQL.  I have carefully sharded my data at the app level so I can store it across n number of instances. However I have two concerns:

If I target each shard for a specific data size or write traffic, it still leaves me exposed to spikes in reads.  So it would be great if it could be configured to automatically replicate across n copies as needed by read query volume (growing and shrinking as GAE instances do, though perhaps more slowly).

Some searches will involve querying multiple shards.  Right now this is done sequentially.  It would be great if I could send n number of queries asynchronously to execute in parallel, just like in GAE's datastore.  I suppose I could create a light wrapper and then use n async URLFetches against my own app, but I'd have to serialize and deserialize my resultsets somehow.  That's my backup plan, but it would be great for this to be supported right in the API.

Has anyone dealt with these issues already? I couldn't find open issues for these in the tracker, is this something I should open a new issue on?

No comments:

Post a Comment