Wednesday, November 8, 2017

[google-cloud-sql-discuss] Re: Migrate from Cloud SQL to DataStore

Hi Azeem,
 
I understood how I can Migrate from SQL to Datastore. But now Problem is that 

No more than one operation per sec

This is creating trouble for me. Suppose there is another kind Like child of Post then it quite possible there is more than one like per sec. 
I don't want to create Like or comment under user, it always under post. Because in my App it's very rear case when need to get all posts where user comment. Mostly need to 
get all comments of the post. So, I think it's a best to make a post as a parent of comment, to make query fast. You mention.

Alright, assuming your most frequent use-case will be to get a post with the last few comments (or just a snippet of each of the last few comments), like G+ and Facebook give you a preview of a post in your stream. I doubt you need strong consistency for these requests. And then, the user opens a post and you want to show more comments (maybe not all, just a batch, let them scroll or click for the next batch), again weak consistency might be sufficient. Normally, browsing posts or comments make no good reason for transactions.

But a strongly consistent read might make sense when cleaning up Datastore from posts/comments of a user who happens to be a spam-bot or a user who wants to delete the account (privacy) or something like Google Check-out. Having the user as parent might be easier and safer to query really all the user's comments and posts batch-wise.

For getting the post-previews:
You could compute the post preview as one bigger entity (post body including the recent comment-snippets as repeated structured property). For each new comment to a post, you write some data of the comment into the post's recent comments property. However, since many comments to one post could create a hotspot on the post entity, where even exponential backoff might not be sufficient, but you can apply the offsetting/buffering described below.

For browsing comments of post-reviews:
Add a computed indexed property to comments that is used for the complete "query comments for this post", including the sorting by date. First the key or ID of the post, a separator character, then UTC date-time value of comment's creation, e.g.:

Post:
KEY(User, 1, Post 7)

Comment:
KEY(User, 5, Comment, 99)
indexedSortByPost = "user_1_post_7-20171024151617.4929"

For counting likes
Likes could be a special case, because normally you only care for the number of likes, unless you want to combine this with a update-notification (i.e., if user likes a post, user will be notified automatically for any new comment). If it is about counting the likes, you could implement them with sharding counters. See this article with Java sample: Sharding counters

Regarding good Datastore designs, I want to recommend this article from the Datastore docs: Best practices.
There are some more articles in the App Engine docs that will give you a better understanding:
 Computing the post-preview
Try to work-around the write limit by offsetting/buffering any comment related write-ops, so writing is throttled below the 1s limit.

How can I do this I'm using Java any article or tutorial ?   

I don't know about an article on this topic. But for this example it would work something like this:
  1. Whenever a comment is created, the request handler will not just write the new Comment object to Datastore, but will also create a transactional task into a pull task-queue "post-need-new-preview" with the hashed key of the post (e.g. URL-encoded string) as a tag. The same can be done for request handlers that update existing posts or existing comments, or that delete comments.
  2. Add a cron-job to your app with a schedule that reflects how often you want the previews to be updated the earliest, e.g. each 5 minutes.
  3. The cron-job will run a task in a push task-queue where tasks in the pull queue "post-need-new-preview" are leased by tag. Remember: each tag represents a post. Pull queue allows you to lease up to 1,000 tasks with the same tag, starting with the oldest task in the pull queue.
  4. For each tag, try to get the post by key, query the last x comments, copy their snippets into the post and write the post back to Datastore (non-transactional). If there are more tasks with the same tag, wait 1 or 2 seconds and queue another task into the push queue to repeat.
  5. If there are no more tasks left, just leave the cron-job task-handler. You could also leave if the oldest task for a tag is too fresh (e.g. only 30 seconds old task), to save instance hours.
With this, you can throttle the write-ops for each post respecting the 1sec limit. Due to the nature of task-queues, the instances would increase with the amount of work-load (i.e. number of posts that need a new preview). If you put handlers like these into some kind of backend service of your app, you could configure a cheap, slow instance class for them. With the cron-job schedule you can control how fresh you want your post-previews to be.

Of course, there could be alternative implementations where your app stores markers per post in Datastore to achieve more or less the same as with the pull queue. But the basic idea is the same: defer pending write ops, group them by post, and make a single write-op instead at certain intervals.

--
You received this message because you are subscribed to the Google Groups "Google Cloud SQL discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-sql-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-sql-discuss/bade3c09-1206-4f0b-8c1c-e7d0d29201bd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

No comments:

Post a Comment