Sunday, November 12, 2017

[google-cloud-sql-discuss] Re: Migrate from Cloud SQL to DataStore



On Sunday, 12 November 2017 21:14:59 UTC+1, Azeem Haider wrote:
After your reply I search a lot on shared count. What I understand is that Shared Count is a technique where you divide single entity into multiple entities, Right ?
Pick one of them randomly and update it, When you need to get Count fetch all these entities and sum up values for total count. 


 
Thing which is confusing me is that if you convert one entity into multiple entities with same parent isn't it inside a single entity group ? If it's a single entity group then you
can't apply more than one write op per sec. 

For example Post1 has two shared count A and B, is post1, A and B in same entity group ? 
OR post1 and A is separate entity group and post1 and B is separate entity group ?


The sharded counters must not have any ancestor, because putting them again into the same entity group would be counter-effective to the limit workaround you want to achieve.

  1. Every counter must be in its own entity group (no parent).
  2. The count property should not be indexed and not in a composite index.
  3. Every counter must have an indexed reference to the post for which it counts, so you can add this filter to any query to collect all counters of a particular post
 
If these are separate entity group then we can also use this technique for comments system, am I right ?


The sharded counters work because when you want to get the total value of likes, you query all counters of the particular post, then you just add all counts and return a single value.

You can also count comments of a post. But you cannot store comments in some shards. The idea is nice, but then, when your user is browsing the comments of a post, your handler has to query multiple shards, and there are to many issues here. You cannot JOIN multiple query results easily, using multiple cursors or offsets per HTTP request and then mapping them back into different shard queries. Much pain awaits on this road.
 
Another thing I don't know I understand it correctly or not.


For getting the post-previews:
You could compute the post preview as one bigger entity (post body including the recent comment-snippets as repeated structured property). For each new comment to a post, you write some data of the comment into the post's recent comments property. However, since many comments to one post could create a hotspot on the post entity, where even exponential backoff might not be sufficient, but you can apply the offsetting/buffering described below.

You said that to create a Post entity with a property of comment which hold list of recent 3 comments ids ?

Not only a list of recent comment ids, but even a preview of the actual comments (the first two lines or so of each), so, when the server returns a post or even a list of posts, the query will already contain everything you need for the response.
 
But using this we need to insert entity in comment Kind as well as update comment list id in Post entity.  Please correct me.
 

Almost correctly. Both kinds have a user as parent.

But Post has the author of the post as parent. And Comment has the author of the comment as parent:

  • KEY(User, 1, Post, 5)
  • KEY(User, 2, Comment 3) with post property KEY(User, 1, Post, 5)
So, it is two different entity groups. If User 2 writes a comment, the comment itself is written into user 2's entity group, not that of user 1. The user will normally not write so many comments to exceed the 1sec limit per entity group. Neither will it happen with writing posts.

However, your code can't immediately update the "recent comments" list of the original post 5 of user 1, because in that case you would get a hotspot for posts with many comments. And for this reason, task-queues can be used to update the post's preview only when necessary and only in a rate slower than the 1sec limit. The use-case was an example for buffering write ops into entity groups.
 

One more thing which I want to ask, If I access any ancestor entity with id and update it will write op one per sec apply ? 

Yes. But in Java it should be possible to do a batch-write, that is, you give multiple entities into the same write. If they are all in the same entity group, it should count as 1 write.

In Python the function is ndb.put_multi(user1, post1_1, post1_2), there is certainly something similar for Java.
 
For example user is parent of post and you have also post_id, You update the post entity using post_id will write op 1 per sec apply ?

Not sure what you mean by "updating the post entity using post_id". If you get a post by its datastore key, and update it, it will count towards the rate limit of its entity group. There is no rate difference if you read the entity by its key or from a query, before you update it.

--
You received this message because you are subscribed to the Google Groups "Google Cloud SQL discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-sql-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-sql-discuss/9fd1d4d8-f0bf-4d5f-b1a0-834990dc2ac3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

No comments:

Post a Comment