Saturday, November 4, 2017

[google-cloud-sql-discuss] Re: Migrate from Cloud SQL to DataStore

 
"Note that entity groups are not required if you simply plan to reference one entity from another."

I believe, that sentence is meant to explain the availability of key properties, i.e. a comment entity could contain a key property storing the post key and another one for the user key. If the comment had its author's user entity as ancestor, you wouldn't necessarily need to reference the user again, because you could just read the parent.

There is limitation in ancestor relation. You can not write more than one entity per sec.

Yes. If you decide to go for User > Post, that means that a user can not create/update/delete more than one post per second (including all writes to the user). On the other hand, you get fast transactional queries for all posts of the user (ancestory-query).
If you go for User > Comment (with a reference to the post to which the comment belongs to), a user cannot create/update/delete more than one comment per second (including all writes to the user). On the other hand, you get fast transactional queries for all comments of the user.
If you go for Post > Comment, there can be only one comment per post created/updated/deleted per second (including all writes to the post), and it doesn't even matter which user is submitting the request. If you think that (transactionally) querying and reading of a post together with all comments is really valuable, you could try to work-around the write limit by offsetting/buffering any comment related write-ops, so writing is throttled below the 1s limit.
Of course, User > Post > Comment would make for a terrible scalability.

With an existing app, maybe you can make these decisions based on your analytics. What kind of requests / operations will be the most frequently used, which are critical for your users and your business. For example:
  1. get one post with most recent 3 comments; 100 requests; should be very fast (less than x ms)
  2. transactional create one comment to a post; 10 requests; can be 202 ACCEPTED, but visible to others with-in a few seconds
  3. non-transactional query of 10 most recent posts of user; 20 requests; should be fast (less than y ms)
  4. transactional query batches of 100 posts per user and delete them (user deletes account); 5 requests; administrative / background
And so on. At the next step, weigh them, estimate costs, and you will get an idea for which kind of ops / requests you want to optimize your schema.

However, I wanted to add, that independently of your schema, you should let Datastore give the IDs for all entities to get the best performance and scalability. As you have read in the doc, you will get random long numerical IDs that are optimally distributed for scalability and performance.

Since you already have billions of records, and maybe many of their IDs are monotonic sequences, it might be worth to apply the same to them after the fact. When importing your old users, I would let Datastore allocate these IDs for you, and store the old ID in a separate UserMap that has the same new ID. E.g. existing user with ID "u123" becomes in Datastore an entity with KEY(User, 39001), you add an entity with KEY(UserMap, 39001)  with indexed property old_id='u123'.  When you do the same with posts of user 'u123', query for the UserMap ID with that old_id and use the resulting ID, KEY(UserMap, 39001), to construct and store KEY(User, 39001). First all users, than all posts, than all comments. If you have completed your migration, you just delete UserOld etc. and you will have an optimal distribution of IDs from the start.

You got some good feedback here and on SO. I hope, I didn't add to your confusion :-)

On Saturday, 4 November 2017 17:08:30 UTC+1, Azeem Haider wrote:
But there is some problem in ancestor relation as I asked in Stackoverflow 

There is limitation in ancestor relation. You can not write more than one entity per sec.
As describe in Documentation keep your entity group small. And they also said that

only create them when transactions are absolutely necessary

But I don't understand the meaning of this line "Note that entity groups are not required if you simply plan to reference one entity from another.

Please check my Stackoverflow  question for more info.


--
You received this message because you are subscribed to the Google Groups "Google Cloud SQL discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-sql-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-sql-discuss/5d67b1ae-94f8-4e03-a54d-66b170ec869a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

No comments:

Post a Comment