Monday, May 9, 2016

[google-cloud-sql-discuss] Re: Adding a primary key in a huge CloudSQL table

Hey Austin,

This isn't a Cloud SQL -specific error but rather a MySQL error, which is mentioned in many places around the net. It ultimately derives either from the fact that in order to alter a table, a temp table is required, or simply from the fact that adding the column, as small as it is, when multiplied by the number of rows, becomes too large (although it's much more likely to be the former situation).

Since Cloud SQL instances have a maximum amount of disk storage which is provisioned automatically, and which your ALTER query is apparently hitting, it appears that you may have no choice but to attempt to shard the table (which, with so many rows and columns, ought to have been sharded to begin with if the possibility of altering the table were needed in the future) and then run primary key alteration on each table shard, or run the query without the primary key. You could also look into some more exotic possibilities such as the potential relevance of innodb_file_per_table, as recommended in this Stack Overflow Q&A.

Let me know if this works out for you,

Cheers,

Nick
Cloud Platform Community Support

On Monday, April 25, 2016 at 2:08:44 PM UTC-4, Augustin Wenger wrote:
I have a 8Go table in my CloudSQL database that (for now) doesn't have a primary key.
It is composed of 52 million rows of 20 columns each.

I would like to add one, since I will remove duplicates and doing so without primary key is too timeconsuming for MySQL

However, I am having issues to make this run, and I'm wondering if I'm actually doing it in the most efficient way.

I wanted to add this primary key using the following line :



     ALTER TABLE mytable ADD COLUMN id INT NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST

I expected this request to increase the size of my database by a few hundreds MB, and run rather quickly.

However, when I ran the request, I saw the database usage increase a lot ( + 14 GB),  and after 2 hours of running time, I saw the following error message : 

"Error 1114 : the table 'mytable' is full".

When this error occurs, the database storage usage goes back to normal, but no change has been made to mytable

My questions are :
- What is CloudSQL trying to do when I launch my request, and how come it takes him so long ?
- What could I do to make my request executed more quickly by CloudSQL ? What am I missing ?


Thanks in advance,



--
You received this message because you are subscribed to the Google Groups "Google Cloud SQL discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-sql-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-sql-discuss/524e01ad-f9e0-411c-915a-1857fe17b32c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

No comments:

Post a Comment