I've got a table with 10M rows, and I'm trying to ALTER TABLE
to add another column (a VARCHAR(80)
).
From a data-modelling perspective, that column should be NOT NULL
- but the amount of time it takes to run the statement is a consideration, and the client code could be changed to deal with a NULL
column if that's warranted.
Should the NULL
-ability of the column I'm trying to add significantly impact the amount of time it takes to add the column either way?
More Information
The context in which I'm doing this is a Django app, with a migration generated by South - adding three separate columns, and adding an index on one of the newly-added columns. Looking at the South-generated SQL, it spreads this operation (adding three columns and an index) over 15 ALTER TABLE
statements - which seems like it will make this operation take a whole lot longer than it should.
I've seen some references that suggest that InnoDB doesn't actually have to create a field in the on-disk file for nullable fields that are NULL
, and just modifies a bitfield in the header. Would this impact the speed of the ALTER TABLE
operation?
I don't think the nullability of the column has anything to do with the speed of ALTER TABLE
. In most alter table operations, the whole table - with all the indexes - has to be copied (temporarily) and then the alteration is done on the copy. With 10M rows, it's kind of slow. From MySQL docs:
Storage, Performance, and Concurrency Considerations
In most cases,
ALTER TABLE
makes a temporary copy of the original table. MySQL waits for other operations that are modifying the table, then proceeds. It incorporates the alteration into the copy, deletes the original table, and renames the new one. WhileALTER TABLE
is executing, the original table is readable by other sessions. Updates and writes to the table that begin after theALTER TABLE
operation begins are stalled until the new table is ready, then are automatically redirected to the new table without any failed updates. The temporary table is created in the database directory of the new table. This can differ from the database directory of the original table forALTER TABLE
operations that rename the table to a different database.
If you want to make several changes in a table's structure, it's usually better to do them in one ALTER TABLE
operation.
Allowing client code to make changes in tables is probably not the best idea - and you have hit on one good reason for not allowing that. Why do you need it? If you can't do otherwise, it would probably be better - for performance reasons - to allow your client code to be creating a table (with the new column and the PK of the existing table) instead of adding a column.