Adding columns with default values to really large tables in Postgres + Rails

We had a fairly simple task of adding a couple of columns to a table for our Rails app. This is normally a straight forward operation and a boring task at best, but for us the fun only just started. The table in question was a fairly large table with lots of reads on it and in the spirit of no down time, this is the adventure we had.

TL:DR;

Jump straight to the solution!

The Task

  • Add two columns to the notifications table
  • Both columns have default values
  • Table has 2.2 MM rows!

Attempt #1

class AddPhoneFlagsToNotifications < ActiveRecord::Migration
  def change
    add_column :notifications, :text_message, :boolean, default: false
    add_column :notifications, :call_phone, :boolean, default: false
  end
end

Problem

  • Migration takes hours!
  • The notifications table is locked
  • Entire application grinds to a halt

Reason

  • Column creation with default values causes all rows to be touched at the same time
  • Updates are a slow operation in Postgres since it has to guarantee consistency
  • That guarantee results in locking the entire table

Solution

  • Postgres can create null columns extremely fast! Even on a huge table!
  • We can split the work to two tasks, creating the columns and populating the default value

Attempt #2

class AddPhoneFlagsToNotifications < ActiveRecord::Migration
  def change
    add_column :notifications, :text_message, :boolean
    add_column :notifications, :call_phone, :boolean

    execute <<-SQL
      ALTER TABLE notifications
        ALTER COLUMN text_message SET DEFAULT false,
        ALTER COLUMN call_phone SET DEFAULT false
    SQL

    last_id = Notification.last.id
    batch_size = 10000
    (0..last_id).step(batch_size).each do |from_id|
      to_id = from_id + batch_size
      execute <<-SQL
        UPDATE notifications
          SET
            text_message = false,
            call_phone = false
          WHERE id BETWEEN #{from_id} AND #{to_id}
      SQL
    end
  end
end

Problem

  • Migration takes hours!
  • The notifications table is still locked!
  • Entire application grinds to a halt

Reason

  • Rails migration tasks are always wrapped in a transaction to allow for rollbacks
  • Adding the new column AND updating the rows are in one gigantic transaction!
  • Transactions guarantee consistency
  • That guarantee results in the whole table locking again!

Solution

  • You can disable the transaction handle in Rails migrations by calling “disable_ddl_transaction!” in your migration task
  • But you have to handle transactions on your own
  • We can then run each step in its own transaction
  • And add our own error handling to the rollback operation

Attempt #3

class AddPhoneFlagsToNotifications < ActiveRecord::Migration
  disable_ddl_transaction!

  def up
    ActiveRecord::Base.transaction do
      add_column :notifications, :text_message, :boolean, default: nil
      add_column :notifications, :call_phone, :boolean, default: nil

      sql = <<-SQL
        ALTER TABLE notifications
          ALTER COLUMN text_message SET DEFAULT false,
          ALTER COLUMN call_phone SET DEFAULT false
      SQL
      execute(sql)
    end


    last_id = Notification.last.id
    batch_size = 10000
    (0..last_id).step(batch_size).each do |from_id|
      to_id = from_id + batch_size
      ActiveRecord::Base.transaction do
        execute <<-SQL
          UPDATE notifications
            SET
              text_message = false,
              call_phone = false
            WHERE id BETWEEN #{from_id} AND #{to_id}
        SQL
      end
    end

    rescue => e
      # roll back our work
      down
      raise e
  end
end

Result

  • Migration takes hours!
  • There is no table locking
  • Application is slower due to all the writes to notifications table
  • Nothing grinds to a halt

Takeaways

  • Always be mindful of the number of rows affected in the migration
  • Be mindful of the transaction size
  • Leverage Postgres features

Possible alternate solution

  • Handle NULL case in code to treat as the desired default value:
    • Clean solution and quick turn around, but it required us to muck up the model to abstract out that case. Given that we may or may not have complete control over how those values are extracted from the model, this may turn into lots of defensive code.
  • Add view in database to do mapping for us:
    • Very clean solution though this would require us to maintain both the schema and the view whenever we do schema changes to that table. Though we don’t do changes to the schema often in this table, the extra maintance overhead was deemed not worth the value.
  • Add trigger to only update rows that are actively queried:
    • Also very clean solution though it came down to data integrity and since our data eventually gets consumed by our data team, having a sane state on our data was highest priority. This meant that having a NULL state on a Boolean was not desired. Ultimately, we could have added the trigger to handle any current requests and just made the migration run slowly to backfill less frequently accessed rows. Since we were able to run the entire migration within a night, we decided it wasn’t worth the additional hassle.

Related Posts 10

How To Scale As A CTO - Why Course Correction Is Inefficient (Video)
Bridging digital and physical with a MakerBot Replicator 2
How I Aced my Technical Interview (by pretending to be a wizard)
Additional tips on improving scrolling performance of a UICollectionView
Why WeWork.com uses a static generator and why you should too
Rabbits, Bunnies and Threads
Creating BEM Stylesheets for React Components
React Tutorial - Converting Reflux to Redux
Inside WeWork's Tech Stack
Caching External APIs in Rails for a Ginormous Speed Boost