Developers often need copies of production data for local development, and standard workflows for getting these dumps have a lot of downsides.
- Moving SQL dumps around means an increased risk of sensitive data being exposed.
- Most sanitization methods are a multi-step process.
- Large databases put strain on slow internet connections.
- Large databases take way too long to import.
The Solution
mtk-dump is an extremely powerful replacement for mysqldump
. Define your sanitization and minification rules in a simple yaml file, and produce small and safe SQL dumps for development. Check out some configuration examples for the cool stuff it can do.
mtk-dump \
--config=mtk.yml \
--host=mysql.svc \
--port=3306 \
--user=foo \
--password=bar \
app_environment > db.sql
You can get mtk-dump
from the releases page, or with the skpr/mtk-dump
docker image.
Examples
These examples could all live in a single mtk.yml
file and be included with the --config=mtk.yml
flag.
Sanitize account credentials
The rewrite
block allows you to set values for specific columns. These can be explicit strings or mysql expressions.
rewrite:
# Table: users
users:
# Column: email
# MySQL expression to give each account a unique email in the format 123@localhost
email: concat(id, "@localhost")
# Column: pass
# Explicit string to overwrite the hashed password.
pass: '"sanitized"'
Exclude orders older than 1 week
Using the where
block, you can exclude rows that don’t match a given mysql WHERE clause.
where:
# Table: orders
# WHERE clause to find rows with recent created timestamp.
orders: |-
created >= DATE_SUB(NOW(), INTERVAL 7 DAY)
Exclude data from specific tables
Use nodata
to keep the table structure, but no data.
nodata:
- queue
- sessions
Exclude entire tables
Use ignore
to exclude tables entirely.
ignore:
- __ACQUIA_MONITORING__
Exclude body field data for unpublished revisions in Drupal
This example uses subqueries to find the revision IDs of the currently published drupal entities, and only export body data for those revisions.
where:
# Table: node_revision__body
node_revision__body: |-
revision_id IN (SELECT vid FROM node)
# Table: paragraph_revision__body
paragraph_revision__body: |-
revision_id IN (SELECT revision_id FROM paragraphs_item)
Sanitizing SQL dumps
If you already have a SQL file (such as a production backup) and want to sanitize it, you can use a cool little script included in the skpr/mtk-dump
docker image to do this.
SQL_INPUT=db.sql
SQL_OUTPUT=sanitized.sql
MTK_DUMP_CONFIG=mtk.yml
docker run --rm \
-v $(pwd):/data \
-w /data \
-e MTK_DUMP_CONFIG skpr/mtk-dump database-sanitize ${SQL_INPUT} ${SQL_OUTPUT}
More Thoughts
Replace mysqldump
I haven’t done this myself, but as mtk-dump
takes the same connection flags, you could hypothetically overwrite the mysqldump
binary on your system. mtk configs could then be toggled with the MTK_DUMP_CONFIG
environment variable. This might be a way to get around issues like drush
hard-coding the mysqldump binary. This might also be a bad idea.
Autogenerate configs
I’m interested in autogenerating mtk configs for known data hogs, like the body field in drupal etc… Maybe a composer plugin or something like that.
Credit
Shout-out to Nick Schuch and CodeDrop who built and maintain the mtk project.