I built this last summer and thought it was fairly complete. And it turns out I was naive. 6 months later, I have resolved ~100 issues reported by the users.
Incredible tool, definitely going to get a working poc for this with my team. Curious how this can apply across many databases, say in a cluster, checking for discrepancies etc. Wondering if that is a supported feature?
I use https://github.com/ariga/atlas for this, migration based and schema base both has there good and bad side, I prefer both even in one project, schema based can make dev faster, eaiser, migration based make feel reliable.
Ariel from Atlas here. This setup is pretty common (declarative locally, versioned in real environments). Since migrations are auto-generated (by Atlas) on PRs, most developers never touch the versioned workflow directly.
I wrote a very rudimentary schema and automatic migration system for SQLite. One problem that I ran into recently was deploying changes that spanned two migrations, because the tool doesn’t know how to step through commits to do successive auto-migrations between schema versions. I guess there are rather obvious ways to handle this if you generate and then commit the full migration sql for each schema change. Nonetheless I’m curious if this is a problem you have had to think about, if you find it interesting or if it sounds like a bad path to go down, and if atlas does anything smart in this department. Thanks in advance!
How does this compare to Entity Framework (for dotnet) migrations or sqitch/liquibase.
I get the declarative nature of this, but migration of schemas is not purely declarative, particular on large production databases. An ideal schema manager would understand the costs of particular migrations (perhaps by using the table stats and EXPLAIN) and apply that back to the migration strategies so that downtime is minimized/eliminated.
Adding or remove columns or indexes can trigger major database table scans and other problems, especially when partitioning conditions change.
We've rolled our own variant of this, which uses an XML file as input. Just easier to parse than SQL schema definitions. The tool compares the schema defined by the XML with the DB, and applies changes as needed.
We used Sybase SQLAnywhere, and a complication there was that if you had materialized views against a table, you had to drop and recreate that view when adding or removing columns in the table. And when you recreate it, you of course have to remember to recreate the indexes on that materialized view.
Tracking this in case you have multiple materialized views touching multiple tables became a bit tricky, and you don't want to do the "dumb" thing to just always drop and recreate, or per-column, since some of them might take an hour or so to recreate and reindex.
The tool has some built-in safeties like never dropping columns if it's missing (have to add explicit element for that in the XML), and only performs safe column definition changes, ie integer to varchar(50) is safe, integer to varchar(3) is not, etc.
It really made database migrations very painless for us, great since we've have hundreds of on-premise installations. Just modify the XML, and let the tool do its job.
And since its our tool, we can add functionality when we need to.
Also means I can stop with my hobby project that was supposed to do the same. Wasn't far along and haven't worked on it in months anyway.
So I'll spend my time on another hobby project then that also solves something that is already solved 100 times over, but I don't like the other solutions (simple log monitoring including systemd and sending error emails if something is found).
is there anything (open source) similar to microsoft database project
but that would work for postgresql
i like the following about it
1. database schema is regular code
2. make schema change declaratively
3. packaging (.daspac) and deployement script
most open source tools , seem to be after the fact tools, that do diffs
ms db project, handle the code from the start in a declarative, source code managed way
I'm always in the market for new sql tooling, but I'm wondering what's the use case there?
Isn't it much quicker to write a one line migration vs copying the DDL, then adapting it to the desired state and then getting getting the migrations from that tool? Or am I misunderstanding something?
Be sure to look at the actual sqldef command-line tool, not the trivial copy-and-paste demo on their website. Declarative schema management is best used combined with a Git repo.
In the big picture, declarative schema management has lots of advantages around avoiding/solving schema drift, either between environments (staging vs prod) or between shards in a sharded setup (among thousands of shards, one had a master failure at an inopportune time).
It's also much more readable to have the "end state" in your repo at all times, rather than a sequence of ALTERs.
Yes, we've used skeema for this for many years, and it is just plain lovely. Putting into source control your desired end state is so much more intuitive and understandable than accumulating migrations. In a way it's like the difference between jQuery and React -- you just say how you want it to look like in the end, and the computer does the work to figure out how to make it so.
Out of curiosity, the post you linked mentions that it won't work for renames. What's the approach for these and other types of procedural migrations, such as data transformations (ie: splitting a column, changing a type, etc.)
With a declarative model, would you run the migration and follow immediately with a one off script?
Looks like this is only useful for empty databases. Which severely limits possible use cases.
Schema management is only a small part of the problem, and I don’t think this tool handles data migrations. E.g. if I reshape a JSONB column into something more structured, I don’t think it would be able to handle that. Or if I drop a column the backwards migration it generates ADD COLUMN … NOT NULL, which is obviously unusable if the table has any data in it already.
Personally I've always called this style "declarative schema management" since the input declares the desired state, and the tool figures out how to transition the database to that state.
sqldef is really cool for supporting many database dialects. I'm the author of Skeema [1] which includes a lot of functionality that sqldef lacks, but at the cost of being 100% MySQL/MariaDB-specific. Some other DB-specific options in this space include Stripe's pg-schema-diff [2], results [3], stb-tester's migrator for sqlite [4], among many others over the years.
The more comprehensive solutions from ByteBase, Atlas, Liquibase, etc tend to support multiple databases and multiple paradigms.
And then over in Typescript ORM world, the migrators in Prisma and Drizzle support a "db push" declarative concept. (fwiw, I originated that paradigm; Prisma directly copied several aspects of `skeema push`, and then Drizzle copied Prisma. But ironically, if I ever complete my early-stage next-gen tool, it uses a different deployment paradigm.)
> Personally I've always called this style "declarative schema management" since the input declares the desired state, and the tool figures out how to transition the database to that state.
Personally I've called it a mistake, since there's no way a tool can infer what happened based on that information.
For schema changes, it absolutely can, for every situation except table renames or column renames.
That might sound like a major caveat, but many companies either ban renames or have a special "out-of-band" process for them anyway, once a table is being used in production. This is necessary because renames have substantial deploy-order complexity, i.e. you cannot make the schema change at the same exact instant as the corresponding application change, and the vast majority of ORMs don't provide anything to make this sane.
In any case, many thousands of companies use declarative schema management. Some of the largest companies on earth use it. It is known to work, and when engineered properly, it definitely improves development velocity.
Uh, any database of sufficient size is going to do migrations “out of band” as they can take hours or days and you never have code requiring those changes ship at migration start.
Small things where you don’t have DBA or whatever, sure use tooling like you would for auto-changes in a local development.
Very large tech companies completely automate the schema change process (at least for all common operations) so that development teams can make schema changes at scale without direct DBA involvement. The more sophisticated companies handle this regardless of table size, sharding, operational events, etc. It makes a massive difference in execution speed for the entire company.
Renames aren't compatible with that automation flow though, which is what I meant by "out-of-band". They rely on careful orchestration alongside code change deploys, which gets especially nasty when you have thousands of application servers and thousands of database shards. In some DBMS, companies automate them using a careful dance of view-swapping, but that seems brittle performance-wise / operationally.
How is this better than Liquibase? Isn’t there a risk of missing schema evolution steps when you only diff current vs target? (Because current can be v1 and target v3 and you might have lost column migration logic from v2)
MS' Sql Server Data Tools is such an abominable garbage fire that I have no interest in these kind of tools. Besides being a buggy mess, it's very often insufficient - you end up having to maintain migrations anyways because often you have to inject data into new tables or columns, or rename columns, etc.
> often you have to inject data into new tables or columns
No tool can help you with that, simply because this kind of data migration depends on your particular business logic that the tool has no way of knowing about.
While SQL Server Data Tools has its warts, it has been immensely useful for us in making sure every little detail gets handled during migration. That doesn't usually mean that it can do the entire migration itself - we do the manual adjustments to the base tables that SSDT cannot do on its own, and then let it handle the rest, which in our case is mostly about indexes, views, functions and stored procedures.
After all that, SSDT can compare the resulting database with the "desired" database, and reliably flag any differences, preventing schema drift.
Anyone with real-world usage of this one vs atlas. I've used atlas but it seems to be moving away from FOSS (paywalled-fearures). Its also struggled with some migrations that were a bit more than not trivial (like trying to play changes in the wrong order)
Looks pretty cool. Lately I've been using ChatGPT to generate SQLite schema migrations and it works shockingly well. I give it the original schema and new schema and it generates the statements including caveats and gotchas to watch out for.
I built this last summer and thought it was fairly complete. And it turns out I was naive. 6 months later, I have resolved ~100 issues reported by the users.
Nice work, really happy to have found this today.
See: https://atlasgo.io/concepts/declarative-vs-versioned#combini..., and https://github.com/ariga/atlas-action?tab=readme-ov-file#ari...
I get the declarative nature of this, but migration of schemas is not purely declarative, particular on large production databases. An ideal schema manager would understand the costs of particular migrations (perhaps by using the table stats and EXPLAIN) and apply that back to the migration strategies so that downtime is minimized/eliminated.
Adding or remove columns or indexes can trigger major database table scans and other problems, especially when partitioning conditions change.
We used Sybase SQLAnywhere, and a complication there was that if you had materialized views against a table, you had to drop and recreate that view when adding or removing columns in the table. And when you recreate it, you of course have to remember to recreate the indexes on that materialized view.
Tracking this in case you have multiple materialized views touching multiple tables became a bit tricky, and you don't want to do the "dumb" thing to just always drop and recreate, or per-column, since some of them might take an hour or so to recreate and reindex.
The tool has some built-in safeties like never dropping columns if it's missing (have to add explicit element for that in the XML), and only performs safe column definition changes, ie integer to varchar(50) is safe, integer to varchar(3) is not, etc.
It really made database migrations very painless for us, great since we've have hundreds of on-premise installations. Just modify the XML, and let the tool do its job.
And since its our tool, we can add functionality when we need to.
Also means I can stop with my hobby project that was supposed to do the same. Wasn't far along and haven't worked on it in months anyway.
So I'll spend my time on another hobby project then that also solves something that is already solved 100 times over, but I don't like the other solutions (simple log monitoring including systemd and sending error emails if something is found).
https://david.rothlis.net/declarative-schema-migration-for-s...
i like the following about it 1. database schema is regular code 2. make schema change declaratively 3. packaging (.daspac) and deployement script
most open source tools , seem to be after the fact tools, that do diffs ms db project, handle the code from the start in a declarative, source code managed way
Isn't it much quicker to write a one line migration vs copying the DDL, then adapting it to the desired state and then getting getting the migrations from that tool? Or am I misunderstanding something?
In the big picture, declarative schema management has lots of advantages around avoiding/solving schema drift, either between environments (staging vs prod) or between shards in a sharded setup (among thousands of shards, one had a master failure at an inopportune time).
It's also much more readable to have the "end state" in your repo at all times, rather than a sequence of ALTERs.
There are a bunch of other advantages; I have an old post about this topic here: https://www.skeema.io/blog/2019/01/18/declarative/
It's also quite essential when maintaining nontrivial stored procedures. Doing that with imperative migrations is a gateway to hell. https://www.skeema.io/blog/2023/10/24/stored-proc-deployment...
With a declarative model, would you run the migration and follow immediately with a one off script?
Schema management is only a small part of the problem, and I don’t think this tool handles data migrations. E.g. if I reshape a JSONB column into something more structured, I don’t think it would be able to handle that. Or if I drop a column the backwards migration it generates ADD COLUMN … NOT NULL, which is obviously unusable if the table has any data in it already.
sqldef is really cool for supporting many database dialects. I'm the author of Skeema [1] which includes a lot of functionality that sqldef lacks, but at the cost of being 100% MySQL/MariaDB-specific. Some other DB-specific options in this space include Stripe's pg-schema-diff [2], results [3], stb-tester's migrator for sqlite [4], among many others over the years.
The more comprehensive solutions from ByteBase, Atlas, Liquibase, etc tend to support multiple databases and multiple paradigms.
And then over in Typescript ORM world, the migrators in Prisma and Drizzle support a "db push" declarative concept. (fwiw, I originated that paradigm; Prisma directly copied several aspects of `skeema push`, and then Drizzle copied Prisma. But ironically, if I ever complete my early-stage next-gen tool, it uses a different deployment paradigm.)
[1] https://github.com/skeema/skeema/
[2] https://github.com/stripe/pg-schema-diff
[3] https://github.com/djrobstep/results
[4] https://david.rothlis.net/declarative-schema-migration-for-s...
1. DB is source of truth, generate TS from DB 2. TS to DB direct sync, no migration files 3. TS source, Drizzle generates and applies SQL 4. TS source, Drizzle generates SQL, runtime application 5. TS source, Drizzle generates SQL, manual application 6. TS source, Drizzle outputs SQL, Atlas application
Personally I've called it a mistake, since there's no way a tool can infer what happened based on that information.
That might sound like a major caveat, but many companies either ban renames or have a special "out-of-band" process for them anyway, once a table is being used in production. This is necessary because renames have substantial deploy-order complexity, i.e. you cannot make the schema change at the same exact instant as the corresponding application change, and the vast majority of ORMs don't provide anything to make this sane.
In any case, many thousands of companies use declarative schema management. Some of the largest companies on earth use it. It is known to work, and when engineered properly, it definitely improves development velocity.
Small things where you don’t have DBA or whatever, sure use tooling like you would for auto-changes in a local development.
Renames aren't compatible with that automation flow though, which is what I meant by "out-of-band". They rely on careful orchestration alongside code change deploys, which gets especially nasty when you have thousands of application servers and thousands of database shards. In some DBMS, companies automate them using a careful dance of view-swapping, but that seems brittle performance-wise / operationally.
You tell it what’s being renamed with a special comment.
No tool can help you with that, simply because this kind of data migration depends on your particular business logic that the tool has no way of knowing about.
While SQL Server Data Tools has its warts, it has been immensely useful for us in making sure every little detail gets handled during migration. That doesn't usually mean that it can do the entire migration itself - we do the manual adjustments to the base tables that SSDT cannot do on its own, and then let it handle the rest, which in our case is mostly about indexes, views, functions and stored procedures.
After all that, SSDT can compare the resulting database with the "desired" database, and reliably flag any differences, preventing schema drift.