Django: How to easily migrate data from legacy apps to Django
Reflecting on the last 12 months, two of the major projects I’ve been a part of included migrating major internal legacy apps from PHP into Django. In both migrations, the apps were actively used every day by internal users and so the cutover had to be seamless and disruption-free. Having gone through this migration process twice we were able to figure out which strategies worked well for us — in this article I will review the step-by-step approach that ended up working best for us.
Overall, the strategy is to have the legacy app live side-by-side with the new Django app for some time to ensure a seamless transition for users. We’ll accomplish this by having Django “take over” the existing underlying database, thus removing the need to migrate the data from one DB to another. Once users are happy with the new Django app, we can permanently shut down the old application.
The entire process unfolds as follows:
- Create the Django app & models from the existing database.
- Deploy the new app side-by-side with the legacy app.
- Start slowly cutting over user base to new app, resolve any bugs found along the way.
- Retire the legacy app.
The major benefit of this approach is it allows for as much time as necessary to address any issues found during cutover while still having the legacy app to fall back on in case of emergency.
Step 1: Create the new Django app and models
The first step is to create our brand-new Django app, though we’ll be doing it slightly differently from a new project, as we already have a database that we want to create models for (rather than the other way around).
First, start by creating the Django project as you normally would:
django-admin startproject my_new_app
Next, update the settings.py
and set the parameters for the existing database (ideally, a dev/test copy of the production database)
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'NAME': 'mydb',
'USER': 'mydbuser',
'PASSWORD': 'mypass',
'HOST': 'localhost',
'PORT': '5432',
}
}
Next step is where we diverge from the typical setup, we’ll use a different command to have Django automatically determine the appropriate models based on what’s already in the database:
python manage.py inspectdb > models.py
While Django does an admirable job at getting the models correct, it is very important to meticulously review the models to ensure they match the configuration in the database and make tweaks where necessary. From my experience, foreign key relations are what tend to trip up Django the most often.
You’ll notice that by default, the models generated by Django include managed=False
in the Meta
class — this plays an important role in our strategy as this is what tells the makemigration
command to ignore these models altogether, thus ensuring that our shared database is not updated and therefore still backwards compatible with the legacy app.
Next step is the most important one — building the functionality using those shiny new models. As the development process progresses, it’s likely that tweaks and adjustments will need to be made to the models as this is where any inconsistencies with the database are likely to be found. While this implementation phase is where the bulk of the development time and effort will be spent, for the purpose of this article we’ll take a leap of faith over this and skip right to the next phase.
Step 2: Deploy the new app side-by-side with the legacy app
Once the new app functionality has be completed and rigorous testing performed, it is time to move on to the next phase of the project: putting the new app in production in front of a set of specifically-chosen users to start putting it through its paces.
Until this point all development and testing has been performed against a dev/test database with test data sets. However, now we deploy the new app into a production server, where it will point to the existing database which is shared with the legacy app. In other words, both the legacy app and the Django app are now reading & writing to/from the same database tables.
Step 3: Start cutting over users
By this point we’ve also selected a group of users who make up the first batch to cut over to the Django app. These users are selected based on their intimate knowledge of the legacy app and their breadth of expertise with each of the features. In other words, these are our power users. These early adopters understand their role in the project, and that they act as the first guard against any bug escapes, incomplete workflows, unintuitive UX, etc and thus their valuable feedback will be instrumental to the success of future user groups.
During this phase, monitoring and observability are absolutely crucial to get visibility into uncaught exceptions, odd behaviours, potential bugs, and the like. One approach which has worked exceptionally well for us was to configure Django to funnel any exception stack traces from any HTTP 500 pages directly to our Slack channel and E-mail distribution group so we can investigate each one as they come in and get a fix turned around quickly.
Having the legacy app still around and functional plays a crucial role in acting as a “safety net” in case any blocking defects are found in the Django app preventing users from making any progress on their work. Any such user can switch back to using the legacy app, without loss of progress, until we’re ready to have them try again in the new Django app.
As the number of issues reported decreases and confidence builds, the app can be rolled out to additional groups of users. Rinse and repeat the process until all users have been cut over.
Step 4: Retire legacy app
After the entire user base has been cut over to the new app, it is time to select a date to permanently retire the old legacy app. Once that is done, we can finally have our new Django app become the sole app managing the backend DB.
With the legacy app finally gone, we no longer have to worry about backwards-compatibility and Django can now manage the backend database. To do this simply update the models.py
file for the app and either remove or update the managed
parameter to True
in the Meta
class:
class MyModel(models.Model):
my_field = models.CharField(max_length=30)
my_other_field = models.CharField(max_length=30)
class Meta:
managed = True # Alternatively, delete this line
Repeat for every model class in the migrated app. This will tell Django to generate migration files for these models going forward the next time the makemigrations
command is run and every time after that. Let’s run the command now so it can initialize the first migration files for these models:
python manage.py makemigrations
Next, we need to run the migration. We’ll need to make sure to specify this is a fake migration because these new migration files will attempt to create these tables in the database (these tables already exist as we’re already using them!) This step will need to be run for each instances of the app that is actively running (dev, stage, prod, etc)
python manage.py migrate --fake-initial
Our Django instance now completely owns and manages the back-end database! We are now free to change our models and create new migrations as we normally would.
This wraps up my process for cutting over legacy apps over to Django. All in all, this process usually takes a few months, but it ensures a pleasant experience for both the app developer as well as the user, and leaves in place a safety net to fall back to in case of emergency on the new app. Crucially, this process avoids a costly data migration from one database to another, which not only saves a ton of work for the developer, but also allows for a controlled cut-over of the user base from one app to the next.
Do you have your own strategies for cutting over Django apps different from mine? I would love to hear about them! Sound off in the comments.