Django多数据库历险记(一)

前言

毫无疑问,Django是最优秀的Python Web框架之一,然而其对多数据库的支持却让我内心十分复杂。在数据库迁移、跨库外键、单元测试等方面,坑无处不在。于是就有了这篇文章,关于Django多数据库的历险记。

注:本篇文章使用Python 3.6+Django 2.2

准备出发

  1. 创建Django项目multi_db和两个app:app_1app_2

    1
    2
    3
    4
    $ django-admin startproject multi_db
    $ cd multi_db/
    $ python manage.py startapp app_1
    $ python manage.py startapp app_2
  2. app_1app_2里分别添加各自的Model

    1
    2
    3
    4
    5
    6
    7
    # app_1/models.py
    from django.db import models

    class Model1(models.Model):
    name = models.CharField(max_length=255)
    class Meta:
    app_label = "app_1"
    1
    2
    3
    4
    5
    6
    7
    # app_2/models.py
    from django.db import models

    class Model2(models.Model):
    name = models.CharField(max_length=255)
    class Meta:
    app_label = "app_2"
  3. 为两个app指定不同的数据库并为其编写路由文件(省略部分内容)

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    # multi_db/settings.py
    # ...
    INSTALLED_APPS = [
    # ...
    'app_1',
    'app_2',
    ]
    # ...
    DATABASES = {
    'default': {
    'ENGINE': 'django.db.backends.sqlite3',
    'NAME': os.path.join(BASE_DIR, 'db.sqlite3'),
    },
    'db_1': {
    'ENGINE': 'django.db.backends.mysql', 'NAME': 'db_1',
    'HOST': '127.0.0.1', 'PORT': '6603', 'USER': 'root',
    'PASSWORD': 'password', 'CHARSET': 'utf-8', 'TEST': {'DEPENDENCIES': []},
    },
    'db_2': {
    'ENGINE': 'django.db.backends.mysql', 'NAME': 'db_2',
    'HOST': '127.0.0.1', 'PORT': '6603', 'USER': 'root',
    'PASSWORD': 'password', 'CHARSET': 'utf-8', 'TEST': {'DEPENDENCIES': []},
    },
    }
    DB_ROUTING = {'app_1': 'db_1', 'app_2': 'db_2'}
    DATABASE_ROUTERS = ['multi_db.db_routers.DBRouter']
    # ...
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    # multi_db/db_routers.py
    from django.conf import settings

    class DBRouter:
    def db_for_read(self, model, **hints):
    return settings.DB_ROUTING.get(model._meta.app_label)

    def db_for_write(self, model, **hints):
    return settings.DB_ROUTING.get(model._meta.app_label)

    def allow_relation(self, obj1, obj2, **hints):
    return True

    def allow_migrate(self, db, app_label, model_name=None, **hints):
    return True

第一关:数据迁移

Django里数据库迁移分两部分,makemigrationsmigrate。前者负责在Model出现变动时生成迁移方案文件,后者则负责在数据库中应用前者的迁移方案。于是,麻烦来了。

makemigrations

makemigrations命令不涉及数据库操作,所以用起来比较简单。使用时可以指定app以检测部分Model,也可以不指定app以检测全部Model。

1
2
3
4
5
6
7
$ python manage.py makemigrations
Migrations for 'app_1':
app_1/migrations/0001_initial.py
- Create model Model1
Migrations for 'app_2':
app_2/migrations/0001_initial.py
- Create model Model2

OK,两个app的迁移方案文件就生成好了。Django内置的一些app(比如django.contrib.auth)已经自带迁移方案,所以这里没有生成迁移方案文件。

migrate

migrate命令和makemigrations一样,可以指定app。这里我们不指定app,直接应用所有迁移方案。

1
2
3
4
5
6
7
8
9
10
11
12
$ multi_db python manage.py migrate
Operations to perform:
Apply all migrations: admin, app_1, app_2, auth, contenttypes, sessions
Running migrations:
Applying contenttypes.0001_initial... OK
Applying auth.0001_initial... OK
Applying admin.0001_initial... OK
# ...
Applying app_1.0001_initial... OK
Applying app_2.0001_initial... OK
# ...
Applying sessions.0001_initial... OK

很好,包括内置app、自定义app的所有迁移方案全部完成。按照前面的配置,默认的db.sqlite3里应该出现Django内置app的Model、db_1里应该出现app_1的Model、db2里应该出现app_2的Model,对吧?

现实是残酷的。从migrate模块的源码(migrate.py#L81)可以看出,migrate在应用迁移方案时根本不会考虑DATABASE_ROUTERS里的db_for_readdb_for_write,只会对参数--database指定的数据库(如不指定则为默认数据库)进行操作——也就是说,这里所有的Model都被一股脑写入到了默认的db.sqlite3中:

1
2
3
4
5
6
7
$ sqlite3 db.sqlite3 .tables
app_1_model1 auth_user_groups
app_2_model2 auth_user_user_permissions
auth_group django_admin_log
auth_group_permissions django_content_type
auth_permission django_migrations
auth_user django_session

很明显,这不是我想要的。如何补救呢?措施有两种。

补救措施1

第一种补救措施是在生成自定义app的迁移方案之前抢先完成对内置app的迁移方案,然后再进行自定义app的迁移。这种措施需要删除已经生成好的迁移方案文件。当然,这个补救措施有点耍滑头了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
$ rm db.sqlite3
$ rm app_?/migrations/0001_initial.py
$ # 迁移内置app
$ python manage.py migrate
Operations to perform:
Apply all migrations: admin, auth, contenttypes, sessions
Running migrations:
Applying contenttypes.0001_initial... OK
# ...
$ # 重新生成自定义app的model的迁移方案
$ python manage.py makemigrations
Migrations for 'app_1':
app_1/migrations/0001_initial.py
- Create model Model1
Migrations for 'app_2':
app_2/migrations/0001_initial.py
- Create model Model2
$ # 迁移时指定数据库
$ python manage.py migrate app_1 --database db_1
Operations to perform:
Apply all migrations: app_1
Running migrations:
Applying app_1.0001_initial... OK
$ python manage.py migrate app_2 --database db_2
Operations to perform:
Apply all migrations: app_2
Running migrations:
Applying app_2.0001_initial... OK

补救措施2

第二种补救措施不必删除已经生成好的自定义app的迁移方案文件,而是单独迁移内置app。虽然migrate命令一次只能迁移一个app,但所幸需要用到的内置app一般不会太多。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ rm db.sqlite3
$ python manage.py migrate admin
Operations to perform:
Apply all migrations: admin
Running migrations:
Applying contenttypes.0001_initial... OK
# ...
$ python manage.py migrate auth
# ...
$ python manage.py migrate contenttypes
# ...
$ python manage.py migrate sessions
# ...
$ # 单独迁移完内置app后再迁移自定义app
$ python manage.py migrate app_1 --database db_1
# ...
$ python manage.py migrate app_2 --database db_2
# ...

自定义migrate

前文提到的两个补救措施终究只是一时的权宜之计,用起来也很麻烦。反正只是多运行几次migrate命令,为什么不交给python来做呢?编辑my_migrate.py如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# my_migrate.py
#!/usr/bin/env python
import os

import django
from django.apps import apps, AppConfig
from django.conf import settings
from django.core.management import execute_from_command_line, CommandError
from django.db import DEFAULT_DB_ALIAS

os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'multi_db.settings')
django.setup()

for app_config in list(apps.get_app_configs()): # type: AppConfig
app_label = app_config.label # type: str
db = settings.DB_ROUTING.get(app_config.label, DEFAULT_DB_ALIAS)
argv = ["manage.py", "migrate", "--traceback", "--database", db, app_label]
print("python " + " ".join(argv))
try:
execute_from_command_line(argv)
except CommandError as e:
print(e)
finally:
print()
print("migrate complete")

一股熟悉的味道……没错,看起来就像是manage.py的魔改版本。代码在migrate某app之前会去配置文件里的DB_ROUTING字典查找该app对应的数据库,再用这个数据库来migrate这个app。运行效果如下:

1
2
3
4
5
6
7
8
9
10
$ python my_migrate.py
python manage.py migrate --traceback --database default admin
Operations to perform:
Apply all migrations: admin
Running migrations:
Applying contenttypes.0001_initial... OK
# ... 略

python manage.py migrate --traceback --database db_1 app_1
# ... 略

大功告成

于是,这个Django项目的数据库迁移工作终于完成了,太不容易了……验证一下:

1
2
3
4
5
6
$ sqlite3 db.sqlite3 .tables
auth_group auth_user_user_permissions
auth_group_permissions django_admin_log
auth_permission django_content_type
auth_user django_migrations
auth_user_groups django_session
1
2
3
4
5
6
7
mariadb root@127.0.0.1:db_1> show tables
+-------------------+
| Tables_in_db_1 |
+-------------------+
| app_1_model1 |
| django_migrations |
+-------------------+
1
2
3
4
5
6
7
mariadb root@127.0.0.1:db_2> show tables
+-------------------+
| Tables_in_db_2 |
+-------------------+
| app_2_model2 |
| django_migrations |
+-------------------+

第二关:运行

数据库迁移(终于)完成后,该运行这个Django项目了:

1
2
3
4
5
6
7
$ python manage.py runserver
Watching for file changes with StatReloader
Performing system checks...
System check identified no issues (0 silenced).
You have 2 unapplied migration(s). Your project may not work properly until you apply the migrations for app(s): app_1, app_2.
Run 'python manage.py migrate' to apply them.
April 23, 2020 - xx:xx:xx

等等,为什么我在上一步已经完成了全部迁移工作,Django还会提示app_1app_2未迁移?虽然这并不影响运行,但鲜红的提示总是令人不爽。

apply_migration()的源码(executor.py#L246)中可以看出,在migrate命令过程中,每执行完一条迁移任务,Django都会在同数据库的django_migrations表中写入一条迁移完成的标记,用于记录迁移进度。实际上,db_1db_2django_migrations也是有对应标记的。

但从模块check_migrations()的源码(base.py#L453)中可以看出,在运行runserver命令时,Django只会到默认数据库的django_migrations表中查找迁移进度,db_1db_2的迁移进度就被无视了……那么,怎样才能让在运行runserver时检查非默认数据库的迁移进度呢?

自定义check_migrations

简单来说,就是修改runservercheck_migrations()的行为。先来看看原始的check_migrations()代码:

1
2
3
4
5
6
7
8
9
10
11
12
def check_migrations(self):
from django.db.migrations.executor import MigrationExecutor
try:
executor = MigrationExecutor(connections[DEFAULT_DB_ALIAS])
except ImproperlyConfigured:
# No databases are configured (or the dummy one)
return

plan = executor.migration_plan(executor.loader.graph.leaf_nodes())
if plan:
# 提示未migrate的app,省略
self.stdout.write(self.style.NOTICE("Run 'python manage.py migrate' to apply them.\n"))

总体逻辑并不复杂,关键在于executor对象的migration_plan()方法。我要做的就是引入db_1db_2executor,让每个数据库只检查自己的迁移进度。下面是我自己实现的check_migrations()方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# multi_db/my_runserver.py
from collections import defaultdict

from django.conf import settings
from django.core.management.commands import runserver
from django.db import connections, DEFAULT_DB_ALIAS
from django.db.migrations.executor import MigrationExecutor

class MyCommand(runserver.Command):
def check_migrations(self):
# 为每个数据库连接创建一个executor并放到字典里
executors = {}
for alias in connections:
executors[alias] = MigrationExecutor(connections[alias])
default_executor = executors[DEFAULT_DB_ALIAS]
# 获取迁移任务列表(nodes)
# nodes是一个tuple的list,每个tuple的第一个元素是app的名称
nodes = default_executor.loader.graph.leaf_nodes()
# 按app名称将node分类
node_map = defaultdict(list)
for node in nodes: # type: tuple
alias = settings.DB_ROUTING.get(node[0], DEFAULT_DB_ALIAS)
node_map[alias].append(node)
# 让每个executor检查各自的迁移任务是否完成
plan = []
for alias, executor in executors.items():
plan.extend(executor.migration_plan(node_map[alias]))

if plan:
# 提示未migrate的app,同原函数,省略
self.stdout.write(self.style.NOTICE("Run 'python manage.py migrate' to apply them.\n"))

下一步就是让manager.py使用我这个check_migrations()而不是Django内置的check_migrations()。在manager.py的文件头修改一下:

1
2
3
4
5
6
7
8
9
10
#!/usr/bin/env python
"""Django's command-line utility for administrative tasks."""
import os
import sys

from multi_db.my_runserver import MyCommand
from django.core.management.commands import runserver

runserver.Command = MyCommand
# ... 略

大功告成

运行看看效果:

1
2
3
4
5
6
$ python manage.py runserver
Watching for file changes with StatReloader
Performing system checks...

System check identified no issues (0 silenced).
April 23, 2020 - xx:xx:xx

终于不提示还有unapplied migration了……我哭了,你呢?


Django多数据库历险记(一)
https://www.yooo.ltd/2020/04/23/Django多数据库历险记(一)/
作者
OrangeWolf
发布于
2020年4月23日
许可协议