Recently we upgraded from Python 2.7
all the way to Python 3.5.

What’s the difference between Python 2 and Python 3?

In the programmer’s utopia,
programming languages should just be useful tools.
But in real life,
programming languages, even versions of programming languages,
are tied to many aspects of an engineering project.
Python 2 and Python 3 are like Ofo and Mobike,
they differ in a few big features,
but essentially they’re the same,
yet some subtle parts are different.

For example Python 2’s most notorious pitfall, Unicode.
Anyone who’s written Python 2 will run into UnicodeDecodeError and UnicodeEncodeError-type errors.
In phone interviews,
when we answered candidates about why we use Python 2,
we’d also throw the pot at boss Xie:

Yeah it’s like this,
our first line of code through the entire initial framework was chosen by the CEO.
He came back from America,
so he didn’t know China uses Chinese,
nor would he hit Unicode-related issues.
So without thinking too much,
he chose Python 2.7.

The real reason is also largely because back then some libraries had better support for Python 2

Besides the Unicode difference,
Python 2 to Python 3 also has some built-in functions that changed.
For example urllib.urlencode -> urllib.parse.urlencode,
StringIO.StringIO -> io.BytesIO.
In general projects you can use six this library for compatibility,
for example the two examples above can be replaced with six.moves.urllib.parse.urlencode
and six.BytesIO,
Django also has a bundled six in django.utils.six.
But we’re building our own environment,
so we don’t really need to consider compatibility,
the migration work’s rough Milestones are as follows:

Some Tiny Work

  1. Ensure your own code’s compatibility.
  2. Ensure third-party libraries’ compatibility.
  3. Ensure unit tests pass.
  4. Ensure test environment and integration tests pass.
  5. Production environment switches to Python 3!
  6. Announce the glorious result!

For example let me talk about the specific things we did:

Boss Liu wanted to swap out boss Xie’s Python 2 very early on.
So around the end of 2016,
we all had this psychological expectation.
Because we’re all PyCharm users,
(JetBrains’s super-handy IDE)
we all turned on Python Compatibility Inspection.
Then when writing code, we paid attention not to use xrange(), dict.iteritems(), print ,
and instead used range(), dict.items(), print().
For Unicode-type problems,
we used the __future__ feature,
ensuring every file’s header was:

# -*- coding: utf-8 -*-
from __future__ import absolute_import, unicode_literals

# First line specifies encoding
# Second line absolute_import specifies preference for importing from absolute path
#   See PEP-0328: https://www.python.org/dev/peps/pep-0328
# Second line unicode_literals specifies strings default to unicode type
#   See PEP-3112: https://www.python.org/dev/peps/pep-3112

For example there’s a really silly py2to3 tool,
even in the best case it can only replace the “own code and system dependencies” mentioned above.
The specific library replacements, correctness verification still have to be done yourself.

Later we roughly went through the third-party libraries used (requirements.txt).
The biggest dependency Django itself is compatible with both 2/3,
then things like celery/redis/requests are also fine on compatibility.
The AWS Python SDK - boto we used before doesn’t support Python 3,
this was easy, the corresponding functionality can be replaced by boto3.

The trickier one was the WeChat library we were using python-wechat-sdk.
This was mainly because we used a lot of WeChat features:
message replies, user management, templates/articles, various assets, etc.
The code itself was already not small,
and the previous code was written quite shittily,
urgently needing refactoring.
So we spent a few months refactoring the code + swapping libraries,
ultimately swapping to the more scientific and easier-to-use wechatpy.

After the code-side prep work was done
(although this is one easy sentence to say,
but in actual development,
it’s impossible to free up a stretch of time exclusively for architecture upgrades.
So we did it all squeezed into the cracks between various business needs,
took roughly half a year.)

After the code-side prep work was done,
we ran the unit tests (UT) with Python 3.
Comprehensive automated testing is your daily safety net and your peace-of-mind at critical moments.
In theory Python 2 to Python 3
shouldn’t have any external behavior differences,
UT shouldn’t error either.
So after fixing the UT errors,
we had confidence to switch to Python 3 to run it.

There was also a small difference here,
which was that our servers were using Ubuntu 14 (Trusty),
Ubuntu14’s default Python3 is Python3.4.
The latest Ubuntu version is Ubuntu 16 (Xenial),
the default Python3 on it is Python3.5.
Boss Liu had been eyeing Ubuntu16 for a long time,
so this time switching to Python 3,
we also upgraded the server version to Ubuntu16.
The can’t-print-Chinese feature was also dropped

Then there was the test environment switching to Python 3,
production environment switching to Python 3.
There was also a small interlude here,
when the production environment switched to Python 3,
we originally wanted to upgrade only a portion of the Servers (canary),
but the tasks thrown by Python 2 Producer in Celery,
Python 3 Consumer could receive them,
but 100% couldn’t complete them…

So we decided this canary’s an egg, no canary!
Found one evening, ordered milk tea and crayfish,
fully switched to Python 3!
Because of the preparation we’d done in advance,
we only spent about ten minutes switching to Python 3~
Lower back no longer ached,
neck no longer hurt,
even Meican’s lunch seemed to taste better.
Now my company’s backend runs on Python 3~

Last is also that everyone’s dev environment also switches to Python 3,
the docstrings we wrote before can be switched to type hintings and other small bits of work.

# In Python 2 you'd write it like this
def old_hint(messages, data=()):
    """
    :type messages: list
    :type data: list[dict]
    :rtype: str
    """
    pass


# In Python 3 you can write it like this
def new_hint(messages: list, data: list(dict) = ()) -> str:
    pass

Summary

The benefits from Python 2 to Python 3 vary from person to person,
different situations have different takes.
Pulling off this tiny thing,
the deeper feelings are:

  • Goals must be clear.
    For example we put Python 3 on our Scrum Board very early.
    Besides the technical team knowing what we wanted to do,
    the product side would also consciously leave room in scheduling for our architecture upgrade.
    Clear goal, everyone’s steps align.

  • Need a driver.
    For example Boss Liu played the role of driving the whole thing,
    orderly distributing pots to everyone, specifically which module who refactors, which library who swaps.
    Some troublesome / pot-bearing tasks no one wanted to do, he also did them,
    like fixing UT compatibility…

  • Mutual trust is great.
    Backend architecture upgrade
    actually affects everything,
    for example the Celery pit mentioned above forced us to urgently roll back the version,
    or you can analogize it to the routine server-stop maintenance gaming companies do every week.
    The pots produced at that time
    were borne by the frontline teammates,
    they helped wipe our asses for us without much complaint…
    ORZ gratitude!

In any case, Python 3 is the trend after all,
Django’s new versions won’t support Python 2,
Celery’s new versions won’t support Python 2 either.
We can’t fall behind.

As teacher Chen Hao often says:

Technical debt can never be paid off, but we have to keep paying it!

Teacher Chen Hao: No, I never said that, you made it up yourself.