前言

这是这个博客里第一篇带有源码Tag的文章。在过去的一个月，笔者深刻地认识到自己在技术深度方面有很大的不足，所以在未来的学习道路上，笔者会尽量多地阅读开源软件的源代码，并将自己的理解以文章的形式表达出来。如有错漏，请各位读者不吝指教。

注：本篇文章所用环境为CPython3.6

functools.wraps

wraps这个内置函数装饰器在笔者的上一篇文章里有过介绍，功能很单纯（拷贝__name__、__doc__等属性），源码也很短（不到50行）。

使用示例

def deco(func: Callable) -> Callable:
    def inner(*args, **kwargs):
        """inner doc"""
        print(f'inner({args}, {kwargs})')

    return inner

@deco
def target(name: str):
    """target doc"""
    print(f'target(name={name})')

在不用wraps装饰器的情况下，target的__name__和__doc__属性都来自deco()里的inner：

1
2
3

print('__doc__  :', target.__doc__)   # __doc__  : inner doc
print('__name__ :', target.__name__)  # __name__ : inner
target(name='bob')                    # inner((), {'name': 'bob'})

这一点也不奇怪，因为此时target是deco()的返回值，其__name__等属性自然也不会是来自被装饰的原始target。但从调用者的角度来说，这是很让人费解的。所以Python内置了wraps装饰器，可以拷贝被装饰函数的部分属性，让装饰后的函数看起来和装饰前一样：

def deco(func: Callable) -> Callable:
    @functools.wraps(func)
    def inner(*args, **kwargs):
        """inner doc"""
        print(f'inner({args}, {kwargs})')

    return inner

1
2
3

print('__doc__  :', target.__doc__)   # __doc__  : target doc
print('__name__ :', target.__name__)  # __name__ : target
target(name='bob')                    # inner((), {'name': 'bob'})

源码

wraps装饰器的入口函数在functools.py的第74行，函数很短小：

def wraps(wrapped,
          assigned = WRAPPER_ASSIGNMENTS,
          updated = WRAPPER_UPDATES):
    """ ... 省略
    """
    return partial(update_wrapper, wrapped=wrapped,
                   assigned=assigned, updated=updated)

上一篇文章我介绍过，带参数的函数装饰器会返回一个函数，用返回的函数来对目标函数进行装饰。于是，上面提到的deco()函数就可以转换为如下形式：

from functools import partial, update_wrapper, WRAPPER_ASSIGNMENTS, WRAPPER_UPDATES

def deco(func: Callable) -> Callable:
    # @functools.wraps(func)
    def inner(*args, **kwargs):
        """inner doc"""
        print(f'inner({args}, {kwargs})')
	
    # 关键的两行
    tmp = partial(update_wrapper, wrapped=func,
                  assigned=WRAPPER_ASSIGNMENTS, updated=WRAPPER_UPDATES)
    inner = tmp(inner)

    return inner

由于functools.partial的功能只是固定参数，所以以上提到的关键的两行可以转换为如下形式：

1 2	`inner = update_wrapper(inner, wrapped=func, assigned=WRAPPER_ASSIGNMENTS, updated=WRAPPER_UPDATES)`

再让我们把目光聚集在update_wrapper()函数的源码上（functools.py第44行）：

WRAPPER_ASSIGNMENTS = ('__module__', '__name__', '__qualname__', '__doc__',
                       '__annotations__')
WRAPPER_UPDATES = ('__dict__',)
def update_wrapper(wrapper,
                   wrapped,
                   assigned = WRAPPER_ASSIGNMENTS,
                   updated = WRAPPER_UPDATES):
    """ ... 省略
    """
    for attr in assigned:
        try:
            value = getattr(wrapped, attr)
        except AttributeError:
            pass
        else:
            setattr(wrapper, attr, value)
    for attr in updated:
        getattr(wrapper, attr).update(getattr(wrapped, attr, {}))
    # Issue #17482: set __wrapped__ last so we don't inadvertently copy it
    # from the wrapped function when updating __dict__
    wrapper.__wrapped__ = wrapped
    # Return the wrapper so this can be used as a decorator via partial()
    return wrapper

结合前两段代码，functools.wraps所做的事情就比较清晰了：通过getattr()从被装饰的函数中提取__module__、__name__等属性，再通过setattr()将提取到的属性赋值给装饰后的函数。也就是说，deco()装饰器最后可以转换为如下形式：

def deco(func: Callable) -> Callable:
    # @functools.wraps(func)
    def inner(*args, **kwargs):
        """inner doc"""
        print(f'inner({args}, {kwargs})')

	# 节省空间，不考虑AttributeError
    setattr(inner, '__module__', getattr(func, '__module__')) 
    # ... 省略
    getattr(inner, '__dict__').update(getattr(func, '__dict__', {}))
    inner.__wrapped__ = func

    return inner

不考虑注释，functools.wraps的代码总共只有23行，可以说是相当短小了🤣

functools.lru_cache

lru_cache是个非常实用的装饰器，这个装饰器可以将函数的结果缓存起来，避免传入相同的参数时重复计算。lru代表这个装饰器在缓存结果时采用最近最少使用算法，在缓存的结果到达一定数量时抛弃那些很久没命中过的缓存。

lru_cache接收两个参数：maxsize和typed。前者用于指定缓存结果数量上限，而后者用于判断缓存是是否考虑参数类型。比如func(1)和func(1.0)在默认情况下会共用缓存，但当typed为True时，这两个函数就会分开缓存。

使用示例

递归计算fibonacci数列这种需要多次重复计算的场景特别适合使用lru_cache装饰器。比如下面这个实现：

1 2	`def fibonacci(k: int) -> int: return k if k < 2 else fibonacci(k - 1) + fibonacci(k - 2)`

当计算fibonacci(6)时，总共需要调用25次fibonacci()函数，其中8次是fibonacci(1)。但使用lru_cache装饰器后，只需要调用7次fibonacci()函数即可得出计算结果，而且只需要在原函数上加上一行代码即可实现：

1
2
3

@functools.lru_cache()
def fibonacci(k: int) -> int:
    return k if k < 2 else fibonacci(k - 1) + fibonacci(k - 2)

比较有意思的是，使用了lru_cache装饰器的函数还会被”附赠”两个方法：cache_info和cache_clear，分别用于展示缓存情况和清空缓存：

>>> fibonacci(6)
8
>>> fibonacci.cache_info()
CacheInfo(hits=4, misses=7, maxsize=128, currsize=7)
>>> fibonacci.cache_clear()
>>> fibonacci.cache_info()
CacheInfo(hits=0, misses=0, maxsize=128, currsize=0)

源码

functool.lru_cache在CPython里有两种实现，一种基于C语言（_functool._lru_cache_wrapper），另一种基于Python（functool._lru_cache_wraper）。后者只有在前者无法导入时才会启用，但本篇文章还是聚焦于后者的源码。

lru_cache装饰器的入口函数在functools.py第448行：

_CacheInfo = namedtuple("CacheInfo", ["hits", "misses", "maxsize", "currsize"])
# ... 略
def lru_cache(maxsize=128, typed=False):
    """ ... 略
    """
    if maxsize is not None and not isinstance(maxsize, int):
        raise TypeError('Expected maxsize to be an integer or None')

    def decorating_function(user_function):
        wrapper = _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo)
        return update_wrapper(wrapper, user_function)

    return decorating_function

和wraps一样，lru_cache也是一个带参数的装饰器。同时，lru_cache还用到了上一节提到的update_wrapper()函数，自动将被装饰函数的属性拷贝到新函数中。这段代码的关键是wrapper = _lru_cache_wrapper(...)，那就让我们来看看_lru_cache_wrapper函数的源码结构（functools.py第485行）：

def _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo):
    # 初始化变量，省略

    if maxsize == 0:
        def wrapper(*args, **kwds):
            # No caching -- just a statistics update after a successful call
            # ... 省略
    elif maxsize is None:
        def wrapper(*args, **kwds):
            # Simple caching without ordering or size limit
            # ... 省略
    else:
        def wrapper(*args, **kwds):
            # Size limited caching that tracks accesses by recency
            # ... 省略

    def cache_info():
        """Report cache statistics"""
        # ... 省略

    def cache_clear():
        """Clear the cache and cache statistics"""
        # ... 省略

    wrapper.cache_info = cache_info
    wrapper.cache_clear = cache_clear
    return wrapper

所以，当某个函数使用了lru_cache装饰器后，调用这个函数时被执行的实际上是源码里定义的各个wrapper，cache_info()和cache_clear()这两个方法也是在源码里为wrapper加上的。接下来，就让我们具体分析源码里定义的各个模块。

maxsize为0

当maxsize为0时，lru_cache不会缓存任何结果，除了调用原始函数并返回结果外，只增加misses（缓存未命中）计数器。

def _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo):
    # ...
    hits = misses = 0
    # ...
    if maxsize == 0:
        def wrapper(*args, **kwds):
            # No caching -- just a statistics update after a successful call
            nonlocal misses
            result = user_function(*args, **kwds)
            misses += 1
            return result
    # ...

maxsize为None

当maxsize为None时，lru_cache会无限制地缓存结果，lru算法并不会生效。

def _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo):
    # Constants shared by all lru cache instances:
    sentinel = object()          # unique object used to signal cache misses
    make_key = _make_key         # build a key from the function arguments
    cache = {}
    hits = misses = 0
    cache_get = cache.get    # bound method to lookup a key or return None
    # ...
    elif maxsize is None:
        def wrapper(*args, **kwds):
            # Simple caching without ordering or size limit
            nonlocal hits, misses
            key = make_key(args, kwds, typed)
            result = cache_get(key, sentinel)
            if result is not sentinel:
                hits += 1
                return result
            result = user_function(*args, **kwds)
            cache[key] = result
            misses += 1
            return result
    # ...

可以看到，lru_cache首先会通过函数的参数生成key值（make_key），再用key值去缓存池（dict）里查找结果。如果命中缓存，则返回缓存并增加hits计数器；否则调用函数返回结果，将结果加入缓存池并增加misses（缓存未命中）计数器。

_make_key

再让我们将目光转到_make_key上来。这个函数会通过函数的参数、typed的值来为函数调用生成key值，用于在缓存池里查找或缓存结果。其源码如下：

class _HashedSeq(list):
    """ ... 省略
    """

    __slots__ = 'hashvalue'

    def __init__(self, tup, hash=hash):
        self[:] = tup
        self.hashvalue = hash(tup)

    def __hash__(self):
        return self.hashvalue

def _make_key(args, kwds, typed,
             kwd_mark = (object(),),
             fasttypes = {int, str, frozenset, type(None)},
             tuple=tuple, type=type, len=len):
    """ ... 省略
    """
    key = args
    if kwds:
        key += kwd_mark
        for item in kwds.items():
            key += item
    if typed:
        key += tuple(type(v) for v in args)
        if kwds:
            key += tuple(type(v) for v in kwds.values())
    elif len(key) == 1 and type(key[0]) in fasttypes:
        return key[0]
    return _HashedSeq(key)

从这段源码可以看出，当函数的参数只有一个、typed为False、且类型为int、str等时，_make_key会直接将该参数作为key值返回。否则，_make_key会生成一个_HashedSeq来存储函数的参数，当typed为True时还会存储参数的类型，且*args和**kwargs这两种参数之间会插入一个object用作分隔。下面这几个例子能帮助理解这段源码：

>>> from functools import _make_key
>>> _make_key((1, '123'), {'name': 'bob'}, False)
[1, '123', <object object at ...>, 'name', 'bob']
>>> _make_key((1, '123'), {'name': 'bob'}, True)
[1, '123', <object object at ...>, 'name', 'bob', <class 'int'>, <class 'str'>, <class 'str'>]
>>> _make_key((1,), {}, False)
1
>>> _make_key((1,), {}, True)
[1, <class 'int'>]

谁错了？

理解了这段源码之后，又会出现新的问题：不管typed为True还是False，对于单参数1和1.0，_make_key()得到的结果的哈希值都不一致：

>>> from functools import _make_key
>>> hash(_make_key((1,), {}, False))    # 1
1
>>> hash(_make_key((1.0,), {}, False))  # [1.0]
3430019387558
>>> hash(_make_key((1,), {}, True))     # [1, <class 'int'>]
3713082220284583106
>>> hash(_make_key((1.0,), {}, True))   # [1.0, <class 'float'>]
3713082219329796056

但此时如果使用lru_cache，就会发现lru_cache是将单参数1和1.0一起缓存的，和本文一开始对lru_cache的描述一致：

@functools.lru_cache()
def test(val):
    print(val)

test(1)    # 输出1
test(1.0)  # 已缓存，不输出

所以这到底是哪里出了问题？答案就是前文提到的C语言实现和Python实现的区别。C语言实现的_lru_cache_wrapper()并不使用Python实现的_make_key()，而CPython默认使用的是前者，所以会造成这里提到的问题。

此时如果将修改CPython内置的functools.py，只使用Python实现的_lru_cache_wrapper()，就会发现结果是符合预期的。1和1.0通过_make_key()得到的哈希值不同，自然也就会分开缓存：

try:
    # 不使用C语言实现的版本
    # from _functools import _lru_cache_wrapper
    pass
except ImportError:
    pass

@functools.lru_cache()
def test(val):
    print(val)

test(1)    # 输出1
test(1.0)  # 输出1.0

maxsize为正整数

当maxsize为正整数时，lru_cache不能无限制地缓存函数结果，必须用lru算法清除缓存。遗憾的是，由于受限于篇幅，且笔者并没有弄懂这部分的代码（这才是重点吧……），所以这部分源码就先按下不表。

cache_info和cache_clear

def _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo):
    # ...
    hits = misses = 0
    full = False
    cache_len = cache.__len__  # get cache size without calling len()
    lock = RLock()           # because linkedlist updates aren't threadsafe
    root = []                # root of the circular doubly linked list
    root[:] = [root, root, None, None]     # initialize by pointing to self
    # ...
    def cache_info():
        """Report cache statistics"""
        with lock:
            return _CacheInfo(hits, misses, maxsize, cache_len())

    def cache_clear():
        """Clear the cache and cache statistics"""
        nonlocal hits, misses, full
        with lock:
            cache.clear()
            root[:] = [root, root, None, None]
            hits = misses = 0
            full = False

这两个函数的作用比较单纯，分别是返回当前缓存状态和清空缓存。源码也简单易懂，这里就不过多介绍。其中root、lock、full等变量是只有在maxsize为正整数，或者说启用lru算法时才需要的，本文介绍的、针对其它情况的源码不需要这几个变量。

尾声

本来笔者还想分析singledispatch装饰器的源码的，但一个lru_cache就把笔者折腾得够呛，这还是跳过了最复杂的lru算法的情况下……路漫漫其修远兮啊。

#Python #源码

Python部分内置函数装饰器源码浅析

https://www.yooo.ltd/2020/06/20/python-decorators-source-code/

作者

OrangeWolf

发布于

2020年6月20日

许可协议

Python中的函数参数与参数传递上一篇

读《流畅的Python》：函数装饰器与闭包下一篇