Jekyll2021-12-10T21:52:20+00:00http://zecong.hu/feed.xmlZecong Hu’s blogThe personal blog of Zecong Hu. Contains diaries, tech articles, and gibberish.
Zecong HuGSA Ultra 20202020-08-09T21:50:00+00:002020-08-09T21:50:00+00:00http://zecong.hu/2020/08/09/gsa-ultra-2020<p><a href="https://www.gsa-ultra.com/">GSA Ultra</a> 是英国的一家量化交易公司 GSA Capital 办的比赛。我之所以知道这个比赛是因为两个月前在学校邮箱收到了邮件。邮件内容非常神秘,点开链接后只看到标题和一个报名表,甚至不知道是什么样的比赛。抱着玩玩的心态报名了。</p>
<!--more-->
<p>这个赛制其实还挺有意思的,但实在是有点内卷,对选手负担挺大。比赛时长 10 天,共 18 题。每题总时限 50s,共 10 个测试点,每个测试点的得分与通过人数成反比。因此,虽然暴力通常能过多个点,但暴力能过的点分数都较低,最后几个大数据点分数较高。</p>
<p>通过全部测试点后,运行时间向下取整后最快的人可以获得双倍的分数。提交次数不限而且重复提交没有罚时。可想而知,所有人都会不停刷时间,最后就比谁的常数卷得快。</p>
<p>另外就是,比赛只允许用 Python 3.8 提交,而且没有第三方库。每道题需要实现为一个函数,返回答案。通常返回值都是字符串或者整数,所以遇到答案是实数的情况,要么会取整,要么会表示为分数,要么保留若干位小数表示成字符串。</p>
<p>到比赛截止,我排到了第 12 名,18 道题一共做出来 14 道,其中 6 道时间最快。代码在 <a href="https://github.com/huzecong/gsa-ultra-2020">这个 repo</a> 里。</p>
<h3 id="a1-hello-world">A1: Hello, World!</h3>
<p>就是 a + b problem。</p>
<h3 id="a5-spacedogs">A5: Spacedogs</h3>
<p><strong>题意:</strong>给定 $k$ 维空间中 $n$ 个点的初始质量和位置。每轮合并质量最小的两个点,合并后的点位于两点连线中点处,质量为二者之和。重复合并至只剩一个点为止,输出其坐标。$n\leq 10^4,\ k\leq 5$。</p>
<p><strong>题解:</strong>用 k-d tree 加速模拟就行。我不会写 k-d tree,直接在 GitHub 上找了个库往上贴。最后跑了 23s,是最快提交的四倍。</p>
<h3 id="a7-doubling-danny">A7: Doubling Danny</h3>
<p><strong>题意:</strong>求 $\sum_{i=1}^n 2^{a_i} \bmod{m}$,其中 $n\leq 5000,\ a_i\leq 2^{5000},\ m\leq 2^{2203}-1$ 且 $m$ 为质数。</p>
<p><strong>题解:</strong>这模数实在有点大,整得我有点懵。最直观的想法是首先把 $a_i$ 都对 $m-1$ 取模,然后快速幂,但这样没法过所有点。之后就是往上堆常数优化,比如十进制快速幂,还有对指数排序后做差,每次只算差值次幂然后累乘。这两个优化都加上之后可以 41s 通过,但一众人的提交都是 0s 过的,不知道是什么黑魔法。</p>
<h3 id="a9-jumping-jimmy">A9: Jumping Jimmy</h3>
<p><strong>题意:</strong>很裸的静态 RMQ 问题,$n\leq 10^5$。</p>
<p><strong>题解:</strong>用 Sparse Table(ST 表,似乎没有更好的中文名称)就能过,但跑不到最快。但毕竟是经典问题,网上能搜到 <a href="https://web.stanford.edu/class/cs166/lectures/00/Small00.pdf">一些</a> <a href="https://web.stanford.edu/class/cs166/lectures/01/Small01.pdf">课件</a>,里面提到可以通过恰当分块,块内暴力块间 ST 表做到 $O(n)$ 预处理 $O(\log n)$ 询问,这样就可以 1s 通过成为最快提交了。理论最优复杂度可以做到 $O(n)-O(1)$,但常数上未必有分块暴力快。</p>
<h3 id="a12-dicey-situation">A12: Dicey Situation</h3>
<p><strong>题意:</strong>有 $d$ 个一模一样的 $k$ 面骰子,第 $i$ 面上有数字 $a_i$。Alice 首先掷全部骰子,然后她可以重掷 $r-1$ 轮。每轮中,她可以选择任意数量的骰子(包括全选和全不选)保持不变,将剩下的骰子重新掷一遍。Alice 结束后,Bob 进行同样的操作(掷全部骰子,然后重掷 $r-1$ 轮)。最后比较两人的骰子点数和,较大的一方得 1 分,如果相等则都得 0 分。双方都想最大化自己减去对方的分数,并都采取最优策略。求 Bob 分数减去 Alice 分数的期望值。$d,r\leq 8,\ k\leq 6,\ 1\leq a_i\leq 20$。</p>
<p><strong>题解:</strong>这题很有意思。首先 Bob 是有优势的,他可以根据 Alice 掷出的点数和决定每轮重掷哪些骰子。其次,双方如果要重掷骰子,那么一定会选择点数最低的若干颗。</p>
<p>假设已知 Alice 按照最优策略掷出每种点数和的概率,我们可以 DP 求出 Bob 的期望收益。首先枚举 Alice 的点数和,记为 $A$。令 $f[i][S]$ 表示还剩 $i$ 次重掷,目前骰子点数的状态为 $S$ 时的期望收益。所有初值 $f[0][S]$ 直接与 $A$ 比较就能求出。对 $i>0$ 的情况,枚举重掷的骰子数量 $j$,求出只保留点数最大的 $d-j$ 颗骰子之后的状态 $S’$,再枚举重掷 $j$ 颗骰子能掷出的所有点数组合即可转移。$S$ 的方案数其实相当于 $d$ 个无标号小球放入 $k$ 个有标号盒子的方案数,为 $\binom{d+k-1}{k-1}$,最大为 1287。这样可以求出 Bob 在 Alice 点数和为 $A$ 时的期望收益 $B(A)$。</p>
<p>而对于 Alice 的策略,一开始的想法是:Alice 的目标就是掷出尽可能大的点数和,因此每个骰子可以独立计算。但事实上并非如此,Alice 的策略与其当前点数和有关。举个例子,Alice 有 3 个每面为 $[1,2,3]$ 的骰子,其中一颗的点数为 2。如果剩下两颗的点数为 $[1,1]$,那么重掷这颗骰子的期望收益的增量为 $(B(3)+B(5)-2B(4))/3$;如果为 $[3,3]$,则期望收益为 $(B(7)+B(9)-2B(8))/3$。而这两个值未必相同,只有当这个值大于 0 时,Alice 才会重掷这颗骰子。因此,我们应当用与 Bob 相同的 DP 来求 Alice 的期望收益,初值 $f[0][S]$ 为 $B(S\text{ 的点数和})$。</p>
<p>剩下的就是一些常数优化了。比如,用 $3k$ 个 bit 表示状态 $S$;缓存每个 $S’$ 转移得到的结果避免重复计算;当 $S$ 的点数和大于 $A$ 时期望收益一定为 1。加上优化之后可以 25s 通过,但最快提交只花了 5s。</p>
<h3 id="a13-explodium">A13: Explodium!</h3>
<p><strong>题意:</strong>给定数轴上 $n$ 个目标的位置 $p_i$ 及其权重 $s_i$,你需要扔若干炸弹摧毁所有目标。落在 $x$ 处且能量为 $g$ 的炸弹对目标 $i$ 造成的伤害为 $\max(0,g-\vert x-p_i\vert)$。一个目标受到的总伤害是来自每枚炸弹的伤害之和,总伤害达到 $s_i$ 则其被摧毁。求炸弹能量和的最小值。$n\leq 10^5$。</p>
<p><strong>题解:</strong>一个很关键的结论是:最优方案中,每个目标只会被一个炸弹炸到,也即每个炸弹的“杀伤范围”一定不重合。假设有两个炸弹都能炸到位于其间的某个目标,那么改成在目标位置放一个炸弹覆盖相同的区间,需要的能量是更低的。</p>
<p>基于结论,可以得到很显然的 DP:令 $f[r]$ 表示覆盖前 $r$ 个目标的最小能量和,转移时枚举 $l$ 然后考虑一个炸弹覆盖 $[l,r]$ 的区间所需的能量。计算能量时,可以枚举能量所在的范围,假设为第 $k$ 到 $k+1$ 个目标之间,那么最小能量为其左侧的 $s_i-p_i$ 的最大值加右侧的 $s_i+p_i$ 最大值,即 $\max_{l\leq i\leq k} (s_i-p_i)+\max_{k<i\leq r} (s_i+p_i)$,同时还得保证按照这一能量算出来的炸弹位置确实在 $[k,k+1]$ 之间。朴素实现的复杂度为 $O(n^3)$。</p>
<p>能量计算的过程可以进一步优化。考虑相邻的两个目标 $i$ 和 $j$,如果 $s_i-s_j\geq \vert p_i-p_j\vert$,那么可以摧毁 $i$ 的炸弹必然可以同时摧毁 $j$,因此可以直接删去 $j$。这样一来,计算覆盖区间所需能量时,左右侧取到最大值的目标必然是端点处的目标。化简转移方程后,直接维护前缀最值即可 $O(1)$ 转移,总复杂度 $O(n)$,可以 0s 通过。</p>
<h3 id="a15-grand-hotel">A15: Grand Hotel</h3>
<p><strong>题意:</strong>酒店有编号 $0\sim n-1$ 的房间,初始为空。有 $m$ 次操作,操作共两类:</p>
<ul>
<li>入住:给定 $x$,找到编号连续且起始编号最小的 $x$ 个空房间,标记为非空。</li>
<li>退房:给定 $i$,将第 $i$ 次入住操作标记的房间标为空。</li>
</ul>
<p>求每次入住时标记的编号最小的房间的编号和。$n\leq 10^{11},\ m\leq 10^6$。</p>
<p><strong>题解:</strong>我写了一个 $O(n\log n)$ 的做法,没能通过最后一个测试点。做法大致是,维护所有连续的空区间,然后维护若干哈希表,允许快速查询:左端点为 $l$ 的区间右端点、右端点为 $r$ 的区间左端点、长度为 $d$ 的所有区间的有序列表(只关心最小值,可以用堆)。另外,再用线段树维护长度 $\geq d$ 的所有区间中左端点的最小值,在更新哈希表的时同时维护。可想而知常数不小。</p>
<p>大多数提交都花了超过 30s,最快的只要 11s。很难想象有 $O(n)$ 的做法,估计是有常数更小的实现吧。</p>
<h3 id="a16-roshambolic">A16: Roshambolic</h3>
<p>冷知识:题目名字是一个“混成词”(portmanteau),是由 roshambo(石头剪刀布)和 shambolic(混乱的)重叠拼起来的。Roshambo 是美国加州北部常用的叫法,据说来源自日语里的「じゃん拳ぽん」(jyan-ken-pon),于二十世纪初传到美国。而据明代文献记载,石头剪子布则是最早起源于中国的游戏。</p>
<p><strong>题意:</strong>题意大概是两人用牌玩石头剪刀布,每张牌上写了石头、剪刀,或者布。给定牌的顺序,双方轮流打出牌堆顶的牌,胜者获得这两张牌并加入牌堆底,平局则各自收回。问多少回合后有一方没牌。牌数$\;\leq 1000$。</p>
<p><strong>题解:</strong>很直接的模拟题,但要跑最快(2s)还是需要一些实现技巧。</p>
<h3 id="d1-gone-to-seed">D1: Gone to Seed</h3>
<p><strong>题意:</strong>$2^k$ 个选手进行淘汰赛,每轮中两两配对并淘汰败者,$k$ 轮后决出冠军。给定每个选手的能力 $s_i$,两人 $i$ 和 $j$ 对战,$i$ 胜的概率为 $s_i/(s_i+s_j)$。求 0 号选手获胜的概率,以最简分数形式表示。$k\leq 4,\ 1\leq s_i\leq 20$。</p>
<p><strong>题解:</strong>状压 DP,用 $f[i][S]$ 表示集合 $S$ 中的人进行淘汰赛,最后 $i$ 胜出的概率。转移时枚举 $S$ 的大小为 $\vert S\vert/2$ 的子集,再枚举两个子集各自的胜者进行转移,最后答案为 $f[0][\text{全集}]$。由于我们最后只关心 0 号选手获胜的概率,当枚举的子集中包含 0 时,则只需考虑 0 是胜者的情况,可以少计算很多状态。</p>
<p>分数的处理其实有一些技巧。虽然 Python 内置了 <code class="language-plaintext highlighter-rouge">fractions.Fraction</code> 分数类,但其实现中每做一次都会求一次最大公因数,而题目中算到最后分子分母都会变成很大的数,求 gcd 的开销很大。因此,我自己实现了需要手动化简的分数类,只在一个状态的结果算出来之后才化简。为了避免中间过程的分母太大,做加法时特判被加数的分母是否是加数分母的倍数。</p>
<p>最终运行时间 18s,而最快的提交只跑了 7s。</p>
<h3 id="d2-hardcore-parkour">D2: Hardcore Parkour</h3>
<p><strong>题意:</strong>略。</p>
<p><strong>题解:</strong>很简单的 DP,$f[i][j]$ 表示在平台 $i$ 速度为 $j$ 的最小耗时。直接实现的复杂度是 $O(100n)$(10 种速度,转移最多考虑 10 个前一列的平台)。运行时间 6s,最快 1s。用单调队列可以把转移复杂度降为 $O(1)$,但我没写,压压常数估计就能跑进 1s 了。</p>
<h3 id="d3-wascally-wabbits">D3: Wascally Wabbits</h3>
<p><strong>题意:</strong>有 $n$ 只来自同一家族的兔子,其亲子关系构成一棵树。每只兔子有一对基因控制眼睛颜色,显性基因为 <code class="language-plaintext highlighter-rouge">R</code>,隐性基因为 <code class="language-plaintext highlighter-rouge">g</code>。当一对基因里有显性基因时(<code class="language-plaintext highlighter-rouge">RR</code> 或 <code class="language-plaintext highlighter-rouge">Rg</code>),兔子表现出显性性状(红眼);只有当基因型是 <code class="language-plaintext highlighter-rouge">gg</code> 时,才表现出隐性性状(绿眼)。已知在遗传时,每个基因都有 $p$ 的概率变异。给定家族树,以及部分兔子的性状,求绿眼兔子数量的期望值。$n\leq 10^4$,答案需要表示为分数。</p>
<p><strong>题解:</strong>期望线性可加,所以我们分开考虑每只未知性状的兔子。三种基因型 <code class="language-plaintext highlighter-rouge">RR</code>、<code class="language-plaintext highlighter-rouge">Rg</code>、<code class="language-plaintext highlighter-rouge">gg</code> 的先验概率分别为 $1/4$、$1/2$、$1/4$,我们要求的是兔子基因型是 <code class="language-plaintext highlighter-rouge">gg</code> 的后验概率。由贝叶斯公式,我们可以转而求:当这只兔子基因型是 <code class="language-plaintext highlighter-rouge">gg</code> 时,其它兔子的性状与已知相符的概率。注意到这里亲子关系颠倒是没有影响的,所以我们可以以枚举的兔子为根做树形 DP 计算上述概率。以每只兔子为根做一次树形 DP 的总复杂度为 $O(n^2)$,但我们可以用经典的两遍 DFS 方法将其优化至 $O(n)$。</p>
<p>不过这里有个意料之外的问题:分数计算开销实在是太大的。在树高较大的情况下,中间结果可以达到 $p^{2n}$,即便题目约定分母不超过 100,也有四万位了。考虑运算时间的话,实际复杂度应为 $O(n^{2.5})$ 甚至 $O(n^3)$(Python 的乘法实现了 Karatsuba 算法,复杂度为 $O(n^{1.58})$;gcd 的复杂度不清楚)。维护 $p$ 的多项式复杂度上和直接做分数运算没有太大差别,常数反而更大,因此也意义不大。</p>
<p>最后我只结合了 D1 中的手动化简分数实现了 $O(n)$ 的 DP,通过了 8 个点。最快的提交居然只要 1s,过于恐怖。</p>
<h3 id="d4-task-genie">D4: Task Genie</h3>
<p><strong>题意:</strong>给定 $n$ 个节点的树,每个节点有正整数权重 $c_i$。你可以选择最多 $w$ 个节点,将其权重改为 0。问修改后,从根节点出发的最长链最短是多少。链的长度定义为途径节点的点权和。$n\leq 10^5,\ w\leq 5\cdot 10^4,\ c\leq 10^3$。</p>
<p><strong>题解:</strong>这题我只会写暴力。我就实现了一个 $O(nw^2)$ 的树上背包,过了 7 个点。我猜正解是某种数据结构优化的贪心加调整,但我没有证据。</p>
<h3 id="d5-snakes-and-ladders">D5: Snakes and Ladders</h3>
<p><strong>题意:</strong><a href="https://boardgamegeek.com/boardgame/5432/chutes-and-ladders">“蛇与梯子”</a>是一款经典的 <del>无聊</del> 桌游。棋盘上有 $n$ 个格子,0 为起点,$n-1$ 为终点。玩家掷骰子前进,当停留在梯子的起点时,则前进到梯子的终点;当停留在蛇的起点时,则后退到蛇的终点。共有 $l$ 个梯子和 $s$ 条蛇,所有蛇和梯子的起点、终点均不重合。问期望情况下需要多少步到达终点。$n \leq 10^5,\ 1\leq s,l\leq 100$。</p>
<p><strong>题解:</strong>求图上随机游走的期望步数是一类经典问题。本题中,如果没有蛇,那么图构成有向无环图,可以直接 DP 计算期望。有蛇则出现了环,需要列方程组后高斯消元求解,复杂度为 $O(n^3)$。设格子 $i$ 走到终点的期望步数为 $x_i$,则方程为 $x_i=\frac{1}{6}\sum_{j=1}^6 x_{p_{i+j}}+1$,其中 $p_i$ 代表停在第 $i$ 个格子时会移动到哪里,如果格子是梯子的起点则移动到其终点;否则停留在原地。</p>
<p>可以注意到,方程组中的绝大多数方程都是“平凡”的。如果格子 $i$ 不是蛇的终点,那么 $x_i$ 只会出现在序号 $i$ 之前的方程中,因此我们可以直接将其代入从而消去变量 $x_i$,最后只剩下蛇的终点所在格子对应的 $l$ 个变量。这一过程可以用 $O(nl)$ 的 DP 实现,之后只需 $O(l^3)$ 的时间求解简化的方程组即可。至此,已经可以 10s 通过了。</p>
<p>但这题最快的提交是 0s,事实上也确实可以继续优化。目前的瓶颈在于 $O(nl)$ 的 DP,但可以注意到,DP 的大部分过程同样是在做“平凡”的转移,即计算之后 6 个格子的系数然后除以 6。把这一转移写成矩阵形式并预处理系数,则可以在 $O(6l)$ 的时间内完成连续一段的平凡转移,那么复杂度可以降为 $O(l^2)$。最终运行时间应该在 1s 左右,运气好刷几次可以刷成 0s。可惜的是,比赛结束后官方重测所有提交,我的运行时间又掉回了 1s。</p>
<h3 id="d6-two-quests">D6: Two Quests</h3>
<p><strong>题意:</strong>给定两个序列 $a$ 和 $b$,长度分别为 $n$ 和 $m$。要求将两个序列合并为一个长度为 $n+m$ 的序列 $c$,要求新序列内来自 $a$ 的元素相对顺序不变,$b$ 同理。求 $\vert c_1-0\vert+\sum_{i=2}^{n+m} \vert c_i-c_{i-1}\vert$ 的最小值。$n,m\leq 2000$。</p>
<p><strong>题解:</strong>令 $f[i][j][0/1]$ 表示已经选择了序列 $a$ 的前 $i$ 个元素和 $b$ 的前 $j$ 个元素,且上一个选择的元素来自序列 $a$ 或 $b$($0/1$)的最小值。转移时枚举下一个选择的元素来自 $a$ 还是 $b$ 即可。这样可以 20s 左右通过。</p>
<p>进一步优化也不难。按照对角线顺序($i+j$)转移,则可以将空间复杂度优化到 $O(n+m)$。由于 Python 没有原生的二维数组,空间上的优化可以大幅减少寻址时间。再加上其它一些常数优化,可以把运行时间压到 8s。直到比赛最后一天前,这都是最快的提交,可最后一天被人卷出了 6s,而我已经不知道自己的代码还要怎么榨出性能了。</p>
<h3 id="d8-garden-path">D8: Garden Path</h3>
<p><strong>题意:</strong>$n\times m$ 的格子中有 $k$ 个格子上有障碍物,问用 $1\times 2$ 的砖块铺满剩下格子的方案数,取模输出。$n\leq 6,\ m\leq 10^9,\ k\leq 1000$。</p>
<p><strong>题解:</strong>很经典的骨牌密铺问题,有个很经典的状压 DP 矩阵快速幂做法。状态表示比较巧妙,我们只需要记录一行中每个格子是否是竖直摆放的骨牌的上半部分即可。考虑两个二进制数表示的状态 <code class="language-plaintext highlighter-rouge">x</code> 和 <code class="language-plaintext highlighter-rouge">y</code>,<code class="language-plaintext highlighter-rouge">x</code> 能向 <code class="language-plaintext highlighter-rouge">y</code> 转移当且仅当:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">x & y == 0</code>:相邻行的同一列不能有两个竖直摆放的上半部分。</li>
<li><code class="language-plaintext highlighter-rouge">x | y</code> 的所有连续 0 位长度为偶数。<code class="language-plaintext highlighter-rouge">y</code> 的 1 位自然是被竖直的骨牌占据了,同样,<code class="language-plaintext highlighter-rouge">x</code> 的 1 位在 <code class="language-plaintext highlighter-rouge">y</code> 也需要对应的骨牌下半部分。剩下的位置必须被水平的骨牌占满,因此每段的长度必须为偶数。</li>
</ul>
<p>障碍物则可以类似考虑。之后便是用矩阵快速幂加速转移连续的不含障碍物的行,复杂度为 $O(2^{3n}k\log m)$。要优化到 0s 通过,则需要下一番功夫做常数优化。首先则是预处理转移矩阵的次幂并打表。用 Python 打表其实很方便,直接 <code class="language-plaintext highlighter-rouge">pickle.dumps</code> 就完事儿。在做快速幂时,注意到我们其实是用行向量左乘矩阵的次幂,因此我们优先做矩阵和向量的乘法,将快速幂优化到 $O(2^{4n}\log m)$。用稀疏矩阵计算还可以进一步优化常数。最后再预处理一行内每种障碍物情况下的转移(稀疏)矩阵并打表,方可强压在 0s 内通过。但这个成绩也不稳定,赛后重测又给我整成了 1s。</p>
<h3 id="d10-paper-cut">D10: Paper Cut</h3>
<p><strong>题意:</strong>有一张 $c\times d$ 的纸,要将其裁剪成每块大小 $a\times b$ 的小块。每次操作可以选择一块并沿网格线将纸裁成两块。问是否可以按要求完成裁剪。$c,d< 10^{12},\ a,b<10^6$,最多 1000 组询问。</p>
<p><strong>题解:</strong>这题我其实是找规律做出来的。先上结论,可以完成裁剪的情况需要满足至少一个条件:</p>
<ol>
<li>$c$ 是 $a$ 的倍数,且 $d$ 是 $b$ 的倍数。</li>
<li>$c$ 是 $\mathrm{lcm}(a,b)$ 的倍数,且 $ax+by=d$ 有非负整数解。</li>
<li>条件 1 或 2 中,交换 $c$ 和 $d$。</li>
</ol>
<p>条件 2 要满足的前提是 $d$ 是 $\gcd(a,b)$ 的倍数。可以用拓展欧几里得算法求出 $ax+by=\gcd(a,b)$ 的一组解 $(x_0,y_0)$,那么 $(\frac{x_0d+kb}{\gcd(a,b)},\frac{y_0d-ka}{\gcd(a,b)})$ 同样是一组解,判断是否有整数 $k$ 使得解非负即可。运行速度很快,可以 0s 通过。</p>
<p>知道结论后,再回头尝试证明。充分性很好证明,考虑必要性。我们将所有可以完成裁剪的 $(c,d)$ 分成两类,一类是裁出的所有 $a\times b$ 小块”朝向“相同的,即满足条件 1,另一类则是裁出来朝向不同的。用归纳法加一大堆分类讨论可以证明第二类与条件 2 等价,这里略去。不过我自己只是脑补了一下,没有细想,不太清楚是不是严格的证明。直观理解一下,条件 2 代表可以将纸裁成两部分,每部分都递归满足条件 2,也可以是一部分水平切割,另一部分竖直切割。</p>
<h3 id="d11-satisfaction">D11: Satisfaction</h3>
<p><strong>题意:</strong>给定包含 $n$ 个布尔变量的布尔表达式,仅包含与或非三种运算和括号。求所有 $2^n$ 种取值组合中,有多少组可以让表达式值为真。$n\leq 26$,表达式长度不超过 1000。</p>
<p><strong>题解:</strong>这题我写的暴力。暴力非常好写,用 Python 的 <code class="language-plaintext highlighter-rouge">compile</code> 和 <code class="language-plaintext highlighter-rouge">eval</code> 就行,甚至不用自己写 parser。加一些常数优化,最多可以过 8 个点。我还探索了一些别的做法,比如 parse 成语法树之后尝试一些简单的化简,但效果甚微。</p>
<p>这一类布尔可满足性问题是 NPC 的,所以我猜正解也就是搜索加剪枝。搜索时优先搜索包含变量少的子树,可能可以找到一组变量,使得其取特定值时整个表达式恒真或恒假。举个例子,如果 <code class="language-plaintext highlighter-rouge">AND</code> 节点的一个儿子只包含一个变量,那么该变量为假时整棵子树为假,这样可以大幅减小搜索空间。不过我并没有写这个做法,感觉上也并不太好写。</p>
<h3 id="d14-prime-feast">D14: Prime Feast</h3>
<p><strong>题意:</strong>给定长度为 $n$ 的数字串,每次从中删去一个长度不超过 4 的子串,满足子串不含前导零、是个质数,且比前一次删去的数字要小。问删去的所有数字之和最大是多少。$n\leq 100$。</p>
<p><strong>题解:</strong>如果没有删除的数字递减的限制,则可以用区间 DP,转移有两种情况,要么是把区间分成两半分开处理,要么是区间两个端点会被一起删掉。如果是后者,则枚举删去的数字长度,然后再枚举区间中间最多两个位置。复杂度 $O(n^4)$。</p>
<p>但有数字递减的限制就比较麻烦了,不知道是否还有巧妙的 DP 做法。最后我实现的是搜索,只要加上最优性剪枝就能跑得贼快,甚至可以 0s 通过。感觉有点不够优美。</p>Zecong HuGSA Ultra 是英国的一家量化交易公司 GSA Capital 办的比赛。我之所以知道这个比赛是因为两个月前在学校邮箱收到了邮件。邮件内容非常神秘,点开链接后只看到标题和一个报名表,甚至不知道是什么样的比赛。抱着玩玩的心态报名了。Inheritance for Python Namedtuples2019-08-10T19:05:00+00:002019-08-10T19:05:00+00:00http://zecong.hu/2019/08/10/inheritance-for-namedtuples<blockquote>
<p><strong>tl;dr:</strong> Inheritance for the Python built-in namedtuple does not work as we expect. This blog post demonstrates how to create a custom namedtuple class that supports meaningful inheritance, and more.</p>
</blockquote>
<p>I’ve always under-appreciated the Python <a href="https://docs.python.org/3/library/collections.html#collections.namedtuple"><code class="language-plaintext highlighter-rouge">collections.namedtuple</code></a> class. For those who are unfamiliar, a namedtuple is a fancier tuple, whose elements can also be accessed as attributes:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">collections</span> <span class="kn">import</span> <span class="n">namedtuple</span>
<span class="n">Point</span> <span class="o">=</span> <span class="n">namedtuple</span><span class="p">(</span><span class="s">'Point'</span><span class="p">,</span> <span class="p">(</span><span class="s">'x'</span><span class="p">,</span> <span class="s">'y'</span><span class="p">))</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">Point</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">p</span><span class="p">.</span><span class="n">x</span><span class="p">,</span> <span class="n">p</span><span class="p">.</span><span class="n">y</span><span class="p">)</span> <span class="c1"># 1 2
</span></code></pre></div></div>
<p>This allows using meaningful names for the elements, rather than having to remember what are stored under each index.</p>
<p>What I don’t like about it, however, is the ugly syntax: attribute names are stored as strings, the class name is repeated, and most importantly, refactoring is error-prone, even within powerful IDEs. You can rename class attributes and all the references easily in PyCharm, but you can’t do that for namedtuples. What I wanted was a syntax like that of the C/C++ <code class="language-plaintext highlighter-rouge">struct</code>, with a default constructor to assign values to each field.</p>
<p>Luckily, this changed in Python 3.6, with the implementation of <a href="https://www.python.org/dev/peps/pep-0526/">PEP 526</a>. This version provides <a href="https://docs.python.org/3/library/typing.html#typing.NamedTuple"><code class="language-plaintext highlighter-rouge">typing.NamedTuple</code></a>, a typed version of namedtuple with a brand new syntax. Instead of the example above, now you can write:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">NamedTuple</span>
<span class="k">class</span> <span class="nc">Point</span><span class="p">(</span><span class="n">NamedTuple</span><span class="p">):</span>
<span class="n">x</span><span class="p">:</span> <span class="nb">int</span>
<span class="n">y</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">Point</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">p</span><span class="p">.</span><span class="n">x</span><span class="p">,</span> <span class="n">p</span><span class="p">.</span><span class="n">y</span><span class="p">)</span> <span class="c1"># 1 0
</span></code></pre></div></div>
<p>This snippet works in exactly the same way, but adds type annotations for each field, and also supports default values (but fields with default values have to follow those without, just as in a function declaration). The syntax is also much more natural (to a former C++ user, at least). But there is still something we can’t do: inheritance.</p>
<p>If you ever tried to inherit a namedtuple, you will find that it doesn’t work as you expect. As illustrated in <a href="https://stackoverflow.com/questions/42385916/inheriting-from-a-namedtuple-base-class-python">this StackOverflow question</a>, the new attributes added in the subclass don’t show up, and you’d have to manually override the constructor, which is kind of against the intention of using namedtuples in the first place.</p>
<p>Now, you may think, let’s just hack into the internals and somehow make inheritance work. If you were ever in the mood to peek under the hood of this <code class="language-plaintext highlighter-rouge">namedtuple</code> class, you’d find that it’s surprisingly complicated for what seemed like a small and easy piece of functionality. But don’t be afraid, the logic is actually pretty straightforward — it just involves some details of Python’s internal data model.</p>
<p>Before we begin, let’s summarize what we want to achieve through this blog post:</p>
<ul>
<li>Make inheritance work for <code class="language-plaintext highlighter-rouge">typing.NamedTuple</code> as we expect.</li>
<li>Also allow multiple inheritance, if there are no overlaps in field names among the base classes.</li>
<li>Remove the constraint on ordering for fields with default values.</li>
</ul>
<h2 id="instance-class-and-metaclass">Instance, Class, and Metaclass</h2>
<p>Before diving into the actual code, let’s get a couple of concepts clear. We need to know what metaclasses are, and how a class is created, before we can customize that behavior.</p>
<p>If you’re not familiar with metaclasses, I recommend reading <a href="https://blog.ionelmc.ro/2015/02/09/understanding-python-metaclasses/">this wonderful article</a>, which gives a comprehensive explanation of the entire topic. But here, I will try to briefly explain the concepts that will be useful for our goals.</p>
<h4 id="class-instance-and-the-__new__-method">Class Instance and the <code class="language-plaintext highlighter-rouge">__new__</code> method</h4>
<p>We’re all familiar with <strong>class</strong>es. An <strong>instance</strong> of a class is what you’d get after calling the class constructor.</p>
<p>You might think the Python class constructor is <code class="language-plaintext highlighter-rouge">__init__</code>, but that’s not the whole story. When you construct an instance, the <code class="language-plaintext highlighter-rouge">__new__</code> method is first called with the same arguments you pass to <code class="language-plaintext highlighter-rouge">__init__</code>. <code class="language-plaintext highlighter-rouge">__new__</code> is responsible for the actually creating an instance of the class, and that instance is then passed into <code class="language-plaintext highlighter-rouge">__init__</code> as the <code class="language-plaintext highlighter-rouge">self</code> argument.</p>
<p>Note that <code class="language-plaintext highlighter-rouge">__new__</code> is considered a <a href="https://docs.python.org/3/library/functions.html#classmethod">class method</a> (because the instance is not even created at the point of call), so its first argument is <code class="language-plaintext highlighter-rouge">cls</code> instead of <code class="language-plaintext highlighter-rouge">self</code>. For most classes, the <code class="language-plaintext highlighter-rouge">__new__</code> method just calls the super class <code class="language-plaintext highlighter-rouge">__new__</code>, which all traces back to <code class="language-plaintext highlighter-rouge">object.__new__(cls)</code>.</p>
<p>There are special cases though — you can return stuff that is not an instance of type <code class="language-plaintext highlighter-rouge">cls</code> (or any of its subclasses), in which case, the <code class="language-plaintext highlighter-rouge">__init__</code> method will not be called. A common use case for this is to entirely disable the behaviors of a class:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">ProgressBar</span><span class="p">:</span> <span class="c1"># wrap around an iterable to print a progress bar to terminal
</span> <span class="k">def</span> <span class="nf">__new__</span><span class="p">(</span><span class="n">cls</span><span class="p">,</span> <span class="n">iterable</span><span class="p">,</span> <span class="n">enable</span><span class="o">=</span><span class="bp">True</span><span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">enable</span><span class="p">:</span>
<span class="k">return</span> <span class="n">iterable</span> <span class="c1"># progress bar disabled; don't wrap the iterable
</span> <span class="k">return</span> <span class="nb">super</span><span class="p">().</span><span class="n">__new__</span><span class="p">(</span><span class="n">cls</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">iterable</span><span class="p">,</span> <span class="n">enable</span><span class="o">=</span><span class="bp">True</span><span class="p">):</span>
<span class="c1"># `enable` must be `True`
</span></code></pre></div></div>
<h4 id="metaclass">Metaclass</h4>
<p>The <code class="language-plaintext highlighter-rouge">type</code> built-in function shows the type of objects, <em>e.g.</em>,</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">type</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span> <span class="c1"># int
</span><span class="nb">type</span><span class="p">(</span><span class="mf">3.14</span><span class="p">)</span> <span class="c1"># float
</span><span class="nb">type</span><span class="p">(</span><span class="s">"wow"</span><span class="p">)</span> <span class="c1"># str
</span><span class="nb">type</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">])</span> <span class="c1"># list
</span><span class="nb">type</span><span class="p">(</span><span class="n">MissileWarningSystem</span><span class="p">(</span><span class="n">test_run</span><span class="o">=</span><span class="bp">False</span><span class="p">))</span> <span class="c1"># <class 'MissileWarningSystem'>
</span></code></pre></div></div>
<p>But what is the type of a class? Turns out, the type of a class is what we call a <strong>metaclass</strong>, and the default metaclass (and the base for all metaclasses) is <code class="language-plaintext highlighter-rouge">type</code> itself. This reveals a new level of hierarchy<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> to us:</p>
<ul>
<li>An instance is an instance of a class. The base for all classes is <code class="language-plaintext highlighter-rouge">object</code>.</li>
<li>A class is an instance of a metaclass. The base for all metaclasses is <code class="language-plaintext highlighter-rouge">type</code>.</li>
</ul>
<p>Just as classes control the behavior of instances, metaclasses control the behavior of classes. When a class is created, the metaclass’ <code class="language-plaintext highlighter-rouge">__new__</code> method is called, and then its <code class="language-plaintext highlighter-rouge">__init__</code> method. What’s different to classes is that you don’t get to customize the arguments received, it’s always like this:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Metaclass</span><span class="p">(</span><span class="nb">type</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__new__</span><span class="p">(</span><span class="n">mcs</span><span class="p">,</span> <span class="n">typename</span><span class="p">,</span> <span class="n">bases</span><span class="p">,</span> <span class="n">namespace</span><span class="p">):</span> <span class="p">...</span>
</code></pre></div></div>
<ul>
<li><code class="language-plaintext highlighter-rouge">mcs</code> is the metaclass instance, in this case, <code class="language-plaintext highlighter-rouge">Metaclass</code> or its potential sub-metaclasses (yes, inheritance works here).</li>
<li><code class="language-plaintext highlighter-rouge">typename</code> is a <code class="language-plaintext highlighter-rouge">str</code> storing the name of the class to create.</li>
<li><code class="language-plaintext highlighter-rouge">bases</code> is a tuple of classes, containing the base classes of the class to create. This is what’s in the brackets following the class name on the first line.</li>
<li><code class="language-plaintext highlighter-rouge">namespace</code> contains all the class-level attributes, including methods and class attributes.</li>
</ul>
<p>Since <code class="language-plaintext highlighter-rouge">type</code> is the default metaclass, we can use the same set of arguments with the <code class="language-plaintext highlighter-rouge">type</code> constructor to programmatically create a new class:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">MyClass</span> <span class="o">=</span> <span class="nb">type</span><span class="p">(</span><span class="s">"MyClass"</span><span class="p">,</span> <span class="p">(</span><span class="nb">object</span><span class="p">,),</span> <span class="p">{</span>
<span class="s">"__init__"</span><span class="p">:</span> <span class="k">lambda</span> <span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">:</span> <span class="nb">setattr</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="s">'x'</span><span class="p">,</span> <span class="n">x</span><span class="p">),</span>
<span class="s">"foo"</span><span class="p">:</span> <span class="k">lambda</span> <span class="bp">self</span><span class="p">:</span> <span class="k">print</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">x</span><span class="p">),</span>
<span class="p">})</span>
</code></pre></div></div>
<p>which is equivalent to the canonical class definition syntax:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">MyClass</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">x</span> <span class="o">=</span> <span class="n">x</span>
<span class="k">def</span> <span class="nf">foo</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">print</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">x</span><span class="p">)</span>
</code></pre></div></div>
<h2 id="the-namedtuple-class">The <code class="language-plaintext highlighter-rouge">NamedTuple</code> Class</h2>
<p>Now that we’re equipped with the adequate knowledge, the first thing to do is look at how <code class="language-plaintext highlighter-rouge">NamedTuple</code> is implemented:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">_make_nmtuple</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">types</span><span class="p">):</span>
<span class="n">msg</span> <span class="o">=</span> <span class="s">"NamedTuple('Name', [(f0, t0), (f1, t1), ...]); each t must be a type"</span>
<span class="n">types</span> <span class="o">=</span> <span class="p">[(</span><span class="n">n</span><span class="p">,</span> <span class="n">_type_check</span><span class="p">(</span><span class="n">t</span><span class="p">,</span> <span class="n">msg</span><span class="p">))</span> <span class="k">for</span> <span class="n">n</span><span class="p">,</span> <span class="n">t</span> <span class="ow">in</span> <span class="n">types</span><span class="p">]</span>
<span class="n">nm_tpl</span> <span class="o">=</span> <span class="n">collections</span><span class="p">.</span><span class="n">namedtuple</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="p">[</span><span class="n">n</span> <span class="k">for</span> <span class="n">n</span><span class="p">,</span> <span class="n">t</span> <span class="ow">in</span> <span class="n">types</span><span class="p">])</span>
<span class="c1"># Prior to PEP 526, only _field_types attribute was assigned.
</span> <span class="c1"># Now, both __annotations__ and _field_types are used to maintain compatibility.
</span> <span class="n">nm_tpl</span><span class="p">.</span><span class="n">__annotations__</span> <span class="o">=</span> <span class="n">nm_tpl</span><span class="p">.</span><span class="n">_field_types</span> <span class="o">=</span> <span class="n">collections</span><span class="p">.</span><span class="n">OrderedDict</span><span class="p">(</span><span class="n">types</span><span class="p">)</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">nm_tpl</span><span class="p">.</span><span class="n">__module__</span> <span class="o">=</span> <span class="n">sys</span><span class="p">.</span><span class="n">_getframe</span><span class="p">(</span><span class="mi">2</span><span class="p">).</span><span class="n">f_globals</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="s">'__name__'</span><span class="p">,</span> <span class="s">'__main__'</span><span class="p">)</span>
<span class="k">except</span> <span class="p">(</span><span class="nb">AttributeError</span><span class="p">,</span> <span class="nb">ValueError</span><span class="p">):</span>
<span class="k">pass</span>
<span class="k">return</span> <span class="n">nm_tpl</span>
<span class="k">class</span> <span class="nc">NamedTuple</span><span class="p">(</span><span class="n">metaclass</span><span class="o">=</span><span class="n">NamedTupleMeta</span><span class="p">):</span>
<span class="n">_root</span> <span class="o">=</span> <span class="bp">True</span>
<span class="k">def</span> <span class="nf">__new__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">typename</span><span class="p">,</span> <span class="n">fields</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="k">if</span> <span class="n">fields</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span>
<span class="n">fields</span> <span class="o">=</span> <span class="n">kwargs</span><span class="p">.</span><span class="n">items</span><span class="p">()</span>
<span class="k">elif</span> <span class="n">kwargs</span><span class="p">:</span>
<span class="k">raise</span> <span class="nb">TypeError</span><span class="p">(</span><span class="s">"Either list of fields or keywords"</span>
<span class="s">" can be provided to NamedTuple, not both"</span><span class="p">)</span>
<span class="k">return</span> <span class="n">_make_nmtuple</span><span class="p">(</span><span class="n">typename</span><span class="p">,</span> <span class="n">fields</span><span class="p">)</span>
</code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">__new__</code> method here is actually not of our interest — it’s just here to provide an interface similar to <code class="language-plaintext highlighter-rouge">namedtuple</code>. The <code class="language-plaintext highlighter-rouge">_make_nmtuple</code> function that’s called from <code class="language-plaintext highlighter-rouge">__new__</code> is a utility function that internally constructs a <code class="language-plaintext highlighter-rouge">collections.namedtuple</code> and adds type annotations to it. We note that what’s returned from <code class="language-plaintext highlighter-rouge">__new__</code> is not an instance of <code class="language-plaintext highlighter-rouge">NamedTuple</code>.</p>
<p>We notice that <code class="language-plaintext highlighter-rouge">NamedTuple</code> has a metaclass called <code class="language-plaintext highlighter-rouge">NamedTupleMeta</code>. The <code class="language-plaintext highlighter-rouge">_root</code> attribute here is important for the metaclass, and we’ll talk more of it later.</p>
<h2 id="the-namedtuplemeta-metaclass">The <code class="language-plaintext highlighter-rouge">NamedTupleMeta</code> Metaclass</h2>
<p>Now let’s take a look at the metaclass code:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">NamedTupleMeta</span><span class="p">(</span><span class="nb">type</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__new__</span><span class="p">(</span><span class="n">cls</span><span class="p">,</span> <span class="n">typename</span><span class="p">,</span> <span class="n">bases</span><span class="p">,</span> <span class="n">ns</span><span class="p">):</span>
<span class="k">if</span> <span class="n">ns</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="s">'_root'</span><span class="p">,</span> <span class="bp">False</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">super</span><span class="p">().</span><span class="n">__new__</span><span class="p">(</span><span class="n">cls</span><span class="p">,</span> <span class="n">typename</span><span class="p">,</span> <span class="n">bases</span><span class="p">,</span> <span class="n">ns</span><span class="p">)</span>
<span class="n">types</span> <span class="o">=</span> <span class="n">ns</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="s">'__annotations__'</span><span class="p">,</span> <span class="p">{})</span>
<span class="n">nm_tpl</span> <span class="o">=</span> <span class="n">_make_nmtuple</span><span class="p">(</span><span class="n">typename</span><span class="p">,</span> <span class="n">types</span><span class="p">.</span><span class="n">items</span><span class="p">())</span>
<span class="n">defaults</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">defaults_dict</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">for</span> <span class="n">field_name</span> <span class="ow">in</span> <span class="n">types</span><span class="p">:</span>
<span class="k">if</span> <span class="n">field_name</span> <span class="ow">in</span> <span class="n">ns</span><span class="p">:</span>
<span class="n">default_value</span> <span class="o">=</span> <span class="n">ns</span><span class="p">[</span><span class="n">field_name</span><span class="p">]</span>
<span class="n">defaults</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">default_value</span><span class="p">)</span>
<span class="n">defaults_dict</span><span class="p">[</span><span class="n">field_name</span><span class="p">]</span> <span class="o">=</span> <span class="n">default_value</span>
<span class="k">elif</span> <span class="n">defaults</span><span class="p">:</span>
<span class="k">raise</span> <span class="nb">TypeError</span><span class="p">(</span><span class="s">"Non-default namedtuple field {field_name} cannot "</span>
<span class="s">"follow default field(s) {default_names}"</span>
<span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="n">field_name</span><span class="o">=</span><span class="n">field_name</span><span class="p">,</span>
<span class="n">default_names</span><span class="o">=</span><span class="s">', '</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">defaults_dict</span><span class="p">.</span><span class="n">keys</span><span class="p">())))</span>
<span class="n">nm_tpl</span><span class="p">.</span><span class="n">__new__</span><span class="p">.</span><span class="n">__annotations__</span> <span class="o">=</span> <span class="n">collections</span><span class="p">.</span><span class="n">OrderedDict</span><span class="p">(</span><span class="n">types</span><span class="p">)</span>
<span class="n">nm_tpl</span><span class="p">.</span><span class="n">__new__</span><span class="p">.</span><span class="n">__defaults__</span> <span class="o">=</span> <span class="nb">tuple</span><span class="p">(</span><span class="n">defaults</span><span class="p">)</span>
<span class="n">nm_tpl</span><span class="p">.</span><span class="n">_field_defaults</span> <span class="o">=</span> <span class="n">defaults_dict</span>
<span class="c1"># update from user namespace without overriding special namedtuple attributes
</span> <span class="k">for</span> <span class="n">key</span> <span class="ow">in</span> <span class="n">ns</span><span class="p">:</span>
<span class="k">if</span> <span class="n">key</span> <span class="ow">in</span> <span class="n">_prohibited</span><span class="p">:</span>
<span class="k">raise</span> <span class="nb">AttributeError</span><span class="p">(</span><span class="s">"Cannot overwrite NamedTuple attribute "</span> <span class="o">+</span> <span class="n">key</span><span class="p">)</span>
<span class="k">elif</span> <span class="n">key</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">_special</span> <span class="ow">and</span> <span class="n">key</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">nm_tpl</span><span class="p">.</span><span class="n">_fields</span><span class="p">:</span>
<span class="nb">setattr</span><span class="p">(</span><span class="n">nm_tpl</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">ns</span><span class="p">[</span><span class="n">key</span><span class="p">])</span>
<span class="k">return</span> <span class="n">nm_tpl</span>
</code></pre></div></div>
<p>Now we know why there’s a <code class="language-plaintext highlighter-rouge">_root</code> attribute in <code class="language-plaintext highlighter-rouge">NamedTuple</code>. The <code class="language-plaintext highlighter-rouge">__new__</code> method of <code class="language-plaintext highlighter-rouge">NamedTupleMeta</code> is also called when <code class="language-plaintext highlighter-rouge">NamedTuple</code> is created, but we can’t create a <code class="language-plaintext highlighter-rouge">collections.namedtuple</code> for that. Thus, we check whether this special <code class="language-plaintext highlighter-rouge">_root</code> attribute exists, and skips the following procedure if it does.</p>
<p>When a subclass of <code class="language-plaintext highlighter-rouge">NamedTuple</code> is created, the <code class="language-plaintext highlighter-rouge">__new__</code> method is also called, but this time the rest of the procedure is also executed. A couple of things happen:</p>
<ul>
<li>Obtain the list of fields in the namedtuple definition. Since we provide an annotation for each field, they’re stored as a dictionary in the <code class="language-plaintext highlighter-rouge">__annotations__</code> special attribute of the class.</li>
<li>Create a namedtuple class using <code class="language-plaintext highlighter-rouge">_make_nmtuple</code>. Note that the returned namedtuple class does not support default values<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup> or contain type annotations for the <code class="language-plaintext highlighter-rouge">__init__</code> method.</li>
<li>Gather default values from <code class="language-plaintext highlighter-rouge">ns</code> (namespace) and set annotations and default argument values for the <code class="language-plaintext highlighter-rouge">__new__</code> method of the namedtuple class.</li>
<li>Add other attributes and methods to the created namedtuple class, so additional methods you defined in the <code class="language-plaintext highlighter-rouge">NamedTuple</code> subclass can also be called from the returned namedtuple class.</li>
</ul>
<h2 id="inheritance-with-a-single-base-class">Inheritance with a Single Base Class</h2>
<p>Let’s first think about what we’re trying to accomplish by inheritance:</p>
<ul>
<li>Automatically generate a constructor that sets all fields, including those from the base class.</li>
<li>Access methods, attributes, and properties from the base class.</li>
<li>Behave correctly in <code class="language-plaintext highlighter-rouge">isinstance</code> and <code class="language-plaintext highlighter-rouge">issubclass</code> checks.</li>
</ul>
<p>If we don’t care about the latter two, the solution is pretty straightforward: we just gather the fields defined in the derived and base classes, and ask <code class="language-plaintext highlighter-rouge">NamedTupleMeta</code> to create a <code class="language-plaintext highlighter-rouge">NamedTuple</code> based on these fields.</p>
<p>Let’s make a first attempt at implementing this. Out of personal preference, I’m going to call our enhanced namedtuple <code class="language-plaintext highlighter-rouge">Options</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">OptionsMeta</span><span class="p">(</span><span class="n">typing</span><span class="p">.</span><span class="n">NamedTupleMeta</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__new__</span><span class="p">(</span><span class="n">mcs</span><span class="p">,</span> <span class="n">typename</span><span class="p">,</span> <span class="n">bases</span><span class="p">,</span> <span class="n">namespace</span><span class="p">):</span>
<span class="k">if</span> <span class="n">namespace</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="s">'_root'</span><span class="p">,</span> <span class="bp">False</span><span class="p">):</span>
<span class="c1"># The created class is `Options`, skip.
</span> <span class="k">return</span> <span class="nb">super</span><span class="p">().</span><span class="n">__new__</span><span class="p">(</span><span class="n">mcs</span><span class="p">,</span> <span class="n">typename</span><span class="p">,</span> <span class="n">bases</span><span class="p">,</span> <span class="n">namespace</span><span class="p">)</span>
<span class="c1"># Gather fields from annotations of current class and base class.
</span> <span class="n">fields</span> <span class="o">=</span> <span class="n">collections</span><span class="p">.</span><span class="n">OrderedDict</span><span class="p">()</span>
<span class="n">cur_fields</span> <span class="o">=</span> <span class="n">namespace</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="s">'__annotations__'</span><span class="p">,</span> <span class="p">{})</span>
<span class="c1"># We only deal with single inheritance for now.
</span> <span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="n">bases</span><span class="p">)</span> <span class="o">==</span> <span class="mi">1</span>
<span class="n">base</span> <span class="o">=</span> <span class="n">bases</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">if</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">base</span><span class="p">,</span> <span class="s">'_fields'</span><span class="p">):</span>
<span class="c1"># Base class is a concrete namedtuple.
</span> <span class="k">for</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">base</span><span class="p">.</span><span class="n">_fields</span><span class="p">:</span>
<span class="c1"># Make sure not to overwrite redefined fields.
</span> <span class="k">if</span> <span class="n">name</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">cur_fields</span><span class="p">:</span>
<span class="n">fields</span><span class="p">[</span><span class="n">name</span><span class="p">]</span> <span class="o">=</span> <span class="n">base</span><span class="p">.</span><span class="n">__annotations__</span><span class="p">[</span><span class="n">name</span><span class="p">]</span>
<span class="k">if</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">base</span><span class="p">.</span><span class="n">_field_defaults</span><span class="p">:</span>
<span class="n">namespace</span><span class="p">.</span><span class="n">setdefault</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">base</span><span class="p">.</span><span class="n">_field_defaults</span><span class="p">[</span><span class="n">name</span><span class="p">])</span>
<span class="n">fields</span><span class="p">.</span><span class="n">update</span><span class="p">(</span><span class="n">cur_fields</span><span class="p">)</span>
<span class="n">namespace</span><span class="p">[</span><span class="s">'__annotations__'</span><span class="p">]</span> <span class="o">=</span> <span class="n">fields</span>
<span class="c1"># Let `NamedTupleMeta` create a annotated `namedtuple` for us.
</span> <span class="c1"># Note that `bases` is not used there so we just set it to `None`.
</span> <span class="n">nm_tpl</span> <span class="o">=</span> <span class="nb">super</span><span class="p">().</span><span class="n">__new__</span><span class="p">(</span><span class="n">mcs</span><span class="p">,</span> <span class="n">typename</span><span class="p">,</span> <span class="bp">None</span><span class="p">,</span> <span class="n">namespace</span><span class="p">)</span>
<span class="k">return</span> <span class="n">nm_tpl</span>
<span class="k">class</span> <span class="nc">Options</span><span class="p">(</span><span class="n">metaclass</span><span class="o">=</span><span class="n">OptionsMeta</span><span class="p">):</span>
<span class="n">_root</span> <span class="o">=</span> <span class="bp">True</span>
<span class="k">def</span> <span class="nf">__new__</span><span class="p">(</span><span class="n">cls</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="k">if</span> <span class="n">cls</span> <span class="ow">is</span> <span class="n">Options</span><span class="p">:</span>
<span class="c1"># Prevent instantiation of `Options` class.
</span> <span class="k">raise</span> <span class="nb">TypeError</span><span class="p">(</span><span class="s">"Type Options cannot be instantiated; "</span>
<span class="s">"it can be used only as a base class"</span><span class="p">)</span>
<span class="k">return</span> <span class="nb">super</span><span class="p">().</span><span class="n">__new__</span><span class="p">(</span><span class="n">cls</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
</code></pre></div></div>
<p>A few things to notice here:</p>
<ul>
<li>We define a new metaclass that inherits <code class="language-plaintext highlighter-rouge">NamedTupleMeta</code> so we could call its <code class="language-plaintext highlighter-rouge">__new__</code> method that takes care of everything for us. The <code class="language-plaintext highlighter-rouge">Options</code> class doesn’t really do anything, and for simplicity, we forbid directly instantiating it like we could for <code class="language-plaintext highlighter-rouge">NamedTuple</code>.</li>
<li><code class="language-plaintext highlighter-rouge">annotations</code> must be an <code class="language-plaintext highlighter-rouge">OrderedDict</code> because the ordering of fields matter — the order determines the index of the field in the underlying tuple object. Here we put base class fields in front of derived ones, but leave out ones that are redefined.</li>
<li>A limitation of this method is that the base class cannot contain fields with default values, unless: <em>a)</em> they’re redefined in the base class, or <em>b)</em> every field in the derived class also comes with a default value.</li>
</ul>
<p>If you understood what we’ve learnt so far, the implementation is actually pretty straightforward. However, we encounter problems when we try to use it in practice:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">In</span> <span class="p">[</span><span class="mi">1</span><span class="p">]:</span> <span class="k">class</span> <span class="nc">BaseOptions</span><span class="p">(</span><span class="n">Options</span><span class="p">):</span>
<span class="p">...:</span> <span class="n">a</span><span class="p">:</span> <span class="nb">int</span>
<span class="p">...:</span> <span class="n">b</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">2</span>
<span class="n">In</span> <span class="p">[</span><span class="mi">2</span><span class="p">]:</span> <span class="k">class</span> <span class="nc">DerivedOptions</span><span class="p">(</span><span class="n">BaseOptions</span><span class="p">):</span>
<span class="p">...:</span> <span class="n">b</span><span class="p">:</span> <span class="nb">float</span> <span class="o">=</span> <span class="mf">0.5</span>
<span class="p">...:</span> <span class="n">c</span><span class="p">:</span> <span class="nb">float</span> <span class="o">=</span> <span class="mf">1.0</span>
<span class="n">In</span> <span class="p">[</span><span class="mi">3</span><span class="p">]:</span> <span class="n">BaseOptions</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="n">Out</span><span class="p">[</span><span class="mi">3</span><span class="p">]:</span> <span class="n">BaseOptions</span><span class="p">(</span><span class="n">a</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">b</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
<span class="n">In</span> <span class="p">[</span><span class="mi">4</span><span class="p">]:</span> <span class="n">DerivedOptions</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
<span class="n">Out</span><span class="p">[</span><span class="mi">4</span><span class="p">]:</span> <span class="n">BaseOptions</span><span class="p">(</span><span class="n">a</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">b</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
<span class="n">In</span> <span class="p">[</span><span class="mi">5</span><span class="p">]:</span> <span class="n">DerivedOptions</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">,</span> <span class="mf">0.4</span><span class="p">)</span>
<span class="o">---------------------------------------------------------------------------</span>
<span class="nb">TypeError</span> <span class="n">Traceback</span> <span class="p">(</span><span class="n">most</span> <span class="n">recent</span> <span class="n">call</span> <span class="n">last</span><span class="p">)</span>
<span class="o"><</span><span class="n">ipython</span><span class="o">-</span><span class="nb">input</span><span class="o">-</span><span class="mi">5</span><span class="o">-</span><span class="n">f4db6b51352e</span><span class="o">></span> <span class="ow">in</span> <span class="o"><</span><span class="n">module</span><span class="o">></span>
<span class="o">----></span> <span class="mi">1</span> <span class="n">DerivedOptions</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">,</span> <span class="mf">0.4</span><span class="p">)</span>
<span class="nb">TypeError</span><span class="p">:</span> <span class="n">__new__</span><span class="p">()</span> <span class="n">takes</span> <span class="k">from</span> <span class="mi">2</span> <span class="n">to</span> <span class="mi">3</span> <span class="n">positional</span> <span class="n">arguments</span> <span class="n">but</span> <span class="mi">4</span> <span class="n">were</span> <span class="n">given</span>
</code></pre></div></div>
<p>The error message may seem a bit cryptic, but what happens here is that <code class="language-plaintext highlighter-rouge">DerivedOptions</code> became an alias for <code class="language-plaintext highlighter-rouge">BaseOptions</code>. A deeper investigation shows that <code class="language-plaintext highlighter-rouge">OptionsMeta.__new__</code> is not even called when <code class="language-plaintext highlighter-rouge">DerivedOptions</code> is created. How come?</p>
<p>The truth is, the <code class="language-plaintext highlighter-rouge">nm_tpl</code> returned from the constructor of <code class="language-plaintext highlighter-rouge">NamedTuple</code> is of type <code class="language-plaintext highlighter-rouge">collections.namedtuple</code>, and of course, the metaclass of which is not <code class="language-plaintext highlighter-rouge">OptionsMeta</code>. When inheriting the <code class="language-plaintext highlighter-rouge">nm_tpl</code> class, we’re actually inheriting a namedtuple, not an <code class="language-plaintext highlighter-rouge">Options</code> subclass.</p>
<p>Thus, we must create a new class using the namespace of <code class="language-plaintext highlighter-rouge">nm_tpl</code>, and we do so by directly invoking the <code class="language-plaintext highlighter-rouge">__new__</code> method of <code class="language-plaintext highlighter-rouge">type</code>, which is <code class="language-plaintext highlighter-rouge">NamedTupleMeta</code>’s super class:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">return</span> <span class="nb">type</span><span class="p">.</span><span class="n">__new__</span><span class="p">(</span><span class="n">mcs</span><span class="p">,</span> <span class="n">typename</span><span class="p">,</span> <span class="n">bases</span> <span class="o">+</span> <span class="n">nm_tpl</span><span class="p">.</span><span class="n">__bases__</span><span class="p">,</span> <span class="n">nm_tpl</span><span class="p">.</span><span class="n">__dict__</span><span class="p">.</span><span class="n">copy</span><span class="p">())</span>
</code></pre></div></div>
<p>To explain this method call:</p>
<ul>
<li>
<p><code class="language-plaintext highlighter-rouge">type.__new__</code> will create a class with metaclass set to <code class="language-plaintext highlighter-rouge">mcs</code> (which is <code class="language-plaintext highlighter-rouge">OptionsMeta</code> in this case).</p>
</li>
<li>
<p>An added benefit here is that we get to set the base class of the created class, in this case, <code class="language-plaintext highlighter-rouge">BaseOptions</code> (from <code class="language-plaintext highlighter-rouge">bases</code>) and <code class="language-plaintext highlighter-rouge">tuple</code> (from <code class="language-plaintext highlighter-rouge">nm_tpl.__bases__</code>). Note that it’s essential to keep <code class="language-plaintext highlighter-rouge">tuple</code> a base class, because <code class="language-plaintext highlighter-rouge">tuple.__new__</code> is called when we create an instance of this namedtuple, and that requires the class to be a subclass of <code class="language-plaintext highlighter-rouge">tuple</code>. If we don’t do that, we get an exception:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">TypeError</span><span class="p">:</span> <span class="nb">tuple</span><span class="p">.</span><span class="n">__new__</span><span class="p">(</span><span class="n">DerivedOptions</span><span class="p">):</span> <span class="n">DerivedOptions</span> <span class="ow">is</span> <span class="ow">not</span> <span class="n">a</span> <span class="n">subtype</span> <span class="n">of</span> <span class="nb">tuple</span>
</code></pre></div> </div>
</li>
<li>
<p>The <code class="language-plaintext highlighter-rouge">__dict__</code> (namespace) of <code class="language-plaintext highlighter-rouge">nm_tpl</code> is used as is. We do a copy because <code class="language-plaintext highlighter-rouge">type.__new__</code> requires this namespace dictionary to be writable (of type <code class="language-plaintext highlighter-rouge">dict</code>), but <code class="language-plaintext highlighter-rouge">__dict__</code> is not (of type <code class="language-plaintext highlighter-rouge">mappingproxy</code>).</p>
</li>
</ul>
<p>Since we were able to keep the actual base class (<code class="language-plaintext highlighter-rouge">BaseOptions</code>) in the MRO<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup> of the derived class, Python automatically takes care of the latter two functionalities we wanted to accomplish by inheritance. We can easily verify this:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">In</span> <span class="p">[</span><span class="mi">1</span><span class="p">]:</span> <span class="k">class</span> <span class="nc">BaseOptions</span><span class="p">(</span><span class="n">Options</span><span class="p">):</span>
<span class="p">...:</span> <span class="n">a</span><span class="p">:</span> <span class="nb">int</span>
<span class="p">...:</span> <span class="o">@</span><span class="nb">property</span>
<span class="p">...:</span> <span class="k">def</span> <span class="nf">foo</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="p">...:</span> <span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">a</span>
<span class="n">In</span> <span class="p">[</span><span class="mi">2</span><span class="p">]:</span> <span class="k">class</span> <span class="nc">DerivedOptions</span><span class="p">(</span><span class="n">BaseOptions</span><span class="p">):</span>
<span class="p">...:</span> <span class="n">b</span> <span class="p">:</span><span class="nb">int</span>
<span class="n">In</span> <span class="p">[</span><span class="mi">3</span><span class="p">]:</span> <span class="n">x</span> <span class="o">=</span> <span class="n">DerivedOptions</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
<span class="n">In</span> <span class="p">[</span><span class="mi">4</span><span class="p">]:</span> <span class="n">x</span><span class="p">.</span><span class="n">foo</span>
<span class="n">Out</span><span class="p">[</span><span class="mi">4</span><span class="p">]:</span> <span class="mi">1</span>
<span class="n">In</span> <span class="p">[</span><span class="mi">5</span><span class="p">]:</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">BaseOptions</span><span class="p">)</span>
<span class="n">Out</span><span class="p">[</span><span class="mi">5</span><span class="p">]:</span> <span class="bp">True</span>
</code></pre></div></div>
<h2 id="multiple-inheritance">Multiple Inheritance</h2>
<p>The method above also fits for multiple inheritance — we just need to gather fields from all the base classes. However, with multiple bases come other problems that did not exist in the single inheritance case:</p>
<ul>
<li>What if multiple base classes define the same field? Since we’re exploring uncharted waters here, we get to define the behavior, but it has to be intuitive. My opinion is that base classes must not have overlapping fields, unless they’re redefined in the derived class. This guarantees that there aren’t unexpected overwrites of fields by different orderings of the base classes. But of course, if you implement it, you’re free to choose whatever strategy that pleases you.</li>
<li>What if a base class is not a subclass of <code class="language-plaintext highlighter-rouge">Options</code>? We should still keep it <code class="language-plaintext highlighter-rouge">bases</code> so it’s kept in the MRO<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>, and instances could access its methods.</li>
</ul>
<p>Now, let’s try implementing this <code class="language-plaintext highlighter-rouge">OptionsMeta</code> metaclass that supports multiple inheritance:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">OptionsMeta</span><span class="p">(</span><span class="n">typing</span><span class="p">.</span><span class="n">NamedTupleMeta</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__new__</span><span class="p">(</span><span class="n">mcs</span><span class="p">,</span> <span class="n">typename</span><span class="p">,</span> <span class="n">bases</span><span class="p">,</span> <span class="n">namespace</span><span class="p">):</span>
<span class="k">if</span> <span class="n">namespace</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="s">'_root'</span><span class="p">,</span> <span class="bp">False</span><span class="p">):</span>
<span class="c1"># The created class is `Options`, skip.
</span> <span class="k">return</span> <span class="nb">super</span><span class="p">().</span><span class="n">__new__</span><span class="p">(</span><span class="n">mcs</span><span class="p">,</span> <span class="n">typename</span><span class="p">,</span> <span class="n">bases</span><span class="p">,</span> <span class="n">namespace</span><span class="p">)</span>
<span class="c1"># Gather fields from annotations of current class and base classes.
</span> <span class="n">cur_fields</span> <span class="o">=</span> <span class="n">namespace</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="s">'__annotations__'</span><span class="p">,</span> <span class="p">{})</span>
<span class="n">fields</span> <span class="o">=</span> <span class="n">collections</span><span class="p">.</span><span class="n">OrderedDict</span><span class="p">()</span>
<span class="n">field_sources</span> <span class="o">=</span> <span class="p">{}</span> <span class="c1"># which base class does the name came from
</span> <span class="n">field_defaults</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">for</span> <span class="n">base</span> <span class="ow">in</span> <span class="n">bases</span><span class="p">:</span>
<span class="k">if</span> <span class="nb">issubclass</span><span class="p">(</span><span class="n">base</span><span class="p">,</span> <span class="n">Options</span><span class="p">)</span> <span class="ow">and</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">base</span><span class="p">,</span> <span class="s">'_fields'</span><span class="p">):</span>
<span class="c1"># Base class is a concrete subclass of `Options`.
</span> <span class="k">for</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">base</span><span class="p">.</span><span class="n">_fields</span><span class="p">:</span>
<span class="k">if</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">cur_fields</span><span class="p">:</span>
<span class="c1"># Make sure not to overwrite redefined fields.
</span> <span class="k">continue</span>
<span class="k">if</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">fields</span><span class="p">:</span>
<span class="c1"># Overlapping field that is not redefined.
</span> <span class="k">raise</span> <span class="nb">TypeError</span><span class="p">(</span>
<span class="sa">f</span><span class="s">"Base class </span><span class="si">{</span><span class="n">base</span><span class="si">}</span><span class="s"> contains field </span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s">, which "</span>
<span class="sa">f</span><span class="s">"is defined in other base class "</span>
<span class="sa">f</span><span class="s">"</span><span class="si">{</span><span class="n">field_sources</span><span class="p">[</span><span class="n">name</span><span class="p">]</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
<span class="n">fields</span><span class="p">[</span><span class="n">name</span><span class="p">]</span> <span class="o">=</span> <span class="n">base</span><span class="p">.</span><span class="n">__annotations__</span><span class="p">[</span><span class="n">name</span><span class="p">]</span>
<span class="n">field_sources</span><span class="p">[</span><span class="n">name</span><span class="p">]</span> <span class="o">=</span> <span class="n">base</span>
<span class="k">if</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">base</span><span class="p">.</span><span class="n">_field_defaults</span><span class="p">:</span>
<span class="n">field_defaults</span><span class="p">[</span><span class="n">name</span><span class="p">]</span> <span class="o">=</span> <span class="n">base</span><span class="p">.</span><span class="n">_field_defaults</span><span class="p">[</span><span class="n">name</span><span class="p">]</span>
<span class="n">fields</span><span class="p">.</span><span class="n">update</span><span class="p">(</span><span class="n">cur_fields</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">fields</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="k">raise</span> <span class="nb">ValueError</span><span class="p">(</span><span class="s">"Options class must contain at least one field"</span><span class="p">)</span>
<span class="k">for</span> <span class="n">name</span><span class="p">,</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">field_defaults</span><span class="p">.</span><span class="n">items</span><span class="p">():</span>
<span class="n">namespace</span><span class="p">.</span><span class="n">setdefault</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span>
<span class="n">namespace</span><span class="p">[</span><span class="s">'__annotations__'</span><span class="p">]</span> <span class="o">=</span> <span class="n">fields</span>
<span class="c1"># Let `NamedTupleMeta` create a annotated `namedtuple` for us.
</span> <span class="c1"># Note that `bases` is not used here so we just set it to `None`.
</span> <span class="n">nm_tpl</span> <span class="o">=</span> <span class="nb">super</span><span class="p">().</span><span class="n">__new__</span><span class="p">(</span><span class="n">mcs</span><span class="p">,</span> <span class="n">typename</span><span class="p">,</span> <span class="bp">None</span><span class="p">,</span> <span class="n">namespace</span><span class="p">)</span>
<span class="c1"># Wrap the return type in `OptionsMeta` so it can be subclassed.
</span> <span class="c1"># Also keep base classes of the `namedtuple` (i.e., the `tuple` class),
</span> <span class="c1"># so we can call `tuple.__new__`.
</span> <span class="n">bases</span> <span class="o">=</span> <span class="n">bases</span> <span class="o">+</span> <span class="n">nm_tpl</span><span class="p">.</span><span class="n">__bases__</span>
<span class="k">return</span> <span class="nb">type</span><span class="p">.</span><span class="n">__new__</span><span class="p">(</span><span class="n">mcs</span><span class="p">,</span> <span class="n">typename</span><span class="p">,</span> <span class="n">bases</span><span class="p">,</span> <span class="n">nm_tpl</span><span class="p">.</span><span class="n">__dict__</span><span class="p">.</span><span class="n">copy</span><span class="p">())</span>
</code></pre></div></div>
<p>This works great when we inherit from non-<code class="language-plaintext highlighter-rouge">Options</code> classes, as we can see from these examples:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">In</span> <span class="p">[</span><span class="mi">1</span><span class="p">]:</span> <span class="k">class</span> <span class="nc">BaseOptions</span><span class="p">(</span><span class="n">Options</span><span class="p">):</span>
<span class="p">...:</span> <span class="n">a</span><span class="p">:</span> <span class="nb">int</span>
<span class="p">...:</span> <span class="o">@</span><span class="nb">property</span>
<span class="p">...:</span> <span class="k">def</span> <span class="nf">foo</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="p">...:</span> <span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">a</span>
<span class="n">In</span> <span class="p">[</span><span class="mi">2</span><span class="p">]:</span> <span class="k">class</span> <span class="nc">Mixin</span><span class="p">:</span>
<span class="p">...:</span> <span class="k">def</span> <span class="nf">bar</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="p">...:</span> <span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">a</span> <span class="o">+</span> <span class="bp">self</span><span class="p">.</span><span class="n">b</span>
<span class="n">In</span> <span class="p">[</span><span class="mi">3</span><span class="p">]:</span> <span class="k">class</span> <span class="nc">DerivedOptions</span><span class="p">(</span><span class="n">BaseOptions</span><span class="p">,</span> <span class="n">Mixin</span><span class="p">):</span>
<span class="p">...:</span> <span class="n">b</span> <span class="p">:</span><span class="nb">int</span>
<span class="n">In</span> <span class="p">[</span><span class="mi">4</span><span class="p">]:</span> <span class="n">x</span> <span class="o">=</span> <span class="n">DerivedOptions</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
<span class="n">In</span> <span class="p">[</span><span class="mi">5</span><span class="p">]:</span> <span class="n">x</span><span class="p">.</span><span class="n">foo</span>
<span class="n">Out</span><span class="p">[</span><span class="mi">5</span><span class="p">]:</span> <span class="mi">1</span>
<span class="n">In</span> <span class="p">[</span><span class="mi">6</span><span class="p">]:</span> <span class="n">x</span><span class="p">.</span><span class="n">bar</span><span class="p">()</span>
<span class="n">Out</span><span class="p">[</span><span class="mi">6</span><span class="p">]:</span> <span class="mi">3</span>
<span class="n">In</span> <span class="p">[</span><span class="mi">7</span><span class="p">]:</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">BaseOptions</span><span class="p">)</span>
<span class="n">Out</span><span class="p">[</span><span class="mi">7</span><span class="p">]:</span> <span class="bp">True</span>
<span class="n">In</span> <span class="p">[</span><span class="mi">8</span><span class="p">]:</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">Mixin</span><span class="p">)</span>
<span class="n">Out</span><span class="p">[</span><span class="mi">8</span><span class="p">]:</span> <span class="bp">True</span>
</code></pre></div></div>
<p>But when we try to inherit from two <code class="language-plaintext highlighter-rouge">Options</code> subclasses, something weird happens:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">In</span> <span class="p">[</span><span class="mi">1</span><span class="p">]:</span> <span class="k">class</span> <span class="nc">OptionsA</span><span class="p">(</span><span class="n">Options</span><span class="p">):</span>
<span class="p">...:</span> <span class="n">a</span><span class="p">:</span> <span class="nb">int</span>
<span class="p">...:</span> <span class="n">b</span><span class="p">:</span> <span class="nb">int</span>
<span class="n">In</span> <span class="p">[</span><span class="mi">2</span><span class="p">]:</span> <span class="k">class</span> <span class="nc">OptionsB</span><span class="p">(</span><span class="n">Options</span><span class="p">):</span>
<span class="p">...:</span> <span class="n">c</span><span class="p">:</span> <span class="nb">int</span>
<span class="p">...:</span> <span class="n">d</span><span class="p">:</span> <span class="nb">int</span>
<span class="n">In</span> <span class="p">[</span><span class="mi">3</span><span class="p">]:</span> <span class="k">class</span> <span class="nc">MergedOptions</span><span class="p">(</span><span class="n">OptionsA</span><span class="p">,</span> <span class="n">OptionsB</span><span class="p">):</span>
<span class="p">...:</span> <span class="k">pass</span>
<span class="o">---------------------------------------------------------------------------</span>
<span class="nb">TypeError</span> <span class="n">Traceback</span> <span class="p">(</span><span class="n">most</span> <span class="n">recent</span> <span class="n">call</span> <span class="n">last</span><span class="p">)</span>
<span class="o"><</span><span class="n">ipython</span><span class="o">-</span><span class="nb">input</span><span class="o">-</span><span class="mi">3</span><span class="o">-</span><span class="mi">51</span><span class="n">d384fffb01</span><span class="o">></span> <span class="ow">in</span> <span class="o"><</span><span class="n">module</span><span class="o">></span>
<span class="o">----></span> <span class="mi">1</span> <span class="k">class</span> <span class="nc">MergedOptions</span><span class="p">(</span><span class="n">OptionsA</span><span class="p">,</span> <span class="n">OptionsB</span><span class="p">):</span>
<span class="mi">2</span> <span class="k">pass</span>
<span class="mi">3</span>
<span class="o"><</span><span class="n">ipython</span><span class="o">-</span><span class="nb">input</span><span class="o">-</span><span class="mi">3</span><span class="o">-</span><span class="mi">5</span><span class="n">ff213f4a3b5</span><span class="o">></span> <span class="ow">in</span> <span class="n">__new__</span><span class="p">(</span><span class="n">mcs</span><span class="p">,</span> <span class="n">typename</span><span class="p">,</span> <span class="n">bases</span><span class="p">,</span> <span class="n">namespace</span><span class="p">)</span>
<span class="mi">43</span> <span class="c1"># so we can call `tuple.__new__`.
</span> <span class="mi">44</span> <span class="n">bases</span> <span class="o">=</span> <span class="n">bases</span> <span class="o">+</span> <span class="n">nm_tpl</span><span class="p">.</span><span class="n">__bases__</span>
<span class="o">---></span> <span class="mi">45</span> <span class="k">return</span> <span class="nb">type</span><span class="p">.</span><span class="n">__new__</span><span class="p">(</span><span class="n">mcs</span><span class="p">,</span> <span class="n">typename</span><span class="p">,</span> <span class="n">bases</span><span class="p">,</span> <span class="n">nm_tpl</span><span class="p">.</span><span class="n">__dict__</span><span class="p">.</span><span class="n">copy</span><span class="p">())</span>
<span class="mi">46</span>
<span class="nb">TypeError</span><span class="p">:</span> <span class="n">multiple</span> <span class="n">bases</span> <span class="n">have</span> <span class="n">instance</span> <span class="n">lay</span><span class="o">-</span><span class="n">out</span> <span class="n">conflict</span>
</code></pre></div></div>
<p>Now this is something new, an error message I’ve never seen before. It turns out that I cannot inherit from multiple built-in classes that don’t go together at the C level<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>, in this case, two different subclasses of <code class="language-plaintext highlighter-rouge">tuple</code>. I can see why this is a problem: such built-in types are implemented in C, with fixed memory layouts and implementations for special methods.</p>
<p>If we can’t create the type with our bases, how about modifying the bases after creation? It turns out you can’t do that either:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o"><</span><span class="n">ipython</span><span class="o">-</span><span class="nb">input</span><span class="o">-</span><span class="mi">118</span><span class="o">-</span><span class="n">d6cd3ab74257</span><span class="o">></span> <span class="ow">in</span> <span class="n">__new__</span><span class="p">(</span><span class="n">mcs</span><span class="p">,</span> <span class="n">typename</span><span class="p">,</span> <span class="n">bases</span><span class="p">,</span> <span class="n">namespace</span><span class="p">)</span>
<span class="mi">43</span> <span class="c1"># so we can call `tuple.__new__`.
</span> <span class="mi">44</span> <span class="n">options_type</span> <span class="o">=</span> <span class="nb">type</span><span class="p">.</span><span class="n">__new__</span><span class="p">(</span><span class="n">mcs</span><span class="p">,</span> <span class="n">typename</span><span class="p">,</span> <span class="n">nm_tpl</span><span class="p">.</span><span class="n">__bases__</span><span class="p">,</span> <span class="n">nm_tpl</span><span class="p">.</span><span class="n">__dict__</span><span class="p">.</span><span class="n">copy</span><span class="p">())</span>
<span class="o">---></span> <span class="mi">45</span> <span class="n">options_type</span><span class="p">.</span><span class="n">__bases__</span> <span class="o">=</span> <span class="n">bases</span>
<span class="mi">46</span> <span class="k">return</span> <span class="n">options_type</span>
<span class="mi">47</span>
<span class="nb">TypeError</span><span class="p">:</span> <span class="n">__bases__</span> <span class="n">assignment</span><span class="p">:</span> <span class="s">'Options'</span> <span class="nb">object</span> <span class="n">layout</span> <span class="n">differs</span> <span class="k">from</span> <span class="s">'tuple'</span>
</code></pre></div></div>
<p>It seems that we’re out of luck. But actually, here’s some less known evil: you can <a href="http://stupidpythonideas.blogspot.com/2015/12/can-you-customize-method-resolution.html">override the creation of the MRO</a> in the metaclass! But the crazy thing here is, we need to implement the C3 linearization algorithm ourselves. Luckily, it’s a simple algorithm:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">OptionsMeta</span><span class="p">(</span><span class="n">typing</span><span class="p">.</span><span class="n">NamedTupleMeta</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__new__</span><span class="p">(</span><span class="n">mcs</span><span class="p">,</span> <span class="n">typename</span><span class="p">,</span> <span class="n">bases</span><span class="p">,</span> <span class="n">namespace</span><span class="p">):</span>
<span class="p">...</span> <span class="c1"># omitted here
</span> <span class="n">new_namespace</span> <span class="o">=</span> <span class="n">nm_tpl</span><span class="p">.</span><span class="n">__dict__</span><span class="p">.</span><span class="n">copy</span><span class="p">()</span>
<span class="n">new_namespace</span><span class="p">[</span><span class="s">'_bases'</span><span class="p">]</span> <span class="o">=</span> <span class="n">bases</span>
<span class="n">options_type</span> <span class="o">=</span> <span class="nb">type</span><span class="p">.</span><span class="n">__new__</span><span class="p">(</span><span class="n">mcs</span><span class="p">,</span> <span class="n">typename</span><span class="p">,</span> <span class="n">nm_tpl</span><span class="p">.</span><span class="n">__bases__</span><span class="p">,</span> <span class="n">new_namespace</span><span class="p">)</span>
<span class="c1"># Writing to `__bases__` triggers an MRO update. This has to be done after
</span> <span class="c1"># class creation because otherwise we can't access `_bases`.
</span> <span class="n">options_type</span><span class="p">.</span><span class="n">__bases__</span> <span class="o">=</span> <span class="nb">tuple</span><span class="p">(</span><span class="n">nm_tpl</span><span class="p">.</span><span class="n">__bases__</span><span class="p">)</span>
<span class="k">return</span> <span class="n">options_type</span>
<span class="k">def</span> <span class="nf">mro</span><span class="p">(</span><span class="n">cls</span><span class="p">):</span>
<span class="n">default_mro</span> <span class="o">=</span> <span class="nb">super</span><span class="p">().</span><span class="n">mro</span><span class="p">()</span>
<span class="c1"># `Options` does not define `_bases`, so we don't do anything about it.
</span> <span class="k">if</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">cls</span><span class="p">,</span> <span class="s">'_bases'</span><span class="p">):</span>
<span class="c1"># `default_mro` should be `[cls, tuple, object]`.
</span> <span class="c1"># `c3merge` and `c3mro` are implementations of the C3 linearization
</span> <span class="c1"># algorithm, which unluckily aren't provided as APIs.
</span> <span class="k">return</span> <span class="n">c3merge</span><span class="p">([</span>
<span class="n">default_mro</span><span class="p">[:</span><span class="mi">1</span><span class="p">],</span>
<span class="o">*</span><span class="p">[</span><span class="n">base</span><span class="p">.</span><span class="n">__mro__</span> <span class="k">for</span> <span class="n">base</span> <span class="ow">in</span> <span class="n">cls</span><span class="p">.</span><span class="n">_bases</span><span class="p">],</span>
<span class="n">default_mro</span><span class="p">[</span><span class="mi">1</span><span class="p">:]])</span>
<span class="k">return</span> <span class="n">default_mro</span>
<span class="k">def</span> <span class="nf">c3merge</span><span class="p">(</span><span class="n">sequences</span><span class="p">):</span>
<span class="sa">r</span><span class="s">"""Adapted from https://www.python.org/download/releases/2.3/mro/"""</span>
<span class="c1"># Make sure we don't actually mutate anything we are getting as input.
</span> <span class="n">sequences</span> <span class="o">=</span> <span class="p">[</span><span class="nb">list</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">sequences</span><span class="p">]</span>
<span class="n">result</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
<span class="c1"># Clear out blank sequences.
</span> <span class="n">sequences</span> <span class="o">=</span> <span class="p">[</span><span class="n">x</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">sequences</span> <span class="k">if</span> <span class="n">x</span><span class="p">]</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">sequences</span><span class="p">:</span>
<span class="k">return</span> <span class="n">result</span>
<span class="c1"># Find the first clean head.
</span> <span class="k">for</span> <span class="n">seq</span> <span class="ow">in</span> <span class="n">sequences</span><span class="p">:</span>
<span class="n">head</span> <span class="o">=</span> <span class="n">seq</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="c1"># If this is not a bad head (i.e., not in any other sequence)
</span> <span class="k">if</span> <span class="ow">not</span> <span class="nb">any</span><span class="p">(</span><span class="n">head</span> <span class="ow">in</span> <span class="n">s</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span> <span class="k">for</span> <span class="n">s</span> <span class="ow">in</span> <span class="n">sequences</span><span class="p">):</span>
<span class="k">break</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">Error</span><span class="p">(</span><span class="s">"inconsistent hierarchy"</span><span class="p">)</span>
<span class="c1"># Move the head from the front of all sequences to the end of results.
</span> <span class="n">result</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">head</span><span class="p">)</span>
<span class="k">for</span> <span class="n">seq</span> <span class="ow">in</span> <span class="n">sequences</span><span class="p">:</span>
<span class="k">if</span> <span class="n">seq</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="n">head</span><span class="p">:</span>
<span class="k">del</span> <span class="n">seq</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">return</span> <span class="n">result</span>
</code></pre></div></div>
<p>Of course, this complex method is when you need to support every general case. Normally you wouldn’t have multiple layers of hierarchy for namedtuples, nor will you mix-in a bunch of other classes such that you need to be careful about the MRO.</p>
<h2 id="arbitrary-order-of-fields">Arbitrary Order of Fields</h2>
<p>Now, to the final goal which you’ve probably forgotten: removing the constraint on ordering for fields with default values. This is an inherent limit in Python, because method arguments with default values are treated as keyword arguments (captured by <code class="language-plaintext highlighter-rouge">**kwargs</code>), and have to be declared after positional arguments (captured by <code class="language-plaintext highlighter-rouge">*args</code>).</p>
<p>To workaround this, we can declare all arguments of the constructor as keyword-only arguments. For me, not allowing positional arguments is actually better because the order of the fields can be ambiguous when you have multiple base classes.</p>
<p>How can we programmatically create a method with custom arguments? Let’s dive into the code for <code class="language-plaintext highlighter-rouge">collections.namedtuple</code>, where the magic happens. The code is pretty long so I’m just going to show the relevant parts here. Turns out magic doesn’t exist, everything’s just a hack:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="p">...</span> <span class="c1"># omitted
</span> <span class="n">arg_list</span> <span class="o">=</span> <span class="nb">repr</span><span class="p">(</span><span class="n">field_names</span><span class="p">).</span><span class="n">replace</span><span class="p">(</span><span class="s">"'"</span><span class="p">,</span> <span class="s">""</span><span class="p">)[</span><span class="mi">1</span><span class="p">:</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="c1"># Create all the named tuple methods to be added to the class namespace
</span>
<span class="n">s</span> <span class="o">=</span> <span class="sa">f</span><span class="s">'def __new__(_cls, </span><span class="si">{</span><span class="n">arg_list</span><span class="si">}</span><span class="s">): return _tuple_new(_cls, (</span><span class="si">{</span><span class="n">arg_list</span><span class="si">}</span><span class="s">))'</span>
<span class="n">namespace</span> <span class="o">=</span> <span class="p">{</span><span class="s">'_tuple_new'</span><span class="p">:</span> <span class="n">tuple_new</span><span class="p">,</span> <span class="s">'__name__'</span><span class="p">:</span> <span class="sa">f</span><span class="s">'namedtuple_</span><span class="si">{</span><span class="n">typename</span><span class="si">}</span><span class="s">'</span><span class="p">}</span>
<span class="c1"># Note: exec() has the side-effect of interning the field names
</span> <span class="k">exec</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">namespace</span><span class="p">)</span>
<span class="n">__new__</span> <span class="o">=</span> <span class="n">namespace</span><span class="p">[</span><span class="s">'__new__'</span><span class="p">]</span>
<span class="n">__new__</span><span class="p">.</span><span class="n">__doc__</span> <span class="o">=</span> <span class="sa">f</span><span class="s">'Create new instance of </span><span class="si">{</span><span class="n">typename</span><span class="si">}</span><span class="s">(</span><span class="si">{</span><span class="n">arg_list</span><span class="si">}</span><span class="s">)'</span>
<span class="k">if</span> <span class="n">defaults</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span>
<span class="n">__new__</span><span class="p">.</span><span class="n">__defaults__</span> <span class="o">=</span> <span class="n">defaults</span>
<span class="n">__new__</span><span class="p">.</span><span class="n">__qualname__</span> <span class="o">=</span> <span class="sa">f</span><span class="s">'</span><span class="si">{</span><span class="n">typename</span><span class="si">}</span><span class="s">.__new__'</span>
<span class="p">...</span> <span class="c1"># omitted
</span> <span class="n">class_namespace</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">...</span> <span class="c1"># omitted
</span> <span class="s">'__new__'</span><span class="p">:</span> <span class="n">__new__</span><span class="p">,</span>
<span class="p">}</span>
<span class="p">...</span> <span class="c1"># omitted
</span> <span class="n">result</span> <span class="o">=</span> <span class="nb">type</span><span class="p">(</span><span class="n">typename</span><span class="p">,</span> <span class="p">(</span><span class="nb">tuple</span><span class="p">,),</span> <span class="n">class_namespace</span><span class="p">)</span>
<span class="p">...</span> <span class="c1"># omitted
</span></code></pre></div></div>
<p>Yep, that’s right. The <code class="language-plaintext highlighter-rouge">__new__</code> method for the namedtuple is created by <em>writing code as a string and calling <code class="language-plaintext highlighter-rouge">exec</code></em>. To be honest, that’s probably the easiest way, and we shouldn’t have gone this far if we need to talk about elegant and readable implementations.</p>
<p>Following their lead, we can also create our own version of <code class="language-plaintext highlighter-rouge">__new__</code> and overwrite theirs:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="c1"># Rewrite `__new__` method to make all arguments keyword-only.
</span> <span class="c1"># This is very hacky code. Do not try this at home.
</span> <span class="n">arg_list</span> <span class="o">=</span> <span class="s">''</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">name</span> <span class="o">+</span> <span class="s">', '</span> <span class="c1"># watch out for singleton tuples
</span> <span class="k">for</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">reordered_fields</span><span class="p">)</span>
<span class="n">s</span> <span class="o">=</span> <span class="p">(</span><span class="sa">f</span><span class="s">"""
def __new__(_cls, *args, </span><span class="si">{</span><span class="n">arg_list</span><span class="si">}</span><span class="s">):
if len(args) > 0:
raise TypeError("Instances of Options class must be created "
"with keyword arguments.")
return _tuple_new(_cls, (</span><span class="si">{</span><span class="n">arg_list</span><span class="si">}</span><span class="s">))
"""</span><span class="p">).</span><span class="n">strip</span><span class="p">()</span> <span class="c1"># remove incorrect indents in the string
</span> <span class="n">new_method_namespace</span> <span class="o">=</span> <span class="p">{</span><span class="s">'_tuple_new'</span><span class="p">:</span> <span class="nb">tuple</span><span class="p">.</span><span class="n">__new__</span><span class="p">,</span>
<span class="s">'__name__'</span><span class="p">:</span> <span class="sa">f</span><span class="s">'namedtuple_</span><span class="si">{</span><span class="n">typename</span><span class="si">}</span><span class="s">'</span><span class="p">}</span>
<span class="k">exec</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">new_method_namespace</span><span class="p">)</span>
<span class="n">__new__</span> <span class="o">=</span> <span class="n">new_method_namespace</span><span class="p">[</span><span class="s">'__new__'</span><span class="p">]</span>
<span class="n">__new__</span><span class="p">.</span><span class="n">__qualname__</span> <span class="o">=</span> <span class="sa">f</span><span class="s">'</span><span class="si">{</span><span class="n">typename</span><span class="si">}</span><span class="s">.__new__'</span>
<span class="n">__new__</span><span class="p">.</span><span class="n">__doc__</span> <span class="o">=</span> <span class="n">nm_tpl</span><span class="p">.</span><span class="n">__new__</span><span class="p">.</span><span class="n">__doc__</span>
<span class="n">__new__</span><span class="p">.</span><span class="n">__annotations__</span> <span class="o">=</span> <span class="n">nm_tpl</span><span class="p">.</span><span class="n">__new__</span><span class="p">.</span><span class="n">__annotations__</span>
<span class="n">__new__</span><span class="p">.</span><span class="n">__kwdefaults__</span> <span class="o">=</span> <span class="p">{</span><span class="n">name</span><span class="p">:</span> <span class="n">namespace</span><span class="p">[</span><span class="n">name</span><span class="p">]</span>
<span class="k">for</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">fields_with_default</span><span class="p">}</span>
<span class="n">nm_tpl</span><span class="p">.</span><span class="n">__new__</span> <span class="o">=</span> <span class="n">__new__</span>
</code></pre></div></div>
<p>As the comment says, this is very dangerous. Don’t try this at home.</p>
<h2 id="summary">Summary</h2>
<p>So far, we’ve delivered our promises. We have a super-enhanced version of namedtuple that supports multiple inheritance and arbitrary field orders. You can find the entire working code in <a href="https://gist.github.com/huzecong/df51502a8a6ec0bcc0e605a2ce109008">this GitHub Gist</a>. It’s a bit long, but you don’t really need to know the details — do the Pythonic thing and treat it as library.</p>
<p>But you may ask, what’s it useful for?</p>
<p>I dunno, but it’s a pretty fun journey, isn’t it?</p>
<h2 id="footnotes">Footnotes</h2>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>There’s actually another level called the meta-metaclass, but that’s rarely useful and I’ve never seen any practical usages. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>This is true for Python 3.6 and lower. Starting from Python 3.7, <code class="language-plaintext highlighter-rouge">collections.namedtuple</code> supports an optional <code class="language-plaintext highlighter-rouge">default</code> argument. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>The MRO (method resolution order) is Python’s answer to the diamond dependency problem in multiple inheritance. When we access a method of an instance, we find the first class in its MRO that defines such method, and returns the method of that class. In the single inheritance case, MRO can be thought of as the list of ancestor classes from the derived class to <code class="language-plaintext highlighter-rouge">object</code>, the base class of everything. Please refer to <a href="https://en.wikipedia.org/wiki/C3_linearization">this Wikipedia article</a> for the algorithm used to compute MRO — the C3 linearization algorithm. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>If you don’t know what this means, you have skipped <a href="#fn:3">footnote 3</a>. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>This is a simplified explanation. <a href="https://stackoverflow.com/questions/48136025/typeerror-multiple-bases-have-instance-lay-out-conflict">This StackOverflow answer</a> gave a pointer to the CPython source code that calculates the best “solid base” for a new class. I’m not familiar with CPython implementations, but my guess is that the solid base is the first class among the MRO with a memory layout different from its base class. Note that adding Python attributes and methods don’t affect the memory layout, because that’s equivalent to adding entries to the <code class="language-plaintext highlighter-rouge">__dict__</code> dictionary.</p>
<p>Also note that this is not limited to CPython. PyPy also has <a href="https://bitbucket.org/pypy/pypy/annotate/default/pypy/objspace/std/typeobject.py?at=default&fileviewer=file-view-default#typeobject.py-1064:1086">a similar check</a>. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Zecong Hutl;dr: Inheritance for the Python built-in namedtuple does not work as we expect. This blog post demonstrates how to create a custom namedtuple class that supports meaningful inheritance, and more. I’ve always under-appreciated the Python collections.namedtuple class. For those who are unfamiliar, a namedtuple is a fancier tuple, whose elements can also be accessed as attributes: from collections import namedtuple Point = namedtuple('Point', ('x', 'y')) p = Point(1, 2) print(p.x, p.y) # 1 2 This allows using meaningful names for the elements, rather than having to remember what are stored under each index. What I don’t like about it, however, is the ugly syntax: attribute names are stored as strings, the class name is repeated, and most importantly, refactoring is error-prone, even within powerful IDEs. You can rename class attributes and all the references easily in PyCharm, but you can’t do that for namedtuples. What I wanted was a syntax like that of the C/C++ struct, with a default constructor to assign values to each field. Luckily, this changed in Python 3.6, with the implementation of PEP 526. This version provides typing.NamedTuple, a typed version of namedtuple with a brand new syntax. Instead of the example above, now you can write: from typing import NamedTuple class Point(NamedTuple): x: int y: int = 0 p = Point(1) print(p.x, p.y) # 1 0 This snippet works in exactly the same way, but adds type annotations for each field, and also supports default values (but fields with default values have to follow those without, just as in a function declaration). The syntax is also much more natural (to a former C++ user, at least). But there is still something we can’t do: inheritance. If you ever tried to inherit a namedtuple, you will find that it doesn’t work as you expect. As illustrated in this StackOverflow question, the new attributes added in the subclass don’t show up, and you’d have to manually override the constructor, which is kind of against the intention of using namedtuples in the first place. Now, you may think, let’s just hack into the internals and somehow make inheritance work. If you were ever in the mood to peek under the hood of this namedtuple class, you’d find that it’s surprisingly complicated for what seemed like a small and easy piece of functionality. But don’t be afraid, the logic is actually pretty straightforward — it just involves some details of Python’s internal data model. Before we begin, let’s summarize what we want to achieve through this blog post: Make inheritance work for typing.NamedTuple as we expect. Also allow multiple inheritance, if there are no overlaps in field names among the base classes. Remove the constraint on ordering for fields with default values. Instance, Class, and Metaclass Before diving into the actual code, let’s get a couple of concepts clear. We need to know what metaclasses are, and how a class is created, before we can customize that behavior. If you’re not familiar with metaclasses, I recommend reading this wonderful article, which gives a comprehensive explanation of the entire topic. But here, I will try to briefly explain the concepts that will be useful for our goals. Class Instance and the __new__ method We’re all familiar with classes. An instance of a class is what you’d get after calling the class constructor. You might think the Python class constructor is __init__, but that’s not the whole story. When you construct an instance, the __new__ method is first called with the same arguments you pass to __init__. __new__ is responsible for the actually creating an instance of the class, and that instance is then passed into __init__ as the self argument. Note that __new__ is considered a class method (because the instance is not even created at the point of call), so its first argument is cls instead of self. For most classes, the __new__ method just calls the super class __new__, which all traces back to object.__new__(cls). There are special cases though — you can return stuff that is not an instance of type cls (or any of its subclasses), in which case, the __init__ method will not be called. A common use case for this is to entirely disable the behaviors of a class: class ProgressBar: # wrap around an iterable to print a progress bar to terminal def __new__(cls, iterable, enable=True): if not enable: return iterable # progress bar disabled; don't wrap the iterable return super().__new__(cls) def __init__(self, iterable, enable=True): # `enable` must be `True` Metaclass The type built-in function shows the type of objects, e.g., type(2) # int type(3.14) # float type("wow") # str type([1, 2, 3, 4]) # list type(MissileWarningSystem(test_run=False)) # <class 'MissileWarningSystem'> But what is the type of a class? Turns out, the type of a class is what we call a metaclass, and the default metaclass (and the base for all metaclasses) is type itself. This reveals a new level of hierarchy1 to us: An instance is an instance of a class. The base for all classes is object. A class is an instance of a metaclass. The base for all metaclasses is type. Just as classes control the behavior of instances, metaclasses control the behavior of classes. When a class is created, the metaclass’ __new__ method is called, and then its __init__ method. What’s different to classes is that you don’t get to customize the arguments received, it’s always like this: class Metaclass(type): def __new__(mcs, typename, bases, namespace): ... mcs is the metaclass instance, in this case, Metaclass or its potential sub-metaclasses (yes, inheritance works here). typename is a str storing the name of the class to create. bases is a tuple of classes, containing the base classes of the class to create. This is what’s in the brackets following the class name on the first line. namespace contains all the class-level attributes, including methods and class attributes. Since type is the default metaclass, we can use the same set of arguments with the type constructor to programmatically create a new class: MyClass = type("MyClass", (object,), { "__init__": lambda self, x: setattr(self, 'x', x), "foo": lambda self: print(self.x), }) which is equivalent to the canonical class definition syntax: class MyClass(object): def __init__(self, x): self.x = x def foo(self): print(self.x) The NamedTuple Class Now that we’re equipped with the adequate knowledge, the first thing to do is look at how NamedTuple is implemented: def _make_nmtuple(name, types): msg = "NamedTuple('Name', [(f0, t0), (f1, t1), ...]); each t must be a type" types = [(n, _type_check(t, msg)) for n, t in types] nm_tpl = collections.namedtuple(name, [n for n, t in types]) # Prior to PEP 526, only _field_types attribute was assigned. # Now, both __annotations__ and _field_types are used to maintain compatibility. nm_tpl.__annotations__ = nm_tpl._field_types = collections.OrderedDict(types) try: nm_tpl.__module__ = sys._getframe(2).f_globals.get('__name__', '__main__') except (AttributeError, ValueError): pass return nm_tpl class NamedTuple(metaclass=NamedTupleMeta): _root = True def __new__(self, typename, fields=None, **kwargs): if fields is None: fields = kwargs.items() elif kwargs: raise TypeError("Either list of fields or keywords" " can be provided to NamedTuple, not both") return _make_nmtuple(typename, fields) The __new__ method here is actually not of our interest — it’s just here to provide an interface similar to namedtuple. The _make_nmtuple function that’s called from __new__ is a utility function that internally constructs a collections.namedtuple and adds type annotations to it. We note that what’s returned from __new__ is not an instance of NamedTuple. We notice that NamedTuple has a metaclass called NamedTupleMeta. The _root attribute here is important for the metaclass, and we’ll talk more of it later. The NamedTupleMeta Metaclass Now let’s take a look at the metaclass code: class NamedTupleMeta(type): def __new__(cls, typename, bases, ns): if ns.get('_root', False): return super().__new__(cls, typename, bases, ns) types = ns.get('__annotations__', {}) nm_tpl = _make_nmtuple(typename, types.items()) defaults = [] defaults_dict = {} for field_name in types: if field_name in ns: default_value = ns[field_name] defaults.append(default_value) defaults_dict[field_name] = default_value elif defaults: raise TypeError("Non-default namedtuple field {field_name} cannot " "follow default field(s) {default_names}" .format(field_name=field_name, default_names=', '.join(defaults_dict.keys()))) nm_tpl.__new__.__annotations__ = collections.OrderedDict(types) nm_tpl.__new__.__defaults__ = tuple(defaults) nm_tpl._field_defaults = defaults_dict # update from user namespace without overriding special namedtuple attributes for key in ns: if key in _prohibited: raise AttributeError("Cannot overwrite NamedTuple attribute " + key) elif key not in _special and key not in nm_tpl._fields: setattr(nm_tpl, key, ns[key]) return nm_tpl Now we know why there’s a _root attribute in NamedTuple. The __new__ method of NamedTupleMeta is also called when NamedTuple is created, but we can’t create a collections.namedtuple for that. Thus, we check whether this special _root attribute exists, and skips the following procedure if it does. When a subclass of NamedTuple is created, the __new__ method is also called, but this time the rest of the procedure is also executed. A couple of things happen: Obtain the list of fields in the namedtuple definition. Since we provide an annotation for each field, they’re stored as a dictionary in the __annotations__ special attribute of the class. Create a namedtuple class using _make_nmtuple. Note that the returned namedtuple class does not support default values2 or contain type annotations for the __init__ method. Gather default values from ns (namespace) and set annotations and default argument values for the __new__ method of the namedtuple class. Add other attributes and methods to the created namedtuple class, so additional methods you defined in the NamedTuple subclass can also be called from the returned namedtuple class. Inheritance with a Single Base Class Let’s first think about what we’re trying to accomplish by inheritance: Automatically generate a constructor that sets all fields, including those from the base class. Access methods, attributes, and properties from the base class. Behave correctly in isinstance and issubclass checks. If we don’t care about the latter two, the solution is pretty straightforward: we just gather the fields defined in the derived and base classes, and ask NamedTupleMeta to create a NamedTuple based on these fields. Let’s make a first attempt at implementing this. Out of personal preference, I’m going to call our enhanced namedtuple Options. class OptionsMeta(typing.NamedTupleMeta): def __new__(mcs, typename, bases, namespace): if namespace.get('_root', False): # The created class is `Options`, skip. return super().__new__(mcs, typename, bases, namespace) # Gather fields from annotations of current class and base class. fields = collections.OrderedDict() cur_fields = namespace.get('__annotations__', {}) # We only deal with single inheritance for now. assert len(bases) == 1 base = bases[0] if hasattr(base, '_fields'): # Base class is a concrete namedtuple. for name in base._fields: # Make sure not to overwrite redefined fields. if name not in cur_fields: fields[name] = base.__annotations__[name] if name in base._field_defaults: namespace.setdefault(name, base._field_defaults[name]) fields.update(cur_fields) namespace['__annotations__'] = fields # Let `NamedTupleMeta` create a annotated `namedtuple` for us. # Note that `bases` is not used there so we just set it to `None`. nm_tpl = super().__new__(mcs, typename, None, namespace) return nm_tpl class Options(metaclass=OptionsMeta): _root = True def __new__(cls, *args, **kwargs): if cls is Options: # Prevent instantiation of `Options` class. raise TypeError("Type Options cannot be instantiated; " "it can be used only as a base class") return super().__new__(cls, *args, **kwargs) A few things to notice here: We define a new metaclass that inherits NamedTupleMeta so we could call its __new__ method that takes care of everything for us. The Options class doesn’t really do anything, and for simplicity, we forbid directly instantiating it like we could for NamedTuple. annotations must be an OrderedDict because the ordering of fields matter — the order determines the index of the field in the underlying tuple object. Here we put base class fields in front of derived ones, but leave out ones that are redefined. A limitation of this method is that the base class cannot contain fields with default values, unless: a) they’re redefined in the base class, or b) every field in the derived class also comes with a default value. If you understood what we’ve learnt so far, the implementation is actually pretty straightforward. However, we encounter problems when we try to use it in practice: In [1]: class BaseOptions(Options): ...: a: int ...: b: int = 2 In [2]: class DerivedOptions(BaseOptions): ...: b: float = 0.5 ...: c: float = 1.0 In [3]: BaseOptions(1) Out[3]: BaseOptions(a=1, b=2) In [4]: DerivedOptions(2) Out[4]: BaseOptions(a=2, b=2) In [5]: DerivedOptions(2, 0.3, 0.4) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-5-f4db6b51352e> in <module> ----> 1 DerivedOptions(2, 0.3, 0.4) TypeError: __new__() takes from 2 to 3 positional arguments but 4 were given The error message may seem a bit cryptic, but what happens here is that DerivedOptions became an alias for BaseOptions. A deeper investigation shows that OptionsMeta.__new__ is not even called when DerivedOptions is created. How come? The truth is, the nm_tpl returned from the constructor of NamedTuple is of type collections.namedtuple, and of course, the metaclass of which is not OptionsMeta. When inheriting the nm_tpl class, we’re actually inheriting a namedtuple, not an Options subclass. Thus, we must create a new class using the namespace of nm_tpl, and we do so by directly invoking the __new__ method of type, which is NamedTupleMeta’s super class: return type.__new__(mcs, typename, bases + nm_tpl.__bases__, nm_tpl.__dict__.copy()) To explain this method call: type.__new__ will create a class with metaclass set to mcs (which is OptionsMeta in this case). An added benefit here is that we get to set the base class of the created class, in this case, BaseOptions (from bases) and tuple (from nm_tpl.__bases__). Note that it’s essential to keep tuple a base class, because tuple.__new__ is called when we create an instance of this namedtuple, and that requires the class to be a subclass of tuple. If we don’t do that, we get an exception: TypeError: tuple.__new__(DerivedOptions): DerivedOptions is not a subtype of tuple The __dict__ (namespace) of nm_tpl is used as is. We do a copy because type.__new__ requires this namespace dictionary to be writable (of type dict), but __dict__ is not (of type mappingproxy). Since we were able to keep the actual base class (BaseOptions) in the MRO3 of the derived class, Python automatically takes care of the latter two functionalities we wanted to accomplish by inheritance. We can easily verify this: In [1]: class BaseOptions(Options): ...: a: int ...: @property ...: def foo(self): ...: return self.a In [2]: class DerivedOptions(BaseOptions): ...: b :int In [3]: x = DerivedOptions(1, 2) In [4]: x.foo Out[4]: 1 In [5]: isinstance(x, BaseOptions) Out[5]: True Multiple Inheritance The method above also fits for multiple inheritance — we just need to gather fields from all the base classes. However, with multiple bases come other problems that did not exist in the single inheritance case: What if multiple base classes define the same field? Since we’re exploring uncharted waters here, we get to define the behavior, but it has to be intuitive. My opinion is that base classes must not have overlapping fields, unless they’re redefined in the derived class. This guarantees that there aren’t unexpected overwrites of fields by different orderings of the base classes. But of course, if you implement it, you’re free to choose whatever strategy that pleases you. What if a base class is not a subclass of Options? We should still keep it bases so it’s kept in the MRO4, and instances could access its methods. Now, let’s try implementing this OptionsMeta metaclass that supports multiple inheritance: class OptionsMeta(typing.NamedTupleMeta): def __new__(mcs, typename, bases, namespace): if namespace.get('_root', False): # The created class is `Options`, skip. return super().__new__(mcs, typename, bases, namespace) # Gather fields from annotations of current class and base classes. cur_fields = namespace.get('__annotations__', {}) fields = collections.OrderedDict() field_sources = {} # which base class does the name came from field_defaults = {} for base in bases: if issubclass(base, Options) and hasattr(base, '_fields'): # Base class is a concrete subclass of `Options`. for name in base._fields: if name in cur_fields: # Make sure not to overwrite redefined fields. continue if name in fields: # Overlapping field that is not redefined. raise TypeError( f"Base class {base} contains field {name}, which " f"is defined in other base class " f"{field_sources[name]}") fields[name] = base.__annotations__[name] field_sources[name] = base if name in base._field_defaults: field_defaults[name] = base._field_defaults[name] fields.update(cur_fields) if len(fields) == 0: raise ValueError("Options class must contain at least one field") for name, value in field_defaults.items(): namespace.setdefault(name, value) namespace['__annotations__'] = fields # Let `NamedTupleMeta` create a annotated `namedtuple` for us. # Note that `bases` is not used here so we just set it to `None`. nm_tpl = super().__new__(mcs, typename, None, namespace) # Wrap the return type in `OptionsMeta` so it can be subclassed. # Also keep base classes of the `namedtuple` (i.e., the `tuple` class), # so we can call `tuple.__new__`. bases = bases + nm_tpl.__bases__ return type.__new__(mcs, typename, bases, nm_tpl.__dict__.copy()) This works great when we inherit from non-Options classes, as we can see from these examples: In [1]: class BaseOptions(Options): ...: a: int ...: @property ...: def foo(self): ...: return self.a In [2]: class Mixin: ...: def bar(self): ...: return self.a + self.b In [3]: class DerivedOptions(BaseOptions, Mixin): ...: b :int In [4]: x = DerivedOptions(1, 2) In [5]: x.foo Out[5]: 1 In [6]: x.bar() Out[6]: 3 In [7]: isinstance(x, BaseOptions) Out[7]: True In [8]: isinstance(x, Mixin) Out[8]: True But when we try to inherit from two Options subclasses, something weird happens: In [1]: class OptionsA(Options): ...: a: int ...: b: int In [2]: class OptionsB(Options): ...: c: int ...: d: int In [3]: class MergedOptions(OptionsA, OptionsB): ...: pass --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-3-51d384fffb01> in <module> ----> 1 class MergedOptions(OptionsA, OptionsB): 2 pass 3 <ipython-input-3-5ff213f4a3b5> in __new__(mcs, typename, bases, namespace) 43 # so we can call `tuple.__new__`. 44 bases = bases + nm_tpl.__bases__ ---> 45 return type.__new__(mcs, typename, bases, nm_tpl.__dict__.copy()) 46 TypeError: multiple bases have instance lay-out conflict Now this is something new, an error message I’ve never seen before. It turns out that I cannot inherit from multiple built-in classes that don’t go together at the C level5, in this case, two different subclasses of tuple. I can see why this is a problem: such built-in types are implemented in C, with fixed memory layouts and implementations for special methods. If we can’t create the type with our bases, how about modifying the bases after creation? It turns out you can’t do that either: <ipython-input-118-d6cd3ab74257> in __new__(mcs, typename, bases, namespace) 43 # so we can call `tuple.__new__`. 44 options_type = type.__new__(mcs, typename, nm_tpl.__bases__, nm_tpl.__dict__.copy()) ---> 45 options_type.__bases__ = bases 46 return options_type 47 TypeError: __bases__ assignment: 'Options' object layout differs from 'tuple' It seems that we’re out of luck. But actually, here’s some less known evil: you can override the creation of the MRO in the metaclass! But the crazy thing here is, we need to implement the C3 linearization algorithm ourselves. Luckily, it’s a simple algorithm: class OptionsMeta(typing.NamedTupleMeta): def __new__(mcs, typename, bases, namespace): ... # omitted here new_namespace = nm_tpl.__dict__.copy() new_namespace['_bases'] = bases options_type = type.__new__(mcs, typename, nm_tpl.__bases__, new_namespace) # Writing to `__bases__` triggers an MRO update. This has to be done after # class creation because otherwise we can't access `_bases`. options_type.__bases__ = tuple(nm_tpl.__bases__) return options_type def mro(cls): default_mro = super().mro() # `Options` does not define `_bases`, so we don't do anything about it. if hasattr(cls, '_bases'): # `default_mro` should be `[cls, tuple, object]`. # `c3merge` and `c3mro` are implementations of the C3 linearization # algorithm, which unluckily aren't provided as APIs. return c3merge([ default_mro[:1], *[base.__mro__ for base in cls._bases], default_mro[1:]]) return default_mro def c3merge(sequences): r"""Adapted from https://www.python.org/download/releases/2.3/mro/""" # Make sure we don't actually mutate anything we are getting as input. sequences = [list(x) for x in sequences] result = [] while True: # Clear out blank sequences. sequences = [x for x in sequences if x] if not sequences: return result # Find the first clean head. for seq in sequences: head = seq[0] # If this is not a bad head (i.e., not in any other sequence) if not any(head in s[1:] for s in sequences): break else: raise Error("inconsistent hierarchy") # Move the head from the front of all sequences to the end of results. result.append(head) for seq in sequences: if seq[0] == head: del seq[0] return result Of course, this complex method is when you need to support every general case. Normally you wouldn’t have multiple layers of hierarchy for namedtuples, nor will you mix-in a bunch of other classes such that you need to be careful about the MRO. Arbitrary Order of Fields Now, to the final goal which you’ve probably forgotten: removing the constraint on ordering for fields with default values. This is an inherent limit in Python, because method arguments with default values are treated as keyword arguments (captured by **kwargs), and have to be declared after positional arguments (captured by *args). To workaround this, we can declare all arguments of the constructor as keyword-only arguments. For me, not allowing positional arguments is actually better because the order of the fields can be ambiguous when you have multiple base classes. How can we programmatically create a method with custom arguments? Let’s dive into the code for collections.namedtuple, where the magic happens. The code is pretty long so I’m just going to show the relevant parts here. Turns out magic doesn’t exist, everything’s just a hack: ... # omitted arg_list = repr(field_names).replace("'", "")[1:-1] # Create all the named tuple methods to be added to the class namespace s = f'def __new__(_cls, {arg_list}): return _tuple_new(_cls, ({arg_list}))' namespace = {'_tuple_new': tuple_new, '__name__': f'namedtuple_{typename}'} # Note: exec() has the side-effect of interning the field names exec(s, namespace) __new__ = namespace['__new__'] __new__.__doc__ = f'Create new instance of {typename}({arg_list})' if defaults is not None: __new__.__defaults__ = defaults __new__.__qualname__ = f'{typename}.__new__' ... # omitted class_namespace = { ... # omitted '__new__': __new__, } ... # omitted result = type(typename, (tuple,), class_namespace) ... # omitted Yep, that’s right. The __new__ method for the namedtuple is created by writing code as a string and calling exec. To be honest, that’s probably the easiest way, and we shouldn’t have gone this far if we need to talk about elegant and readable implementations. Following their lead, we can also create our own version of __new__ and overwrite theirs: # Rewrite `__new__` method to make all arguments keyword-only. # This is very hacky code. Do not try this at home. arg_list = ''.join(name + ', ' # watch out for singleton tuples for name in reordered_fields) s = (f""" def __new__(_cls, *args, {arg_list}): if len(args) > 0: raise TypeError("Instances of Options class must be created " "with keyword arguments.") return _tuple_new(_cls, ({arg_list})) """).strip() # remove incorrect indents in the string new_method_namespace = {'_tuple_new': tuple.__new__, '__name__': f'namedtuple_{typename}'} exec(s, new_method_namespace) __new__ = new_method_namespace['__new__'] __new__.__qualname__ = f'{typename}.__new__' __new__.__doc__ = nm_tpl.__new__.__doc__ __new__.__annotations__ = nm_tpl.__new__.__annotations__ __new__.__kwdefaults__ = {name: namespace[name] for name in fields_with_default} nm_tpl.__new__ = __new__ As the comment says, this is very dangerous. Don’t try this at home. Summary So far, we’ve delivered our promises. We have a super-enhanced version of namedtuple that supports multiple inheritance and arbitrary field orders. You can find the entire working code in this GitHub Gist. It’s a bit long, but you don’t really need to know the details — do the Pythonic thing and treat it as library. But you may ask, what’s it useful for? I dunno, but it’s a pretty fun journey, isn’t it? Footnotes There’s actually another level called the meta-metaclass, but that’s rarely useful and I’ve never seen any practical usages. ↩ This is true for Python 3.6 and lower. Starting from Python 3.7, collections.namedtuple supports an optional default argument. ↩ The MRO (method resolution order) is Python’s answer to the diamond dependency problem in multiple inheritance. When we access a method of an instance, we find the first class in its MRO that defines such method, and returns the method of that class. In the single inheritance case, MRO can be thought of as the list of ancestor classes from the derived class to object, the base class of everything. Please refer to this Wikipedia article for the algorithm used to compute MRO — the C3 linearization algorithm. ↩ If you don’t know what this means, you have skipped footnote 3. ↩ This is a simplified explanation. This StackOverflow answer gave a pointer to the CPython source code that calculates the best “solid base” for a new class. I’m not familiar with CPython implementations, but my guess is that the solid base is the first class among the MRO with a memory layout different from its base class. Note that adding Python attributes and methods don’t affect the memory layout, because that’s equivalent to adding entries to the __dict__ dictionary. Also note that this is not limited to CPython. PyPy also has a similar check. ↩在 MacBook Pro 上配置使用外置显卡2017-11-18T17:42:05+00:002017-11-18T17:42:05+00:00http://zecong.hu/2017/11/18/using-egpu-on-macbook-pro<p>在听说 macOS High Sierra 官方支持 eGPU 之后,便一直想买一块显卡,以弥补我用 Mac 5年以来没怎么玩过大型 3D 游戏的遗憾,顺带炼一炼丹。趁着双十一这个借口,狠下心来买了一个 eGPU 盒子和一块 1080 Ti。</p>
<p>不过配置显卡的过程及其复杂,为了方便他人,同时备自己不时之需,在此记录一下。</p>
<p>声明:这里记载的方法是我综合网上各教程得到的,可能只适用于我自己的机型和配件,仅供参考。</p>
<!--more-->
<h2 id="配置与环境">配置与环境</h2>
<ul>
<li><strong>主机:</strong>2015款 MBP 15’,带 AMD R9 M370X 独显,以及 Intel Iris Pro 集显</li>
<li><strong>系统:</strong>macOS High Sierra 10.13.1 / Windows 10 Fall Creator’s Update (10.0.10586)</li>
<li><strong>eGPU盒:</strong>Sonnet eGPF Breakaway Box (GPU-350W-TB3Z)</li>
<li><strong>GPU:</strong>EVGA GeForce GTX 1080 Ti SC2 (11G-P4-6593-KR)</li>
</ul>
<p>之所以选择这个盒子,是因为苹果官方提供的开发者版 eGPU 就是基于这个盒子的。毕竟官方“认证”,用着放心一些。我买的是 350W 的盒子(因为便宜),只提供了一个 8pin 和一个 6pin 电源接口,因此必须使用同样使用这种接口的 GPU。</p>
<p>另外,由于这个盒子使用 Thunderbolt 3 接口且只附赠 TB3 公对公连接线,因此还需要自行购买转接器。市面上只有 TB3 公对 TB2 母的转接器,不过因为是双向的,可以搭配 TB2 公对公连接线使用。</p>
<h2 id="在-windows-上配置">在 Windows 上配置</h2>
<p>不得不说,Windows 对各类硬件的支持还是完善得多。在 Windows 上配置非常简单,由于 Boot Camp 自带 TB 驱动,eGPU 盒子即插即用,只要上 NVIDIA 官网下最新驱动安装即可。</p>
<p>不过这里有一个很坑的地方:MBP 的两个 TB2 接口是不一样的。具体有什么差别我也没查到,但是我的盒子只有插在<strong>靠近电源一侧</strong>的接口才可以正常工作。如果插在另一个接口,虽然可以正常识别,但是 GeForce Experience 在驱动安装完成后仍然会提示需要安装驱动,无法使用。</p>
<h3 id="使用内置显示器">使用内置显示器</h3>
<p>如果使用外接显示器的话,至此已经可以正常使用了。但是我们可以通过进一步的配置,让内置显示器使用外接显卡渲染。这一部分的原理似乎是,在 Mac 启动时会检测是否存在独立显卡,如果存在则不会使用集成显卡。但是为了使用 NVIDIA Optimus 来让 GPU 为内置显示器渲染,则需要让集成显卡保持运行。</p>
<p>具体描述请参见链接<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>。</p>
<h4 id="第一步设置启动盘">第一步:设置启动盘</h4>
<p>这一步需要在 macOS 中完成。在 System Preferences(系统偏好设置) > Startup Disk(启动盘)中,选择 Windows BOOTCAMP 分区作为启动盘,并重启。</p>
<p>如果选项中没有 BOOTCAMP 分区,可能是因为第三方 NTFS 驱动(比如我使用的 Turexa NTFS)挂载了分区。以 Turexa NTFS 为例,在其设置页面中的 Volumes 页选择 BOOTCAMP 分区,勾选“Disable Turexa NTFS”,并在 Disk Utility 中卸载再挂载分区即可。</p>
<h4 id="第二步创建-usb-引导盘">第二步:创建 USB 引导盘</h4>
<p>这一步是为了假装是 macOS 启动。你需要一个容量不超过 4GB 的U盘,并将其格式化为 FAT 格式。如果手头只有大容量U盘,可以通过以下操作来划分一块 4GB 的分区:</p>
<ol>
<li>以管理员权限运行 <code class="language-plaintext highlighter-rouge">diskpart</code>;</li>
<li>执行 <code class="language-plaintext highlighter-rouge">list disk</code>,记下U盘对应的磁盘编号(假设为 Disk 1);</li>
<li>执行 <code class="language-plaintext highlighter-rouge">select disk 1</code>;</li>
<li>执行 <code class="language-plaintext highlighter-rouge">clean</code>,这将抹除U盘上所有的数据,并删除分区表;</li>
<li>执行 <code class="language-plaintext highlighter-rouge">create partition primary size=4000</code>,这将创建一个 4GB 的主分区;这一步操作后系统可能会弹出对话框询问是否需要格式化,关闭即可;</li>
<li>执行 <code class="language-plaintext highlighter-rouge">format fs=fat quick</code>,这将快速格式化分区为 FAT 格式;</li>
<li>执行 <code class="language-plaintext highlighter-rouge">assign letter = D</code>,这将为分区分配盘符 D ,以访问文件系统。</li>
</ol>
<p>当然,更简单的办法应该是在 macOS 下使用 Disk Utility(磁盘工具)完成上述操作。</p>
<p>之后,下载 <a href="https://github.com/0xbb/apple_set_os.efi/releases"><code class="language-plaintext highlighter-rouge">apple_set_os.efi</code></a>。在U盘根目录下创建目录 <code class="language-plaintext highlighter-rouge">/EFI/Boot</code>,并将下载的文件重命名为 <code class="language-plaintext highlighter-rouge">bootx64.efi</code> 放在目录中。</p>
<h4 id="第三步执行-gpu-switch">第三步:执行 <code class="language-plaintext highlighter-rouge">gpu-switch</code></h4>
<p>下载 <a href="https://github.com/0xbb/gpu-switch"><code class="language-plaintext highlighter-rouge">gpu-switch</code></a> 的 Windows 版本。它的作用是在下次启动系统时使用集成显卡。以管理员权限执行 <code class="language-plaintext highlighter-rouge">integrated.bat</code> 即可。</p>
<h4 id="第四步通过-efi-boot-引导">第四步:通过 EFI Boot 引导</h4>
<p>重新启动,在开机时按住左 option 键,选择 EFI Boot 启动。此时内置显示屏就是外接显卡渲染的啦。</p>
<p>为了验证这一点,可以在桌面右键菜单中打开 NVIDIA Control Panel。如果右键菜单中没有这一项,或者点击后弹出“没有使用 NVIDIA GPU 的显示器”,则说明配置不成功。</p>
<h2 id="在-macos-上配置">在 macOS 上配置</h2>
<p>现在 macOS 上已经有了 NVIDIA 的官方驱动支持。目前最新的 WebDriver 版本号为 378.10.10.10.20.107,可以在 <a href="http://www.nvidia.com/download/driverResults.aspx/126538/en-us">NVIDIA 官方网站</a>上下载。同时需要安装对应的 CUDA 驱动。</p>
<p>如果要在 NVIDIA 官网搜索最新版本的 macOS 驱动,则需要在产品系列选择“GeForce 600 Series”,操作系统选择“Show all Operating Systems”,然后选择对应的 macOS 系统版本。这是因为该驱动目前只为该系列显卡提供正式支持,对较新的显卡的支持还在 beta 阶段。</p>
<p>需要注意的是,安装驱动时需要开启 System Integrity Protection(SIP)。具体方法是在开机进入 macOS 系统前按住 Cmd+R 进入恢复模式,打开命令行执行 <code class="language-plaintext highlighter-rouge">csrutil enable</code>。同理,执行 <code class="language-plaintext highlighter-rouge">csrutil disable</code>则可以关闭 SIP。如果没有手动关闭过 SIP的话,默认状态下 SIP是开启的。</p>
<p>为了使用外置 GPU,还需要做一些附加的配置。下载 <a href="https://egpu.io/wp-content/uploads/wpforo/attachments/3/3858-nvidia-egpu-v2-1013-1.zip">NVIDIAEGPUSupport</a>,并在<strong>关闭 SIP</strong> 的情况下安装。详细信息可以参考连接<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">2</a></sup>。</p>
<p>不过这时,如果在启动时连接了 eGPU,则进入登录界面后会花屏。如果在启动后连接 eGPU,在系统信息中的显卡信息处只能看到“NVIDIA Chip Model”,并不会显示具体型号。解决方法和 Windows 部分使用内置显示器的方法类似:将启动盘设为 macOS 分区,执行 macOS 下的 <code class="language-plaintext highlighter-rouge">gpu-switch</code>,然后重启时从 EFI Boot 启动。此时可以正常进入登录界面,登录后可以使用 <a href="http://cuda-z.sourceforge.net/">CUDA-Z</a> 检测 GPU。</p>
<p>需要强调的一点是:目前 macOS <strong>不完全支持热拔插</strong>。在连接 eGPU 后断开可能导致黑屏、重启、显示“五国语言”错误界面等。</p>
<h3 id="使用内置显示器-1">使用内置显示器</h3>
<p>至此,虽然使用了 Windows 部分的方法,但仍然没有让内置显示器用上 eGPU。在关于本机的页面中,Built-in Display 下面显示的仍然是 Intel Iris Pro 内置显卡。</p>
<p>链接<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">3</a></sup>中给出了一种方法,需要用到一个 HDMI 的“空插头”。因为手头没有这种插头,我没有尝试,等以后试过了再更新这一部分。</p>
<h2 id="关于-thunderbolt-2-的性能损失">关于 Thunderbolt 2 的性能损失</h2>
<p>在整个显卡→PCI-E→TB3→TB2→主机的数据通道中,各部分的理论最大带宽为:</p>
<ul>
<li>PCI-E:126Gbps</li>
<li>TB3:32Gbps</li>
<li>TB2:16Gbps</li>
</ul>
<p>因此 TB2 成为了瓶颈。实际测试在 macOS 下,传输速度约为 1200MB/s,也就是 9.6Gbps。如果使用内置显示器的话,速度会更低。根据链接<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>中的测试,在 TB2 连接下使用 GTX 1080 大概会有 40% 的性能损失,使用外置显示器可以将损失减小到 20%。</p>
<p>另外,链接<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>中指出性能损失可能也部分来自于 <code class="language-plaintext highlighter-rouge">apple_set_os.efi</code>,并给出了一个解决方法。我没有仔细阅读,大家可以自行参考。</p>
<p>就游戏体验来说,即便只有60%的性能,大部分游戏也绰绰有余了。在 Windows 下使用 1920x1600 分辨率运行的 NieR:Automata,在开启最高画质、关闭垂直同步时仍然较为流畅(主观感受,没有实际测过帧率)。对我来说大概够了。</p>
<p>另一方面,对于炼丹而言,计算耗时应该远高于传输耗时,因此瓶颈影响不大。不过这也只是我的猜想,还没有实测过。</p>
<h2 id="参考链接">参考链接</h2>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p><a href="https://egpu.io/forums/mac-setup/how-to-keep-mbps-irisiris-pro-activated-when-booting-into-windows-boot-camp/">https://egpu.io/forums/mac-setup/how-to-keep-mbps-irisiris-pro-activated-when-booting-into-windows-boot-camp/</a> <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p><a href="https://egpu.io/forums/mac-setup/wip-nvidia-egpu-support-for-high-sierra/">https://egpu.io/forums/mac-setup/wip-nvidia-egpu-support-for-high-sierra/</a> <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p><a href="https://egpu.io/how-to-egpu-accelerated-internal-display-macos/">https://egpu.io/how-to-egpu-accelerated-internal-display-macos/</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p><a href="https://egpu.io/forums/mac-setup/pcie-slot-dgpu-vs-thunderbolt-3-egpu-internal-display-test/">https://egpu.io/forums/mac-setup/pcie-slot-dgpu-vs-thunderbolt-3-egpu-internal-display-test/</a> <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p><a href="https://egpu.io/forums/mac-setup/mbp-tb3-port-underperformance-16xxmibs-instead-of-22xxmibs-under-macos-or-windowsapple_set_os-efi/">https://egpu.io/forums/mac-setup/mbp-tb3-port-underperformance-16xxmibs-instead-of-22xxmibs-under-macos-or-windowsapple_set_os-efi/</a> <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Zecong Hu在听说 macOS High Sierra 官方支持 eGPU 之后,便一直想买一块显卡,以弥补我用 Mac 5年以来没怎么玩过大型 3D 游戏的遗憾,顺带炼一炼丹。趁着双十一这个借口,狠下心来买了一个 eGPU 盒子和一块 1080 Ti。 不过配置显卡的过程及其复杂,为了方便他人,同时备自己不时之需,在此记录一下。 声明:这里记载的方法是我综合网上各教程得到的,可能只适用于我自己的机型和配件,仅供参考。Research Notes2017-08-07T01:53:05+00:002017-08-07T01:53:05+00:00http://zecong.hu/2017/08/07/research-notes<p>This post records my notes taken during summer internship @ CMU LTI.</p>
<p>Disclaimer: These notes are not guaranteed to be correct or understandable.</p>
<p>Note: This is a non-mobile-friendly post, mobile view is distorted due to formulae.</p>
<!--more-->
\[\renewcommand{\d}{\ \mathrm{d}}\]
<h2 id="tips-in-training">Tips in Training</h2>
<h4 id="dropout">Dropout</h4>
<ul>
<li>
<p>Apply a random mask on parameters: each parameter is zeroed with a probability of $p$, note that output would be scaled down to $1-p$ compared to values when dropout is not applied</p>
</li>
<li>
<p><strong>Inverted dropout</strong>: Rescale output to $1/(1-p)$ during training, so no special treatment is required for using the models</p>
</li>
<li>
<p><strong>Dropout for embeddings</strong>: zero out entire vectors for random word IDs</p>
</li>
<li>
<p><strong>Dropout for LSTMs</strong>: apply to <u>input</u> and <u>hidden state</u>, rather than parameters. Dropout mask is the same for each time step on one training sample <em>(ref: <a href="https://arxiv.org/pdf/1512.05287.pdf">[Gal 2016] A Theoretically Grounded Application of Dropout in Recurrent Neural Networks</a>)</em></p>
</li>
<li>
<p><strong>Dropout for final FC layer in LSTMs</strong>: apply to <u>LSTM output</u>, rather than FC parameters</p>
</li>
</ul>
<h4 id="evaluating-similarity">Evaluating similarity</h4>
<ul>
<li>
<p>To evaluate similarities of $h_L$ and $h_R$ based on ratings of $[1,K]$, we can jointly train a MLP based on:</p>
\[\begin{align*}
h_\times &= h_L\odot h_R \\
h_+ &= \vert h_L-h_R \vert \\
h_s &= \sigma\left(W^{(\times)}h_\times + W^{(+)}h_+ + b^{(h)}\right) \\
\hat{p_\theta} &= \mathrm{softmax}\left(W^{(p)}h_s + b^{(p)}\right) \\
r &= \{1,2,\ldots,K\} \\
\hat{y} &= r^\top \hat{p_\theta}
\end{align*}\]
<p>i.e. <u>learning a evaluation criterion based on distance and angle</u> between the pair, and mapping it as a weighted average of ratings. Obviously the resultant prediction $\hat{y}$ will be in the range $[1,K]$.</p>
</li>
<li>
<p><strong>Triplet loss</strong>: <em><u>(more details to be described)</u></em></p>
</li>
</ul>
<h4 id="gumbel-max-trick--gumbel-softmax-distribution">Gumbel-max trick & Gumbel-Softmax distribution</h4>
<ul>
<li>
<p><strong>Gumbel distribution</strong> (unit scale, zero location, $x\in(-\infty,+\infty)$):</p>
<ul>
<li><strong>PDF</strong>: $f(x) = \exp(-x-\exp(-x))$</li>
<li><strong>CDF</strong>: $F(x) = \exp(-\exp(-x))$</li>
<li>Property: If $U\sim \mathrm{Uniform}[0,1]$, then $-\log(-\log U)\sim \mathrm{Gumbel}(0,1)$</li>
</ul>
</li>
<li>
<p>Softmax for ${x_k}$ is equivalent to: adding independent Gumbel noise and take argmax</p>
</li>
<li>
<p><strong>Proof</strong>: Let $z_k = x_k + y_k,\ {y_k}\stackrel{\mathrm{i.i.d.}}{\sim}\mathrm{Gumbel}(0,1)$, then $P(z_k\text{ is max}) = \prod_{j\neq k} F(z_k-x_j)$.
\(\begin{align*}
P(k\text{ is selected}) &= \int_{-\infty}^{+\infty}f(z_k-x_k)P(z_k\text{ is max})\d{z_k} \\
&= \int_{-\infty}^{+\infty} \exp\left(-z_k+x_k-\exp(-z_k)\sum_{j=1}^{K}\exp(x_k)\right)\d{z_k} \\
&\stackrel{\text{magic}}{=} \frac{\exp(x_k)}{\sum_{j=1}^{K}\exp(x_j)}=\mathrm{softmax}\left(\{x_k\}\right)^{(k)}
\end{align*}\)
where the “magic” step somehow calculates the closed-form solution of the above integration.</p>
</li>
<li>
<p>See also: <a href="https://hips.seas.harvard.edu/blog/2013/04/06/the-gumbel-max-trick-for-discrete-distributions/">https://hips.seas.harvard.edu/blog/2013/04/06/the-gumbel-max-trick-for-discrete-distributions/</a></p>
</li>
<li>
<p><strong>Usage</strong>: Replace sampling from distribution $P(x)=\pi_x$ with argmax operation:
\(z=\underset{i}{\arg\max}(g_i+\log\pi_i)\sim P\)
where $g_i$ are independent samples from uniform Gumbel distribution.</p>
</li>
<li>
<p><strong>Gumbel-Softmax distribution</strong>: Softmax with temperature $\tau$ applied over Gumbel-max:
\(\mathbf{y}=\mathrm{softmax}((\log\pmb{\pi}+\mathbf{g})/\tau)\)
For lower temperatures, Gumbel-Softmax distribution is close to the one-hot distribution of the argmax element (which is the sample given by Gumbel-max trick); for higher temperatures, distribution is close to uniform. <em>(ref: <a href="https://arxiv.org/pdf/1611.01144">[Jang et al. 2016] Categorical Reparameterization with Gumbel-Softmax</a>)</em></p>
<ul>
<li>This is useful when we need a differentiable sample over a discrete distribution. For sample over continuous distributions, we have the <a href="#how-do-we-compute-the-lower-bound">reparameterization trick</a>.</li>
</ul>
</li>
</ul>
<h4 id="exposure-bias-and-scheduled-sampling">Exposure bias and scheduled sampling</h4>
<ul>
<li>
<p><strong>Exposure bias</strong>: In sequential models, during training, we feed the ground-truth label at the previous time step as input, no matter what the output prediction was; while during testing, we always feed the previous output. This way, the model is “exposed” to the ground-truth, even when it was not able to make such predictions. The model may also fail to capture relations between the next state and the previous output.</p>
</li>
<li>
<p><strong>Scheduled sampling</strong>: At each time step, use previous label by probability $1-p$, and use “teacher forcing” (feed ground-truth) by probability $p$.</p>
<p>$p$ is set to a high value at the beginning of training, and eventually anneal to a close to 0 value (thus the name “scheduled” sampling).
See also: <a href="https://www.evernote.com/shard/s189/sh/c9ac2e3f-a150-4d0c-9a44-16657e5d42cd/5eb49d50695c903ca1b4a04934e63363">https://www.evernote.com/shard/s189/sh/c9ac2e3f-a150-4d0c-9a44-16657e5d42cd/5eb49d50695c903ca1b4a04934e63363</a></p>
</li>
<li>
<p><strong>Drawbacks of scheduled sampling</strong>: When previous label was used, we were using the result of argmax of the softmax at previous time step as input, and naturally we would like to back propagate through such calculations. However, argmax is non-differentiable.</p>
<p>The reason back propagating through argmax is desirable is that, the actual cause for predicting a wrong label at the current time step may be that wrong predictions were made at previous time steps (cascading error).</p>
</li>
<li>
<p><strong>Other caveats</strong>: see <a href="http://www.inference.vc/scheduled-sampling-for-rnns-scoring-rule-interpretation/">A Word of Caution on Scheduled Sampling for Training RNNs</a>.</p>
</li>
<li>
<p><strong>Soft argmax and differentiable scheduled sampling</strong> <em>(ref: <a href="https://arxiv.org/pdf/1704.06970.pdf">[Goyal, Dyer 2017] Differentiable Scheduled Sampling for Credit Assignment</a>)</em>: <em><u>(more details to be described)</u></em></p>
</li>
</ul>
<h4 id="truncated-backprop">Truncated Backprop</h4>
<ul>
<li>
<p>Concretely, for every $k_1$ time steps, train on the following $k_2$ time steps. When $k_1<k_2$, there’s overlap between consecutive time steps; sometimes $k_1=k_2$ is desired.</p>
<p><img src="https://r2rt.com/static/images/RNN_tf_truncated_backprop.png" alt="truncated backprop" /></p>
</li>
<li>
<p>Initial state may be <strong>zeroed</strong> by a small probability, so as to bias the model towards being easily start from a zero state in test time <em>(ref: <a href="https://arxiv.org/pdf/1707.05589.pdf">[Melis, Dyer 2017] On the State of the Art of Evaluation in Neural Language Models</a>)</em></p>
</li>
<li>
<p><strong>Data preprocessing</strong>: <em><u>(more details to be described)</u></em></p>
</li>
<li>
<p><strong>Pros</strong>: cheaper to train (less memory consumption for computation graphs), and mitigates the vanishing gradient problem; <strong>Cons</strong>: constrained the maximum range for dependencies</p>
</li>
</ul>
<h4 id="entropy-cross-entropy-loss-and-kl-divergence">Entropy, Cross Entropy Loss, and KL-Divergence</h4>
<ul>
<li>
<p>Shannon <strong>Entropy</strong> of a probability distribution is defined as</p>
\[H(p)=\mathbb{E}_p[-\log p]=-\sum_{x_i}p(x_i)\log p(x_i)\]
<p>which is the expected number of bits required to represent an element in the set over which the probability distribution is defined. The lower bound for the number of bits required to represent an element $x_i$ is $\log\frac{1}{p(x_i)}=-\log p(x_i)$.</p>
</li>
<li>
<p><strong>Cross-Entropy loss</strong> is defined on two distributions:</p>
\[H(p,q)=\mathbb{E}_p[-\log q]=-\sum_{x_i}p(x_i)\log q(x_i)\]
<p>which can be interpreted as estimating entropy using the wrong probability $q$. When minimizing w.r.t. cross-entropy loss, we’re trying to match our predicted distribution $q$ to the true distribution $p$.</p>
</li>
<li>
<p><strong>KL-divergence</strong> is simply the difference between entropy and cross-entropy loss:</p>
\[\mathrm{KL}(p\ \Vert\ q)=H(p,q)-H(p)=\sum_{x_i}p(x_i)\log\frac{p(x_i)}{q(x_i)}\]
<p>which is the number of extra bits required. Usually minimizing w.r.t. KL-divergence is equivalent to minimizing w.r.t. cross-entropy loss.</p>
</li>
<li>
<p>See also: <a href="https://rdipietro.github.io/friendly-intro-to-cross-entropy-loss/">https://rdipietro.github.io/friendly-intro-to-cross-entropy-loss/</a> and <a href="https://www.countbayesie.com/blog/2017/5/9/kullback-leibler-divergence-explained">https://www.countbayesie.com/blog/2017/5/9/kullback-leibler-divergence-explained</a></p>
</li>
</ul>
<h4 id="tied-input-and-output-embeddings">Tied Input and Output Embeddings</h4>
<ul>
<li>
<p>Let $L$ be the input embedding, such that input to the LSTM is $x_t = Ly^*_{t-1}$</p>
</li>
<li>
<p>Replace the dense layer $y_t=\mathrm{softmax}(Wh_t+b)$ following the LSTM unit by the <strong>transpose</strong> of the embedding, i.e. $y_t=\mathrm{softmax}\left(L^\top h_t\right)$ <em>(ref: <a href="https://arxiv.org/pdf/1611.01462.pdf">[Inan & Khosravi 2016] Tying Word Vectors and Word Classifiers etc.</a>)</em></p>
</li>
<li>
<p>Since input and output are in the same space, it is reasonable to assume they’re related by a linear transformation $A$. Tying embeddings results in minimizing w.r.t. vector similarities:</p>
<ul>
<li>
<p>Let $u_t=Ly^*_t$, i.e. the embedding of the actual output. By minimizing w.r.t. vector similarities, we would like the probability $y_t$ be related to similarity metrics, concretely $y_t=\tilde{y_t}=\mathrm{softmax}\left(L^\top u_t\right)$, where we use inner product as measurement of similarity.</p>
</li>
<li>
<p>In order to minimize loss, $h_t$ would be adjusted to be closer to the appropriate column of $L$.</p>
</li>
<li>
<p>If we apply KL-divergence as loss, $\tilde{y_t}$ could be used as the estimated true distribution. Other class labels are also utilized during backprop, compared to the case when one-hot encoding is used.</p>
</li>
</ul>
</li>
</ul>
<h4 id="softmax-approximations-by-sampling">Softmax Approximations by Sampling</h4>
<ul>
<li>
<p>Ref to: <a href="http://ruder.io/word-embeddings-softmax/index.html#samplingbasedapproaches">http://ruder.io/word-embeddings-softmax/index.html#samplingbasedapproaches</a></p>
</li>
<li>
<p>Given the correct word $w$, and all candidate words $w_i$. For negative log softmax loss, the formula for loss is:</p>
\[J_w=-\log\frac{\exp(h^\top v_{w})}{\sum_{w_i}\exp(h^\top v_{w_i})}=-h^\top v_w+\log\sum_{w_i}\exp(h^\top v_{w_i})\]
<p>where $v_w$ is the output embedding. Denoting $\mathcal{E}(w)=h^\top v_{w}$, taking gradients w.r.t. parameters would give:</p>
\[\begin{align*}
\nabla_\theta J_w & = -\nabla_\theta \mathcal{E}(w)+\nabla_\theta \log\sum_{w_i}\exp(\mathcal{E}(w_i)) \\
& = -\nabla_\theta\mathcal{E}(w)+\sum_{w_i}\frac{\exp(\mathcal{E}(w_i))}{\sum_{w_i'}\exp(\mathcal{E}(w_i'))}\nabla_\theta\mathcal{E}(w_i) \\
& = -\nabla_\theta\mathcal{E}(w)+\sum_{w_i}P(w_i)\nabla_\theta\mathcal{E}(w_i) \\
& = -\nabla_\theta\mathcal{E}(w)+\mathbb{E}_{w_i\sim P}[\nabla_\theta\mathcal{E}(w_i)]
\end{align*}\]
<p>where $P(w_i)$ is the softmax probability of $w_i$.</p>
<p>Sampling methods reduce computational complexity by approximating the expected term.</p>
</li>
</ul>
<h5 id="importance-sampling">Importance sampling</h5>
<ul>
<li>
<p>Expectation can be calculated using Monte Carlo methods: average of samples multiplied by its probability.</p>
</li>
<li>
<p>To avoid computing actual probabilities (which is the same as calculating softmax), sample from another distribution $Q$ similar to the target distribution $P$, for instance, the unigram distribution.</p>
</li>
<li>
<p>Suppose we’re to calculate $\mathbb{E}_{x\sim P}[f(x)]$, which in continuous form is equivalent to</p>
\[\mathbb{E}_{x\sim P}[f(x)]=\int f(x)p(x)\d x\]
<p>where $p(x)$ is the PDF of distribution $P$. We can calculate the integration w.r.t. a different distribution $Q$ with PDF $q(x)$ by evaluating:</p>
\[\int f(x)p(x)\d x=\int \frac{f(x)p(x)}{q(x)}q(x)\d x=\mathbb{E}_{x\sim Q}\left[\frac{f(x)p(x)}{q(x)}\right]\]
<p>When $Q$ is similar to $P$, doing Monte Carlo integration w.r.t. $Q$ can decrease variance compared to using uniform distribution.</p>
</li>
<li>
<p>To avoid weighting the gradients with $P$, we need to approximate $P$ as well. Denote $P(w)=\frac{\tilde{p}(w)}{Z_p}$, where $Z_p$ is the partition function, and $\tilde{p}(w)=\exp(\mathcal{E}(w))$ is the unnormalized probability of distribution $P$. We can rewrite the expectation as:</p>
\[\begin{align*}
\mathbb{E}_{w_i\sim P}[\nabla_\theta\mathcal{E}(w_i)] & = \mathbb{E}_{\tilde{w}_i\sim Q}\left[\frac{P(\tilde{w})}{Q(\tilde{w}_i)}\nabla_\theta\mathcal{E}(w_i)\right] \\
& \approx \frac{1}{m}\sum_{i=1}^{m}\frac{P(\tilde{w}_i)}{Q(\tilde{w}_i)}\nabla_\theta\mathcal{E}(\tilde{w}_i) \\
& = \frac{Z_q}{Z_p}\frac{1}{m}\sum_{i=1}^{m}\frac{\tilde{p}(\tilde{w}_i)}{\tilde{q}(\tilde{w}_i)}\nabla_\theta\mathcal{E}(\tilde{w}_i)
\end{align*}\]
<p>where $\tilde{w}_i$ are $m$ samples from distribution $Q$ used in a Monte Carlo estimator. We can apply the same technique in approximating the partition function:</p>
\[\begin{align*}
\frac{Z_p}{Z_q} & =\frac{1}{Z_q}\sum_{w}\tilde{p}(w) \\
& = \sum_w \frac{Q(w)}{\tilde{q}(w)}\tilde{p}(w) \\
& = \mathbb{E}_{w\sim Q}\left[\frac{\tilde{p}(w)}{\tilde{q}(w)}\right] \approx \frac{1}{m}\sum_{i=1}^{m} \frac{\tilde{p}(\tilde{w}_i)}{\tilde{q}(\tilde{w}_i)}
\end{align*}\]
</li>
<li>
<p>Combining the above formulae gives us an unbiased estimator of the expectation:</p>
\[\begin{align*}
\mathbb{E}_{w_i\sim P}[\nabla_\theta\mathcal{E}(w_i)] & \approx \sum_{i=1}^{m}\frac{\tilde{p}(\tilde{w}_i)/Q(\tilde{w}_i)}{\sum_k \tilde{p}(\tilde{w}_k)/Q(\tilde{w}_k)} \nabla_\theta\mathcal{E}(\tilde{w}_i) \\
& = \sum_{i=1}^{m}\frac{\exp(\mathcal{E}(\tilde{w}_i))/Q(\tilde{w}_i)}{\sum_k \exp(\mathcal{E}(\tilde{w}_k))/Q(\tilde{w}_k)} \nabla_\theta\mathcal{E}(\tilde{w}_i) \\
& = \nabla\log\sum_{i=1}^{m}\frac{\exp(\mathcal{E}(\tilde{w}_i))}{Q(\tilde{w}_i)}
\end{align*}\]
<p>which gives us our actual objective:</p>
\[J_w \approx -\mathcal{E}(w) + \log\sum_{i=1}^{m}\frac{\exp(\mathcal{E}(\tilde{w}_i))}{Q(\tilde{w}_i)}=-\mathcal{E}(w) + \log\sum_{i=1}^{m}\exp(\mathcal{E}(\tilde{w}_i)-\log Q(\tilde{w}_i))\]
<p>The latter form is numerically more stable, and the log-sum-exp trick could be applied.</p>
</li>
<li>
<p>Note that the new objective is also an approximation of the original one. We can see the denominator of softmax as an expectation w.r.t. a uniform distribution, and here we’re approximating it with distribution $Q$. But anyway, this is not very accurate, for evaluation, a full softmax is still required.</p>
</li>
<li>
<p>Also refer to <a href="#implementation-details"><strong>Implementation Details</strong></a>. See also <em>Pattern Recognition and Machine Learning</em> Ch. 11.1.4.</p>
</li>
</ul>
<h5 id="noise-contrastive-estimation-nce">Noise contrastive estimation (NCE)</h5>
<ul>
<li>
<p>Ref to: <a href="https://datascience.stackexchange.com/questions/13216/intuitive-explanation-of-noise-contrastive-estimation-nce-loss">https://datascience.stackexchange.com/questions/13216/intuitive-explanation-of-noise-contrastive-estimation-nce-loss</a> and <a href="https://arxiv.org/pdf/1410.8251.pdf">https://arxiv.org/pdf/1410.8251.pdf</a></p>
</li>
<li>
<p>Language modeling can be seen as a multinomial classification problem (predicting the label of the next word). We can convert this into a binary classification problem.</p>
</li>
<li>
<p>Train the LM as is but w/o the final output layer. Jointly train an extra binary classifier to distinguish noise (randomly chosen words) against correct words $w$ given the context $c$.</p>
</li>
<li>
<p>For each word, sample $k$ noises $\tilde{w}_{k}$ from noise distribution $Q$. Minimize per-category cross-entropy loss using logistic regression, giving the loss function, and substituting expectation with Monte Carlo sampling:</p>
\[\begin{align*}
J_w & \stackrel{\phantom{M.C.}}{=} - \log P(y=1\mid w,c) - k\cdot\mathbb{E}_{\tilde{w}_{j}\sim Q}[\log P(y=0\mid \tilde{w}_{j},c)] \\
& \stackrel{M.C.}{=} - \log P(y=1\mid w,c) - \sum_{j=1}^{k}\log P(y=0\mid \tilde{w}_{j},c)
\end{align*}\]
<p>the reason why we used expectation for the noise entropy but not for the positive entropy, is because when we sum over all training data, the positive part would be equal to the entropy calculated for the whole dataset (i.e. the entire distribution).</p>
</li>
<li>
<p>So samples come from a mixture of two distributions: the actual empirical distribution $\tilde{P}$ from data (the distribution we’re trying to model), and the noise distribution $Q$. We replace the empirical distribution with the learned distribution $P_\theta$ of our model, which gives:</p>
\[\begin{align*}
P(w\mid c) & = P(y=0,w\mid c)+P(y=1,w\mid c) \\
& =\frac{k}{k+1}Q(w)+\frac{1}{k+1}P_\theta(w\mid c) \\
P(y=1\mid w,c) & = \frac{P(y=1,w\mid c)}{P(w\mid c)}=\frac{P_\theta(w\mid c)}{P_\theta(w\mid c)+k\cdot Q(w)} \\
P(y=0\mid w,c) & = 1-P(y=1\mid w,c)=\frac{k\cdot Q(w)}{P_\theta(w\mid c)+k\cdot Q(w)}
\end{align*}\]
</li>
<li>
<p>Substituting probabilities into the loss function, we can calculate its gradients as follows:</p>
\[\begin{align*}
\nabla J_w & = -\nabla\log P(y=1\mid w,c) - k\cdot\mathbb{E}_{w_j\sim Q}[\nabla\log P(y=0\mid w_j,c)] \\
& = -\nabla\log P(y=1\mid w,c) - \sum_{w_j\in V}k\cdot Q(w_j)\nabla\log P(y=0\mid w_j,c) \\
& = -\frac{k\cdot Q(w)}{P_\theta(w\mid c)+k\cdot Q(w)}\cdot\nabla\log P_\theta(w\mid c)+\sum_{w_j\in V}\frac{k\cdot Q(w_j)}{P_\theta(w_j\mid c)+k\cdot Q(w_j)}\nabla P_\theta(w_j\mid c) \\
& = -\sum_{w_j\in V}\frac{k\cdot Q(w_j)}{P_\theta(w_j\mid c)+k\cdot Q(w_j)}\left(\tilde{P}(w_j\mid c)-P_\theta(w_j\mid c)\right)\nabla\log P_\theta(w_j\mid c)
\end{align*}\]
<p>where the empirical distribution $\tilde{P}(w_j\mid c)$ equals 1 iff $w_j=w$.</p>
<p>We can observe that when $k\rightarrow\infty$, the gradient $\nabla J_w\rightarrow -\sum\left(\tilde{P}(w_j\mid c)-P_\theta(w_j\mid c)\right)\nabla\log P_\theta(w_j\mid c)$, which goes to zero as $P_\theta$ matches $\tilde{P}$.</p>
</li>
<li>
<p>But $P_\theta(w\mid c)=\mathrm{softmax}(h^\top v_w)$, which is what we need to estimate. We can replace it by $P_\theta(w\mid c)=\exp(h^\top v_w)/Z(c)$, where $Z(c)$ is trainable. Or simply, let $Z(c)\equiv 1$, giving $P_\theta(w\mid c)=\exp(h^\top v_w)$.</p>
</li>
<li>
<p><strong>Note</strong>: Performance is poor?</p>
</li>
</ul>
<h5 id="negative-sampling">Negative sampling</h5>
<ul>
<li>
<p>An approximation to NCE, by setting the most expensive term $k\cdot Q(w)\equiv1$, giving:</p>
\[P(y=1\mid w,c)=\frac{\exp(h^\top v_w)}{\exp(h^\top v_w)+1}=\frac{1}{1+\exp(-h^\top v_w)}=\sigma(h^\top v_w)\]
<p>where $\sigma$ is the sigmoid function.</p>
</li>
<li>
<p>Equivalent to NCE only when $k=\lvert V\rvert$ and $Q$ is a uniform distribution.</p>
</li>
<li>
<p><strong>Note</strong>: Inappropriate for language modeling, because probabilistic information is lost. Good for representation learning, as in word2vec.</p>
</li>
</ul>
<h4 id="locality-sensitive-hashing">Locality Sensitive Hashing</h4>
<ul>
<li>
<p>A set of hash functions for approximated nearest neighbor search</p>
</li>
<li>
<p><strong>Hyperplane LSH for cosine similarities</strong>: Draw random vectors from normal distribution. For each stored point, check which side of the plane it is at (the sign of their dot product), and encode such information as a 01-string. Such string is used as the hash signature.</p>
</li>
<li>
<p>Results are not good.</p>
<ul>
<li>For 10k 128-dim points, best point found by LSH ranked ~23 among actual NNs.</li>
<li>This makes it unsuitable for softmax approximations</li>
</ul>
</li>
</ul>
<h4 id="xavier-initializer--he-initializer">Xavier Initializer & He Initializer</h4>
<ul>
<li>
<p><strong>Xavier initializer</strong> was proposed by Xavier Glorot, thus also called Glorot initializer <em>(ref: <a href="http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf">[Glorot & Bengio 2010] Understanding the difficulty of training …</a>)</em></p>
</li>
<li>
<p>When applying a linear transform $W$ to vector $\mathbf{x}$, we have $\mathbf{y}=W\mathbf{x}=\sum_{i=1}^{n}W_i\mathbf{x}_i$, where $n$ is the dimensionality (or the number of input neurons to the FC layer)</p>
</li>
<li>
<p>Assume the input vector has zero mean, and all elements and parameters are IID, we can calculate the variance of $\mathbf{y}$ as follows:</p>
\[\begin{align*}
\mathrm{Var}(\mathbf{y}) & = \mathrm{Var}\left(\sum_{i=1}^{n}W_i\mathbf{x}_i\right)=\sum_{i=1}^{n}\mathrm{Var}(W_i\mathbf{x}_i) \\
& = \sum_{i=1}^{n}\left(\mathbb{E}[\mathbf{x}_i]^2\mathrm{Var}(W_i) + \mathbb{E}[W_i]^2\mathrm{Var}(\mathbf{x}_i)+\mathrm{Var}(W_i)\mathrm{Var}(\mathbf{x}_i)\right) \\
& = n\mathrm{Var}(W_i)\mathrm{Var}(\mathbf{x}_i)
\end{align*}\]
</li>
<li>
<p>This means the variance is scaled by $n\mathrm{Var}(W_i)$ after the transform. In order to preserve variance, Xavier initializer aims to set the variance of the weights to $\mathrm{Var}(W_i)=\frac{1}{n}=\frac{1}{n_\mathrm{in}}$. If we consider backwards pass, we would find that we need $\mathrm{Var}(W_i)=\frac{1}{n_\mathrm{out}}$, so as a compromise, variance is set to:</p>
\[\mathrm{Var}(W_i)=\frac{2}{n_\mathrm{in}+n_\mathrm{out}}\]
<p>where $n_\mathrm{in}$ and $n_\mathrm{out}$ corresponds to the dimensions $n$ and $m$ of the transform matrix.</p>
</li>
<li>
<p>To obtain such variance, consider a uniform distribution $U[-x,x]$ whose variance is $\mathrm{Var}(U)=\frac{x^2}{3}$. Solving the equation gives us</p>
\[W\sim U\Bigg[-\frac{1}{f'(0)}\sqrt{\frac{6}{n+m}},\frac{1}{f'(0)}\sqrt{\frac{6}{n+m}}\Bigg]\]
<p>where $f$ is the nonlinearity after the transform.</p>
</li>
<li>
<p><strong>He initializer</strong> was proposed by Kaiming He et al. It simply multiplies the Xavier initializer variance by 2. This is useful for ReLU nonlinearities, whose derivative is undefined at 0. This also makes sense in ReLU’s derivative is 0 half the time and 1 for the other half. <em>(ref: <a href="https://arxiv.org/pdf/1502.01852">[He 2015] Delving Deep into Rectifiers…</a>)</em></p>
</li>
<li>
<p>Ref to: <a href="http://andyljones.tumblr.com/post/110998971763/an-explanation-of-xavier-initialization">http://andyljones.tumblr.com/post/110998971763/an-explanation-of-xavier-initialization</a></p>
</li>
</ul>
<h4 id="tuning-on-the-development-set">Tuning on the Development Set</h4>
<ul>
<li>
<p><strong>Early-stopping</strong>: When results does not get better on the dev set, simply stop training. Usually there’s a threshold (or <strong>patience</strong>) as to how many epochs with worse results are tolerated.</p>
</li>
<li>
<p><strong>Rollback</strong>: When results does not get better, simply load the best previous model and decay the learning rate. Best models as usually saved to disk.</p>
</li>
<li>
<p><strong>Rollback Optimizer</strong>: Regarding the optimizer (we’re only concerned about the optimizer statistics, e.g. parameter momentum, but not the hyper-params e.g. learning rate), 3 strategies are possible:</p>
<ol>
<li>Load the optimizer at the time of the best model snapshot. This requires saving the optimizer state as well.</li>
<li>Reset optimizer statistics. For common optimizers this mean zeroing momentum and moments.</li>
<li>Use the current optimizer as-is. The current optimizer has statistics for the worse dev performance part of training.</li>
</ol>
<p>Usually the effectiveness of the 3 methods are in order of their numbering. However method 2 has the benefit of being able to escape out of local minima as the initial step would be a large step.</p>
</li>
</ul>
<h2 id="theory--proofs">Theory & Proofs</h2>
<h4 id="on-gradient-vanishingexploding-of-rnns">On Gradient Vanishing/Exploding of RNNs</h4>
<ul>
<li><em>(ref: <a href="https://arxiv.org/pdf/1607.03474">[Zilly 2016] Recurrent Highway Networks</a>, chapter 2)</em></li>
</ul>
<p>A vanilla RNN can be described as</p>
\[y^{(t)}=f\left(Wx^{(t)}+Ry^{(t-1)}+b\right)\]
<p>For simplicity, suppose the loss is defined on the last state only, i.e. $\mathcal{L}=g\left(y^{(T)}\right)$. The gradient w.r.t. parameters would be</p>
\[\frac{\d\mathcal{L}}{\d\theta}=\frac{\d\mathcal{L}}{\d y^{(T)}}\frac{\d y^{(T)}}{\d \theta}=\frac{\d\mathcal{L}}{\d y^{(T)}}\sum_{t_1=1}^{T}\frac{\d y^{(T)}}{\d y^{(t_1)}}\left(\frac{\d y^{(t_1)}}{\d W}+\frac{\d y^{(t_1)}}{\d b}\right)\]
<p>In the formula above, the gradient is expanded using the chain rule, and then expanded along the time axis. We further expand the Jacobian $\frac{\d y^{(T)}}{\d y^{(t_1)}}$:</p>
\[\frac{\d y^{(T)}}{\d y^{(t_1)}}=\prod_{t_1<t\leq T}\frac{\d y^{(t)}}{\d y^{(t-1)}}=\prod_{t_1<t\leq T}R\cdot\mathrm{diag}\left[f'\left(Ry^{(t-1)}\right)\right]\]
<p>Denoting $A=\frac{\d y^{(t)}}{\d y^{(t-1)}}$ as the temporal Jacobian, the upper bound for its norm would be</p>
\[\Vert A\Vert\leq \Vert R\Vert \left\Vert f'\left(Ry^{(t-1)}\right)\right\Vert\leq \sigma_\max\cdot\gamma\]
<p>where $\sigma_\max$ is the principal singular value of $A$, and $\gamma$ is the upper bound on $f’$.</p>
<p>As the temporal Jacobian is multiplied together, which is approximately equivalent to $A$ raised to the $T$-th power. So the conditions for vanishing/exploding gradients are:</p>
<ul>
<li>
<p><strong>Vanishing gradients</strong>: $\gamma\sigma_\max<1$</p>
</li>
<li>
<p><strong>Exploding gradients</strong>: $\rho(A)=\sigma_\max>1$</p>
</li>
</ul>
<p><strong>Comparing to MLPs</strong>: The reason deep CNNs/MLPs do not suffer less from gradient vanishing/exploding as RNNs do, is because MLPs use different matrices at different layers, while RNNs use the same matrix in every time step.</p>
<h4 id="on-the-effectiveness-of-lstms">On the Effectiveness of LSTMs</h4>
<ul>
<li><em>(ref: <a href="http://proceedings.mlr.press/v37/jozefowicz15.pdf">[Jozefowicz 2015] An Empirical Exploration of RNN structures</a>, chapter 2)</em></li>
</ul>
<p>In their simplest forms, the RNN calculates the new state $h^{(t)}$ by $h^{(t)}=f(Wh^{(t-1)})$, while the LSTM (without forget gates) calculates the new state by $c^{(t)}=c^{(t-1)}+i_\mathrm{g}^{(t)}f(Wc^{(t-1)})$, $h^{(t)}=o_\mathrm{g}^{(t)}c^{(t)}$.</p>
<p>The temporal Jacobian here would be</p>
\[\frac{\d c^{(t)}}{\d c^{(t-1)}}=1\]
<p>Or to put simply, to obtain the state at time step $t$, RNNs would apply $t$ times the transformation $f$, while LSTMs calculate the increment at each time step and sums them up.</p>
<h2 id="implementation-details">Implementation Details</h2>
<h4 id="about-dynet">About DyNet</h4>
<ul>
<li>
<p>Transpose (<code class="language-plaintext highlighter-rouge">dy.tranpose</code>) requires making a copy of the matrix, so does <code class="language-plaintext highlighter-rouge">dy.concatenate_cols</code> and similar functions.</p>
</li>
<li>
<p><code class="language-plaintext highlighter-rouge">lstm.disable_dropout</code> does not work, use <code class="language-plaintext highlighter-rouge">lstm.set_dropouts(0, 0)</code> instead.</p>
</li>
<li>
<p>Load parameters of LSTM initial state (as in truncated backprop) by:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">s</span> <span class="o">=</span> <span class="p">[</span><span class="n">vec</span><span class="p">.</span><span class="n">npvalue</span><span class="p">().</span><span class="n">reshape</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="n">batch_size</span><span class="p">)</span> <span class="k">for</span> <span class="n">vec</span> <span class="ow">in</span> <span class="n">state</span><span class="p">.</span><span class="n">s</span><span class="p">()]</span>
<span class="c1"># dy.renew_cg()
</span><span class="n">state</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">lstm</span><span class="p">.</span><span class="n">initial_state</span><span class="p">().</span><span class="n">set_s</span><span class="p">([</span><span class="n">dy</span><span class="p">.</span><span class="n">inputTensor</span><span class="p">(</span><span class="n">vec</span><span class="p">,</span> <span class="n">batched</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span> <span class="k">for</span> <span class="n">vec</span> <span class="ow">in</span> <span class="n">s</span><span class="p">])</span>
</code></pre></div> </div>
</li>
<li>
<p>Use <code class="language-plaintext highlighter-rouge">dy.affine_transform([b, W, x])</code> for linear layer with biases, this is more efficient.</p>
</li>
<li>
<p><code class="language-plaintext highlighter-rouge">dy.log_softmax</code> is more efficient than <code class="language-plaintext highlighter-rouge">dy.log(dy.softmax(x))</code>, and prevents numerical problems. Similarly, <code class="language-plaintext highlighter-rouge">dy.pickneglogsoftmax</code> is better than <code class="language-plaintext highlighter-rouge">dy.log_softmax</code> then <code class="language-plaintext highlighter-rouge">dy.pick</code>.</p>
</li>
<li>
<p>Note the difference between <code class="language-plaintext highlighter-rouge">dy.pick</code>, <code class="language-plaintext highlighter-rouge">dy.pick_batch</code> and <code class="language-plaintext highlighter-rouge">dy.pick_batch_elem</code>.</p>
</li>
</ul>
<h4 id="log-probability-domain">Log-probability Domain</h4>
<ul>
<li>
<p>Addition in log domain is done by the log-sum-exp operation $\ln\sum\exp(x_i)$.</p>
</li>
<li>
<p>DyNet has <code class="language-plaintext highlighter-rouge">dy.logsumexp</code>, and so does Numpy.</p>
</li>
<li>
<p>See computation tricks at <a href="https://hips.seas.harvard.edu/blog/2013/01/09/computing-log-sum-exp/">https://hips.seas.harvard.edu/blog/2013/01/09/computing-log-sum-exp/</a></p>
</li>
</ul>
<h4 id="importance-sampling-1">Importance Sampling</h4>
<ul>
<li>
<p>There’s a chance that the ground-truth $w$ is not included in the approximation of the denominator. This could lead to a negative which is bad for optimization.</p>
</li>
<li>
<p>For a biased solution, we can simply include all targets in the samples. Although effective, this leads to a biased estimator, and deprives the loss of its probabilistic information (it cannot be used to evaluate perplexity).</p>
</li>
<li>
<p>For an unbiased solution, we can modify the proposal distribution to bias towards to targets. What differs from the above solution is that we still <em>sample</em> from the distribution, rather than forcibly modifying the samples.</p>
</li>
<li>
<p>For efficient calculation, we can use the same samples are shared across the mini-batch and time steps. <em>(ref: [[Jozefowicz et al. 2016] Exploring the Limits of Language Modeling][https://arxiv.org/pdf/1602.02410])</em></p>
</li>
</ul>
<h2 id="models--structures">Models & Structures</h2>
<h4 id="latent-predictor-networks-by-w-ling-et-al"><a href="https://arxiv.org/pdf/1603.06744.pdf">Latent predictor networks</a> by W. Ling et al</h4>
<ul>
<li>
<p>When using a normal RNN model, after inference is made, we can backtrack from the final state through to the initial state, resulting in a path</p>
</li>
<li>
<p>This is due to that RNNs generated one token at a time, and at each time step samples the next token according to calculated probabilities</p>
</li>
<li>
<p>If RNNs can generate multiple tokens in one time step, there may be <u>multiple paths from the initial state to the target state, corresponding to segmentations of the sequence</u>. Path counts can be exponential to its length, and the union of the paths is a directed acyclic graph</p>
</li>
<li>
<p>Paper proposes a method to perform <u>joint training on several predictors of different granularity</u>. The method introduced latent variables for deciding which predictor to use, thus giving it the name</p>
</li>
<li>
<p>To calculate gradients for a time step, <u>summed products of probabilities on the DAG</u> are required, which can be calculated using a dynamic programming algorithm</p>
</li>
<li>
<p>Attention over all different fields in a structured input is used</p>
</li>
<li>
<p>Prediction is done using beam search</p>
</li>
<li>
<p>The authors utilized this technique in code generation tasks for card games, where a character-level LSTM predictor is jointly trained with pointer networks for copying text directly from card descriptions</p>
</li>
</ul>
<h4 id="neural-lattice-language-models-by-jacob-grahams-grad-student"><u>Neural lattice language models</u> by Jacob (Graham’s grad. student)</h4>
<ul>
<li>
<p>Main idea is similar: enabling LSTM models to <u>generate multiple tokens in one time step</u></p>
</li>
<li>
<p>Exact probability is hard to calculate as LSTMs keep track of whole context seen from the initial state, so each path would have a different state</p>
</li>
<li>
<p>Paper evaluated different approaches of probability estimations and different representations of multiple-tokens in one time step <em><u>(more details to be described)</u></em></p>
<ul>
<li>
<p>Ancestral sampling from “latent sequence decompositions”: just treat multiword tokens as regular tokens</p>
</li>
<li>
<p>TreeLSTM-style summation: summing predecessors’ hidden states. Cons: losses probabilistic info</p>
</li>
<li>
<p>Weighted expectation: weight summations using prob. dist. learned in ancestral sampling</p>
</li>
</ul>
</li>
<li>
<p><strong>Difference with “<u>latent predictor networks</u>“</strong>:</p>
<ul>
<li>
<p>Latent predictor networks combine multiple predictor models, while this is one unified model</p>
</li>
<li>
<p>The reason why probabilities are easy to calculate in said paper is due to the fact that, although predictors of different granularity are used, <u>all predicted tokens are in the same space, and multiple tokens are fed into the character-level network one-by-one</u>. Hidden states of the char-level network is used in the pointer network in turn. So only O(length) states are required in total</p>
</li>
</ul>
</li>
</ul>
<h4 id="pointer-networks-by-o-vinyals-et-al"><a href="https://arxiv.org/pdf/1506.03134">Pointer Networks</a> by O. Vinyals et al</h4>
<ul>
<li>
<p>Output is the set of tokens from input, instead of fixed vocabulary</p>
</li>
<li>
<p>Basically a seq2seq model with attention, but use attention weights directly as probability from predicting each input token</p>
</li>
<li>
<p>Can be trained to select ordered subsets from input, even accomplish difficult tasks as convex hulls, Delauney triangulation and TSP</p>
</li>
<li>
<p>See also: <a href="http://fastml.com/introduction-to-pointer-networks/">http://fastml.com/introduction-to-pointer-networks/</a></p>
</li>
</ul>
<h4 id="treelstms-by-k-s-tai-r-socher-and-christopher-d-manning"><a href="https://arxiv.org/pdf/1503.00075.pdf">TreeLSTMs</a> by K. S. Tai, R. Socher, and Christopher D. Manning</h4>
<ul>
<li>
<p>A natural generalization of LSTM to tree structures</p>
</li>
<li>
<p>Sum children hidden states as $\tilde{h}$, and replace this as $h$ in formulas for normal LSTMs</p>
</li>
<li>
<p>Forget gate is different for each child: use only the hidden state of child to calculate forget gate parameters</p>
</li>
<li>
<p>Cell state of parent is as usual, summing over cell states of each child with respective forget gates</p>
</li>
<li>
<p>Ordered children version exists: use different parameters for each child (depending on its index). Such model has a limit on the maximum branch factor</p>
</li>
<li>
<p><strong>Benefits</strong>: Can make use of sentence structures generated by parsers; better at preserving state, i.e. can cope better with long distance dependencies (since path lengths are shorter on trees)</p>
</li>
</ul>
<h4 id="adapted-softmax-by-grave-et-al"><a href="https://arxiv.org/pdf/1609.04309.pdf">Adapted Softmax</a> by Grave et al</h4>
<ul>
<li>Efficient for large vocabularies, and optimized according to empirical analysis of matrix multiplication speed on GPUs:
<ul>
<li>For matrices with dimensions $k$, the empirical formula for matrix multiplication time cost is $g(k)=c_\mathrm{m}+\max(0,\lambda(k-k_0))$, where typically $c_\mathrm{m}=0.40\ \mathrm{ms}$, and $k_0=50$.</li>
</ul>
</li>
<li>Proposed structure is a two level hierarchical softmax:
<ul>
<li>First level contains all common words (~20% words covering ~80% corpus), and representations of clusters</li>
<li>Remaining words are grouped into clusters according to frequency. Words with lower frequency fall into larger clusters.</li>
</ul>
</li>
</ul>
<h4 id="highway-networks-by-srivastava-et-al"><a href="https://arxiv.org/pdf/1505.00387">Highway Networks</a> by Srivastava et al</h4>
<ul>
<li>Simply put, <strong>Highway Networks</strong> add LSTM-style input (a.k.a. transfer) and forget (a.k.a. carry) gates to normal NN layers. Proposed structures uses tied gates (i.e. $f_\mathrm{g}=1-i_\mathrm{g}$). Such structure can be applied to very deep NNs to help training.</li>
<li>In practice, transfer gates are initialized with a negative bias to bias the network towards carry behavior. The intuition is the same as initialize forget gates biases to 1 or 2 to enable gradient flow at early stages and preserve long term memories <em>(ref: <a href="https://pdfs.semanticscholar.org/1154/0131eae85b2e11d53df7f1360eeb6476e7f4.pdf">[Gers 1999] Learning to Forget…</a>)</em>.</li>
</ul>
<h2 id="topic-language-models">Topic: Language Models</h2>
<h4 id="character-aware-neural-lm-by-kim-et-al"><a href="https://arxiv.org/pdf/1508.06615">Character-Aware Neural LM</a> by Kim et al</h4>
<ul>
<li>Each word is fed into a character-level CNN:
<ul>
<li>Append SOW and EOW tokens to each word (and zero-pad for batching)</li>
<li>Apply 1-d convolution to low-dimension char embeddings</li>
<li>Max-pool over all features for each feature map, and concatenate</li>
</ul>
</li>
<li>
<p>Output is fed through a highway layer, i.e. feeding a part of the vector directly through to the output, and transforming the rest.</p>
</li>
<li>
<p>Output is then fed into an LSTM-LM, making word-level predictions. Two-layer hierarchical softmax is used for large vocabularies.</p>
</li>
<li><strong>Pros & Cons</strong>:
<ul>
<li>Fewer params than vanilla LSTM-LMs (due to the absence of word vectors).</li>
<li>Can deal with OOV inputs, but not outputs; good for morphologically-rich languages.</li>
<li>Computationally more expensive.</li>
</ul>
</li>
</ul>
<h2 id="topic-embeddings">Topic: Embeddings</h2>
<h4 id="word2vec-by-mikolov-et-al"><u>word2vec</u> by Mikolov et al</h4>
<ul>
<li>
<p>Reference papers: <a href="https://arxiv.org/pdf/1301.3781.pdf">CBOW & Skip-gram models</a>, <a href="http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf">Negative sampling</a>; see also: <a href="http://web.stanford.edu/class/cs224n/lecture_notes/cs224n-2017-notes1.pdf">CS224n Lecture Notes 1</a></p>
</li>
<li><strong>CBOW</strong> predicts the center word given surrounding (context) words. <strong>Skip-gram</strong> predicts surrounding words given the center word.
Both methods learn two representations $\mathcal{U}$ (output) and $\mathcal{V}$ (input) for each word, generating output probabilities by $\hat{y}=\mathrm{softmax}(\mathcal{UV}x)$, where $x$ is a one-hot vector (for skip-gram, or average of one-hot vectors for CBOW).
<ul>
<li>The “two representation” part, in its essence, is a rank constraint on the matrix. Or in other words, a down projection.</li>
</ul>
</li>
<li>
<p>For <strong>negative sampling</strong>, refer to <a href="#negative-sampling">previous section</a>.</p>
</li>
<li>Intuition for vector additive compositionality (e.g. skip-gram):
<ul>
<li>Due to the linearity in the training objective, summing two vectors would result in summing log probabilities in the output layer.</li>
<li>This is equivalent to the product of context word distributions, so words appearing near both words have higher probabilities.</li>
</ul>
</li>
<li>About the impact of <strong>window size</strong> on word representations:
<ul>
<li>Representations learnt with smaller windows is more aware of <strong>syntactic</strong> relations.</li>
<li>With larger windows, <strong>semantic</strong> relations are well captured.</li>
</ul>
</li>
<li>With <strong>subword information</strong> <em>(ref: <a href="https://arxiv.org/pdf/1607.04606">[Bojanwoski 2016] Enriching Word Vectors with Subword Information</a>)</em>
<ul>
<li>Replace word embeddings in skip-gram with the sum of n-gram embeddings.</li>
<li>Boundary symbols are added to begin and end of words</li>
</ul>
</li>
<li><strong><a href="https://arxiv.org/pdf/1607.01759">fastText</a></strong> classifier:
<ul>
<li>Bag of words + Bag of bigrams</li>
<li>Feeds average of word/n-gram embeddings through a FC layer</li>
<li>Uses very low dimensions (10 for sentiment, 50-200 for tags), very fast</li>
</ul>
</li>
</ul>
<h4 id="context2vec-by-melamud-et-al"><a href="http://aclweb.org/anthology/K16-1006">context2vec</a> by Melamud et al</h4>
<ul>
<li>
<p>Run a Bi-LSTM on the sentence, and use the prefix and suffix hidden state vectors from both LSTMs as the context representation for a given word.</p>
</li>
<li>
<p><strong>Compared to word2vec</strong>: The two models basically does the same thing. As context2vec uses LSTMs to generate context reps, it is capable of handling larger contexts and deal with long distance relationships.</p>
</li>
</ul>
<h4 id="paragraph-vector-by-le--mikolov"><a href="https://arxiv.org/pdf/1405.4053">Paragraph Vector</a> by Le & Mikolov</h4>
<ul>
<li>
<p>Assign embeddings to each word and each paragraph. For each window, concat the paragraph vector and the first few word vectors and feed through an FC layer predict the last word. (Similar to CBOW)</p>
</li>
<li>
<p>To generate representation for a new paragraph, train the vector while freezing other parameters.</p>
</li>
<li>
<p>Another method (similar to skip-gram): sample a random window from the paragraph, and predict words in the window given the paragraph vector.</p>
</li>
<li>
<p>Vectors learnt by the two methods are concatenated for use in downstream tasks.</p>
</li>
</ul>
<h4 id="c2w-character-to-word-by-w-ling-et-al"><a href="http://www.cs.cmu.edu/~lingwang/papers/emnlp2015.pdf">C2W</a> (character to word) by W. Ling et al</h4>
<ul>
<li>
<p>Generate word representations by running a char-level Bi-LSTM.</p>
</li>
<li>
<p>Author does not generate embeddings directly in an unsupervised fashion; this model is used only in downstream tasks (POS-tagging, LM).</p>
</li>
</ul>
<h4 id="tensor-indexing-model-by-y-zhao--zhiyuan-liu"><a href="https://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/9597/9526">Tensor Indexing Model</a> by Y. Zhao & Zhiyuan Liu</h4>
<ul>
<li>
<p>Find common two-word phrases from corpus and replace them by a single phrase token.</p>
</li>
<li>
<p>Train using skip-gram objective, obtaining embeddings in the same space for the whole phrase and words in the phrase.</p>
</li>
<li>
<p>Train a composition model approximating the phrase embedding $\mathbf{z}$ given word embeddings $\mathbf{x}$ and $\mathbf{y}$:
\(\mathbf{z}_i=f(\mathbf{x},\mathbf{y})=\mathbf{x}^\top W_i\mathbf{y}+(M\mathbf{x})_i+(N\mathbf{y})_i\)
and apply rank constraints on matrices $W_i$, giving $W_i\approx U_i^\top V_i+I$.</p>
</li>
<li>
<p>Loss function is MSE of $\mathbf{z}$ and $f(\mathbf{x},\mathbf{y})$, plus a regularization factor.</p>
</li>
</ul>
<h4 id="structured-word2vec-by-w-ling-et-al"><a href="http://www.cs.cmu.edu/~lingwang/papers/naacl2015.pdf">Structured Word2Vec</a> by W. Ling et al</h4>
<ul>
<li>
<p>Simple modifications to original word2vec model, taking <strong>word order</strong> into consideration.</p>
</li>
<li>
<p><strong>Structured Skip-gram</strong>: Use different matrices for each context position.</p>
</li>
<li>
<p><strong>Continuous Window</strong>: Concat context vectors instead of summing them.</p>
</li>
<li>
<p>Such models are better at capturing syntactic information.</p>
</li>
</ul>
<h4 id="sembei-segmentation-free-word-embeddings-by-t-oshikiri"><a href="http://aclweb.org/anthology/D17-1081">sembei</a> (segmentation-free word embeddings) by T. Oshikiri</h4>
<ul>
<li>
<p>Based on skip-gram, trains n-gram embeddings.</p>
</li>
<li>
<p>Only consider cases when the center and context n-grams are adjacent, use separate matrices for left and right contexts (as in <a href="#structured-word2vec-by-w-ling-et-al">Structured Word2Vec</a>).</p>
<ul>
<li>Theoretically capable of dealing with arbitrary sized windows, a sample is regarded as positive as long as there exists such a segmentation of the corpus into the chosen n-grams. But this is computationally expensive.</li>
</ul>
</li>
<li>
<p><strong>Limitations</strong>: N-grams are the model’s vocabularies, i.e. it provides no means of composition, thus cannot deal with OOV words.</p>
</li>
</ul>
<h2 id="topic-dependency-parsing">Topic: Dependency Parsing</h2>
<h4 id="task-description">Task Description</h4>
<ul>
<li>Generate a tree structure over a sentence, describing dependency of words.</li>
<li>A directed tree, with a label for each edge, denoting the type of dependency.</li>
<li>There can be multiple roots for a single sentence, especially when the sentence is conjoined by conjunctions (e.g. “and”).</li>
</ul>
<h4 id="methods">Methods</h4>
<ul>
<li>
<p>Methods can be basically categorized into one of the two:</p>
</li>
<li>
<p><strong>Transition-based methods</strong>: Shift-reduce parsers that acts like a pushdown automata, at each time step the model can choose to either SHIFT (push word to stack), or REDUCE (pop the top two words from stack, assign one of the as the parent of the other, and push the parent back into the stack). More complicated models support other actions. These methods often make local decisions, and use greedy decoding.</p>
</li>
<li>
<p><strong>Arc-factored graph-based methods</strong>: Assign likelihood for each ordered pair of nodes, and for each label type. Run Chu-Liu-Edmonds’ algorithm for maximum spanning arborescence. These methods usually make global decisions.</p>
</li>
</ul>
<h4 id="deep-biaffine-attention-by-t-dozat--manning"><a href="https://arxiv.org/pdf/1611.01734">Deep Biaffine Attention</a> by T. Dozat & Manning</h4>
<ul>
<li>
<p>Refer to code implementations: <a href="https://github.com/chantera/teras/blob/master/teras/framework/pytorch/model.py">https://github.com/chantera/teras/blob/master/teras/framework/pytorch/model.py</a> and <a href="https://github.com/chantera/biaffineparser/blob/master/pytorch_model.py">https://github.com/chantera/biaffineparser/blob/master/pytorch_model.py</a> (two parts of one code)</p>
</li>
<li>
<p>An arc-factored graph-based method. Similar to self-attention.</p>
</li>
<li>
<p>Runs a Bi-LSTM over the sentence, then passes each state through 4 separate MLPs, to generate embeddings for the word, when used as both dependant and head, and when used in predicting arc head and edge label.</p>
<ul>
<li>
<p>Recurrent state: $\mathbf{r}_i$</p>
</li>
<li>
<p>Embeddings: $H^{({\text{arc-dep}})} = [\cdots \mathbf{h}_i^{(\text{arc-dep})} \cdots]$, $\mathbf{h}_i^{\text{arc-dep}} = \mathrm{MLP}^{(\text{arc-dep})}(\mathbf{r}_i)$.</p>
<ul>
<li>and similarly for “arc-head”, “label-dep”, and “label-head”.</li>
<li>This serves as dimensionality reduction.</li>
</ul>
</li>
<li>
<p>Arc scores for word $i$ (i.e. the likelihood of each word being the dependency head of $i$), vector length = sentence length:</p>
\[\begin{align*}
\mathbf{s}_i^{(\text{arc})} & = {H^{(\text{arc-head})}}^\top \left( U^{(\text{arc-1})}\mathbf{h}_i^{(\text{arc-dep})} + \mathbf{u}^{(\text{arc-2})} \right) \\
& = {H^{(\text{arc-head})}}^\top U_b^{(\text{arc})}
\left[\begin{matrix} \mathbf{h}_i^{(\text{arc-dep})} \\ 1 \end{matrix}\right]
\end{align*}\]
</li>
<li>
<p>Score of label type $k$, for word $i$, given its <u>true head</u> $y_i$, vector length = number of label types:</p>
\[\begin{align*}
\mathbf{s}_i^{(\text{label}_k)} & = {\mathbf{h}_{y_i}^{(\text{label-head})}}^\top U_k^{(\text{label-1})} \mathbf{h}_i^{(\text{label-dep})} + {\mathbf{u}^{(\text{label-2,1})}}^\top\mathbf{h}_i^{(\text{label-dep})} + {\mathbf{u}^{(\text{label-2,2})}}^\top\mathbf{h}_{y_i}^{(\text{label-head})} + b_k \\
& = \left[\begin{matrix} \mathbf{h}_{y_i}^{(\text{label-head})} \\ 1 \end{matrix}\right]^\top
U_b^{(\text{label})}
\left[\begin{matrix} \mathbf{h}_i^{(\text{label-dep})} \\ 1 \end{matrix}\right]
- 1 + b_k
\end{align*}\]
</li>
<li>
<p>Scores are equivalent to affine transformations on two vectors, hence the name.</p>
</li>
<li>
<p>Loss = neg log softmax of correct head for each node, plus neg log softmax of correct type for correct head for each node.</p>
</li>
</ul>
</li>
<li>
<p>Needs to run Chu-Liu’s algorithm during prediction. Select argmax type for each predicted edge.</p>
</li>
<li>
<p>When the formulae are expanded, they take the same form as the <a href="#tensor-indexing-model-by-y-zhao--zhiyuan-liu">Tensor Indexing Model</a></p>
</li>
</ul>
<h2 id="topic-variational-auto-encoder">Topic: Variational Auto-encoder</h2>
<h4 id="formulation-of-vae">Formulation of VAE</h4>
<ul>
<li>Ref to: <a href="https://arxiv.org/pdf/1606.05908">Tutorial on Variational Autoencoders</a></li>
</ul>
<h5 id="the-first-formula">The first formula</h5>
<ul>
<li>
<p>To build a generative model, we need to approximate the distribution $P(X)$ where $X$ is our data (things to generate).</p>
</li>
<li>
<p>An intuitive method is to first extract features $\mathbf{z}$ from $X$’s, and use a model parameterized by $\theta$ to recover $X$ given $\mathbf{z}$.</p>
</li>
<li>
<p>The features $\mathbf{z}$ are called <strong>latent variables</strong> (means “hidden”) because they’re not <em>observed</em> but <em>inferred</em>.</p>
</li>
<li>
<p>Following the law of total probability, we have</p>
\[P(X) = \int P_\lambda(\mathbf{z}) P_\theta(X\mid\mathbf{z})\d\mathbf{z} = \mathbb{E}_{\mathbf{z}\sim P_\lambda}[P_\theta(X\mid\mathbf{z})]\]
<p>where $P_\lambda(\mathbf{z})$ is our <strong>prior</strong> knowledge of the space of latent variables, and $P_\theta(X\mid\mathbf{z})$ if the likelihood approximated by our model, parameterized by $\theta$. In modern context, $\theta$ can be seen as the <strong>decoder</strong>.</p>
</li>
<li>
<p>Naturally, our objective would be to maximize the expectation of the marginal log-probability over the data distribution, $\mathbb{E}_{X\sim D}[\log P(X)]$. This is a form of maximum likelihood estimation (MLE).</p>
</li>
</ul>
<h5 id="simplify-calculations-introducing-posterior">Simplify calculations: Introducing posterior</h5>
<ul>
<li>
<p>But such integration is intractable for three reasons:</p>
<ul>
<li>The space of $\mathbf{z}$ is large.</li>
<li>Monte Carlo sampling would not be effective because $P_\theta(X\mid\mathbf{z})$ is likely to be zero for most $\mathbf{z}$’s.</li>
<li>We want to run optimization in mini-batches. Consider maximizing the objective w.r.t. a single data example $X$, we’re effectively increasing the probability of $X$ given any latent $\mathbf{z}$. This is clearly counter-intuitive.</li>
</ul>
</li>
<li>
<p>From the second reason, it is natural to consider using <strong>importance sampling</strong> to speed up sampling procedure. Thus we introduce a distribution $Q_\phi(\mathbf{z}\mid X)$, which gives high probability to latent $\mathbf{z}$’s that would in turn give high probability to the example $X$. This is the <strong>posterior</strong> distribution matching its prior. In modern context, $\phi$ can be seen as the <strong>encoder</strong>.</p>
</li>
<li>
<p>With the posterior in mind, we can rewrite the objective</p>
\[P(X) = \mathbb{E}_{\mathbf{z}\sim Q_\phi(\cdot\mid X)}\left[ \frac{P_\lambda(\mathbf{z})P_\theta(X\mid\mathbf{z})}{Q_\phi(\mathbf{z}\mid X)} \right]\]
</li>
</ul>
<h5 id="simplify-calculations-log-probability-domain--kl-divergence">Simplify calculations: Log-probability domain & KL-divergence</h5>
<ul>
<li>
<p>To match our objective, we try transform everything into the log-probability domain. But the logarithm function cannot be moved inside the expectation.</p>
</li>
<li>
<p>However, <strong>Jensen’s inequality</strong> states that for any convex function $f$ and random variable $X$, we have $f(\mathbb{E}[X]) \leq \mathbb{E}[f(x)]$. For concave functions like $\log$, the opposite conclusion holds. Thus we derive a lower bound on $\log P(X)$:</p>
\[\begin{align*}
\log P(X) & \geq \mathbb{E}_{\mathbf{z}\sim Q_\phi(\cdot\mid X)}\left[ \log\frac{P_\lambda(\mathbf{z})P_\theta(X\mid\mathbf{z})}{Q_\phi(\mathbf{z}\mid X)} \right] \\
& \geq \mathbb{E}_{\mathbf{z}\sim Q_\phi(\cdot\mid X)}\left[ \log P_\lambda(\mathbf{z}) + \log P_\theta(X\mid\mathbf{z}) - \log Q_\phi(\mathbf{z}\mid X) \right]
\end{align*}\]
</li>
<li>
<p>Observe that the above formulation can be rewritten as</p>
\[\log P(X) \geq \mathbb{E}_{\mathbf{z}\sim Q_\phi(\cdot\mid X)}[\log P_\theta(X\mid \mathbf{z})] - \mathrm{KL}(Q_\phi(\mathbf{z}\mid X)\Vert P_\lambda(\mathbf{z}))\]
<p>where the second term is the KL-divergence given by $\mathrm{KL}(q\Vert p) = H(p,q)-H(q)$, the difference of cross-entropy and entropy.</p>
<p>The RHS is also known as <strong>Evidence Lower Bound</strong> (ELBO) $\mathcal{L}(X;\phi,\theta,\lambda)$.</p>
</li>
<li>
<p>The lower bound can be interpreted as:</p>
<ul>
<li>Maximizing the likelihood w.r.t. the posterior of latent $\mathbf{z}$ given the trained example $X$, and</li>
<li>Regularizing the posterior distribution by pulling it close to the prior.</li>
</ul>
</li>
</ul>
<h5 id="how-do-we-compute-the-lower-bound">How do we compute the lower bound?</h5>
<ul>
<li>
<p>We should first define the form for our prior and posterior. A common choice is to use a Gaussian distribution (or a mixture of Gaussians). There are two reasons for this:</p>
<ul>
<li>For the prior, an univariate Gaussian distribution defined on $\mathbb{R}$ is able to represent any distribution by composing the inverse CDF of the desired distribution with the CDF of a Gaussian. This also holds true for multiple dimensions. So we can trust our decoder to learn this mapping, which shouldn’t be too difficult for neural network models.</li>
<li>For the posterior, using a Gaussian distribution means that we only have to specify the mean $\mu$ and variance $\sigma^2$. Also, such settings gives an analytical solution to the KL-divergence term.</li>
</ul>
</li>
<li>
<p>For the first term, given the assumption that $Q_\phi$ is able to produce a nice estimate of the posterior, we can safely use Monte Carlo sampling, i.e. sample $\mathbf{z}$ from distribution $Q_\phi$, and optimize for $\log P_\theta(X\mid\mathbf{z})$.</p>
<ul>
<li>However, in terms of SGD, this is not acceptable, because “sampling” is indifferentiable.
<ul>
<li>To see why this is true, note that for the term we backprop w.r.t. the negative log likelihood, which is dependent only on $\mathbf{z}$.</li>
<li>However, $\mathbf{z}$ is sampled from $Q_\phi$, but it is introduced into $P_\theta$ as “input”, which has no gradient. So the gradient is cannot be backpropped through the sampling procedure.</li>
</ul>
</li>
<li>So a re-parameterization trick must be applied: since $Q_\phi$ is Gaussian $\mathcal{N}(\mu,\sigma^2)$, sampling $z\sim \mathcal{N}(\mu,\sigma^2)$ is equivalent to sampling $\epsilon\sim\mathcal{N}(0,1)$ and compute $z=\mu + \sigma\cdot\epsilon$. Thus gradient is able to flow through the encoder. <em>(ref: <a href="https://arxiv.org/pdf/1312.6114">[Kingma & Welling 2013] Autoencoding Variational Bayes</a>)</em></li>
</ul>
</li>
<li>
<p>For the second term, given Gaussians $P(\mathbf{z})=\mathcal{N}(0,I)$ and $Q_\phi(\mathbf{z}\mid X)=\mathcal{N}(\pmb{\mu}(X),\pmb{\Sigma}(X)=\mathrm{diag}(\pmb{\sigma}^2(X)))$, denoting $n$ as the dimensionality of $\mathbf{z}$, KL-divergence has the following analytical form
\(\mathrm{KL}(Q_\phi(\mathbf{z}\mid X)\Vert P(\mathbf{z})) = \frac{1}{2}\sum_{i=1}^{n}\left(1+\log(\sigma_i^2) - \mu_i^2 - \sigma_i^2\right)\)</p>
</li>
</ul>
<h5 id="when-does-equality-hold-true">When does equality hold true?</h5>
<ul>
<li>
<p>To investigate the problem, subtract the RHS from LHS. But first, we make the assumption that our parameterized model is able to model the ground truth likelihood. Thus in the following deduction, we omit the subscripts on $P$:</p>
\[\begin{align*}
& \phantom{=\;\;\!} \log P(X) - \mathbb{E}_{\mathbf{z}\sim Q_\phi(\cdot\mid X)}\left[ \log P(\mathbf{z}) + \log P(X\mid\mathbf{z}) - \log Q_\phi(\mathbf{z}\mid X) \right] \\
& = \log P(X) - \mathbb{E}_{\mathbf{z}\sim Q_\phi(\cdot\mid X)}\left[ \log P(\mathbf{z}) + \left(\log P(\mathbf{z}\mid X) + \log P(X) - \log P(\mathbf{z})\right) - \log Q_\phi(\mathbf{z}\mid X) \right] \\
& = \mathbb{E}_{\mathbf{z}\sim Q_\phi(\cdot\mid X)}[\log Q_\phi(\mathbf{z}\mid X) - \log P(\mathbf{z}\mid X)] = \mathrm{KL}(Q_\phi(\mathbf{z}\mid X)\Vert P(\mathbf{z}\mid X))
\end{align*}\]
<p>where $P(\mathbf{z}\mid X)$ is the true posterior. So the closer $Q_\phi$ matches the true posterior distribution, the tighter our lower bound is.</p>
</li>
<li>
<p>This also shows another reason why we choose to optimize the lower bound instead: this KL-divergence term is intractable, because we have no idea which $\mathbf{z}$’s give high probability to $X$.</p>
</li>
</ul>
<h5 id="faq">FAQ</h5>
<p>With the knowledge in mind, we can look back at some problems that was glossed over:</p>
<ul>
<li><strong>Why is a simple Gaussian sufficient for prior?</strong> Apart from the “NN can learn any CDF” reason, we use Gaussian also because it is a commonly-used distribution with a non-zero probability for every point in $\mathbb{R}$. And the fact that not all distributions can be re-parameterized.</li>
<li><strong>Why do we constrain the KL-divergence of posterior $Q_\phi$ and prior $P_\lambda$?</strong> This does not make sense in that if this term is minimized, then the KL-divergence term on the other side of the equation, namely $\mathrm{KL}(Q_\phi(\mathbf{z}\mid X)\Vert P(\mathbf{z}\mid X))$ would be large, which gives us a loose lower bound. This term mainly serves as <strong>regularization</strong>, for we’re using NNs for $Q_\phi$, and we should constrain its form.</li>
</ul>
<h4 id="possible-issues">Possible issues</h4>
<ul>
<li><strong>Over-regularization</strong>
<ul>
<li>KL-divergence takes a simple form for simplistic priors, and is much easier to learn. Encoder would quickly match the Gaussian prior.</li>
<li><strong>Solutions</strong>:
<ul>
<li>Initially set KL-divergence term to zero, and gradually anneal to a predefined scale. Can be seen as first overfitting and then regularizing.</li>
<li>Or, design more complex priors.</li>
</ul>
</li>
</ul>
</li>
<li><strong>Ignoring latent code</strong>
<ul>
<li>For sequential decoders, a simple latent code would force the model to rely on the ground truth of previous time steps, and a powerful model may learn decoding without consulting the latent code.</li>
<li><strong>Solutions</strong>:
<ul>
<li>Apply dropout on decoder inputs.</li>
<li>Or, constraining the amount of context that the decoder is allow to see.</li>
</ul>
</li>
</ul>
</li>
</ul>
<h4 id="extensions">Extensions</h4>
<h5 id="conditional-vae">Conditional VAE</h5>
<ul>
<li>Autoencode $X$ given $Y$, for instance generate user content given his previous work.</li>
<li>Simply change $Q_\phi$ and $P_\theta$ to conditional distributions. This means both the encoder and decoder needs to condition on $Y$.</li>
</ul>
<h5 id="discrete-latent-variables">Discrete Latent Variables</h5>
<ul>
<li>
<p>Reparameterization trick fails for discrete distributions.</p>
</li>
<li>
<p>Marginalize over every possible discrete choice.</p>
</li>
<li>
<p>Or, use the <a href="#gumbel-max-trick--gumbel-softmax-distribution">Gumbel-Softmax</a> technique.</p>
<p></p>
</li>
</ul>Zecong HuThis post records my notes taken during summer internship @ CMU LTI. Disclaimer: These notes are not guaranteed to be correct or understandable. Note: This is a non-mobile-friendly post, mobile view is distorted due to formulae.Codeforces Round #3912017-01-14T05:43:00+00:002017-01-14T05:43:00+00:00http://zecong.hu/2017/01/14/codeforces-round-391<p>转眼之间已经到了大三,距退役已经三年了。考完期末考试,正好有一场 CF,抱着做康复训练的心态参加了比赛。</p>
<p>结果自然是惨不忍睹……虽然是和同学开黑,却也无济于事。写了5道题,最后只过了两道。大概已经不适合快节奏地写代码了吧。</p>
<p>但不管怎么说,这次的题目还是挺有意思的。而且我们的做法和官方做法有些差别,自认为也是很有意思的思路。</p>
<!--more-->
<h2 id="a">A</h2>
<p>没啥好说的。把计数的数组开成了 <code class="language-plaintext highlighter-rouge">char</code>,贡献了一次 WA。</p>
<h2 id="b">B</h2>
<p>经典题。以各种不同的方式犯傻,一共错了3次。</p>
<h2 id="c">C</h2>
<p>大意是,问有多少个双射 $f$ ,满足对于每个多重集 $S_i$ , $S_i=f(S_i)$ 。</p>
<p>其实就是对于一种元素 $x$ ,在每个多重集中,其出现次数应当等于 $f(x)$ 的出现次数。可以依次把元素划分成若干等价类,答案即等价类大小的阶乘的乘积。</p>
<p>问题在于如何求出等价类。我们思考了各种方法,最后猜测,说不定是一个看上去很暴力,但其实复杂度可以接受的方法。于是就有了下面这个方法:</p>
<p>用一个 <code class="language-plaintext highlighter-rouge">vector<pair<int, int>></code> 描述元素 $x$ :每个二元组 $(i,k)$ 代表 $S_i$ 中 $x$ 出现了 $k$ 次,如果 $k=0$ 那么不把这个二元组加入 vector。这样一来所有元素的 vector 长度之和是 $O(\sum{g_i})$ 的。之后要做的就是统计有多少不同的 vector 了。最直观的方法是排序之后扫描,实际上直接对这些 vector 进行排序复杂度也是可以接受的,但考场上没有想明白,就用了哈希的方法。有意思的是,第一次提交只使用了单关键字,之后惨遭 Hack……被迫改成了双关键字。</p>
<p>然而最后数组开小了 RE 了……</p>
<p>最后证明一下排序的复杂度,其实是 $O(n\log n\cdot d)$ ,其中 $d$ 为 vector 的期望长度。这个长度其实是 $O(\sum g_i/n)$ ,所以复杂度就是 $O\left((\sum g_i)\log n\right)$ 。</p>
<h2 id="d">D</h2>
<p>最后只剩15分钟的时候才开始想这题,还是没能在比赛结束前通过。</p>
<p>其实很简单,因为 $n\leq 75$ ,算一算就知道合法分割里最大的数不超过20。因此可以直接状压 DP,状态 $f[i][S]$ 表示前 $i$ 个01,出现数字集合为 $S$ 时的方案数,状态空间 $O(n\cdot2^{20})$ 。而至于转移,两个切割点之间的部分,除去前导零之后长度必然不超过6,而全部为0时不转移;故转移是 $O(1)$ 的。</p>
<h2 id="e">E</h2>
<p>这题非常有意思。</p>
<p>定义 $f_r(n)$ 如下:</p>
\[\begin{align*}
f_0(n) & = \sum_{u\cdot v=n}\left[\gcd(u,v)=1\right] \\
f_{r+1}(n) & = \sum_{u\cdot v=n}\frac{f_r(u)+f_r(v)}{2}
\end{align*}\]
<p>有 $q$ 个询问,每次给定 $r$ 和 $n$ ,求 $f_r(n)$ 。所有范围均为 $10^6$ 。</p>
<p>可以发现,记 $k$ 为 $n$ 的质因子个数,那么 $f_0(n)=2^k$ 。这是因为每个质因子要么全部属于 $u$ ,要么全部属于 $v$ 。</p>
<p>而 $f_{r+1}$ 的式子可以改写如下:</p>
\[f_{r+1}(n) = \sum_{d\mid n}f_r(d) = \sum_{d\mid n}f_r(d)\cdot 1\]
<p>记 $1(n)$ 为常数函数 $1(n)=1$ ,那么可以将上式表示为 Dirichlet 卷积的形式:</p>
\[f_{r+1}=f_r\ast 1=f_0*1^r\]
<p>和一般卷积一样,Dirichlet 卷积也满足结合律。因此我们研究 $1^r$ 的表达式。</p>
\[\begin{align*}
1^2(n) & = (1 \ast 1)(n) \\
& = \sum_{d\mid n}1\cdot 1 \\
1^3(n) & = (1^2 \ast 1)(n) \\
& = \sum_{d_2\mid n}\left(\sum_{d_1\mid d_2}1\cdot 1\right)\cdot 1 \\
& \cdots \\
1^r(n) & = \sum_{d_1\mid d_2\mid \cdots\mid d_{r-1}\mid n}1
\end{align*}\]
<p>换句话说,即求长度为 $r-1$ 的序列 $d_1,\ldots,d_{r-1}$ 的方案数,其中序列满足 $d_i\mid d_{i+1}$ , $d_{r-1}\mid n$ 。</p>
<p>记 $n$ 的质因数分解为 $n=\prod p_i^{k_i}$ ,仅考虑一个质因子 $p_i$ ,那么 $d_1,\ldots,d_{r-1}$ 的每个数所包含的 $p_i$ 的次数都不应大于 $k_i$ ,且前一个数的次数不应大于后一个数的次数。问题就变成了:求长度为 $r-1$ 且最大数不超过 $k_i$ 的非负非降序列的方案数。用隔板法可以知道方案数为</p>
\[\binom{k_i+r-1}{r-1}\]
<p>那么</p>
\[1^r(n)=\prod_i \binom{k_i+r-1}{r-1}\]
\[f_{r}(n) = (f_0 \ast 1^{r-1})(n) = \sum_{d\mid n}f_0(d)\cdot 1^{r-1}\left(\frac{n}{d}\right)\]
<p>至此,只要用筛法预处理出最小质因子,即可在 $O(\tau(n))$ (因子数)的时间内回答一个询问。但这还不够快(比赛时交的这个做法,TLE 在了 Final Test 的第52个点……)。</p>
<p>由于 $d$ 和 $\frac{n}{d}$ 都是 $n$ 的因子,记 $d$ 的质因数分解为 $d=\prod{p_i^{q_i}}$ ,那么 $q_i\leq k_i$ ,而且 $\frac{n}{d}=\prod{p_i^{k_i-q_i}}$ 。进而我们可以将 $f_0(d)$ 表示如下:</p>
\[f_0(d) = \prod_i (1+[q_i>0])\]
<p>带入 $f_r$ 的式子得到</p>
\[\begin{align*}
f_r(n) & = \sum_{d\mid n}\left(\prod_i (1+[q_i>0])\right)\left(\prod_i \binom{k_i-q_i+r-1}{r-1}\right) \\
& = \sum_{d\mid n}\prod_i (1+[q_i>0])\binom{k_i-q_i+r-1}{r-1} \\
& = \prod_i\sum_{q_i=0}^{k_i} (1+[q_i>0])\binom{k_i-q_i+r-1}{r-1}
\end{align*}\]
<p>因此只要对每个质因子单独计算就可以了。复杂度骤降为 $O(\log n)$ 。</p>
<p>发现了什么吗=·=这个函数是一个积性函数……如果一开始就意识到这一点的话,整个推导其实非常简单,根本不需要考虑什么 Dirichlet 卷积的结合性……</p>
<p>虽然绕了很大的弯路,但这个思路还是挺有趣的😂</p>Zecong Hu转眼之间已经到了大三,距退役已经三年了。考完期末考试,正好有一场 CF,抱着做康复训练的心态参加了比赛。 结果自然是惨不忍睹……虽然是和同学开黑,却也无济于事。写了5道题,最后只过了两道。大概已经不适合快节奏地写代码了吧。 但不管怎么说,这次的题目还是挺有意思的。而且我们的做法和官方做法有些差别,自认为也是很有意思的思路。你所不知道的 template2015-09-19T15:27:00+00:002015-09-19T15:27:00+00:00http://zecong.hu/2015/09/19/what-you-dont-know-about-templates<blockquote>
<p>C makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do it blows your whole leg off.</p>
<p>—— C++ 之父,Bjarne Stroustrup</p>
</blockquote>
<p>C++ 是一门非常强大而复杂的语言,而<strong>模板</strong>(template)则是其“主打功能”之一。可以说,没有了模板,C++ 就不会是 C++,它不会有现在这样的灵活性和可拓展性,也不会像现在这样因为过于复杂而受人诟病。</p>
<p>本文将介绍 C++ 模板的一些较为少见但是非常强大的应用。为了读懂本文,你需要:</p>
<ul>
<li>有基础的编程知识和经验;</li>
<li>能够看懂 C++ 代码;</li>
<li>对于未知的事物充满好奇心。</li>
</ul>
<p>如果你具备上面这些条件,那么就做好准备认识你所不知道的 template 吧!</p>
<h2 id="从基础开始">从基础开始</h2>
<p>为了照顾不太明白的读者,我们先很简单地讲一下什么是 C++ 的模板。已经了然于心的读者们可以选择再来复习一遍,或者跳过这一小节。</p>
<p>在传统的 C 语言中,不同的函数不能有相同的名字。这个规定其实非常好理解,因为很多时候函数操作会依赖于特定的类型。但我们也没法排除有一些不依赖于类型的通用算法,如果要为每种类型都定义一个名字不同的函数,再在每个函数里实现一遍算法,岂不是很麻烦?</p>
<p>于是 C++ 标准委员会(ISO C++ committee)就想,我们要在 C++ 里加入一个新功能,使得程序员可以写出不依赖于具体类型的<strong>泛型</strong>(generic)代码。这个新功能就是模板。在 C++ 中,我们可以通过如下语法定义一个模板函数:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span> <span class="o"><</span><span class="k">class</span> <span class="nc">T</span><span class="p">></span>
<span class="n">T</span> <span class="nf">max</span><span class="p">(</span><span class="n">T</span> <span class="n">a</span><span class="p">,</span> <span class="n">T</span> <span class="n">b</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">a</span> <span class="o">></span> <span class="n">b</span><span class="p">)</span> <span class="k">return</span> <span class="n">a</span><span class="p">;</span>
<span class="k">return</span> <span class="n">b</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>从函数名和函数的执行过程我们很容易推断出,它的作用就是找出并返回<code class="language-plaintext highlighter-rouge">a</code>和<code class="language-plaintext highlighter-rouge">b</code>中的较大值。但是这个<code class="language-plaintext highlighter-rouge">T</code>是个什么玩意儿呢?</p>
<p>其实<code class="language-plaintext highlighter-rouge">T</code>是在代码第一行里定义的<code class="language-plaintext highlighter-rouge">class T</code>,它代表的是一个<em>任意的类型</em>。也就是说,这个函数接受两个类型相同的东西,并返回一个同样类型的东西。</p>
<p>既然有了模板函数,我们也可以弄出一个模板类:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span> <span class="o"><</span><span class="kt">int</span> <span class="n">N</span><span class="p">,</span> <span class="kt">int</span> <span class="n">M</span><span class="p">,</span> <span class="k">typename</span> <span class="nc">T</span><span class="p">></span>
<span class="k">class</span> <span class="nc">Matrix</span> <span class="p">{</span>
<span class="nl">private:</span>
<span class="n">T</span> <span class="n">array</span><span class="p">[</span><span class="n">N</span><span class="p">][</span><span class="n">M</span><span class="p">];</span>
<span class="nl">public:</span>
<span class="n">Matrix</span><span class="p">();</span>
<span class="k">static</span> <span class="n">Matrix</span> <span class="n">identityMatrix</span><span class="p">();</span>
<span class="n">T</span> <span class="o">&</span><span class="n">elementAtIndex</span><span class="p">(</span><span class="kt">int</span> <span class="n">x</span><span class="p">,</span> <span class="kt">int</span> <span class="n">y</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">array</span><span class="p">[</span><span class="n">x</span><span class="p">][</span><span class="n">y</span><span class="p">];</span>
<span class="p">}</span>
<span class="k">const</span> <span class="n">T</span> <span class="o">&</span><span class="n">elementAtIndex</span><span class="p">(</span><span class="kt">int</span> <span class="n">x</span><span class="p">,</span> <span class="kt">int</span> <span class="n">y</span><span class="p">)</span> <span class="k">const</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">array</span><span class="p">[</span><span class="n">x</span><span class="p">][</span><span class="n">y</span><span class="p">];</span>
<span class="p">}</span>
<span class="p">}</span> <span class="p">;</span>
</code></pre></div></div>
<p>注意到我们可以指定多个泛型的参数,而且这些参数还不一定得是<code class="language-plaintext highlighter-rouge">T</code>这样的<em>类型参数</em>——它还可以是<em>非类型参数</em>,如这里的整型<code class="language-plaintext highlighter-rouge">N</code>,甚至可以是嵌套的模板参数。</p>
<p>这些非常简单的例子可以让我们略微感受到模板的强大之处:只要类型<code class="language-plaintext highlighter-rouge">T</code>可以拷贝构造、定义了大于运算符,就可以套用这个函数。如果我们有心,可以用模板实现一整套泛型的算法,并提供简单的借口。设想一下,当我们要给自定义的类型排序的时候,不需要手写快排,而定义比较操作符,直接调用一个模板函数即可;当我们要使用某些数据结构的时候,直接把我们的类名告诉模板类即可。</p>
<p>事实上 C++ 的 STL 就是这样一个玩意儿。STL 的全称是 Standard Template Library,里面用模板实现了各种泛型的算法和数据结构。随叫随到,即写即用,从此无心造轮子,管他开不开 O2。</p>
<p>限于篇幅原因,模板的基础知识只能介绍到这。有关模板的更多知识,推荐大家阅读这个网页:<a href="https://isocpp.org/wiki/faq/templates">https://isocpp.org/wiki/faq/templates</a>。</p>
<h2 id="模板元编程">模板元编程</h2>
<p>模板当然是个好东西,它非常之强大,可以完成原来写C代码时想都不敢想的功能。但问题也就在于它太强大了,就连当初设计模板的人也不知道它究竟有多么厉害。事实上,可以证明模板的这套语言本身就是图灵完备的,也就是说,光是使用模板,我们就可以在编译时完成一切计算。这就是所谓的<strong>模板元编程</strong>(template metaprogramming,TMP)。</p>
<h3 id="从斐波那契谈起">从斐波那契谈起</h3>
<p>如果要你用正常的 C++ 求斐波那契数列,你会怎么写?当然会是像下面这样:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="kt">int</span> <span class="n">N</span> <span class="o">=</span> <span class="mi">100</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">a</span><span class="p">[</span><span class="n">N</span> <span class="o">+</span> <span class="mi">1</span><span class="p">];</span>
<span class="n">a</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="n">a</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span> <span class="n">i</span> <span class="o"><=</span> <span class="n">N</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="n">a</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">a</span><span class="p">[</span><span class="n">i</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="n">a</span><span class="p">[</span><span class="n">i</span> <span class="o">-</span> <span class="mi">2</span><span class="p">];</span>
</code></pre></div></div>
<p>或者是直接写递归的版本:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">fib</span><span class="p">(</span><span class="kt">int</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">n</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">n</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span> <span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">return</span> <span class="n">fib</span><span class="p">(</span><span class="n">n</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">+</span> <span class="n">fib</span><span class="p">(</span><span class="n">n</span> <span class="o">-</span> <span class="mi">2</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>其实我们可以用模板来做:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span> <span class="o"><</span><span class="kt">int</span> <span class="n">N</span><span class="p">></span>
<span class="k">struct</span> <span class="nc">Fib</span> <span class="p">{</span>
<span class="k">static</span> <span class="k">const</span> <span class="kt">int</span> <span class="n">value</span> <span class="o">=</span> <span class="n">Fib</span><span class="o"><</span><span class="n">N</span><span class="o">-</span><span class="mi">1</span><span class="o">>::</span><span class="n">value</span> <span class="o">+</span> <span class="n">Fib</span><span class="o"><</span><span class="n">N</span><span class="o">-</span><span class="mi">2</span><span class="o">>::</span><span class="n">value</span><span class="p">;</span>
<span class="p">}</span> <span class="p">;</span>
<span class="k">template</span> <span class="o"><</span><span class="p">></span>
<span class="k">struct</span> <span class="nc">Fib</span><span class="o"><</span><span class="mi">0</span><span class="o">></span> <span class="p">{</span>
<span class="k">static</span> <span class="k">const</span> <span class="kt">int</span> <span class="n">value</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span> <span class="p">;</span>
<span class="k">template</span> <span class="o"><</span><span class="p">></span>
<span class="k">struct</span> <span class="nc">Fib</span><span class="o"><</span><span class="mi">1</span><span class="o">></span> <span class="p">{</span>
<span class="k">static</span> <span class="k">const</span> <span class="kt">int</span> <span class="n">value</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span> <span class="p">;</span>
</code></pre></div></div>
<p>我们在<code class="language-plaintext highlighter-rouge">Fib</code>类里定义了一个静态的常量<code class="language-plaintext highlighter-rouge">value</code>,代表第<code class="language-plaintext highlighter-rouge">N</code>个斐波那契数的值。<code class="language-plaintext highlighter-rouge">N = 0</code>和<code class="language-plaintext highlighter-rouge">N = 1</code>的两个模板特化是递归的边界条件,一般的情况则直接利用公式递归计算。整个计算过程都是在编译时完成的,而且由于模板的实例化机制,这个递归的过程还是记忆化的,即同一个斐波那契数不会被计算两次,计算的复杂度为 $O(N)$ 而非 $O(fibonacci(N))$ 。</p>
<p>如果我们要使用第10个斐波那契数列的值,只要用<code class="language-plaintext highlighter-rouge">Fib<10>::value</code>就可以了。值得一提的是,由于计算过程需要在编译时完成,模板中的参数必须得是编译时就知晓其值的常量。</p>
<p>另外,通常编译器会对模板的递归层数作限制,在<code class="language-plaintext highlighter-rouge">clang</code>编译器上默认是256层。可以使用<code class="language-plaintext highlighter-rouge">-ftemplate-depth=N</code>来将层数设为<code class="language-plaintext highlighter-rouge">N</code>,但太大的层数会使得编译器自己栈溢出……所以这玩意并没有什么○用。</p>
<h3 id="但是这又有什么用呢">但是这又有什么○用呢?</h3>
<p>虽说是节省了运行时间,但必须在编译时确定所有数值,加之<code class="language-plaintext highlighter-rouge">N</code>还不能太大,感觉上并没有什么○用。</p>
<p>但TML可不光是编译时算数这么简单。下面将介绍两个TML在实际中的应用,请坐和放宽,准备打开新世界的大门。</p>
<h2 id="crtp">CRTP</h2>
<p>这个看上去超级高大上的缩写,全称其实是 Curiously Recurring Template Pattern<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>,直译过来就是“神奇递归模板模式”,一下就显得傻里傻气的了。</p>
<p>为什么叫这个名字呢?我们先来看一段代码:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span> <span class="o"><</span><span class="k">class</span> <span class="nc">Derived</span><span class="p">></span>
<span class="k">struct</span> <span class="nc">Base</span> <span class="p">{</span>
<span class="kt">void</span> <span class="n">polymorphism</span><span class="p">()</span> <span class="p">{</span>
<span class="k">static_cast</span><span class="o"><</span><span class="n">Derived</span><span class="o">*></span><span class="p">(</span><span class="k">this</span><span class="p">)</span><span class="o">-></span><span class="n">_poly</span><span class="p">();</span>
<span class="p">}</span>
<span class="k">static</span> <span class="kt">void</span> <span class="n">static_poly</span><span class="p">()</span> <span class="p">{</span>
<span class="n">Derived</span><span class="o">::</span><span class="n">_static_poly</span><span class="p">();</span>
<span class="p">}</span>
<span class="c1">// Default implementation</span>
<span class="kt">void</span> <span class="n">_poly</span><span class="p">();</span>
<span class="k">static</span> <span class="n">_static_poly</span><span class="p">();</span>
<span class="p">}</span> <span class="p">;</span>
<span class="k">struct</span> <span class="nc">Derived</span> <span class="o">:</span> <span class="k">public</span> <span class="n">Base</span><span class="o"><</span><span class="n">Derived</span><span class="o">></span> <span class="p">{</span>
<span class="kt">void</span> <span class="n">_poly</span><span class="p">();</span>
<span class="c1">// void _static_poly();</span>
<span class="p">}</span>
</code></pre></div></div>
<p>嗯,<code class="language-plaintext highlighter-rouge">Base</code>类是一个正常的模板类,虽然内容有点不明觉厉。但<code class="language-plaintext highlighter-rouge">Derived</code>类是什么鬼?为啥能把自己作为父类的模板参数?这岂不是“我依赖于我爸依赖于我”的状况 = = ?</p>
<p>然而这段代码是可以通过编译的合法代码。原因有二:</p>
<ul>
<li>虽然<code class="language-plaintext highlighter-rouge">Base</code>需要了解<code class="language-plaintext highlighter-rouge">Derived</code>的定义,但其不直接或间接包含<code class="language-plaintext highlighter-rouge">Derived</code>类的实例,也即其大小不依赖于<code class="language-plaintext highlighter-rouge">Derived</code>;</li>
<li>模板类会在被使用时实例化,而此时<code class="language-plaintext highlighter-rouge">Base</code>类与<code class="language-plaintext highlighter-rouge">Derived</code>类的定义均已知晓;就此段代码而言,编译器可以检验<code class="language-plaintext highlighter-rouge">Derived</code>类是否包含<code class="language-plaintext highlighter-rouge">_poly()</code>和<code class="language-plaintext highlighter-rouge">_static_poly()</code>函数的定义,如果没有找到,则会在其父类<code class="language-plaintext highlighter-rouge">Base<Derived></code>类中寻找。</li>
</ul>
<p>一般人在第一次看到这个玄学一般的用法时都会目瞪口呆不知所措。先不要惊讶,我们来看看用这种写法能做什么:</p>
<h3 id="无需-vtable-的编译时多态">无需 VTABLE 的编译时多态</h3>
<p>C++ 实现多态的方法是将类内所有的虚(virtual)函数存在一个被称为<code class="language-plaintext highlighter-rouge">VTABLE</code>的数组中,在运行时调用一个虚函数时,实际调用表中指向的函数。这个方法有个缺点,就是必须维护这么一个表,从而产生额外开销(overhead)。</p>
<p>而利用 CRTP 则可以在实现多态的同时,省去这个表的开销。我们看上面那段代码,在<code class="language-plaintext highlighter-rouge">Base</code>类(下称基类)和<code class="language-plaintext highlighter-rouge">Derived</code>类(下称派生类)中都定义有<code class="language-plaintext highlighter-rouge">_poly()</code>(非虚)函数,理论上基类是访问不到派生类中的函数的。但是这里不一样,基类拥有额外的信息:派生类叫什么。所以基类在调用之前,将自己转换成了派生类。这个转换是可行的,因为事实上,自己本身就是自己的派生类(有点绕,感受一下)。这么一来,调用的就是派生类中的函数了。如果派生类中没有定义<code class="language-plaintext highlighter-rouge">_poly()</code>函数,则编译器会找到基类中的同名函数;如果定义了<code class="language-plaintext highlighter-rouge">_poly()</code>函数,则它会覆盖掉基类中的同名函数,故编译器会找到派生类中的函数。这样我们就在编译器实现了多态。</p>
<p>不过 CRTP 的这个多态并非真正的多态。如果我们有一个派生类的实例,通过 CRTP,基类中定义的函数可以调用派生类中重载的“虚”函数。但如果我们的派生类以基类指针的形式存在,我们则无法通过其访问到派生类的虚函数。后者被成为运行时多态,是 CRTP 力不能及的。</p>
<h3 id="通用基类">通用基类</h3>
<p>有时候我们会遇到这样的问题:我们只有一个基类的指针,而我们要进行一次深拷贝,如下:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">Base</span> <span class="p">{};</span>
<span class="k">struct</span> <span class="nc">Derived</span> <span class="o">:</span> <span class="k">public</span> <span class="n">Base</span> <span class="p">{};</span>
<span class="k">struct</span> <span class="nc">AnotherDerived</span> <span class="o">:</span> <span class="k">public</span> <span class="n">Base</span> <span class="p">{};</span>
<span class="kt">void</span> <span class="nf">copy</span><span class="p">(</span><span class="n">Base</span> <span class="o">*</span><span class="n">p</span><span class="p">)</span> <span class="p">{</span>
<span class="n">Base</span> <span class="o">*</span><span class="n">copy_p</span> <span class="o">=</span> <span class="k">new</span> <span class="o">???</span><span class="p">(</span><span class="o">*</span><span class="n">p</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>于是我们懵逼了,这个<code class="language-plaintext highlighter-rouge">???</code>该填啥啊,写基类不对吧,写派生类也不知道是哪个啊,咋办?</p>
<p>一个解决方式定义一个名为<code class="language-plaintext highlighter-rouge">clone()</code>的虚函数,然后在每个派生类中重载:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">Base</span> <span class="p">{</span>
<span class="k">virtual</span> <span class="n">Base</span> <span class="o">*</span><span class="n">clone</span><span class="p">()</span> <span class="k">const</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span> <span class="p">;</span>
<span class="k">struct</span> <span class="nc">Derived</span> <span class="o">:</span> <span class="k">public</span> <span class="n">Base</span> <span class="p">{</span>
<span class="k">virtual</span> <span class="n">Derived</span> <span class="o">*</span><span class="n">clone</span><span class="p">()</span> <span class="k">const</span> <span class="p">{</span>
<span class="k">return</span> <span class="k">new</span> <span class="n">Derived</span><span class="p">(</span><span class="o">*</span><span class="k">this</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">struct</span> <span class="nc">AnotherDerived</span> <span class="o">:</span> <span class="k">public</span> <span class="n">Base</span> <span class="p">{</span>
<span class="k">virtual</span> <span class="n">AnotherDerived</span> <span class="o">*</span><span class="n">clone</span><span class="p">()</span> <span class="k">const</span> <span class="p">{</span>
<span class="k">return</span> <span class="k">new</span> <span class="n">AnotherDerived</span><span class="p">(</span><span class="o">*</span><span class="k">this</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>这份代码可以正常运作,但显得十分不优美:每个类里都得写一遍,这样不仅有大量重复代码,还容易出错。</p>
<p>我们来分析一下为什么需要这样写,问题似乎在于,我的基类不知道我到底是啥,所以我需要在派生类里定义虚函数。这不就是 CRTP 解决的问题吗?我们把代码改成下面这样<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">Base</span> <span class="p">{</span>
<span class="k">virtual</span> <span class="n">Base</span> <span class="o">*</span><span class="n">clone</span><span class="p">()</span> <span class="k">const</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">};</span>
<span class="k">template</span> <span class="o"><</span><span class="k">class</span> <span class="nc">Derived</span><span class="p">></span>
<span class="k">struct</span> <span class="nc">BaseCRTP</span> <span class="o">:</span> <span class="k">public</span> <span class="n">Base</span> <span class="p">{</span>
<span class="k">virtual</span> <span class="n">Base</span> <span class="o">*</span><span class="n">clone</span><span class="p">()</span> <span class="k">const</span> <span class="p">{</span>
<span class="k">return</span> <span class="k">new</span> <span class="n">Derived</span><span class="p">(</span><span class="k">static_cast</span><span class="o"><</span><span class="n">Derived</span> <span class="k">const</span> <span class="o">&></span><span class="p">(</span><span class="o">*</span><span class="k">this</span><span class="p">));</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">struct</span> <span class="nc">Derived</span> <span class="o">:</span> <span class="k">public</span> <span class="n">BaseCRTP</span><span class="o"><</span><span class="n">Derived</span><span class="o">></span> <span class="p">{};</span>
</code></pre></div></div>
<p>假设我们有一个<code class="language-plaintext highlighter-rouge">Derived</code>类的实例,在经历了一些事情之后,它变成了<code class="language-plaintext highlighter-rouge">Base</code>类的指针。现在我们通过这个指针调用<code class="language-plaintext highlighter-rouge">clone()</code>函数,通过虚函数的机制,我们会找到<code class="language-plaintext highlighter-rouge">BaseCRTP<Derived></code>类的<code class="language-plaintext highlighter-rouge">clone()</code>函数。此时我们已经知道了派生类的名字,也就可以完成深拷贝了。</p>
<p>如果分析一下背后的原因,我们会发现,每次定义一个派生类,编译器都会实例化一份<code class="language-plaintext highlighter-rouge">BaseCRTP</code>出来。所以并没有什么神奇的,其实只是编译器帮我们生成了我们本应手写的代码而已。CRTP 的其他应用,比如实例计数器,也都是基于这个原理。</p>
<h2 id="sfinae">SFINAE</h2>
<p>又是一个高大上的缩写,不急,我们先把它展开了:全称为 Substitution failure is not an error<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>。这都不是一个词组了,而是一句话:替换失败不被视为编译错误。当替换失败时,编译器不报错,而只是将这个模板从待选的重载函数集中移除,不考虑失败的这一个模板而已。</p>
<p>在实例化一个模板时,编译器需要把模板参数中的东西替换为实际的类型或值,然而替换是可能失败的,比如下面这个例子:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span> <span class="o"><</span><span class="k">class</span> <span class="nc">T</span><span class="p">></span>
<span class="kt">void</span> <span class="nf">defined_foo</span><span class="p">(</span><span class="k">typename</span> <span class="n">T</span><span class="o">::</span><span class="n">foo</span><span class="p">)</span> <span class="p">{}</span>
<span class="k">template</span> <span class="o"><</span><span class="k">class</span> <span class="nc">T</span><span class="p">></span>
<span class="kt">void</span> <span class="nf">defined_foo</span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="p">{}</span>
<span class="k">struct</span> <span class="nc">Foo</span> <span class="p">{</span>
<span class="k">typedef</span> <span class="kt">int</span> <span class="n">foo</span><span class="p">;</span>
<span class="p">}</span> <span class="p">;</span>
</code></pre></div></div>
<p>这里我们定义了两个版本的<code class="language-plaintext highlighter-rouge">defined_foo()</code>,接受不同的参数。<code class="language-plaintext highlighter-rouge">typename</code>关键字是为了消除歧义,告诉编译器<code class="language-plaintext highlighter-rouge">T::foo</code>绝壁是个类型名。由于是模板函数,编译器得做类型替换,这里就可能出现替换失败的情况,比如:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">defined_foo</span><span class="o"><</span><span class="n">Foo</span><span class="o">></span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="n">defined_foo</span><span class="o"><</span><span class="kt">int</span><span class="o">></span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
</code></pre></div></div>
<p>第一行,<code class="language-plaintext highlighter-rouge">Foo</code>类中有<code class="language-plaintext highlighter-rouge">foo</code>类型,所以在第一个模板中替换成功,而在第二个模板中替换失败;第二行,<code class="language-plaintext highlighter-rouge">int</code>类型显然不包含别的类型,所以在第一个模板中替换失败,而在第二个模板中替换成功。这两个情况中,都是恰有一个重载的模板替换成功,都不会报错。</p>
<p>看上去是非常自然而且简单的做法吧?你绝对想不到可以用这玩意儿做什么:</p>
<h3 id="编译时自省">编译时自省</h3>
<p>先说说什么是自省。子曰:“见贤思齐焉,见不贤而内<em>自省</em>也”,不过这和我们要讲的自省没有半毛钱关系。</p>
<p>所谓自省,其实就是程序自己知道自己的情况。比如在 Python 中,程序可以在运行时查看自己的代码,获取某个类的名称、成员,甚至是实时增减成员。对于 C++ 这种强类型语言来说,这显然不现实。但通过 SFINAE,我们可以实现一定程度上的自省。</p>
<p>上面的那个例子,其实就是某种自省。我们可以在编译时判断一个类是否满足某种条件,从而调用不同的函数。在 STL 中,有一个神秘的头文件叫<code class="language-plaintext highlighter-rouge"><type_traits></code>(C++11 的新特性),其中包含了很多判断类型的类型的东西,如<code class="language-plaintext highlighter-rouge">is_array</code>、<code class="language-plaintext highlighter-rouge">is_class</code>等。这些东西其实是用 SFINAE 实现的模板类,其中包含一个名为<code class="language-plaintext highlighter-rouge">value</code>的成员,代表判断的值是真是假。</p>
<p>这么说可能有些玄,我们来看一个具体的、真实存在的例子:</p>
<h3 id="boostenable_if">boost::enable_if</h3>
<p>先上代码再解释:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span><span class="o"><</span><span class="kt">bool</span> <span class="n">Cond</span><span class="p">,</span> <span class="k">class</span> <span class="nc">T</span> <span class="o">=</span> <span class="kt">void</span><span class="p">></span>
<span class="k">struct</span> <span class="nc">enable_if_c</span>
<span class="p">{</span> <span class="k">typedef</span> <span class="n">T</span> <span class="n">type</span><span class="p">;</span> <span class="p">};</span>
<span class="k">template</span><span class="o"><</span><span class="k">class</span> <span class="nc">T</span><span class="p">></span>
<span class="k">struct</span> <span class="nc">enable_if_c</span><span class="o"><</span><span class="nb">false</span><span class="p">,</span> <span class="n">T</span><span class="o">></span> <span class="p">{};</span>
<span class="k">template</span><span class="o"><</span><span class="k">class</span> <span class="nc">Cond</span><span class="p">,</span> <span class="k">class</span> <span class="nc">T</span> <span class="o">=</span> <span class="kt">void</span><span class="p">></span>
<span class="k">struct</span> <span class="nc">enable_if</span> <span class="o">:</span> <span class="n">enable_if_c</span><span class="o"><</span><span class="n">Cond</span><span class="o">::</span><span class="n">value</span><span class="p">,</span> <span class="n">T</span><span class="o">></span> <span class="p">{};</span>
<span class="c1">// === 我是分割线 ===</span>
<span class="k">template</span><span class="o"><</span><span class="k">class</span> <span class="nc">T</span><span class="p">></span>
<span class="k">typename</span> <span class="n">enable_if</span><span class="o"><</span><span class="n">is_floating_point</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">,</span> <span class="n">T</span><span class="o">>::</span><span class="n">type</span>
<span class="nf">frobnicate</span><span class="p">(</span><span class="n">T</span> <span class="n">t</span><span class="p">)</span> <span class="p">{}</span>
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">is_floating_point</code>也是<code class="language-plaintext highlighter-rouge"><type_traits></code>里的东西,其作用正如其名。<code class="language-plaintext highlighter-rouge">enable_if</code>的实现是,如果第一个参数的条件为<code class="language-plaintext highlighter-rouge">true</code>,那么其中会包含一个<code class="language-plaintext highlighter-rouge">type</code>类型,为第二个参数的类型;否则不会有这个类型。拿分割线下的函数举例,如果第一个参数的条件,即<code class="language-plaintext highlighter-rouge">T</code>是浮点类型,为<code class="language-plaintext highlighter-rouge">false</code>,那么在使用这个函数时就会产生替换错误。所以<code class="language-plaintext highlighter-rouge">enable_if</code>的作用可以理解为限制模板所能够接受的类型。</p>
<p>不过这个写法有点丑陋,毕竟返回类型这么长;所以另一个常见的写法是添加一个虚设的(dummy)参数,并在参数里实现 SFINAE<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span> <span class="o"><</span><span class="k">class</span> <span class="nc">T</span><span class="p">></span>
<span class="n">T</span> <span class="nf">frobnicate</span><span class="p">(</span><span class="n">T</span> <span class="n">t</span><span class="p">,</span> <span class="k">typename</span> <span class="n">enable_if</span><span class="o"><</span><span class="n">is_floating_point</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">,</span> <span class="n">T</span><span class="o">>::</span><span class="n">type</span> <span class="o">*</span> <span class="o">=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{}</span>
</code></pre></div></div>
<h3 id="type_traits">type_traits</h3>
<p>整个<code class="language-plaintext highlighter-rouge"><type_traits></code>库过于庞大,以个人之力无法看穿,只能挑选一些浅薄的所见所得与大家分享。下面的代码均来自<code class="language-plaintext highlighter-rouge"><type_traits></code>库,为了便于阅读,删去了一部分不影响理解的内容。</p>
<p>首先,我们需要两个类<code class="language-plaintext highlighter-rouge">true_type</code>和<code class="language-plaintext highlighter-rouge">false_type</code>,用来区分结果。这两个类中应当定义<code class="language-plaintext highlighter-rouge">value</code>,类型为<code class="language-plaintext highlighter-rouge">bool</code>,值分别为<code class="language-plaintext highlighter-rouge">true</code>和<code class="language-plaintext highlighter-rouge">false</code>。</p>
<p>然后,我们要考虑 cv 修饰符(<code class="language-plaintext highlighter-rouge">const</code>和<code class="language-plaintext highlighter-rouge">volatile</code>)的问题,它们不应影响我们对类型的判断。<code class="language-plaintext highlighter-rouge">remove_cv</code>的作用是去除类型中的 cv 修饰符,它的实现如下:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// remove_const</span>
<span class="k">template</span> <span class="o"><</span><span class="k">class</span> <span class="nc">_Tp</span><span class="p">></span> <span class="k">struct</span> <span class="nc">remove_const</span> <span class="p">{</span><span class="k">typedef</span> <span class="n">_Tp</span> <span class="n">type</span><span class="p">;};</span>
<span class="k">template</span> <span class="o"><</span><span class="k">class</span> <span class="nc">_Tp</span><span class="p">></span> <span class="k">struct</span> <span class="nc">remove_const</span><span class="o"><</span><span class="k">const</span> <span class="n">_Tp</span><span class="o">></span> <span class="p">{</span><span class="k">typedef</span> <span class="n">_Tp</span> <span class="n">type</span><span class="p">;};</span>
<span class="c1">// remove_volatile</span>
<span class="c1">// ...</span>
<span class="c1">// remove_cv</span>
<span class="k">template</span> <span class="o"><</span><span class="k">class</span> <span class="nc">_Tp</span><span class="p">></span> <span class="k">struct</span> <span class="nc">remove_cv</span> <span class="p">{</span>
<span class="k">typedef</span> <span class="k">typename</span> <span class="n">remove_volatile</span><span class="o"><</span><span class="k">typename</span> <span class="n">remove_const</span><span class="o"><</span><span class="n">_Tp</span><span class="o">>::</span><span class="n">type</span><span class="o">>::</span><span class="n">type</span> <span class="n">type</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>
<p>这里利用模板的匹配功能,获得了类型去掉 cv 修饰符后的名字,并记在了<code class="language-plaintext highlighter-rouge">remove_cv::type</code>中。</p>
<p>接下来就可以判断了。先看一个简单的,<code class="language-plaintext highlighter-rouge">is_null_pointer</code>:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span> <span class="o"><</span><span class="k">class</span> <span class="nc">_Tp</span><span class="p">></span> <span class="k">struct</span> <span class="nc">__is_nullptr_t_impl</span> <span class="o">:</span> <span class="k">public</span> <span class="n">false_type</span> <span class="p">{};</span>
<span class="k">template</span> <span class="o"><</span><span class="p">></span> <span class="k">struct</span> <span class="nc">__is_nullptr_t_impl</span><span class="o"><</span><span class="n">nullptr_t</span><span class="o">></span> <span class="o">:</span> <span class="k">public</span> <span class="n">true_type</span> <span class="p">{};</span>
<span class="k">template</span> <span class="o"><</span><span class="k">class</span> <span class="nc">_Tp</span><span class="p">></span> <span class="k">struct</span> <span class="nc">_LIBCPP_TYPE_VIS_ONLY</span> <span class="n">is_null_pointer</span>
<span class="o">:</span> <span class="k">public</span> <span class="n">__is_nullptr_t_impl</span><span class="o"><</span><span class="k">typename</span> <span class="n">remove_cv</span><span class="o"><</span><span class="n">_Tp</span><span class="o">>::</span><span class="n">type</span><span class="o">></span> <span class="p">{};</span>
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">nullptr_t</code>是 C++11 中加入的空指针类型。这个判断很简单,不用多说。</p>
<p>下面则是一个特别玄的判断:<code class="language-plaintext highlighter-rouge">is_base_of</code>,判断<code class="language-plaintext highlighter-rouge">B</code>是否是<code class="language-plaintext highlighter-rouge">D</code>的基类。由于<code class="language-plaintext highlighter-rouge"><type_traits></code>中的代码过于复杂,下面给出的代码是 StackOverflow 上一个问题里<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>的简化版代码:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="kt">char</span> <span class="p">(</span><span class="o">&</span><span class="n">yes</span><span class="p">)[</span><span class="mi">1</span><span class="p">];</span>
<span class="k">typedef</span> <span class="kt">char</span> <span class="p">(</span><span class="o">&</span><span class="n">no</span><span class="p">)[</span><span class="mi">2</span><span class="p">];</span>
<span class="k">template</span> <span class="o"><</span><span class="k">class</span> <span class="nc">B</span><span class="p">,</span> <span class="k">class</span> <span class="nc">D</span><span class="p">></span>
<span class="k">struct</span> <span class="nc">Host</span> <span class="p">{</span>
<span class="k">operator</span> <span class="n">B</span><span class="o">*</span><span class="p">()</span> <span class="k">const</span><span class="p">;</span>
<span class="k">operator</span> <span class="n">D</span><span class="o">*</span><span class="p">();</span>
<span class="p">}</span> <span class="p">;</span>
<span class="k">template</span> <span class="o"><</span><span class="k">class</span> <span class="nc">B</span><span class="p">,</span> <span class="k">class</span> <span class="nc">D</span><span class="p">></span>
<span class="k">struct</span> <span class="nc">is_base_of</span> <span class="p">{</span>
<span class="k">template</span> <span class="o"><</span><span class="k">class</span> <span class="nc">T</span><span class="p">></span>
<span class="k">static</span> <span class="n">yes</span> <span class="n">test</span><span class="p">(</span><span class="n">D</span><span class="o">*</span><span class="p">,</span> <span class="n">T</span><span class="p">);</span>
<span class="k">static</span> <span class="n">no</span> <span class="n">test</span><span class="p">(</span><span class="n">B</span><span class="o">*</span><span class="p">,</span> <span class="kt">int</span><span class="p">);</span>
<span class="k">static</span> <span class="k">const</span> <span class="kt">bool</span> <span class="n">value</span> <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">test</span><span class="p">(</span><span class="n">Host</span><span class="o"><</span><span class="n">B</span><span class="p">,</span><span class="n">D</span><span class="o">></span><span class="p">(),</span> <span class="kt">int</span><span class="p">()))</span> <span class="o">==</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">yes</span><span class="p">);</span>
<span class="p">}</span> <span class="p">;</span>
</code></pre></div></div>
<p>我们把<code class="language-plaintext highlighter-rouge">true_type</code>和<code class="language-plaintext highlighter-rouge">false_type</code>改成了<code class="language-plaintext highlighter-rouge">yes</code>和<code class="language-plaintext highlighter-rouge">no</code>,通过其内存大小来判断类型;这个并不重要。</p>
<p>要理解这段代码的原理,首先我们需要知道 C++ 标准中,有多个可选函数时会优先选择哪方:</p>
<ul>
<li><strong>原则1:</strong>如果两个函数参数类型相同,而 cv 修饰符不同,则优先选择与传入参数 cv 修饰符匹配的一方;</li>
<li><strong>原则2:</strong>如果原则1无法区分,且两个类型转换函数返回类型不同,则优先选择与目标参数匹配的一方;</li>
<li><strong>原则3:</strong>如果原则2无法区分,优先选择非模板函数。</li>
</ul>
<p>现在我们来分析一下代码的原理。<code class="language-plaintext highlighter-rouge">Host</code>的两个类型转换函数的原型分别为</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">B *(Host<B, D> const &)</code></li>
<li><code class="language-plaintext highlighter-rouge">D *(Host<B, D> &)</code></li>
</ul>
<p>假设<code class="language-plaintext highlighter-rouge">B</code>是<code class="language-plaintext highlighter-rouge">D</code>的基类,那么<code class="language-plaintext highlighter-rouge">D *</code>可以转换为<code class="language-plaintext highlighter-rouge">B *</code>,反则不行。对于第一个<code class="language-plaintext highlighter-rouge">test</code>函数,可选的转换函数只有第二个;而对于第二个<code class="language-plaintext highlighter-rouge">test</code>函数,两个转换函数都可选,根据原则1,编译器会第二个选择转换函数(因为默认的传入参数为<code class="language-plaintext highlighter-rouge">*this</code>,为非<code class="language-plaintext highlighter-rouge">const</code>类型)。此时第一个<code class="language-plaintext highlighter-rouge">test</code>函数的目标类型与转换函数的返回类型匹配,而第二个的不匹配,根据原则2,编译器会选择第一个<code class="language-plaintext highlighter-rouge">test</code>函数,故得到<code class="language-plaintext highlighter-rouge">yes</code>。</p>
<p>假设<code class="language-plaintext highlighter-rouge">B</code>不是<code class="language-plaintext highlighter-rouge">D</code>的基类,那么<code class="language-plaintext highlighter-rouge">D *</code>不可以转换为<code class="language-plaintext highlighter-rouge">B *</code>,反之或许可以。对于第二个<code class="language-plaintext highlighter-rouge">test</code>函数,可选的转换函数只有第一个;对于第一个<code class="language-plaintext highlighter-rouge">test</code>函数,可选的转换函数有第二个,也可能有第一个,但一定会选择第二个。此时根据原则3,编译器会选择第二个<code class="language-plaintext highlighter-rouge">test</code>函数,故得到<code class="language-plaintext highlighter-rouge">no</code>。</p>
<h2 id="总结">总结</h2>
<p>写了这么多,其实也只涉及了模板的冰山一角。由此可见这一功能的强大与复杂,也不难理解为何模板一直处于争论的中心,甚至有这么一个笑话:“Java 程序员聚在一起谈面向对象和设计模式,C++ 程序员聚在一起谈模板和语言规范到底是怎么回事”,嘲笑的就是 C++ 令人咂舌的复杂程度。</p>
<p>但一码归一码,C++ 还是一门被广泛使用的语言,因此适当的了解还是必要的。而本文中提到的用法,可能大家没有见过,但在工程中的确是普遍存在的。就拿 CRTP 来说,Boost 库的文法分析库 Spirit、计算几何库 CGAL 的整个核心中都使用了 CRTP;而 SFINAE 更是已经进入了 C++ 标准。即便自己不会写出这样的代码,至少在见到的时候也应该明白是在干什么;退一步讲,把这篇文章当成是普通的科普文章,图个乐呵也好。</p>
<h2 id="参考文献">参考文献</h2>
<p></p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p><a href="https://en.wikipedia.org/wiki/Curiously_recurring_template_pattern">Wikipedia - Curiously recurring template pattern</a> <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p><a href="https://katyscode.wordpress.com/2013/08/22/c-polymorphic-cloning-and-the-crtp-curiously-recurring-template-pattern/">Katy’s Code - C++: Polymorphic cloning and the CRTP (Curiously Recurring Template Pattern)</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p><a href="https://en.wikipedia.org/wiki/Substitution_failure_is_not_an_error">Wikipedia - Substitution failure is not an error</a> <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p><a href="http://accu.org/content/conf2013/Jonathan_Wakely_sfinae.pdf">ACCU 2013 - Jonathan Wakely - SFINAE Functionality Is Not Arcane Esoterica</a> <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p><a href="http://stackoverflow.com/questions/2910979/how-does-is-base-of-work">StackOverflow - How does <code class="language-plaintext highlighter-rouge">is_base_of</code> work?</a> <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Zecong HuC makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do it blows your whole leg off. —— C++ 之父,Bjarne Stroustrup C++ 是一门非常强大而复杂的语言,而模板(template)则是其“主打功能”之一。可以说,没有了模板,C++ 就不会是 C++,它不会有现在这样的灵活性和可拓展性,也不会像现在这样因为过于复杂而受人诟病。 本文将介绍 C++ 模板的一些较为少见但是非常强大的应用。为了读懂本文,你需要: 有基础的编程知识和经验; 能够看懂 C++ 代码; 对于未知的事物充满好奇心。 如果你具备上面这些条件,那么就做好准备认识你所不知道的 template 吧! 从基础开始 为了照顾不太明白的读者,我们先很简单地讲一下什么是 C++ 的模板。已经了然于心的读者们可以选择再来复习一遍,或者跳过这一小节。 在传统的 C 语言中,不同的函数不能有相同的名字。这个规定其实非常好理解,因为很多时候函数操作会依赖于特定的类型。但我们也没法排除有一些不依赖于类型的通用算法,如果要为每种类型都定义一个名字不同的函数,再在每个函数里实现一遍算法,岂不是很麻烦? 于是 C++ 标准委员会(ISO C++ committee)就想,我们要在 C++ 里加入一个新功能,使得程序员可以写出不依赖于具体类型的泛型(generic)代码。这个新功能就是模板。在 C++ 中,我们可以通过如下语法定义一个模板函数: template <class T> T max(T a, T b) { if (a > b) return a; return b; } 从函数名和函数的执行过程我们很容易推断出,它的作用就是找出并返回a和b中的较大值。但是这个T是个什么玩意儿呢? 其实T是在代码第一行里定义的class T,它代表的是一个任意的类型。也就是说,这个函数接受两个类型相同的东西,并返回一个同样类型的东西。 既然有了模板函数,我们也可以弄出一个模板类: template <int N, int M, typename T> class Matrix { private: T array[N][M]; public: Matrix(); static Matrix identityMatrix(); T &elementAtIndex(int x, int y) { return array[x][y]; } const T &elementAtIndex(int x, int y) const { return array[x][y]; } } ; 注意到我们可以指定多个泛型的参数,而且这些参数还不一定得是T这样的类型参数——它还可以是非类型参数,如这里的整型N,甚至可以是嵌套的模板参数。 这些非常简单的例子可以让我们略微感受到模板的强大之处:只要类型T可以拷贝构造、定义了大于运算符,就可以套用这个函数。如果我们有心,可以用模板实现一整套泛型的算法,并提供简单的借口。设想一下,当我们要给自定义的类型排序的时候,不需要手写快排,而定义比较操作符,直接调用一个模板函数即可;当我们要使用某些数据结构的时候,直接把我们的类名告诉模板类即可。 事实上 C++ 的 STL 就是这样一个玩意儿。STL 的全称是 Standard Template Library,里面用模板实现了各种泛型的算法和数据结构。随叫随到,即写即用,从此无心造轮子,管他开不开 O2。 限于篇幅原因,模板的基础知识只能介绍到这。有关模板的更多知识,推荐大家阅读这个网页:https://isocpp.org/wiki/faq/templates。 模板元编程 模板当然是个好东西,它非常之强大,可以完成原来写C代码时想都不敢想的功能。但问题也就在于它太强大了,就连当初设计模板的人也不知道它究竟有多么厉害。事实上,可以证明模板的这套语言本身就是图灵完备的,也就是说,光是使用模板,我们就可以在编译时完成一切计算。这就是所谓的模板元编程(template metaprogramming,TMP)。 从斐波那契谈起 如果要你用正常的 C++ 求斐波那契数列,你会怎么写?当然会是像下面这样: const int N = 100; int a[N + 1]; a[0] = 0, a[1] = 1; for (int i = 2; i <= N; ++i) a[i] = a[i - 1] + a[i - 2]; 或者是直接写递归的版本: int fib(int n) { if (n == 0) return 0; if (n == 1) return 1; return fib(n - 1) + fib(n - 2); } 其实我们可以用模板来做: template <int N> struct Fib { static const int value = Fib<N-1>::value + Fib<N-2>::value; } ; template <> struct Fib<0> { static const int value = 0; } ; template <> struct Fib<1> { static const int value = 1; } ; 我们在Fib类里定义了一个静态的常量value,代表第N个斐波那契数的值。N = 0和N = 1的两个模板特化是递归的边界条件,一般的情况则直接利用公式递归计算。整个计算过程都是在编译时完成的,而且由于模板的实例化机制,这个递归的过程还是记忆化的,即同一个斐波那契数不会被计算两次,计算的复杂度为 $O(N)$ 而非 $O(fibonacci(N))$ 。 如果我们要使用第10个斐波那契数列的值,只要用Fib<10>::value就可以了。值得一提的是,由于计算过程需要在编译时完成,模板中的参数必须得是编译时就知晓其值的常量。 另外,通常编译器会对模板的递归层数作限制,在clang编译器上默认是256层。可以使用-ftemplate-depth=N来将层数设为N,但太大的层数会使得编译器自己栈溢出……所以这玩意并没有什么○用。 但是这又有什么○用呢? 虽说是节省了运行时间,但必须在编译时确定所有数值,加之N还不能太大,感觉上并没有什么○用。 但TML可不光是编译时算数这么简单。下面将介绍两个TML在实际中的应用,请坐和放宽,准备打开新世界的大门。 CRTP 这个看上去超级高大上的缩写,全称其实是 Curiously Recurring Template Pattern1,直译过来就是“神奇递归模板模式”,一下就显得傻里傻气的了。 为什么叫这个名字呢?我们先来看一段代码: template <class Derived> struct Base { void polymorphism() { static_cast<Derived*>(this)->_poly(); } static void static_poly() { Derived::_static_poly(); } // Default implementation void _poly(); static _static_poly(); } ; struct Derived : public Base<Derived> { void _poly(); // void _static_poly(); } 嗯,Base类是一个正常的模板类,虽然内容有点不明觉厉。但Derived类是什么鬼?为啥能把自己作为父类的模板参数?这岂不是“我依赖于我爸依赖于我”的状况 = = ? 然而这段代码是可以通过编译的合法代码。原因有二: 虽然Base需要了解Derived的定义,但其不直接或间接包含Derived类的实例,也即其大小不依赖于Derived; 模板类会在被使用时实例化,而此时Base类与Derived类的定义均已知晓;就此段代码而言,编译器可以检验Derived类是否包含_poly()和_static_poly()函数的定义,如果没有找到,则会在其父类Base<Derived>类中寻找。 一般人在第一次看到这个玄学一般的用法时都会目瞪口呆不知所措。先不要惊讶,我们来看看用这种写法能做什么: 无需 VTABLE 的编译时多态 C++ 实现多态的方法是将类内所有的虚(virtual)函数存在一个被称为VTABLE的数组中,在运行时调用一个虚函数时,实际调用表中指向的函数。这个方法有个缺点,就是必须维护这么一个表,从而产生额外开销(overhead)。 而利用 CRTP 则可以在实现多态的同时,省去这个表的开销。我们看上面那段代码,在Base类(下称基类)和Derived类(下称派生类)中都定义有_poly()(非虚)函数,理论上基类是访问不到派生类中的函数的。但是这里不一样,基类拥有额外的信息:派生类叫什么。所以基类在调用之前,将自己转换成了派生类。这个转换是可行的,因为事实上,自己本身就是自己的派生类(有点绕,感受一下)。这么一来,调用的就是派生类中的函数了。如果派生类中没有定义_poly()函数,则编译器会找到基类中的同名函数;如果定义了_poly()函数,则它会覆盖掉基类中的同名函数,故编译器会找到派生类中的函数。这样我们就在编译器实现了多态。 不过 CRTP 的这个多态并非真正的多态。如果我们有一个派生类的实例,通过 CRTP,基类中定义的函数可以调用派生类中重载的“虚”函数。但如果我们的派生类以基类指针的形式存在,我们则无法通过其访问到派生类的虚函数。后者被成为运行时多态,是 CRTP 力不能及的。 通用基类 有时候我们会遇到这样的问题:我们只有一个基类的指针,而我们要进行一次深拷贝,如下: struct Base {}; struct Derived : public Base {}; struct AnotherDerived : public Base {}; void copy(Base *p) { Base *copy_p = new ???(*p); } 于是我们懵逼了,这个???该填啥啊,写基类不对吧,写派生类也不知道是哪个啊,咋办? 一个解决方式定义一个名为clone()的虚函数,然后在每个派生类中重载: struct Base { virtual Base *clone() const = 0; } ; struct Derived : public Base { virtual Derived *clone() const { return new Derived(*this); } } struct AnotherDerived : public Base { virtual AnotherDerived *clone() const { return new AnotherDerived(*this); } } 这份代码可以正常运作,但显得十分不优美:每个类里都得写一遍,这样不仅有大量重复代码,还容易出错。 我们来分析一下为什么需要这样写,问题似乎在于,我的基类不知道我到底是啥,所以我需要在派生类里定义虚函数。这不就是 CRTP 解决的问题吗?我们把代码改成下面这样2: struct Base { virtual Base *clone() const = 0; }; template <class Derived> struct BaseCRTP : public Base { virtual Base *clone() const { return new Derived(static_cast<Derived const &>(*this)); } } struct Derived : public BaseCRTP<Derived> {}; 假设我们有一个Derived类的实例,在经历了一些事情之后,它变成了Base类的指针。现在我们通过这个指针调用clone()函数,通过虚函数的机制,我们会找到BaseCRTP<Derived>类的clone()函数。此时我们已经知道了派生类的名字,也就可以完成深拷贝了。 如果分析一下背后的原因,我们会发现,每次定义一个派生类,编译器都会实例化一份BaseCRTP出来。所以并没有什么神奇的,其实只是编译器帮我们生成了我们本应手写的代码而已。CRTP 的其他应用,比如实例计数器,也都是基于这个原理。 SFINAE 又是一个高大上的缩写,不急,我们先把它展开了:全称为 Substitution failure is not an error3。这都不是一个词组了,而是一句话:替换失败不被视为编译错误。当替换失败时,编译器不报错,而只是将这个模板从待选的重载函数集中移除,不考虑失败的这一个模板而已。 在实例化一个模板时,编译器需要把模板参数中的东西替换为实际的类型或值,然而替换是可能失败的,比如下面这个例子: template <class T> void defined_foo(typename T::foo) {} template <class T> void defined_foo(T) {} struct Foo { typedef int foo; } ; 这里我们定义了两个版本的defined_foo(),接受不同的参数。typename关键字是为了消除歧义,告诉编译器T::foo绝壁是个类型名。由于是模板函数,编译器得做类型替换,这里就可能出现替换失败的情况,比如: defined_foo<Foo>(0); defined_foo<int>(0); 第一行,Foo类中有foo类型,所以在第一个模板中替换成功,而在第二个模板中替换失败;第二行,int类型显然不包含别的类型,所以在第一个模板中替换失败,而在第二个模板中替换成功。这两个情况中,都是恰有一个重载的模板替换成功,都不会报错。 看上去是非常自然而且简单的做法吧?你绝对想不到可以用这玩意儿做什么: 编译时自省 先说说什么是自省。子曰:“见贤思齐焉,见不贤而内自省也”,不过这和我们要讲的自省没有半毛钱关系。 所谓自省,其实就是程序自己知道自己的情况。比如在 Python 中,程序可以在运行时查看自己的代码,获取某个类的名称、成员,甚至是实时增减成员。对于 C++ 这种强类型语言来说,这显然不现实。但通过 SFINAE,我们可以实现一定程度上的自省。 上面的那个例子,其实就是某种自省。我们可以在编译时判断一个类是否满足某种条件,从而调用不同的函数。在 STL 中,有一个神秘的头文件叫<type_traits>(C++11 的新特性),其中包含了很多判断类型的类型的东西,如is_array、is_class等。这些东西其实是用 SFINAE 实现的模板类,其中包含一个名为value的成员,代表判断的值是真是假。 这么说可能有些玄,我们来看一个具体的、真实存在的例子: boost::enable_if 先上代码再解释: template<bool Cond, class T = void> struct enable_if_c { typedef T type; }; template<class T> struct enable_if_c<false, T> {}; template<class Cond, class T = void> struct enable_if : enable_if_c<Cond::value, T> {}; // === 我是分割线 === template<class T> typename enable_if<is_floating_point<T>, T>::type frobnicate(T t) {} is_floating_point也是<type_traits>里的东西,其作用正如其名。enable_if的实现是,如果第一个参数的条件为true,那么其中会包含一个type类型,为第二个参数的类型;否则不会有这个类型。拿分割线下的函数举例,如果第一个参数的条件,即T是浮点类型,为false,那么在使用这个函数时就会产生替换错误。所以enable_if的作用可以理解为限制模板所能够接受的类型。 不过这个写法有点丑陋,毕竟返回类型这么长;所以另一个常见的写法是添加一个虚设的(dummy)参数,并在参数里实现 SFINAE4: template <class T> T frobnicate(T t, typename enable_if<is_floating_point<T>, T>::type * = 0) {} type_traits 整个<type_traits>库过于庞大,以个人之力无法看穿,只能挑选一些浅薄的所见所得与大家分享。下面的代码均来自<type_traits>库,为了便于阅读,删去了一部分不影响理解的内容。 首先,我们需要两个类true_type和false_type,用来区分结果。这两个类中应当定义value,类型为bool,值分别为true和false。 然后,我们要考虑 cv 修饰符(const和volatile)的问题,它们不应影响我们对类型的判断。remove_cv的作用是去除类型中的 cv 修饰符,它的实现如下: // remove_const template <class _Tp> struct remove_const {typedef _Tp type;}; template <class _Tp> struct remove_const<const _Tp> {typedef _Tp type;}; // remove_volatile // ... // remove_cv template <class _Tp> struct remove_cv { typedef typename remove_volatile<typename remove_const<_Tp>::type>::type type; }; 这里利用模板的匹配功能,获得了类型去掉 cv 修饰符后的名字,并记在了remove_cv::type中。 接下来就可以判断了。先看一个简单的,is_null_pointer: template <class _Tp> struct __is_nullptr_t_impl : public false_type {}; template <> struct __is_nullptr_t_impl<nullptr_t> : public true_type {}; template <class _Tp> struct _LIBCPP_TYPE_VIS_ONLY is_null_pointer : public __is_nullptr_t_impl<typename remove_cv<_Tp>::type> {}; nullptr_t是 C++11 中加入的空指针类型。这个判断很简单,不用多说。 下面则是一个特别玄的判断:is_base_of,判断B是否是D的基类。由于<type_traits>中的代码过于复杂,下面给出的代码是 StackOverflow 上一个问题里5的简化版代码: typedef char (&yes)[1]; typedef char (&no)[2]; template <class B, class D> struct Host { operator B*() const; operator D*(); } ; template <class B, class D> struct is_base_of { template <class T> static yes test(D*, T); static no test(B*, int); static const bool value = sizeof(test(Host<B,D>(), int())) == sizeof(yes); } ; 我们把true_type和false_type改成了yes和no,通过其内存大小来判断类型;这个并不重要。 要理解这段代码的原理,首先我们需要知道 C++ 标准中,有多个可选函数时会优先选择哪方: 原则1:如果两个函数参数类型相同,而 cv 修饰符不同,则优先选择与传入参数 cv 修饰符匹配的一方; 原则2:如果原则1无法区分,且两个类型转换函数返回类型不同,则优先选择与目标参数匹配的一方; 原则3:如果原则2无法区分,优先选择非模板函数。 现在我们来分析一下代码的原理。Host的两个类型转换函数的原型分别为 B *(Host<B, D> const &) D *(Host<B, D> &) 假设B是D的基类,那么D *可以转换为B *,反则不行。对于第一个test函数,可选的转换函数只有第二个;而对于第二个test函数,两个转换函数都可选,根据原则1,编译器会第二个选择转换函数(因为默认的传入参数为*this,为非const类型)。此时第一个test函数的目标类型与转换函数的返回类型匹配,而第二个的不匹配,根据原则2,编译器会选择第一个test函数,故得到yes。 假设B不是D的基类,那么D *不可以转换为B *,反之或许可以。对于第二个test函数,可选的转换函数只有第一个;对于第一个test函数,可选的转换函数有第二个,也可能有第一个,但一定会选择第二个。此时根据原则3,编译器会选择第二个test函数,故得到no。 总结 写了这么多,其实也只涉及了模板的冰山一角。由此可见这一功能的强大与复杂,也不难理解为何模板一直处于争论的中心,甚至有这么一个笑话:“Java 程序员聚在一起谈面向对象和设计模式,C++ 程序员聚在一起谈模板和语言规范到底是怎么回事”,嘲笑的就是 C++ 令人咂舌的复杂程度。 但一码归一码,C++ 还是一门被广泛使用的语言,因此适当的了解还是必要的。而本文中提到的用法,可能大家没有见过,但在工程中的确是普遍存在的。就拿 CRTP 来说,Boost 库的文法分析库 Spirit、计算几何库 CGAL 的整个核心中都使用了 CRTP;而 SFINAE 更是已经进入了 C++ 标准。即便自己不会写出这样的代码,至少在见到的时候也应该明白是在干什么;退一步讲,把这篇文章当成是普通的科普文章,图个乐呵也好。 参考文献 Wikipedia - Curiously recurring template pattern ↩ Katy’s Code - C++: Polymorphic cloning and the CRTP (Curiously Recurring Template Pattern) ↩ Wikipedia - Substitution failure is not an error ↩ ACCU 2013 - Jonathan Wakely - SFINAE Functionality Is Not Arcane Esoterica ↩ StackOverflow - How does is_base_of work? ↩