今天智平在群里发了一个signal的考题考大家,就是下面程序是什么输出:
import signal
import sys
count = 0
def signal_handler(signum, frame):
    global count
    count = -1
    print 'sig', count
def main():
    signal.signal(signal.SIGALRM, signal_handler)
    signal.setitimer(signal.ITIMER_REAL, 1, 1)
    global count
    while True:
        for line in sys.stdin:
            count += 1
            print count
if __name__ == "__main__":
    main()
然后执行tail -f /home/logs/nginx/access.log | python a.py,
问输出是 1,2,3,4,5,-1,6,7,8… 这样, 还是1,2,3,4,5,-1,0,1,2…
我们经过测试,是第一种情况。非常奇葩。
伟大的大神叠哥说道:
count += 1 有 read count、plus 1、store count 三步,会不会是因为实现上的什么原因,那个signal_handler总是插入到这三个操作中间去执行了?
因为,发现在count += 1 前面加一条无意义的赋值语句,结果就符合预期了。就输出1,2,3,4,5,-1,0,1,2…
while True: for line in sys.stdin: x = count count += 1 print count
为啥加上x = count,输出就变了呢?我觉得大神叠哥一语道出了问题的本质,于是想去求证一下是否真的如此:signal_handler总是插入到这三个操作中间去执行。
我们来看字节码
我们首先来看看signal_handler的字节码:
In [8]: dis.dis(signal_handler)
  3           0 LOAD_CONST               1 (-1)
              3 STORE_GLOBAL             0 (count)
  4           6 LOAD_CONST               2 ('sig')
              9 PRINT_ITEM
             10 LOAD_GLOBAL              0 (count)
             13 PRINT_ITEM
             14 PRINT_NEWLINE
             15 LOAD_CONST               0 (None)
             18 RETURN_VALUE
然后我们再看看两个程序main()的字节码。这里列出字节码主要供下一节参考使用,现在可以不用太仔细看。
非预期的程序
我们看看输出1,2,3,4,5,-1,6,7,8…的字节码
def main():
    signal.signal(signal.SIGALRM, signal_handler)
    signal.setitimer(signal.ITIMER_REAL, 1, 1)
    global count
    while True:
        for line in sys.stdin:
            count += 1
            print count
In [4]: dis.dis(main)
  2           0 LOAD_GLOBAL              0 (signal)
              3 LOAD_ATTR                0 (signal)
              6 LOAD_GLOBAL              0 (signal)
              9 LOAD_ATTR                1 (SIGALRM)
             12 LOAD_GLOBAL              2 (signal_handler)
             15 CALL_FUNCTION            2
             18 POP_TOP
  3          19 LOAD_GLOBAL              0 (signal)
             22 LOAD_ATTR                3 (setitimer)
             25 LOAD_GLOBAL              0 (signal)
             28 LOAD_ATTR                4 (ITIMER_REAL)
             31 LOAD_CONST               1 (1)
             34 LOAD_CONST               1 (1)
             37 CALL_FUNCTION            3
             40 POP_TOP
# while True:
  5          41 SETUP_LOOP              45 (to 89)
        >>   44 LOAD_GLOBAL              5 (True)
             47 POP_JUMP_IF_FALSE       88
# for line in sys.stdin:
  6          50 SETUP_LOOP              32 (to 85)
             53 LOAD_GLOBAL              6 (sys)
             56 LOAD_ATTR                7 (stdin)
             59 GET_ITER
        >>   60 FOR_ITER                21 (to 84)
             63 STORE_FAST               0 (line)
# count += 1
  7          66 LOAD_GLOBAL              8 (count)
             69 LOAD_CONST               1 (1)
             72 INPLACE_ADD
             73 STORE_GLOBAL             8 (count)
# print count
  8          76 LOAD_GLOBAL              8 (count)
             79 PRINT_ITEM
             80 PRINT_NEWLINE
             81 JUMP_ABSOLUTE           60  # 这里回到偏移60的指令
        >>   84 POP_BLOCK
        >>   85 JUMP_ABSOLUTE           44
        >>   88 POP_BLOCK
        >>   89 LOAD_CONST               0 (None)
             92 RETURN_VALUE
预期输出的程序
我们再来看看1,2,3,4,5,-1,0,1,2…的程序的字节码:
def main():
    signal.signal(signal.SIGALRM, signal_handler)
    signal.setitimer(signal.ITIMER_REAL, 1, 1)
    global count
    while True:
        for line in sys.stdin:
            x = count
            count += 1
            print count
In [6]: dis.dis(main)
  2           0 LOAD_GLOBAL              0 (signal)
              3 LOAD_ATTR                0 (signal)
              6 LOAD_GLOBAL              0 (signal)
              9 LOAD_ATTR                1 (SIGALRM)
             12 LOAD_GLOBAL              2 (signal_handler)
             15 CALL_FUNCTION            2
             18 POP_TOP
  3          19 LOAD_GLOBAL              0 (signal)
             22 LOAD_ATTR                3 (setitimer)
             25 LOAD_GLOBAL              0 (signal)
             28 LOAD_ATTR                4 (ITIMER_REAL)
             31 LOAD_CONST               1 (1)
             34 LOAD_CONST               1 (1)
             37 CALL_FUNCTION            3
             40 POP_TOP
# while True:
  5          41 SETUP_LOOP              51 (to 95)
        >>   44 LOAD_GLOBAL              5 (True)
             47 POP_JUMP_IF_FALSE       94
# for line in sys.stdin:
  6          50 SETUP_LOOP              38 (to 91)
             53 LOAD_GLOBAL              6 (sys)
             56 LOAD_ATTR                7 (stdin)
             59 GET_ITER
        >>   60 FOR_ITER                27 (to 90)
             63 STORE_FAST               0 (line)
# x = count
  7          66 LOAD_GLOBAL              8 (count)
             69 STORE_FAST               1 (x)
# count += 1
  8          72 LOAD_GLOBAL              8 (count)
             75 LOAD_CONST               1 (1)
             78 INPLACE_ADD
             79 STORE_GLOBAL             8 (count)
# print count
  9          82 LOAD_GLOBAL              8 (count)
             85 PRINT_ITEM
             86 PRINT_NEWLINE
             87 JUMP_ABSOLUTE           60
        >>   90 POP_BLOCK
        >>   91 JUMP_ABSOLUTE           44
        >>   94 POP_BLOCK
        >>   95 LOAD_CONST               0 (None)
             98 RETURN_VALUE
增加调试代码
有了程序的字节码后,我们需要打印一下程序的执行流程,来深入看看究竟发生了啥事。我们可以先下载Python 2.7.3的源代码回来,然后修改:
首先打印处理的字节码,我们打开Python/ceval.c,在下面PyEval_EvalFrameEx中的swtich (opcode),前面加上printf("[opcode: %d]n", opcode);来打印处理的opcode。
然后我们修改一下signalmodule模块,在处理signal的时候,打印一条log。我们打开Modules/signalmodule.c,在其中的PyErr_CheckSignals里面的result = PyEval_CallObject(Handlers[i].func, arglist);语句前加上printf("-> call signal handlern");,这样就可以在调用signal handler的时候打印一条log。
然后就是编译Python了。编译完,我们就用编译好的Python来做实验!Yeah~
打印程序执行流程
接下来,我们就执行这怪异的程序,我们先来看看输出1,2,3,4,5,-1,6,7,8…这个不符合预期的程序的字节码处理流程1
非预期版本的执行流程
[opcode: 113 JUMP_ABSOLUTE] [opcode: 93 FOR_ITER] [opcode: 116 LOAD_GLOBAL] [opcode: 100 LOAD_CONST] [opcode: 55 INPLACE_ADD] [opcode: 97 STORE_GLOBAL] [opcode: 116 LOAD_GLOBAL] [opcode: 71 PRINT_ITEM] [opcode: 72 PRINT_NEWLINE] [opcode: 113 JUMP_ABSOLUTE] [opcode: 93 FOR_ITER] [opcode: 116 LOAD_GLOBAL] # 将count压入栈 -> call signal handler [opcode: 100 LOAD_CONST] # 这里进入到了signal_handler [opcode: 97 STORE_GLOBAL] # count = -1 [opcode: 100 LOAD_CONST] [opcode: 71 PRINT_ITEM] [opcode: 116 LOAD_GLOBAL] [opcode: 71 PRINT_ITEM] [opcode: 72 PRINT_NEWLINE] [opcode: 100 LOAD_CONST] [opcode: 83 RETURN_VALUE] [opcode: 100 LOAD_CONST] # 回到main [opcode: 55 INPLACE_ADD] # 将之前栈顶的count+1,存到全局中的count [opcode: 97 STORE_GLOBAL] [opcode: 116 LOAD_GLOBAL] [opcode: 71 PRINT_ITEM] [opcode: 72 PRINT_NEWLINE] [opcode: 113 JUMP_ABSOLUTE]
经过几十秒后,我们可以看到,除了第一次外,后续的signal handler,都在同一个地方被调用的:
[opcode: 93 FOR_ITER] [opcode: 116 LOAD_GLOBAL] # 将count压入栈 -> call signal handler
这里解释器刚刚将全局的count压入栈,就切换到signal handler处理。
所以这里可以出现问题的原因就很明显了。在main()中,已经将全局的count压入栈,而切换到signal_handler中,修改了全局的count为-1,也是无意义的。因为切回到main()的时候,是操作的是栈顶部的count副本,栈顶+1后,将栈顶的值又写入全局的count中,就把signal_handler设置的count=-1给覆盖了。所以就会一直递增下去。
可预测版本的执行流程
[opcode: 113 JUMP_ABSOLUTE] [opcode: 93 FOR_ITER] [opcode: 116 LOAD_GLOBAL] [opcode: 125 STORE_FAST] [opcode: 116 LOAD_GLOBAL] [opcode: 100 LOAD_CONST] [opcode: 55 INPLACE_ADD] [opcode: 97 STORE_GLOBAL] [opcode: 116 LOAD_GLOBAL] [opcode: 71 PRINT_ITEM] [opcode: 72 PRINT_NEWLINE] [opcode: 113 JUMP_ABSOLUTE] [opcode: 93 FOR_ITER] [opcode: 116 LOAD_GLOBAL] # 将全局count压入栈顶 -> call signal handler [opcode: 100 LOAD_CONST] # 进入signal handler [opcode: 97 STORE_GLOBAL] [opcode: 100 LOAD_CONST] [opcode: 71 PRINT_ITEM] [opcode: 116 LOAD_GLOBAL] [opcode: 71 PRINT_ITEM] [opcode: 72 PRINT_NEWLINE] [opcode: 100 LOAD_CONST] [opcode: 83 RETURN_VALUE] [opcode: 125 STORE_FAST] # 回到main(), x = 栈顶 [opcode: 116 LOAD_GLOBAL] # 又将全局count压入栈,此时的全局的count是-1 [opcode: 100 LOAD_CONST] # 继续压入1 [opcode: 55 INPLACE_ADD] # 栈顶的两个值求和,count+1 [opcode: 97 STORE_GLOBAL] # 将栈顶的值写到全局count中,此时全局count为0 [opcode: 116 LOAD_GLOBAL] [opcode: 71 PRINT_ITEM] [opcode: 72 PRINT_NEWLINE] [opcode: 113 JUMP_ABSOLUTE]
我们可以看到,返回main()后,x是旧值,不过在count += 1的时候,又重新从全局字典中读取了count,所以此时的count是正确的。
于是这就解释了上述两程序的怪异行为。
新的疑问
于是这里又引入了一个新的疑问,为什么每次signal handler都在同一个指令后触发?嗯,这个我们就需要去看看Python解释器的实现。我们首先看看signal是怎么触发的。
signal
我们打开Modules/signalmodule.c,这个是signal模块的实现。
static void
trip_signal(int sig_num)
{
    Handlers[sig_num].tripped = 1;
    if (is_tripped)
        return;
    /* Set is_tripped after setting .tripped, as it gets
       cleared in PyErr_CheckSignals() before .tripped. */
    is_tripped = 1;
    Py_AddPendingCall(checksignals_witharg, NULL);
    if (wakeup_fd != -1)
        write(wakeup_fd, "