Python-aiohttp百万并发(下)-爱开源

同步 vs 异步

重头戏来了。我们来验证异步是否值得（编码麻烦）。看看同步与异步（client）效率上的区别。异步每分钟能够发起多少请求。

为此，我们首先配置一个异步的aiohttp服务器端。这个服务端将获取全部的html文本，来自Marry Shelley的Frankenstein。在每个响应中，它将添加随机的延时。有的为0，最大值为3s。类似真正的app。有些app的响应延时为固定值，一般而言，每个响应的延时是不同的。

服务器代码如下：

#!/usr/local/bin/python3.5
import asyncio
from datetime import datetime
from aiohttp import web
import random

# set seed to ensure async and sync client get same distribution of delay values
# and tests are fair random.seed(1)
async def hello(request):
    name = request.match_info.get("name", "foo")
    n = datetime.now().isoformat()
    delay = random.randint(0, 3)
    await asyncio.sleep(delay)
    headers = {"content_type": "text/html", "delay": str(delay)}
    # opening file is not async here, so it may block, to improve
    # efficiency of this you can consider using asyncio Executors
    # that will delegate file operation to separate thread or process
    # and improve performance
    # https://docs.python.org/3/library/asyncio-eventloop.html#executor
    # https://pymotw.com/3/asyncio/executors.html
    with open("frank.html", "rb") as html_body:
         print("{}: {} delay: {}".format(n, request.path, delay))
         response = web.Response(body=html_body.read(), headers=headers)
         return response

app = web.Application()
app.router.add_route("GET", "/{name}", hello)
web.run_app(app)

同步客户端代码如下：

import requests
r = 100
url = "http://localhost:8080/{}"
for i in range(r):
   res = requests.get(url.format(i))
  delay = res.headers.get("DELAY")
  d = res.headers.get("DATE")
  print("{}:{} delay {}".format(d, res.url, delay))

在我的机器上，上面的代码耗时2分45s。而异步代码只需要3.48s。

有趣的是，异步代码耗时无限接近最长的延时（server的配置）。如果你观察打印信息，你会发现异步客户端的优势有多么巨大。有的响应为0延迟，有的为3s。同步模式下，客户端会阻塞、等待，你的机器什么都不做。异步客户端不会浪费时间，当有延迟发生时，它将去做其他的事情。在日志中，你也会发现这个现象。首先是0延迟的响应，然后当它们到达后，你将看到1s的延迟，最后是最大延迟的响应。

极限测试

现在我们知道异步表现更好，让我们尝试去找到它的极限，同时尝试让它崩溃。我将发送1000异步请求。我很好奇我的客户端能够处理多少数量的请求。

> time python3 bench.py
2.68user 0.24system 0:07.14elapsed 40%CPU
(0avgtext+0avgdata 53704maxresident)
k 0inputs+0outputs (0major+14156minor)pagefaults 0swaps

1000个请求，花费了7s。相当不错的成绩。然后10K呢？很不幸，失败了：

responses are <_GatheringFuture finished exception=
　　ClientOSError(24, 'Cannot connect to host localhost:8080 ssl:
　　False [Can not connect to localhost:8080 [Too many open files]]')>
Traceback (most recent call last):
   File "/home/pawel/.local/lib/python3.5/site-packages/aiohttp/connector.py", line 581, in _create_connection
   File "/usr/local/lib/python3.5/asyncio/base_events.py", line 651, in create_connection
   File "/usr/local/lib/python3.5/asyncio/base_events.py", line 618, in create_connection
   File "/usr/local/lib/python3.5/socket.py", line 134, in __init__ OS
   Error: [Errno 24] Too many open files

这样不大好，貌似我倒在了10K connections problem面前。

traceback显示，open files太多了，可能代表着open sockets太多。为什么叫文件？Sockets（套接字）仅仅是文件描述符，操作系统有数量的限制。多少才叫太多呢？我查看Python源码，然后发现这个值为1024.怎么样绕过这个问题？一个粗暴的办法是增加这个数值，但是听起来并不高明。更好的办法是，加入一些同步机制，限制并发数量。于是我在asyncio.Semaphore()中加入最大任务限制为1000.

修改客户端代码如下：

# modified fetch function with semaphore
import random
import asyncio
from aiohttp import ClientSession

async def fetch(url):
   async with ClientSession() as session:
       async with session.get(url) as response:
       　delay = response.headers.get("DELAY")
       　date = response.headers.get("DATE")
       　print("{}:{} with delay {}".format(date, response.url, delay))
       　return await response.read()
       　
async def bound_fetch(sem, url):
    # getter function with semaphore
    async with sem:
    　await fetch(url)
    async def run(loop,  r):
    　url = "http://localhost:8080/{}"
    　tasks = []
    　# create instance of Semaphore
    　sem = asyncio.Semaphore(1000)
    　for i in range(r):
    　    # pass Semaphore to every GET request
    　    task = asyncio.ensure_future(bound_fetch(sem, url.format(i)))
    　    tasks.append(task)
    　    responses = asyncio.gather(*tasks)
    　
await responses number = 10000
loop = asyncio.get_event_loop()
future = asyncio.ensure_future(run(loop, number))
loop.run_until_complete(future)

现在，我们可以处理10k链接了。这花去我们23s，同时返回了一些异常。不过不管怎样，相当不错的表现。

那100K呢？这个任务让我的机器很吃力，不过惊奇的是，它工作的很好。服务器的表现相当稳定，虽然内存占用很高，然后cpu占用一直维持在100%左右。让我觉得有趣的是，服务器占用的cpu明显小于client。这是ps的回显：

pawel@pawel-VPCEH390X ~/p/l/benchmarker> ps ua | grep python
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
pawel     2447 56.3  1.0 216124 64976 pts/9    Sl+  21:26   1:27 /usr/local/bin/python3.5 ./test_server.py
pawel     2527  101  3.5 674732 212076 pts/0   Rl+  21:26   2:30 /usr/local/bin/python3.5 ./bench.py

最终因为某些原因，运行5分钟过后，它崩溃了。它生成了接近100K行的输出，所以很难定位traceback，好像某些响应没有正常关闭。具体原因不太确定。(client or server error)

一段时间的滚动以后，我找到了这个异常，在client日志中。

  File "/usr/local/lib/python3.5/asyncio/futures.py", line 387, in __iter__
      return self.result()  # May raise too.
  File "/usr/local/lib/python3.5/asyncio/futures.py", line 274, in result
       raise self._exception
  File "/usr/local/lib/python3.5/asyncio/selector_events.py", line 411, in _sock_connect
       sock.connect(address) OS
  Error: [Errno 99] Cannot assign requested address

我不太确定这里发生了什么。我初始的猜测是测试服务器挂掉了。一个读者提出：这个异常的发生原因是操作系统的可用端口耗尽。之前我限制了并发连接数最大为1k，可能有些sockets仍然处在closing状态，系统内核无法使用才导致这个问题。

已经很不错了，不是吗？100k耗时5分钟。相当于一分钟20k请求数。

最后我尝试1M连接数。我真怕我的笔记本因为这个爆炸^_^.我特意将延迟降低为0到1s之间。最终耗时52分钟。

1913.06user 1196.09system 52:06.87elapsed 99%CPU
(0avgtext+0avgdata 5194260maxresident)k 265144
inputs+0outputs (18692major+2528207minor)
pagefaults 0swaps

这意味着，我们的客户端每分钟发送了19230次请求。还不错吧？注意客户端的性能被服务器限制了，好像服务器端崩溃了好几次。

最后

如你所见，异步HTTP客户端相当强大。发起1M请求不是那么困难，同时相比同步模式，优势巨大。

我好奇对比其他的语言或者异步框架，其表现如何？可能在以后某个时候，我将对比Twisted Treq跟aiohttp。然后，其他的异步库(其他语言)能够支持到多少并发？比如：某些Java 异步框架？或者C++框架？或者某些Rust HTTP客户端？

转载请注明：爱开源 » Python-aiohttp百万并发(下)

Python-aiohttp百万并发(下)

同步 vs 异步

极限测试

最后

相关文章

与本文相关的文章