WebSockets

WebSocket是客户机和服务器间双向数据传输的标准协议。WebSockets协议不是运行在HTTP上,而是在TCP上直接运行。

为什么使用WebSockets?

WebSocket链接允许客户机和服务器间的全双工通信,链接建立后,任一方都可以把数据传给另一方。为什么WebSockets,Server-sent Events (SSE) 和WebRTC data channels至关重要,是因为HTTP不会维持一个开发连接把数据传给浏览器。之前,大多数网络应用会通过AJAX执行一个长链接,如下图。Long polling via AJAX is incredibly inefficient for some applications.

相对比长连接,服务器端推送更有效也更容易扩展,因为浏览器无需频繁的发送AJAX请求。WebSockets are more efficient than long polling for server sent updates.

上图显示了服务器推送数据到客户机,WebSockets是全双工连接,所有客户机也能推送数据到服务器,如下图。

WebSockets also allow client push in addition to server pushed updates.

WebSockets方式对于某些web应用非常好,比如网络聊天室,这也常被用来作为WebSockets的应用实例。

WebSockets实现

浏览器端和服务器端都必须实现WebSockets协议来维持连接。

一个多线程或多处理器的服务器不能很好的扩展WebSockets,因为它被设计成建立连接,快速处理请求,然后关闭连接。异步服务器比如Tornado 或 Green Unicorn monkey patched with gevent 可以很好的实现WebSockets服务器端执行。

客户机端,并不需要为WebSockets使用JavaScript库。实现WebSockets的浏览器会通过WebSockets object实现功能。

JavaScript客户端库

  • Socket.io 客户端JavaScript库能用与建立与服务器端的WebSockets连接。
  • web-socket-js 是一个基于Flash开发的客户端WebSockets实现。

Python实现

Nginx WebSocket proxying

Nginx officially supports WebSocket proxying as of version 1.3. However, you have to configure the Upgrade and Connection headers to ensure requests are passed through Nginx to your WSGI server. It can be tricky to set this up the first time.

Here are the configuration settings I use in my Nginx file as part of my WebSockets proxy.

# this is where my WSGI server sits answering only on localhost
# usually this is Gunicorn monkey patched with gevent
upstream app_server_wsgiapp {
  server localhost:5000 fail_timeout=0;
}

server {

  # typical web server configuration goes here

  # this section is specific to the WebSockets proxying
  location /socket.io {
    proxy_pass http://app_server_wsgiapp/socket.io;
    proxy_redirect off;

    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
    proxy_read_timeout 600;
  }
}

Note if you run into any issues with the above example configuration you’ll want to scope out the official HTTP proxy module documentation.

The following resources are also helpful for setting up the configuration properly.

Open source Python examples with WebSockets

  • The python-websockets-example contains code to create a simple web application that provides WebSockets using Flask, Flask-SocketIO and gevent.
  • The Flask-SocketIO project has a chat web application that demos sending server generated events as well as input from users via a text box input on a form.

General WebSockets resources

  • The official W3C candidate draft for WebSockets API and the working draft for WebSockets are good reference material but can be tough for those new to the WebSockets concepts. I recommend reading the working draft after looking through some of the more beginner-friendly resources list below.
  • WebSockets 101 by Armin Ronacher provides a detailed assessment of the subpar state of HTTP proxying in regards to WebSockets. He also discusses the complexities of the WebSockets protocol including the packet implementation.
  • The “Can I Use?” website has a handy WebSockets reference chart for which web browsers and specific versions support WebSockets.
  • Mozilla’s Developer Resources for WebSockets is a good place to find documentation and tools for developing with WebSockets.
  • WebSockets from Scratch gives a nice overview of the protocol then shows how the lower-level pieces work with WebSockets, which are often a black box to developers who only use libraries like Socket.IO.
  • websocketd is a WebSockets server aiming to be the “CGI of WebSockets”. Worth a look.

Python-specific WebSockets resources

MkDocs

MkDocs 是一个基于Python的静态站点生成器,整合了Markdown和Jinja2来生成静态网页。源代码参见source code is available on GitHub

MkDocs static site and documentation generator logo.

MkDocs有什么优点?

MkDocs使用YAML配置文件,可以使用主题方便的更改输出文档的外观。

除了YAML配置文件和可变主题,MkDocs还有一个搜索特性。在其他静态站点生成器里搜索不太容易,但是在MkDocs里可以容易添加搜索而无需插件或修改代码。

MkDocs资源

  • The official Getting Started with MkDocs is likely the best place to go when you are just getting set up with your first site that uses this project.
  • Building Markdown-based Sites with MkDocs provides an initial perspective on using MkDocs to build a static website along with some of the advantages and disadvantages that come with using this static site generator.
  • Mkdocs documentation is a quick tutorial to get MkDocs installed and modify the initial mkdocs.yml file.
  • MkDocs making strides is a post from one of the project’s core commiters on some changes that greatly improved the project such as site regeneration during development when a file is modified, search, the command-line client and packageable theming.

Lektor

Lektor 是一个静态站点生成器,带有内容管理系统(CMS)及web框架特征。

Lektor’s 源代码托管在GitHub 。

Lektor static website generator logo.

Lektor与其他静态站点生成器有何不同?

大多数静态站点生成器,包括Pelican,将程序员作为主要用户。Lektor试图通过提供管理面板来创建和更新网站内容,来面向非程序员用户。

Lektor资源

  • Introducing Lektor is the background story for what motivated Armin Ronacher to start hacking on his own static site generator project after jumping around from Django to WordPress for hosting content. The post also includes details on the differences in the project compared to other static site generators.
  • Hello, Lektor is a wonderful getting started and overview post. The post walks through the files Lektor generates, the admin content management system and pulling data into pages from the Meetup API.
  • The official Lektor quickstart contains the first commands to use to generate a new project scaffold. There is also a getting started screencast that walks through installing and initial steps for getting set up with the project.
  • Converting to Lektor provides a quick hack for converting exported content in XML from Silvrback to a format that Lektor can use to generate a new site.
  • Lektor Static CMS, put the fun back into Content Management is a short overview as the first part in what aims to be a continuing series on how to use Lektor as a content management system.
  • In Experiences Migrating to Lektor the author gives his impression of Lektor after moving his 400+ articles over from a home-grown blogging engine. He talks a bit about how he went from deploying on GitHub Pages to surge.sh and finally over to Netlify.

Pelican

Pelican是一个Python写的静态站点生成器,以Jinja作为模板引擎,用Markdown或reStructuredText写的内容来生成静态网站。

Pelican 源代码在GitHub

Pelican static website generator logo.

为什么Pelican是一个有用的工具?

静态网站比用网站开发框架制作的依赖于后端数据库的网站容易部署的多。此外,静态网站能更快速的加载,因为没有HTTP请求-相应循环或数据库请求和中间件执行。

提供静态网站托管的服务器只需简单相应HTTP请求即可,在服务器端不需要创建动态数据。

Pelican资源

静态站点生成器 Static Site Generator

静态站点生成器综合了标记语言,比如Markdown或reStructuredText,和模板引擎比如Jinja,用于生成HTML文件。HTML文件能被web服务器或CDN支持而不需要任何依赖于WSGI server.

为什么静态站点生成器?

静态内容比如HTML,CSS和JavaScript文件能通过CDN分发且费用低廉。如果静态网站有大量同时流量进入,能被CDN分流而不影响链接。

比如,下图的网站由CDN分流,即使有近400个同时链接也能正常运作。

Example of how static websites scale with a CDN based on Full Stack Python on Hacker News front page traffic.

静态站点生成器如何工作?

静态站点生成器允许开发者通过写标记语言和模板文件来生成HTML文件,然后站点合并标记语言和模板,创建HTML。产生的HTML文件不需要手工维护,每次修改标记语言和模板时,HTML文件都会相应修改。

如下图,Pelican静态站点生成器将reStructuredText文件和Jinja2模板文件作为输入,整合生成静态的HTML文件。

Example of how static site generators work with a markup language and templates.

使用静态站点生成器的不足?

主要不足是站点生成后,其中的代码不能被执行(因为是静态的)。

被数据库支持的数据,比如评论,sessions等,只能通过第三方服务支持。如果你现在一个静态网站上使用评论,你需要嵌入embed Disqus’s form并完全依赖他们的服务。

很多web应用无法只使用一个静态网站。但是,一个静态网站生成器可以生成一个web应用的部分,剩余动态的部分由WSGI服务器提供。如果整合恰当,这样的网站的性能,会由于完全由WSGI服务器产生内容的网站。

Python实现

许多编程语言都有静态站点生成器的实现方式。下面是一些主要由Python代码实现的例子:

  • Pelican 是一个常用的Python静态站点生成器。主要模板引擎是Jinja,Markdown,reStructuredText and AsciiDoc。
  • Lektor 是一个静态内容管理系统和静态站点生成器,它使用Jinja作为模板引擎。
  • MkDocs
  • Nikola (source code) 整合了reStructuredText,Markdown或Jupyter (IPython) Notebooks,以Mako或Jinja2模板生产静态站点。
  • Acrylamid (source code) uses incremental builds to generate static sites faster than recreating every page after each change is made to the input files.
  • Hyde (source code) started out as a Python rewrite of the popular Ruby-based Jekyll static site generator. Today the project has moved past those “clone Jekyll” origins. Hyde supports Jinja as well as other templating languages and places more emphasis on metadata within the markup files to instruct the generator how to produce the output files. Check out the Hyde-powered websites page to see live examples created with Hyde.
  • Grow SDK (source code) uses projects, known as pods, which contain a specific file and directory structure so the site can be generated. The project remains in the “experimental” phase.
  • Complexity (source code) is a site generator for users who like to work in HTML. It uses HTML for templating but has some functionality from Jinja for inheritance. Works with Python 2.6+, 3.3+ and PyPy.
  • Cactus (source code) uses the Django templating engine that was originally built with front-end designers in mind. It works with both Python 2.x and 3.x.

开源静态站点生成器案例

静态站点生成器资源

Static site generators can be implemented in any programming language. The following resources either are general to any programming ecosystem or provide a unique angle on how to use a static site generator.

  • Static vs Dynamic Websites does an excellent job of showing the differences between a dynamic website that uses a database backend to produce content in response to a request compared with static sites that are pregenerated. There is also a second part in the series where generic static site generator concepts are explained.
  • Staticgen lists static website generators of all programming languages sorted by various attributes such as the number of GitHub stars, forks and issues.
  • The title is a big grandiose, but there’s some solid detail in this article on why static website generators are the next big thing. I’d argue static website generators have been big for a long time now.
  • Static site generators can be used for a range of websites from side projects up to big sites. This blog post by WeWork on why they use a static site generator explains it from the perspective of a large business.
  • Ditching WordPress and becoming one of the cool kids is one developer’s experience moving away from WordPress and onto Pelican with reStructuredText for his personal blog.
  • Static websites with Flask explains how to use Flask-Frozen to generate a static site based on content from the web framework and a data source backend. This approach is an alternative to using a purpose-built static website generator such as Pelican, Lektor or MkDocs.

静态网站部署资源

Deploying a static site is far less complicated than a traditional web application deployment, but you still need to host the files somewhere accessible. You’ll also to set up DNS to point a domain name to your site as well as provide HTTPS support. These guides walk through various ways of handling the static site deployment.

  • Randall Degges’ Ultimate Guide to Deploying Static Sites to Amazon Web Services walks through all the steps you need to get your site up and running on S3. The guide also shows how to set up SSL certificates to ensure your site can be loaded via HTTPS.
  • Deploying a Static Site on AWS, with S3 and CloudFront provides a really nice tutorial with screenshots to get any type of static site configured on AWS using S3 and Cloudfront.
  • Static site hosting with S3 and Cloudflare shows how to set up an S3 bucket with Cloudflare in front as a CDN that serves the content with HTTPS. You should be able to accomplish roughly the same situation with Amazon Cloudfront, but as a Cloudflare user I like their service for these static site configurations.
  • Google Cloud provides a tutorial on how to use them to host your static site. Note that you cannot currently use HTTPS on Google Storage servers, which is a huge downside.

AWS Lambda

Amazon Web Services (AWS) Lambda is a compute service that executes arbitrary Python code in response to developer-defined AWS events, such as inbound API calls or file uploads to AWS’ Simple Storage Service (S3).

AWS Lambda logo.

Why is Lambda useful?

Lambda is often used as a “serverless” compute architecture, which allows developers to upload their Python code instead of spinning and configuring servers, deploying their code and scaling based on traffic.

Python on AWS Lambda

Lambda only had support for JavaScript, specifically Node.JS, when it was first released in late 2014. Python 2 developers were welcomed to the platform less than a year after its release, in October 2015. Lambda now has support for both Python 2.7 and 3.6.

Python-specific AWS Lambda resources

General AWS Lambda resources

Redis Queue (RQ)

Redis Queue (RQ) is a Python task queue implementation that uses Redis to keep track of tasks in the queue that need to be executed.

Redis Queue (RQ) task queue Python project logo.

RQ is an implementation of the task queue concept. Learn more in the web development chapter or view the table of contents for all topics.

RQ resources

Celery

Celery is a task queue implementation for Python web applications used to asynchronously execute work outside the HTTP request-response cycle.

Celery task queue project logo.

Celery is an implementation of the task queue concept. Learn more in the web development chapter or view the table of contents for all topics.

Why is Celery useful?

Task queues and the Celery implementation in particular are one of the trickier parts of a Python web application stack to understand.

If you are a junior developer it can be unclear why moving work outside the HTTP request-response cycle is important. In short, you want your WSGI server to respond to incoming requests as quickly as possible because each request ties up a worker process until the response is finished. Moving work off those workers by spinning up asynchronous jobs as tasks in a queue is a straightforward way to improve WSGI server response times.

What’s the difference between Celeryd and Celerybeat?

Celery can be used to run batch jobs in the background on a regular schedule. A key concept in Celery is the difference between the Celery daemon (celeryd), which executes tasks, Celerybeat, which is a scheduler. Think of Celeryd as a tunnel-vision set of one or more workers that handle whatever tasks you put in front of them. Each worker will perform a task and when the task is completed will pick up the next one. The cycle will repeat continously, only waiting idly when there are no more tasks to put in front of them.

Celerybeat on the other hand is like a boss who keeps track of when tasks should be executed. Your application can tell Celerybeat to execute a task at time intervals, such as every 5 seconds or once a week. Celerybeat can also be instructed to run tasks on a specific date or time, such as 5:03pm every Sunday. When the interval or specific time is hit, Celerybeat will hand the job over to Celeryd to execute on the next available worker.

Celery tutorials

Celery is a powerful tool that can be difficult to wrap your mind around at first. Be sure to read up on task queue concepts then dive into these specific Celery tutorials.

Celery deployment resources

任务队列 Task queues

任务队列管理常规HTTP请求-相应之外的必须被执行的后台工作。

为什么任务队列必须?

任务通常被异步处理,或者因为它不是被HTTP请求发起,或者它们是在后台长期运行,会显著降低HTTP相应的性能。

比如,一个web应用汇每10分钟pull一次GitHub API,去收集前100个标星的仓库。一个任务队列用来处理这个工作,去调用GitHub API,并把结果存储到持久的数据库里。

另一个例子是当一个数据库查询会在HTTP请求-相应循环里占用太多时间时,查询会每隔一定间隔就在后台被执行,把结果存在数据库中,当HTTP请求来时只要去取预先查询的内容,而不是做那个复杂费时的查询。

其他类型的任务队列包括:

  • 把数据库插入分解到相互独立的数据库操作
  • 每隔一定时间聚合一次数据,比如每隔15分钟
  • 定期执行预处理任务

任务队列项目

Python任务队列的事实标准是Celery。 对于简单应用来说,Celery过于复杂。Celery值得你投入学习。

任务队列云服务

使用任务队列的开源案例

任务队列资源

任务队列学习清单

  1. Pick a slow function in your project that is called during an HTTP request.
  2. Determine if you can precompute the results on a fixed interval instead of during the HTTP request. If so, create a separate function you can call from elsewhere then store the precomputed value in the database.
  3. Read the Celery documentation and the links in the resources section below to understand how the project works.
  4. Install a message broker such as RabbitMQ or Redis and then add Celery to your project. Configure Celery to work with the installed message broker.
  5. Use Celery to invoke the function from step one on a regular basis.
  6. Have the HTTP request function use the precomputed value instead of the slow running code it originally relied upon.