Docker

Docker 是一个开源基础设施管理平台,用于运行和部署软件。

为什么Docker重要?

Docker能够把应用与它们所必须的操作系统依赖包打包,使得部署简化。长期来看,它能成为任何服务器上的抽象层,无论这个服务器是Amazon Web Services,Google Compute Engine,Linode,Rackspace还是其他。

Docker镜像里的Python项目

Docker资源

Python相关Docker资源

配置管理

配置管理包括修改服务器,使之从一个既有状态到另一个状态,并把应用部署自动化。

配置管理工具

工具包括PuppetChefSaltStack,和 Ansible。Puppet 和 Chef 由Ruby编写,SaltStack 和Ansible由Python编写。

特别任务

诸如Chef,Puppet,Ansible 和 SaltStack之类的配置管理工具,并不适合执行需要交互响应的特别任务。Fabric 和 Invoke 被用于交互操作,比如从Django的manage.py shell界面查询数据库。

配置工具比较

Ansible

Ansible is an open source configuration management and application deployment tool built in Python.

Ansible资源

配置管理学习清单

  1. Learn about configuration management in the context of deployment automation and infrastructure-as-code.
  2. Pick a configuration management tool and stick with it. My recommendation is Ansible because it is by far the easiest tool to learn and use.
  3. Read your configuration management tool’s documentation and, when necessary, the source code.
  4. Automate the configuration management and deployment for your project. Note that this is by far the most time consuming step in this checklist but will pay dividends every time you deploy your project.
  5. Hook the automated deployment tool into your existing deployment process.

Jenkins

Jenkins is a continuous integration (CI) server often used to automate building, testing and deploying Python applications.

Official Jenkins CI logo. Licensed under Creative Commons Attribution-ShareAlike 3.0 Unported License.

Jenkins is an implementation of the continuous integration concept. Learn more in the deployment chapter or view the table of contents for all topics.

Jenkins resources

持续整合 Continuous Integration

持续整合将开发,测试,部署过程自动化。无论是个人开发还是团队开发,软件项目可以使用持续整合来确保重要步骤比如单元测试可以自动执行而无需手动执行。

为什么持续整合重要?

当持续整合(CI) 成为软件开发项目的一个环节,它能通过减少人工干预的步骤而显著降低部署时间。唯一的不足是开始时需要花些时间设置,接下来需要持续的维护。

自动测试

CI的另一个主要优势是在部署过程中可以将测试自动化进行。通过运行一个全面的单元测试和整合测试,部署失败可以避免。任何偶然引入并被单元测试发现的bug都会被报告并排除在部署环节。

自动化测试The automated testing on checked in source code can be thought of like the bumper guards in bowling that prevent code quality from going too far off track. CI combined with unit and integration tests check that any code modifications do not break existing tests to ensure the software works as intended.

持续整合案例

下图描述了一个持续整合和部署如何工作的全局试图:

One potential way for continuous integration to work with source control and a deployment environment.

在上图中,当新代码被提交到源代码库时,有一个hook提示持续整合服务器,新代码需要bulid(持续整合服务器也可以从源代码库拉代码,如果提示不成功)。

持续整合服务器拉代码,build,然后测试。如果所有测试通过,持续整合服务器开始部署。最终部署过程完成,重启服务和相关的部署活动。

持续整合有其他多种方式。上述只是一种相对简单的例子。

开源CI项目

有许多免费、开源的整合服务器。许多并不是由Python编写,但是可以为Python应用服务。Polyglot机构(使用超过一种语言和生态系统)经常使用单个CI服务器支持他们的所有项目,而不用考虑项目本身是用什么编程语言编写的。

Jenkins CI资源

Jenkins is commonly used as a continuous integration server implementation for Python projects because it is open source and programming language agnostic. Learn more via the following resources or on the dedicated Jenkins page.

常用持续整合资源

绿独角兽Green Unicorn (Gunicorn)

Green Unicorn,一般简写成”Gunicorn”,是一个Web Server Gateway Interface (WSGI) server,用来运行Python web应用。

Official Green Unicorn (Gunicorn) logo.

为什么Gunicorn重要?

Gunicorn是许多个WSGI服务器中的一个。稳定,常用作web部署,支持了一些世界上最大型的Python web应用如Instagram

Gunicorn 执行EP3333 WSGI server standard specification规范。比如,如果你用Django, Flask 或 Bottle等框架写了一个web应用,那么你的应用就是执行了WSGI规范。

Gunicorn如何知道该如何运行我的web应用?

Gunicorn 知道该如何运行一个web应用,是基于WSGI服务器和WSGI兼容web应用两者之间的一个钩子hook。

下面是一个典型的Djangoweb应用,被Gunicorn运行。我们使用django_defaults 作为一个Django项目。django_defaults项目内的子目录下,有一个wsgi.py文件,内容如下:

"""
WSGI config for django_defaults project.

It exposes the WSGI callable as a module-level variable named ``application``.

For more information on this file, see
https://docs.djangoproject.com/en/1.8/howto/deployment/wsgi/
"""

import os

from django.core.wsgi import get_wsgi_application

os.environ.setdefault("DJANGO_SETTINGS_MODULE", "django_defaults.settings")

application = get_wsgi_application()

wsgi.py 文件通过命令django-admin.py startproject ,在Django项目创建的时候被创建。Django 通过 wsgi.py 文件生成一个application 变量,这样WSGI服务器就把 application 作为一个hook来运行这个web应用,如下图所示:

Gunicorn WSGI server invoking a Django WSGI application.

什么是”pre-fork”工人模型worker model?

Gunicorn 基于pre-fork worker model。The pre-fork worker model意味着主线程唤起工人去处理请求,但是不控制工人如何处理这些请求。每个工人都独立于控制者。

Gunicorn资源

WSGI服务器

Web Server Gateway Interface (WSGI),为Python web应用实现web服务器端的WSGI接口。

WSGI为什么必须?

一个传统的web服务器并不了解或知道如何去运行一个Python web应用。上世纪90年代后期,一个名叫Grisha Trubetskoy的开发者开发了一个用于Apache上的模块an Apache module called mod_python来执行Python代码。从那以后到2000年后,Apache配置mod_python来跑大多数Python web应用。

然而,mod_python不是一个标准。它只是让Python代码跑在一个web服务器上。 mod_python开发上的拖延和安全问题被发现,社区认识到web应用需要一个更连贯性的模块。

因此Python社区提出了WSGI作为模块和容器可以执行的标准接口, 现在WSGI已经被认可为运行Python web应用的标准方式。

WSGI server invoking a WSGI application.

如上图所示,WSGI服务器从WSGI应用唤起一个可被调用的对象,具体定义参见PEP 3333标准。

WSGI的目的

为什么使用WSGI,而不是让web服务器直接指向应用?

  • WSGI 让你具有灵活性。应用开发者能复用web组件。比如,一个开发者能从Green Unicorn转向uWSGI,而无需修改执行WSGI的应用框架。根据PEP 3333:可用性和这种为Python服务的API,能够把对web服务器和web框架的选择给分离开来,让开发者专注于它们擅长的领域。
  • WSGI 服务器提供了伸缩性。同时响应上千个动态请求的是WSGI服务器,而不是应用框架。WSGI服务器处理来自web服务器的请求,并决定如何将这些请求发送到应用框架。这种分割有效提升了网络流量的伸缩性。

WSGI Server - Web server - Browser

WSGI 从设计上讲是运行Python代码的一个标准接口。作为一个开发者,你只需要知道:

  • WSGI代表web服务器网关接口(Web Server Gateway Inteface)
  • WSGI容器是一个单独的进程,跟你的web服务器跑在不同的端口
  • 你的web服务器被配置成传输请求到WSGI容器,后者跑你的web应用,然后把响应结果(以HTML形式)传递回请求者

如果你正在使用标准web框架,比如Django,Flask,或Bottle,或任何其他Python框架,你无需知道框架如何执行WSGI标准。同样,如果你正在使用一个标准的WSGI容器,比如Green Unicorn,uWSGI,mod_wsgi,或gevent,你也无需知道他们如何执行WSGI标准。

但是,当你成为只剩Python web开发者时,知道WSGI标准以及这些框架和容器如何执行WSGI,应该是你学习的一部分。

官方WSGI规范

The WSGI standard v1.0 is specified in PEP 0333. As of September 2010, WSGI v1.0 is superseded by PEP 3333, which defines the v1.0.1 WSGI standard. If you’re working with Python 2.x and you’re compliant with PEP 0333, then you’re also compliant with 3333. The newer version is simply an update for Python 3 and has instructions for how unicode should be handled.

wsgiref in Python 2.x and wsgiref in Python 3.x are the reference implementations of the WSGI specification built into Python’s standard library so it can be used to build WSGI servers and applications.

web服务器配置案例

A web server’s configuration specifies what requests should be passed to the WSGI server to process. Once a request is processed and generated by the WSGI server, the response is passed back through the web server and onto the browser.

For example, this Nginx web server’s configuration specifies that Nginx should handle static assets (such as images, JavaScript, and CSS files) under the /static directory and pass all other requests to the WSGI server running on port 8000:

# this specifies that there is a WSGI server running on port 8000
upstream app_server_djangoapp {
    server localhost:8000 fail_timeout=0;
}

# Nginx is set up to run on the standard HTTP port and listen for requests
server {
  listen 80;

  # nginx should serve up static files and never send to the WSGI server
  location /static {
    autoindex on;
    alias /srv/www/assets;
  }

  # requests that do not fall under /static are passed on to the WSGI
  # server that was specified above running on port 8000
  location / {
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header Host $http_host;
    proxy_redirect off;

    if (!-f $request_filename) {
      proxy_pass http://app_server_djangoapp;
      break;
    }
  }
}

Note that the above code is a simplified version of a production-ready Nginx configuration. For real SSL and non-SSL templates, take a look at the Underwear web server templates on GitHub.

WSGI服务器

WSGI Read the Docs上有一张网站的WSGI服务器列表。下面的WSGI服务器是社区推荐的:

  • Green Unicorn is a pre-fork worker model based server ported from the Ruby Unicorn project.
  • uWSGI is gaining steam as a highly-performant WSGI server implementation.
  • mod_wsgi is an Apache module implementing the WSGI specification.
  • CherryPy is a pure Python web server that also functions as a WSGI server.

WSGI资源

WSGI服务器学习清单

  1. 理解WSGI是关于应用和服务器执行的Python规范
  2. 选一个WSGI服务器。Green Unicorn是一个好的开始。
  3. 把WSGI服务器增加到你的服务器部署里。
  4. 配置web服务器,让它把请求传输到WSGI服务器。
  5. 测试WSGI服务器,让它相应本地请求,不相应你本地之外的请求。

Caddy

Caddy 是创建于2015年的相对很新的HTTP服务器,由Go语言编写。它的设计哲学是关注HTTP/2协议和HTTPS。

Python部署如何使用Caddy?

Caddy 能被用于本地开发测试和生产环境部署,用作HTTP服务器和反向代理proxy directive

常用addy资源

  • A look inside Caddy shows and explains some of the Go code written to build the server.
  • The official Caddy server docs are the spot to look for what directives can be placed into a Caddy configuration file
  • Caddy a modern web server supporting HTTP/2 is a quick synopsis on installing Caddy along with a short example configuration file.
  • HTTP 2.0 on localhost with Caddy shows how to use a self-signed certificate with Caddy to do local development with an HTTP/2 web server.
  • Is Caddy free? explains the donation and sponsorships model that Caddy uses to continue development on the server. The gist is that the server is free to clone, download and use. Sponsors and optional donations are currently used to fund ongoing development.

Nginx

Nginx, 发音”engine-X”(安俊-X), 是前10万大网站里第二常用的web服务器second most common web server among the top 100,000 websites。Nginx也常被用作反向代理以处理来自Python WSGI servers 甚至Apache服务器的请求。

Official Nginx logo.

Nginx在Python Web应用中如何部署?

Nginx常被用来作为发布静态文件,比如图像,CSS 和 JavaScript文件。

Nginx也常被配置成反向代理,即把传入的HTTP请求传输给WSGI server。WSGI服务器通过运行Python应用来产生动态内容。当WSGI服务器传输其响应,以HTML,JSON 或 XML 格式,反向代理再把这些内容传递回客户端。

反向代理服务器和WSGI服务器上的请求和相应周期,可以参见下图:

Python web application deployments rely on Nginx either as a web server or reverse proxy for WSGI servers.

一般来说,客户端不知道也不需要知道是一个Python web应用产生响应结果。响应结果可以是由后台系统的任何语言产生,不只是Python。

我该使用Nginx还是Apache?

首先需要说明:它们都是优秀的开源项目,都能很好的处理你的web应用。实际上,许多世界级大公司在它们的web服务器上两者都用。

Nginx的配置文件会相对更容易写。

Nginx安全加强

Nginx正常安装后做默认配置,是安全的基础。然而,第一次设置ciphers和redirects会比较麻烦。可以读读:

Nginx资源

Nginx can be used without Python so there are a massive number of fantastic resources available for installing, configuring and optimizing this web server implementation. The following resources are ones that I collected during my own struggle while learning how to use Nginx after I had used Apache HTTP Server for several years.

  • The Nginx chapter in the Architecture of Open Source Applications book has a great chapter devoted to why Nginx is built to scale a certain way and lessons learned along the development journey.
  • Inside Nginx: How we designed for performance and scale is a blog post from the developers behind Nginx on why they believe their architecture model is more performant and scalable than other approaches used to build web servers.
  • Test-driving web server configuration is a good story for how to iteratively apply configuration changes, such as routing traffic to Piwik for web analytics, reverse proxying to backend application servers and terminately TLS connections appropriately. It is impressive to read a well-written softare development article like this from a government agency, although UK’s Government Digital Service as well as USA’s 18F and US Digital Service foster a far more credible culture than most typical agencies.
  • Nginx for Developers: An Introduction provides the first steps to getting an initial Nginx configuration up and running.
  • A faster Web server: ripping out Apache for Nginx explains how Nginx can be used instead of Apache in some cases for better performance.
  • Nginx vs Apache: Our view is a first-party perspective written by the developers behind Nginx as to the differences between the web servers.
  • Rate Limiting with Nginx covers how to mitigate against brute force password guessing attempts using Nginx rate limits.
  • Nginx with dynamic upstreams is an important note for setting up your upstream WSGI server(s) if you’re using Nginx as a reverse proxy with hostnames that change.
  • Nginx Caching shows how to set up Nginx for caching HTTP requests, which is often done by Varnish but can also be handled by Nginx with the proxy_cache and related directives.
  • Nginx web server tutorials are oldies but goodies on setting up previous versions of Nginx.
  • Dynamic log formats in nginx explains how to use the HttpSetMiscModule module to transform variables in Nginx and map input to controlled output in the logs. The author uses this technique for pixel tracking but there are other purposes this method could be used for such as advanced debugging.
  • Detecting Bots in Apache & Nginx Logs is an awesome tutorial that shows how to filter web crawlers and bots from your traffic logs when using them for web traffic analytics.

Nginx发行版

Apache HTTP服务器

Apache HTTP服务器是一个广泛部署的web服务器,能够单独使用或者安装WSGI模块如mod_wsgi使用来跑Python web应用。

Apache HTTP Server logo.

为什么Apache HTTP服务器重要?

在过去20多年里,Apache一直是最常用的web服务器。巨大的安装和使用量使得它有大量的教程和开源模块。

Apache开发始于1994年中期,作为一个NCSA HTTP Server项目的分子。到1996年早起,Apache取代先前的NCSA服务器。

Apache HTTP服务器资源

Web服务器

Web服务器从客户端接收Hypertext Transfer Protocol (HTTP) 请求,发送相应数据,比如HTML,XML,JSON。

为什么Web服务器必须?

Web服务器和客户端通过标准语言沟通。通过这种标准语言,一个陈旧的Mozilla Netscape浏览器能和一个现代的Apache 或 Nginx web服务器沟通。

自从1989年Tim Berners-Lee在CERN发明WEB以来,客户端和服务器端通信的语言和方式基本没有什么变化。现代浏览器和web服务器只是简单拓展了web的语言。

Web server实现

Web服务器能被多种方式实现。下面个各种Web服务器有各自特色,扩展和配置方式:

  • Apache HTTP Server 是Internet上最常用的web服务器,已经使用了20多年
  • Nginx 是第二大web服务器,部署超过10万个网站,常被用作Python WSGI servers的反向代理
  • Caddy 是一个新的web服务器,专注于托管HTTP/2协议和HTTPS。

客户端请求

客户端发送请求到web服务器,客户端通常是浏览器,比如IE,Firefox,Chrome,但是也可以是:

  • 没有头部的浏览器,通常做测试用,比如phantomjs
  • 命令行指令,比如wget 和 cURL
  • 基于文本的浏览器,如Lynx
  • 网络爬虫

Web服务器处理来自客户端的请求。Web服务器处理结果包括一个响应代码response code 和响应内容。有些响应代码,比如204 (没有内容) 和403 (禁止访问),不会产生响应内容。

简单情况下,客户端会请求静态内容比如图片或JavaScript文件。Web服务器会授权访问,并返回一个200状态代码。如果客户端已经请求过这个文件,并且该文件没有更改过,web服务器会传回响应代码304(没有更改),告诉客户端已经有最新版的改文件。

Web server and web browser request-response cycle

Web服务器根据浏览器的请求发送数据。第一次请求,浏览器访问”www.fullstackpython.com” 地址,web服务器发送index.html文件。这个HTML文件包含了指向其他文件的路径,比如style.css和script.js。

发送静态文件(比如CSS 和JavaScript 文件) 会占用很大一部分带宽,因此必要时会使用内容分发网络CDN。

Web服务器资源

  • HTTP/1.1 Specification
  • A reference with the full list of HTTP status codes is provided by W3C.
  • If you’re looking to learn about web servers by building one, here’s part one, part two and part three of a great tutorial that shows how to code a web server in Python.
  • rwasa is a newly released web server written in Assembly with no external dependencies that tuned to be faster than Nginx. The benchmarks are worth taking a look at to see if this server could fit your needs if you need the fastest performance trading off for as of yet untested web server.

Web服务器学习清单

  1. 选择一个Web服务器,NginxApache都是不错的选择。
  2. 创建一个SSL认证。做测试用可以使用自签名的认证,做生产应用买Digicert。将Web服务器配置成需要SSL。
  3. 配置Web服务器,托管CSS,JavaScript和图像等静态数据。
  4. 一旦你设置了WSGI server,你需要配置web服务器输出动态内容。