Apache Cassandra

Apache Cassandra is a column-family NoSQL data store occasionally used for persisting data in Python web applications and data projects.

Apache Cassandra project logo.

Apache Cassandra is an implementation of the NoSQL database concept. Learn more in the data chapter or view the table of contents for all topics.

Python with Cassandra resources

General Cassandra resources

Apache Cassandra can be used independently of Python applications for data storage and querying. The learning curve for getting started is similar to other NoSQL data stores but scaling, performance and monitoring can be challenging. The following resources focus on addressing those issues based on teams that have felt the pain and often released their resulting tools as open source projects.

MongoDB

MongoDB is a document-oriented NoSQL database that is often used for storing, querying and analyzing persistence data in Python applications.

MongoDB logo.

MongoDB is an implementation of the NoSQL database concept. Learn more in the data chapter or view the table of contents for all topics.

General MongoDB introductions

It is worth taking some time to learn the ins and outs of MongoDB before connecting it to your Python application. The following tutorials are not specific to Python and will have you work directly with the MongoDB command line and query language.

MongoDB security

NoSQL databases can be a weak spot in a production deployment environment, especially when default settings are built for ease of development instead of proper access control. MongoDB is no exception with its loose default security controls so make sure to lock down your instances.

Python with MongoDB resources

MongoDB is straightforward to use in a Python application when a driver such as PyMongo is installed. The following tutorials show how to install, configure and start using MongoDB with Python.

Redis

Redis is an in-memory key-value pair database typically classified as a NoSQL database. Redis is commonly used for caching, transient data storage and as a holding area for data during analysis in Python applications.

Redis logo.

Redis is an implementation of the NoSQL database concept. Learn more in the data chapter or view the table of contents for all topics.

Redis tutorials

Redis is easy to install and start using compared to most other persistent backends, but it’s useful to follow a walkthrough if you have never previously used Redis or any NoSQL data store.

Redis with Python

Redis is easier to use with Python if you have a code library client that bridges from your code to your Redis instace. The following libraries and resources provide more information on handling data in a Redis instance with your Python code.

  • Redis-py is a solid Python client to use with Redis.
  • Walrus is a higher-level Python wrapper for Redis with some caching, querying and data structure components build into the library.
  • Writing Redis in Python with Asyncio shows a detailed example for how to use the new Asyncio standard library in Python 3.4+ for working with Redis.

Redis Security

Redis should be customized out of its default configuration to secure it against unauthorized and unauthenticated users. These resources provide some advice on Reids security and guarding against data breaches.

  • Pentesting Redis servers shows that security is important not only on your application but also the databases you’re using as well.
  • Redis, just as with any relational or NoSQL database, needs to be secured based on security guidelines. There is also a post where the main author of Redis cracks its security to show the tradeoffs purposely made between ease of use and security in the default settings.
  • Is your Redis server safe? is a tool to test that your Redis instances are locked down. The tool is based on the blog post a few things about Redis security.
  • For God’s sake, secure your Mongo/Redis/etc! digs into the unfortunate default security settings that come with many NoSQL databases which can be used to compromise your systems. Make sure to not only install your dependencies such as Redis, but automate modifying default settings to lock them down against attackers.

General Redis resources

Once you have configured Redis, become comfortable using it and locked it down against malicious actors, you will want to learn more about operating, scaling and collecting metrics. The following resources should help you get started in those areas.

  • Redis-playbook is an Ansible playbook for installing, configuring and securing a Redis instance.
  • GitHub wrote a retrospective on moving persistent data out of Redis and into MySQL that is worth a read as you scale up your Redis usage.
  • This video on Scaling Redis at Twitter is a detailed look behind the scenes with a massive Redis deployment.
  • Real World Redis Tips provides some guidance from Heroku’s engineers from deploying Redis at scale. The tips include setting an explicit idle connection timeout, using a connection pooler and avoiding using KEYS in favor of SCAN.
  • How to collect Redis metrics shows how to use the Redis CLI client to grab key metrics on latency.
  • You should revise your Redis max connections setting is a retrospective from a hard web application failure due to Redis connections maxing out on Heroku, and how to avoid this in your own applications by modifying your redis.conf settings.

NoSQL Data Stores

Relational databases store the vast majority of web application persistent data. However, there are several alternative classifications of storage representations.

  1. Key-value pair
  2. Document-oriented
  3. Column-family table
  4. Graph

These persistent data storage representations are commonly used to augment, rather than completely replace, relational databases. The underlying persistence type used by the NoSQL database often gives it different performance characteristics than a relational database, with better results on some types of read/writes and worse performance on others.

Key-value Pair

Key-value pair data stores are based on hash map data structures.

Key-value pair data stores

  • Redis is an open source in-memory key-value pair data store. Redis is often called “the Swiss Army Knife of web application development.” It can be used for caching, queuing, and storing session data for faster access than a traditional relational database, among many other use cases. Learn more on the Redis page.
  • Memcached is another widely used in-memory key-value pair storage system.

Key-value pair resources

Redis resources

Document-oriented

A document-oriented database provides a semi-structured representation for nested data.

Document-oriented data stores

  • MongoDB is an open source document-oriented data store with a Binary Object Notation (BSON) storage format that is JSON-style and familiar to web developers. PyMongo is a commonly used client for interfacing with one or more MongoDB instances through Python code. MongoEngine is a Python ORM specifically written for MongoDB that is built on top of PyMongo.
  • Riak is an open source distributed data store focused on availability, fault tolerance and large scale deployments.
  • Apache CouchDB is also an open source project where the focus is on embracing RESTful-style HTTP access for working with stored JSON data.

Document-oriented data store resources

Column-family table

A column-family table class of NoSQL data stores builds on the key-value pair type. Each key-value pair is considered a row in the store while the column family is similar to a table in the relational database model.

Column-family table data stores

Graph

A graph database represents and stores data in three aspects: nodes, edges and properties.

A node is an entity, such as a person or business.

An edge is the relationship between two entities. For example, an edge could represent that a node for a person entity is an employee of a business entity.

A property represents information about nodes. For example, an entity representing a person could have a property of “female” or “male”.

Graph data stores

  • Neo4j is one of the most widely used graph databases and runs on the Java Virtual Machine stack.
  • Cayley is an open source graph data store written by Google primarily written in Go.
  • Titan is a distributed graph database built for multi-node clusters.

Graph data store resources

NoSQL third-party services

  • MongoHQ provides MongoDB as a service. It’s easy to set up with either a standard LAMP stack or on Heroku.

NoSQL data store resources

  • NoSQL databases: an overview explains what NoSQL means, how data is stored differently than in relational systems and what the Consistency, Availability and Partition-Tolerance (CAP) Theorem means.
  • CAP Theorem overview presents the basic constraints all databases must trade off in operation.
  • This post on What is a NoSQL database? Learn By Writing One in Python is a detailed article that breaks the mystique behind what some forms of NoSQL databases are doing under the covers.
  • The CAP Theorem series explains concepts related to NoSQL such as what is ACID compared to CAP, CP versus CA and high availability in large scale deployments.
  • NoSQL Weekly is a free curated email newsletter that aggregates articles, tutorials, and videos about non-relational data stores.
  • NoSQL comparison is a large list of popular, BigTable-based, special purpose, and other datastores with attributes and the best use cases for each one.
  • Relational databases such as MySQL and PostgreSQL have added features in more recent versions that mimic some of the capabilities of NoSQL data stores. For example, check out this blog post on using MySQL as a key-value pair store and this post on storing JSON data in PostgreSQL.

NoSQL data stores learning checklist

  1. Understand why NoSQL data stores are better for some use cases than relational databases. In general these benefits are only seen at large scale so they may not be applicable to your web application.
  2. Integrate Redis into your project for a speed boost over slower persistent storage. Storing session data in memory is generally much faster than saving that data in a traditional relational database that uses persistent storage. Note that when memory is flushed the data goes away so anything that needs to be persistent must still be backed up to disk on a regular basis.
  3. Evaluate other use cases such as storing transient logs in a document-oriented data store such as MongoDB.

Peewee

Peewee (source code) is a object-relational mapper (ORM) implementation for bridging data stored in relational database tables with Python objects.

Peewee logo.

What makes Peewee a useful ORM?

Peewee can be an easier library to wrap your head around than SQLAlchemy and other ORMs. It is designed to be easier to hack on and understand, similar to how Bottle is a smaller, one-file web framework compared to the comprehensive Django framework. If you are just getting started with web development, it may be worth using Peewee for your database mapping and operations, especially if you use a microframework such as Flask or Bottle.

Peewee can be used with pretty much any web framework (although using it with Django would currently be complicated due to its tight built-in ORM coupling) or without a web framework. In the latter case Peewee is good for pulling data out of a relational database in a script or Jupyter notebook.

Any of the common relational database backends such as PostgreSQL, MySQL or SQLite are supported, although a database driver is still required. The chart below shows a few example configurations that could use Peewee as an ORM.

Example SQLAlchemy configurations with different web frameworks.

Peewee is an implementation of the object-relational mapping (ORM) concept. Learn more in the data chapter or view all topics.

How does Peewee compare to other Python ORMs?

The analogy used by the core Peewee author is that Peewee is to SQLAlchemy as SQLite is to PostgreSQL. An ORM does not have to work for every exhaustive use case in order to be useful.

Peewee resources

Peewee is a much newer library than several other Python ORMs. For example, Peewee’s first public commit was in 2010, compared to 2005 for SQLAlchemy. The project is still over five years old though and matured substantially in development during that time. However, there are typically less resources and examples available to demonstrate how to use Peewee in your projects than some other ORMs that have been around for a longer period of time.

Many of the best resources come from the project’s author, Charles Leifer, on his blog and on the official site. There are also hundreds of questions answered on the Stack Overflow peewee tag, so as usual that can be a rich source of examples for your Peewee-powered Python applications.

SQLAlchemy

SQLAlchemy (source code) is a well-regarded database toolkit and object-relational mapper (ORM) implementation written in Python. SQLAlchemy provides a generalized interface for creating and executing database-agnostic code without needing to write SQL statements.

SQLAlchemy logo.

Why is SQLAlchemy a good ORM choice?

SQLAlchemy isn’t just an ORM- it also provides SQLAlchemy Core for performing database work that is abstracted from the implementation differences between PostgreSQL, SQLite, etc. In some ways, the ORM is a bonus to Core that automates commonly-required create, read, update and delete operations.

SQLAlchemy can be used with or without the ORM features. Any given project can choose to just use SQLAlchemy Core or both Core and the ORM. The following diagram shows a few example configurations with various application software stacks and backend databases. Any of these configurations can be a valid option depending on what type of application you are coding.

Example SQLAlchemy configurations with different web frameworks.

A benefit many developers enjoy with SQLAlchemy is that it allows them to write Python code in their project to map from the database schema to the applications’ Python objects. No SQL is required to create, maintain and query the database. The mapping allows SQLAlchemy to handle the underlying database so developers can work with their Python objects instead of writing bridge code to get data in and out of relational tables.

SQLAlchemy is an implementation of the object-relational mapping (ORM) concept. Learn more in the data chapter or view all topics.

How does SQLAlchemy code compare to raw SQL?

Below is an example of a SQLAlchemy model definition from the open source compare-python-web-frameworks project that uses SQLAlchemy with Flask and Flask-SQLAlchemy.

class Contact(db.Model):
    __tablename__ = 'contacts'
    id = db.Column(db.Integer, primary_key=True)
    first_name = db.Column(db.String(100))
    last_name = db.Column(db.String(100))
    phone_number = db.Column(db.String(32))

    def __repr__(self):
        return '<Contact {0} {1}: {2}>'.format(self.first_name,
                                               self.last_name,
                                               self.phone_number)

SQLAlchemy handles the table creation that otherwise we would have had to write a create table statement like this one to do the work:

CREATE TABLE CONTACTS(
   ID INT PRIMARY KEY        NOT NULL,
   FIRST_NAME     CHAR(100)  NOT NULL,
   LAST_NAME      CHAR(100)  NOT NULL,
   PHONE_NUMBER   CHAR(32)   NOT NULL,
);

By using SQLAlchemy in our Python code, all records can be obtained with a line like contacts = Contact.query.all() instead of a plain SQL such as SELECT * FROM contacts. That may not look like much of a difference in syntax but writing the queries in Python is often faster and easier for many Python developers once multiple tables and specific filtering on fields for queries have to be written. In addition, SQLAlchemy abstracts away idiosyncratic differences between database implementations in SQLite, MySQL and PostgreSQL.

Using SQLAlchemy with Web Frameworks

There is no reason why you cannot use the SQLAlchemy library in any application that requires a database backend. However, if you are building a web app with Flask, Bottle or another web framework then take a look at the following extensions. They provide some glue code along with helper functions that can reduce the boilerplate code needed to connect your application’s code with the SQLAlchemy library.

SQLAlchemy resources

The best way to get comfortable with SQLAlchemy is to dig in and write a database-driven application. The following resources can be helpful if you are having trouble getting started or are starting to run into some edge cases.

  • There is an entire chapter in the Architecture of Open Source Applications book on SQLAlchemy. The content is detailed and well worth reading to understand what is executing under the covers.
  • The SQLAlchemy cheatsheet has many examples for querying, generating database metadata and many other common (and not so common) operations when working with Core and the ORM.
  • 10 reasons to love SQLAlchemy is a bit of a non-critical lovefest for the code library. However, the post makes some good points about the quality of SQLAlchemy’s documentation and what a pleasure it can be to use it in a Python project.
  • Large web apps in Python: A good architecture goes into issues that expanding codebases face, such as where to put business logic and how to automate database testing. Each of the topics in the article are discussed in the context of a recent project the author worked on that heavily relied on SQLAlchemy.
  • SQLAlchemy and Django explains how one development team uses the Django ORM for most of their standard queries but relies on SQLAlchemy for really advanced queries.
  • SQLAlchemy and data access in Python is a podcast interview with the creator of SQLAlchemy that covers the project’s history and how it has evolved over the past decade.
  • Most Flask developers use SQLAlchemy as an ORM to relational databases. If you’re unfamiliar with SQLAlchemy questions will often come up such as what’s the difference between flush and commit? that are important to understand as you build out your app.
  • SQLAlchemy in batches shows the code that a popular iOS application runs in background batch scripts which uses SQLAlchemy to generate playlists. They provide some context and advice for using SQLAlchemy in batch scripts.
  • Fake Data for your Flask SQLAlchemy App shows how to use Mixer to generate a slew of random data to test your SQLAlchemy models.
  • Getting PostgreSQL transactions under control with SQLAlchemy provides a quick introduction to the tool Chryso that they are working on to provide better transaction management in SQLAlchemy connections.

SQLAlchemy compared to other ORMs

SQLAlchemy is one of many Python object-relational mapper (ORM) implementations. Several open source projects and articles are listed here to make it a bit easier to understand the differences between these implementations.

  • SQLAlchemy vs Other ORMs provides a detailed comparison of SQLAlchemy against alternatives.
  • If you’re interested in the differences between SQLAlchemy and the Django ORM I recommend reading SQLAlchemy and You by Armin Ronacher.
  • This GitHub project named PythonORMSleepy implements the same Flask application with several different ORMs: SQLAlchemy, Peewee, MongoEngine, stdnet and PonyORM. Looking through the code is helpful for understanding the varying approaches each library takes to accomplish a similar objective.
  • Quora has several answers to the question of which is better and why: Django ORM or SQLALchemy based on various developers’ experiences.

Object-relational mappers (ORMs)

An object-relational mapper (ORM) is a code library that automates the transfer of data stored in relational databases tables into objects that are more commonly used in application code.

Diagram showing how object-relational mappers bridge the database and Python objects.

Why are ORMs useful?

ORMs provide a high-level abstraction upon a relational database that allows a developer to write Python code instead of SQL to create, read, update and delete data and schemas in their database. Developers can use the programming language they are comfortable with to work with a database instead of writing SQL statements or stored procedures.

For example, without an ORM a developer would write the following SQL statement to retrieve every row in the USERS table where the zip_code column is 94107:

SELECT * FROM USERS WHERE zip_code=94107;

The equivalent Django ORM query would instead look like the following Python code:

# obtain everyone in the 94107 zip code and assign to users variable
users = Users.objects.filter(zip_code=94107)

The ability to write Python code instead of SQL can speed up web application development, especially at the beginning of a project. The potential development speed boost comes from not having to switch from Python code into writing declarative paradigm SQL statements. While some software developers may not mind switching back and forth between languages, it’s typically easier to knock out a prototype or start a web application using a single programming language.

ORMs also make it theoretically possible to switch an application between various relational databases. For example, a developer could use SQLite for local development and MySQL in production. A production application could be switched from MySQL to PostgreSQL with minimal code modifications.

In practice however, it’s best to use the same database for local development as is used in production. Otherwise unexpected errors could hit in production that were not seen in a local development environment. Also, it’s rare that a project would switch from one database in production to another one unless there was a pressing reason.

While you’re learning about ORMs you should also read up on deployment and check out the application dependencies page.

Do I have to use an ORM for my web application?

Python ORM libraries are not required for accessing relational databases. In fact, the low-level access is typically provided by another library called a database connector, such as psycopg (for PostgreSQL) or MySQL-python (for MySQL). Take a look at the table below which shows how ORMs can work with different web frameworks and connectors and relational databases.

Examples of how varying Python ORMs can work with different connectors and backends.

The above table shows for example that SQLAlchemy can work with varying web frameworks and database connectors. Developers can also use ORMs without a web framework, such as when creating a data analysis tool or a batch script without a user interface.

What are the downsides of using an ORM?

There are numerous downsides of ORMs, including

  1. Impedance mismatch
  2. Potential for reduced performance
  3. Shifting complexity from the database into the application code

Impedance mismatch

The phrase “impedance mismatch” is commonly used in conjunction with ORMs. Impedance mismatch is a catch-all term for the difficulties that occur when moving data between relational tables and application objects. The gist is that the way a developer uses objects is different from how data is stored and joined in relational tables.

This article on ORM impedance mismatch does a solid job of explaing what the concept is at a high level and provides diagrams to visualize why the problem occurs.

Potential for reduced performance

One of the concerns that’s associated with any higher-level abstraction or framework is potential for reduced performance. With ORMs, the performance hit comes from the translation of application code into a corresponding SQL statement which may not be tuned properly.

ORMs are also often easy to try but difficult to master. For example, a beginner using Django might not know about the select_related() function and how it can improve some queries’ foreign key relationship performance. There are dozens of performance tips and tricks for every ORM. It’s possible that investing time in learning those quirks may be better spent just learning SQL and how to write stored procedures.

There’s a lot of hand-waving “may or may not” and “potential for” in this section. In large projects ORMs are good enough for roughly 80-90% of use cases but in 10-20% of a project’s database interactions there can be major performance improvements by having a knowledgeable database administrator write tuned SQL statements to replace the ORM’s generated SQL code.

Shifting complexity from the database into the app code

The code for working with an application’s data has to live somewhere. Before ORMs were common, database stored procedures were used to encapsulate the database logic. With an ORM, the data manipulation code instead lives within the application’s Python codebase. The addition of data handling logic in the codebase generally isn’t an issue with a sound application design, but it does increase the total amount of Python code instead of splitting code between the application and the database stored procedures.

Python ORM Implementations

There are numerous ORM implementations written in Python, including

  1. SQLAlchemy
  2. Peewee
  3. The Django ORM
  4. PonyORM
  5. SQLObject

There are other ORMs, such as Canonical’s Storm, but most of them do not appear to currently be under active development. Learn more about the major active ORMs below.

Django’s ORM

The Django web framework comes with its own built-in object-relational mapping module, generally referred to as “the Django ORM” or “Django’s ORM”.

Django’s ORM works well for simple and medium-complexity database operations. However, there are often complaints that the ORM makes complex queries much more complicated than writing straight SQL or using SQLAlchemy.

It is technically possible to drop down to SQL but it ties the queries to a specific database implementation. The ORM is coupled closely with Django so replacing the default ORM with SQLAlchemy is currently a hack workaround. Note though that some of the Django core committers believe it is only a matter of time before the default ORM is replaced with SQLAlchemy. It will be a large effort to get that working though so it’s likely to come in Django 1.9 or later.

Since the majority of Django projects are tied to the default ORM, it is best to read up on advanced use cases and tools for doing your best work within the existing framework.

SQLAlchemy

SQLAlchemy is a well-regarded Python ORM because it gets the abstraction level “just right” and seems to make complex database queries easier to write than the Django ORM in most cases. There is an entire page on SQLAlchemy that you should read if you want to learn more about using the library.

Peewee

Peewee is a Python ORM implementation that is written to be “simpler, smaller and more hackable” than SQLAlchemy. Read the full Peewee page for more information on the Python ORM implementation.

Pony

Pony ORM is another Python ORM available as open source, under the Apache 2.0 license.

SQLObject

SQLObject is an ORM that has been under active open source development since before 2003.

Schema migrations

Schema migrations, for example when you need to add a new column to an existing table in your database, are not technically part of ORMs. However, since ORMs typically lead to a hands-off approach to the database (at the developers peril in many cases), libraries to perform schema migrations often go hand-in-hand with Python ORM usage on web application projects.

Database schema migrations are a complex topic and deserve their own page. For now, we’ll lump schema migration resources under ORM links below.

General ORM resources

  • This detailed overview of ORMs is a generic description of how ORMs work and how to use them.
  • This example GitHub project implements the same Flask application with several different ORMs: SQLAlchemy, Peewee, MongoEngine, stdnet and PonyORM.
  • Martin Fowler addresses the ORM hate in an essay about how ORMs are often misused but that they do provide benefits to developers.
  • The Rise and Fall of Object Relational Mapping is a talk on the history of ORMs that doesn’t shy away from some controversy. Overall I found the critique of conceptual ideas worth the time it took to read the presentation slides and companion text.
  • If you’re confused about the difference between a connector, such as MySQL-python and an ORM like SQLAlchemy, read this StackOverflow answer on the topic.
  • What ORMs have taught me: just learn SQL is another angle in the ORM versus embedded SQL / stored procedures debate. The author’s conclusion is that while working with ORMs such as SQLAlchemy and Hibernate (a Java-based ORM) can save time up front there are issues as a project evolves such as partial objects and schema redundancies. I think the author makes some valid points that some ORMs can be a shaky foundation for extremely complicated database-backed applications. However, I disagree with the overriding conclusion to eschew ORMs in favor of stored procedures. Stored procedures have their own issues and there are no perfect solutions, but I personally prefer using an ORM at the start of almost every project even if it later needs to be replaced with direct SQL queries.
  • The Vietnam of Computer Science provides the perspective from Ted Neward, the originator of the phrase “Object/relational mapping is the Vietnam of Computer Science” that he first spoke about in 2004. The gist of the argument against ORMs is captured in Ted’s quote that an ORM “represents a quagmire which starts well, gets more complicated as time passes, and before long entraps its users in a commitment that has no clear demarcation point, no clear win conditions, and no clear exit strategy.” There are follow up posts on Coding Horror and another one from Ted entitled thoughts on Vietnam commentary.
  • Turning the Tables: How to Get Along with your Object-Relational Mapper coins the funny but insightful phrase “database denial” to describe how some ORMs provide a usage model that can cause more issues than they solve over straight SQL queries. The post then goes into much more detail about the problems that can arise and how to mitigate or avoid them.

SQLAlchemy and Peewee resources

A comprehensive list of SQLAlchemy and Peewee ORM resources can be found on their respective pages.

Django ORM resources

Pony ORM resources

SQLObject resources

SQLite

SQLite is an open source relational database included with the Python standard library as of Python 2.5. The pysqlite database driver is also included with the standard library so that no further external dependencies are required to access a SQLite database from within Python applications.

SQLite logo.

SQLite is an implementation of the relational database concept. Learn more in the data chapter or view the table of contents for all topics.

SQLite resources

MySQL

MySQL is an open source relational database implementation for storing and retrieving data.

MySQL logo.

MySQL or PostgreSQL?

MySQL is a viable open source database implementation for Python web applications. MySQL has a slightly easier initial learning curve than PostgreSQL. However, PostgreSQL’s design is often preferred by Python web developers, especially when data migrations are run as an application evolves.

MySQL is an implementation of the relational database concept. Learn more in the data chapter or view the table of contents for all topics.

Python Drivers for MySQL

Accessing MySQL from a Python application requires a database driver (also called a “connector”). While it is possible to write a driver as part of your application, in practice most developers use an existing open source driver.

There was a major issue with MySQL drivers since the introduction of Python 3. One of the most popular libraries called MySQLdb did not work in its existing form with Python 3 and there were no plans to update it. Therefore a fork of MySQLdb named mysqlclient added Python 3 compatibility.

The mysqlclient fork was good in that existing MySQLdb users could drop mysqlclient into existing projects that were upgrading to Python 3. However, the fork often causes confusion when searching for which Python driver to use with MySQL. Many developer simply decide to use PostgreSQL because there is better support for Python drivers in the PostgreSQL community.

With that driver support context in mind, it’s absolutely possible to build a Python 3 web application with MySQL as a backend. Here is a list of drivers along with whether it supports Python 2, 3 or both.

  • mysqlclient is a fork of MySQLdb that supports Python 2 and 3.
  • MySQL Connector is Oracle’s “official” (Oracle currently owns MySQL) Python connector. The driver supports Python 2 and 3, just make sure to check the version guide for what releases work with which Python versions.
  • MySQLdb supports Python 2 and was frequently used by Python web applications before the mass migration to Python 3 began.
  • PyMySQL is a pure Python (no C low-level code) implementation that attempts to be a drop-in replacement for MySQLdb. However, some MySQL APIs are not supported by the driver so whether or not your application can use this connector will depend on what you’re building.

What organizations use MySQL?

The database is deployed in production at some of the highest trafficked sites such as Twitter, Facebook and many others major organizations. However, since MySQL AB, the company that developed MySQL, was purchased by Sun Microsystems (which was in turn purchased by Oracle), there have been major defections away from the database by Wikipedia and Google. MySQL remains a viable database option but I always recommend new Python developers learn PostgreSQL if they do not already know MySQL.

Python-specific MySQL resources

General MySQL resources

PostgreSQL

PostgreSQL, often written as “Postgres” and pronounced “Poss-gres”, is an open source relational database implementation frequently used by Python applications as a backed for data storage and retrieval.

PostgreSQL logo.

How does PostgreSQL fit within the Python stack?

PostgreSQL is the default database choice for many Python developers, including the Django team when testing the Django ORM. PostgreSQL is often viewed as more feature robust and stable when compared to MySQL, SQLServer and Oracle. All of those databases are reasonable choices. However, because PostgreSQL tends to be used by Python developers the drivers and example code for using the database tend to be better documented and contain fewer bugs for typical usage scenarios. If you try to use an Oracle database with Django, you’ll see there is far less example code for that setup compared to PostgreSQL backend setups.

PostgreSQL is an implementation of the relational database concept. Learn more in the data chapter or view the table of contents for all topics.

Why is PostgreSQL a good database choice?

PostgreSQL’s open source license allows developers to operate one or more databases without licensing cost in their applications. The open source license operating model is much less expensive compared to Oracle or other proprietary databases, especially as replication and sharding become necessary at large scale. In addition, because so many people ranging from independent developers to multinational organizations use PostgreSQL, it’s often easier to find developers with PostgreSQL experience than other relational databases.

The PostgreSQL core team also releases frequent updates that greatly enhance the database’s capabilities. For example, in the PostgreSQL 9.4 release the jsonb type was added to enhance JavaScript Object Notation (JSON) storage capabilities so that in many cases a separate NoSQL database is not required in an application’s architecture.

Connecting to PostgreSQL with Python

To work with relational databases in Python you need to use a database driver, which is also referred to as a database connector. The most common driver library for working with PostgreSQL is psycopg2. There is a list of all drivers on the PostgreSQL wiki, including several libraries that are no longer maintained. If you’re working with the asyncio Python stdlib module you should also take a look at the aiopg library which wraps psycopg2’s asychronouos features together.

To abstract the connection between tables and objects, many Python developers use an object-relational mapper (ORM) with to turn relational data from PostgreSQL into objects that can be used in their Python application. For example, while PostgreSQL provides a relational database and psycopg is the common database connector, there are many ORMs that can be used with varying web frameworks, as shown in the table below.

Examples of how varying Python ORMs can work with PostgreSQL and the psycopg2 connector.

Learn more about Python ORMs on that dedicated topic page.

PostgreSQL data safety

If you’re on Linux it’s easy to get PostgreSQL installed using a package manager. However, once the database is installed and running your responsibility is just beginning. Before you go live with a production application, make sure to:

  1. Lock down access with a whitelist in the pg_hba.conf file
  2. Enable replication to another database that’s preferrably on different infrastructure in a separate location
  3. Perform regular backups and test the restoration process
  4. Ensure your application prevents SQL injection attacks

When possible have someone qualified do a PostgreSQL security audit to identify the biggest risks to your database. Small applications and bootstrapped companies often cannot afford a full audit in the beginning but as an application grows over time it becomes a bigger target.

The data stored in your database is the lifeblood of your application. If you have ever accidentally dropped a production database or been the victim of malicious activity such as SQL injection attacks, you’ll know it’s far easier to recover when a bit of work has been performed beforehand on backups, replication and security measures.

Python-specific PostgreSQL resources

Many quickstarts and tutorials exist specifically for Django, Flask and other web application frameworks. The ones below are some of the best walkthroughs I’ve read.

General PostgreSQL resources

PostgreSQL tutorials not specific to Python are also really helpful for properly handling your data.

PostgreSQL monitoring and performance

Monitoring one or more PostgreSQL instances and trying to performance tune them is a rare skillset. Here are some resources to get you started if you have to handle these issues in your applications.