Web analytics

Web analytics involves collecting, processing, visualizing web data to enable critical thinking about how users interact with a web application.

Why is web analytics important?

User clients, especially web browsers, generate significant data while users read and interact with webpages. The data provides insight into how visitors use the site and why they stay or leave. The key concept to analytics is learning about your users so you can improve your web application to better suit their needs.

Web analytics concepts

It’s easy to get overwhelmed at both the number of analytics services and the numerous types of data points collected. Focus on just a handful of metrics when you’re just starting out. As your application scales and you understand more about your users add additional analytics services to gain further insight into their behavior with advanced visualizations such as heatmaps and action funnels.

User funnels

If your application is selling a product or service you can ultimately build a user funnel (often called “sales funnel” prior to a user becoming a customer) to better understand why people buy or don’t buy what you’re selling. With a funnel you can visualize drop-off points where visitors leave your application before taking some action, such as purchasing your service.

Open source web analytics projects

  • Piwik is a web analytics platform you can host yourself. Piwik is a solid choice if you cannot use Google Analytics or want to customize your own web analytics platform.
  • Open Web Analytics is another self-hosted platform that integrates through a JavaScript snippet that tracks users’ interactions with the webpage.

Hosted web analytics services

  • Google Analytics is a widely used free analytics tool for website traffic.
  • Clicky provides real-time analytics comparable to Google Analytics’ real-time dashboard.
  • MixPanel‘s analytics platform focuses on mobile and sales funnel metrics. A developer builds what data points need to be collected into the server side or client side code. MixPanel captures that data and provides metrics and visualizations based on the data.
  • KISSmetrics‘ analytics provides context for who is visiting a website and what actions they are taking while on the site.
  • Heap is a recently founded analytics service with a free introductory tier to get started.
  • CrazyEgg is tool for understanding a user’s focus while using a website based on heatmaps generated from mouse movements.

Python-specific web analytics resources

General web analytics resources

Web analytics learning checklist

  1. Add Google Analytics or Piwik to your application. Both are free and while Piwik is not as powerful as Google Analytics you can self-host the application which is the only option in many environments.
  2. Think critically about the factors that will make your application successful. These factors will vary based on whether it’s an internal enterprise app, an e-commerce site or an information-based application.
  3. Add metrics generated from your web traffic based on the factors that drive your application’s success. You can add these metrics with either some custom code or with a hosted web analytics service.
  4. Continuously reevaluate whether the metrics you’ve chosen are still the appropriate ones defining your application’s success. Improve and refine the metrics generated by the web analytics as necessary.

Caching

Caching can reduce the load on servers by storing the results of common operations and serving the precomputed answers to clients.

For example, instead of retrieving data from database tables that rarely change, you can store the values in-memory. Retrieving values from an in-memory location is far faster than retrieving them from a database (which stores them on a persistent disk like a hard drive.) When the cached values change the system can invalidate the cache and re-retrieve the updated values for future requests.

A cache can be created for multiple layers of the stack.

Caching backends

  • memcached is a common in-memory caching system.
  • Redis is a key-value in-memory data store that can easily be configured for caching with libraries such as django-redis-cache and the similarly-named, but separate project django-redis.

Caching resources

  • Caching: Varnish or Nginx?” reviews some considerations such as SSL and SPDY support when choosing reverse proxy Nginx or Varnish.
  • Caching is Hard, Draw me a Picture has diagrams of how web request caching layers work. The post is relevant reading even though the author is describing his Microsoft code as the impetus for writing the content.
  • While caching is a useful technique in many situations, it’s important to also note that there are downsides to caching that many developers fail to take into consideration.

Caching learning checklist

  1. Analyze your web application for the slowest parts. It’s likely there are complex database queries that can be precomputed and stored in an in-memory data store.
  2. Leverage your existing in-memory data store already used for session data to cache the results of those complex database queries. A task queue can often be used to precompute the results on a regular basis and save them in the data store.
  3. Incorporate a cache invalidation scheme so the precomputed results remain accurate when served up to the user.

Monitoring

Monitoring tools capture, analyze and display information for a web application’s execution. Every application has issues arise throughout all levels of the web stack. Monitoring tools provide transparency so developers and operations teams can respond and fix problems.

Why is monitoring necessary?

Capturing and analyzing data about your production environment is critical to proactively deal with stability, performance, and errors in a web application.

Difference between monitoring and logging

Monitoring and logging are very similar in their purpose of helping to diagnose issues with an application and aid the debugging process. One way to think about the difference is that logging happens based on explicit events while monitoring is a passive background collection of data.

For example, when an error occurs, that event is explicitly logged through code in an exception handler. Meanwhile, a monitoring agent instruments the code and gathers data not only about the logged exception but also the performance of the functions.

This distinction between logging and monitoring is vague and not necessarily the only way to look at it. Pragmatically, both are useful for maintaining a production web application.

Monitoring layers

There are several important resources to monitor on the operating system and network level of a web stack.

  1. CPU utilization
  2. Memory utilization
  3. Persistence storage consumed versus free
  4. Network bandwidth and latency

Application level monitoring encompasses several aspects. The amount of time and resources dedicated to each aspect will vary based on whether an application is read-heavy, write-heavy, or subject to rapid swings in traffic.

  1. Application warnings and errors (500-level HTTP errors)
  2. Application code performance
  3. Template rendering time
  4. Browser rendering time for the application
  5. Database querying performance

Open source monitoring projects

  • Sentry started life as a Python-only monitoring project but can now be used for any programming language.
  • statsd is a node.js network daemon that listens for metrics and aggregates them for transfer into another service such as Graphite.
  • Graphite stores time-series data and displays them in graphs through a Django web application.
  • Bucky measures the performance of a web application from end user’s browsers and sends that data back to the server for collection.
  • Sensu is an open source monitoring framework written in Ruby but applicable to any programming language web application.
  • Graph Explorer by Vimeo is a Graphite-based dashboard with added features and a slick design.
  • PacketBeat sniffs protocol packets. Elasticsearch then allows developers to search the collected data and visualize what’s happening inside their web application using the Kibana user interface.
  • Munin is a client plugin-based monitoring system that sends monitoring traffic to the Munin node where the data can be analyzed and visualized. Note this project is written in Perl so Perl 5 must be installed on the node collecting the data.

Hosted monitoring services

Hosted monitoring software takes away the burden of deploying and operating the software yourself. However, hosted monitoring costs (often a significant amount of) money and take your application’s data out of your hands so these services are not the right fit for every project.

  • Sentry is the hosted version of the open source tool that is used to monetize and support further development.
  • New Relic provides application and database monitoring as well as plug ins for capturing and analyzing data about other devleoper tools in your stack, such as Twilio.
  • Rollbar instruments both the server side and client side to capture and report exceptions. The pyrollbar code library provides quick integration for Python web applications. There are also specific instructions for common web frameworks such as Django and Pyramid.
  • Status.io focuses on uptime and response metrics transparency for web applications.
  • StatusPage.io (yes, there’s both a Status and StatusPage.io) provides easy set up status pages for monitoring application up time.
  • PagerDuty alerts a designated person or group if there are stability, performance, or uptime issues with an application.
  • Opbeat Built for django. Opbeat combines performance metrics, release tracking, and error logging into a single simple service.

Monitoring resources

Monitoring learning checklist

  1. Review the software-as-a-service and open source monitoring tools below. Third party services tend to be easier to set up and host the data for you. Open source projects give you more control but you’ll need to have additional servers ready for the monitoring.
  2. My recommendation is to install New Relic‘s free option with the trial period to see how it works with your app. It’ll give you a good idea of the capabilities for application-level monitoring tools.
  3. As your app scales take a look at setting up one of the the open source monitoring projects such as StatsD with Graphite. The combination of those two projects will give you fine-grained control over the system metrics you’re collecting and visualizing.

DevOps

DevOps is the combination of application development and operations, which minimizes or eliminates the disconnect between software developers who build applications and systems administrators who keep infrastructure running.

Why is DevOps important?

When the Agile methodology is properly used to develop software, a new bottleneck often appears during the frequent deployment and operations phases. New updates and fixes are produced so fast in each sprint that infrastructure teams can be overwhelmed with deployments and push back on the pace of delivery. To allievate some of these issues, application developers are asked to work closely with operations folks to automate the delivery from development to production.

DevOps tooling resources

  • DevOps: Python tools to get started is a presentation slideshow that explains that while DevOps is a culture, it can be supported by tools such as Fabric, Jenkins, BuildBot and Git which when used properly can enable continuous software delivery.
  • A look at DevOps tools landscape provides an introductory overview of the tooling that is typically required to perform DevOps. The tools range from source control systems, continuous integration, containers to orchestration. For an Atlassian-centric perspective on tooling, take a look at this post on how to choose the right DevOps tools which is biased towards their tools but still has some good insight such as using automated testing to provide immediate awareness of defects that require fixing.

General DevOps resources

  • DevOps vs. Platform Engineering considers DevOps an ad hoc approach to developing software while building a platform is a strict contract. I see this as “DevOps is a process”, while a “platform is code”. Running code is better than any organizational process.
  • So, you’ve been paged provides their development team’s “Communicate -> Learn -> Act” structure for handling production issues based on lessons learned from their years of experience dealing with incidents.
  • Operations for software developers for beginners gives advice to developers who have never done operations work and been on call for outages before in their career. The advantage of DevOps is greater ownership for developers who built the applications running in production. The disadvantage of course is the greater ownership also leads to much greater responsibility when something breaks!
  • Why are we racing to DevOps? is a very high level summary of the benefits of DevOps to IT organizations. It’s not specific to Python and doesn’t dive into the details, but it’s a decent start for figuring out why IT organizations consider DevOps the hot new topic after adopting an Agile development methodology.

Twilio

Twilio is a web application programming interface (API) that software developers can use to add communications such as phone calling, messaging, video and two-factor authentication into their Python applications.

Why is Twilio a good API choice?

Interacting with the standard telephone networks to send and receive phone calls and text messages without Twilio is extremely difficult if you do not know the unique telecommunications protocols such as Session Initiation Protocol (SIP). Twilio’s API abstracts the telecommunications pieces so as a developer you can simply use your favorite programming languages and frameworks in your application. For example, here’s how you can send an outbound SMS using a few lines of Python code:

# import the Twilio helper library (installed with pip install twilio)
from twilio.rest import TwilioRestClient

# replace the placeholder strings in the following code line with 
# your Twilio Account SID and Auth Token from the Twilio Console
client = TwilioRestClient("ACxxxxxxxxxxxxxx", "zzzzzzzzzzzzz")

# change the "from_" number to your Twilio number and the "to" number
# to any phone number you want to send the message to 
client.messages.create(to="+19732644152", from_="+12023358536", 
                       body="Hello from Python!")

Learn more about the above code in the How to Send SMS Text Messages with Python tutorial.

How is Twilio’s documentation for Python developers?

Twilio is a developer-focused company, rather than a traditional “enterprise company”, so their tutorials and documentation are written by developers for fellow developers.

More Twilio resources

 

API Integration

The majority of production Python web applications rely on several externally hosted application programming interfaces (APIs). APIs are also commonly referred to as third party services or external platforms. Examples include Twilio for messaging and voice services, Stripe for payment processing and Disqus for embedded webpage comments.

There are many articles about proper API design but best practices for integrating APIs is less commonly written about. However, this subject continuously grows in importance because APIs provide critical functionality across many implementation areas.

Hosted API testing services

  • Runscope is a service specifically designed for APIs that assists developers with automated testing and traffic inspection.
  • Apiary provides a blueprint for creating APIs so they are easier to test and generate clean documentation.

API Integration Resources

API integration learning checklist

  1. Pick an API known for top notch documentation. Here’s a list of ten APIs that are a good starting point for beginners.
  2. Read the API documentation for your chosen API. Figure out a simple use case for how your application could be improved by using that API.
  3. Before you start writing any code, play around with the API through the commandline with cURL or in the browser with Postman. This exercise will help you get a better understanding of API authentication and the data required for requests and responses.
  4. Evaluate whether to use a helper library or work with Requests. Helper libraries are usually easier to get started with while Requests gives you more control over the HTTP calls.
  5. Move your API calls into a task queue so they do not block the HTTP request-response cycle for your web application.

API Creation

Creating and exposing APIs allows your web application to interact with other applications through machine-to-machine communication.

API creation frameworks

  • Django REST framework and Tastypie are the two most widely used API frameworks to use with Django. The edge currently goes to Django REST framework based on rough community sentiment. Django REST framework continues to knock out great releases after the 3.0 release mark when Tom Christie ran a successful Kickstarter campaign.
  • Flask-RESTful is widely used for creating web APIs with Flask. It was originally open sourced and explained in a blog post by Twilio then moved into its own GitHub organization so engineers from outside the company could be core contributors.
  • Flask API is another common library for exposing APIs from Flask web applications.
  • Sandman is a widely used tool to automatically generate a RESTful API service from a legacy database without writing a line of code (though it’s easily extensible through code).
  • Cornice is a REST framework for Pyramid.
  • Restless is a lightweight API framework that aims to be framework agnostic. The general concept is that you can use the same API code for Django, Flask, Bottle, Pyramid or any other WSGI framework with minimal porting effort.
  • Eve is a Python REST framework built with Flask, MongoDB and Redis. The framework’s primary author Nicola Iarocci gave a great talk at EuroPython 2014 that introduced the main features of the framework.
  • Falcon is a fast and lightweight framework well suited to create RESTful APIs.
  • Hug built on-top of Falcon and Python3 with an aim to make developing Python driven APIs as simple as possible, but no simpler. Hug leverages Python3 annotations to automatically validate and convert incoming and outgoing API parameters.
  • Pycnic is a JSON-API-only framework designed with REST in mind.

API testing projects

Building, running and maintaining APIs requires as much effort as building, running and maintaining a web application. API testing frameworks are the equivalent of browser testing in the web application world.

  • zato-apitest invokes HTTP APIs and provides hooks for running through other testing frameworks.

Hosted API testing services

  • Runscope is an API testing SaaS application that can test both your own APIs and external APIs that your application relies upon.
  • API Science is focused on deep API testing, including multi-step API calls and monitoring of external APIs.
  • SmartBear has several API monitoring and testing tools for APIs.

API creation resources

Python-specific API creation resources

Django REST Framework resources

API creation learning checklist

  1. Pick an API framework appropriate for your web framework. For Django I recommend Django REST framework and for Flask I recommend Flask-RESTful.
  2. Begin by building out a simple use case for the API. Generally the use case will either involve data that users want in a machine-readable format or a backend for alternative clients such as an iOS or Android mobile app.
  3. Add an authentication mechanism through OAuth or a token scheme.
  4. Add rate limiting to the API if data usage volume could be a performance issue. Also add basic metrics so you can determine how often the API is being accessed and whether it is performing properly.
  5. Provide ample documentation and a walkthrough for how the API can be accessed and used.
  6. Figure out other use cases and expand based on what you learned with the initial API use case.

Bots

Bots are software programs that combine requests, which are typically provided as text, with contextual data, such as geolocation and payment information, to appropriately handle the request and respond. Bots are often also called “chatbots”, “assistants” or “agents.”

Open source Slack bot examples

  • Limbo is an awesome Slack chatbot that provides a base for Python code that otherwise would require boilerplate to handle the Slack API events firehose.
  • python-rtmbot is the bot framework for building Slack bots with the Real Time Messaging (RTM) API over WebSockets.

Python-specific Bots resources

Additional Bots resources

  • Slack bot token leakage exposing business critical information is a detailed look at a search on GitHub for Slack tokens that are used mostly for bots but must be kept secret. Otherwise those tokens expose the entire Slack team’s messaging to outside parties.
  • The Economist wrote a general piece on why bots look like they’ll gain adoption in various market segments. The piece doesn’t have much technical depth but it’s a good overview of how some businesses are looking at the opportunity.
  • Bots won’t replace apps is a fantastic piece by WeChat’s product manager on how text-based bots alone typically do not provide a good user experience. Instead, chat apps with automated responses, user data and basic web browser functionality are what has allowed bot concepts to bloom in Asian markets. There’s a lot of good information in this post to unpack.

Microservices

Microservices are an application architecture style where independent, self-contained programs with a single purpose each can communicate with each other over a network. Typically, these microservices are able to be deployed independently because they have strong separation of responsibilities via a well-defined specification with significant backwards compatibility to avoid sudden dependency breakage.

Why are microservices getting so much buzz?

Microservices follow in a long trend of software architecture patterns that become all the rage. Previously, CORBA and (mostly XML-based) service-oriented architectures (SOA) were the hip buzzword among ivory tower architects.

However, microservices have more substance because they are typically based on RESTful APIs that are far easier for actual software developers to use compared with the previous complicated XML-based schemas thrown around by enterprise software companies. In addition, successful applications begin with a monolith-first approach using a single, shared application codebase and deployment. Only after the application proves its usefulness is it then broken down into microservice components to ease further development and deployment. This approach is called the “monolith-first” or “MonolithFirst” pattern.

Microservice resources

  • Martin Fowler’s microservices article is one of the best in-depth explanations for what microservices are and why to consider them as an architectural pattern.
  • Why microservices? presents some of the advantages, such as the dramatically increased number of deployments per day, that a well-done microservices architecture can provide in the right situation. Many organizational environments won’t allow this level of flexibility but if yours is one that will, it’s worth considering these points.
  • On monoliths and microservices provides some advice on using microservices in a fairly early stage of a software project’s lifecycle.
  • Developing a RESTful microservice in Python is a good story of how an aging Java project was replaced with a microservice built with Python and Flask.
  • Microservices: The essential practices first goes over what a monolith application looks like then dives into what operations you need to support potential microservices. For example, you really need to have continuous integration and deployment already set up. This is a good high-level overview of the topics many developers aren’t aware of when they embark on converting a monolith to microservices.
  • Using Nginx to Load Balance Microservices explains how an Nginx instance can use configuration values from etcd updated by confd as the values are modified. This setup can be useful for load balancing microservices as the backend services are brought up and taken down.
  • How Microservices have changed and why they matter is a high level overview of the topic with some quotes from various developers around the industry.
  • The State of Microservices Today provides some general trends and broad data showing the increasing popularity of microservices heading into 2016. This is more of an overview of the term than a tutorial but useful context for both developers and non-developers.
  • bla bla microservices bla bla is a transcript for a killer talk on microservices that breaks down the important first principles of distributed systems, including asynchronous communication, isolation, autonomicity, single responsibility, exclusive state, and mobility. The slides along with the accompanying text go into how reality gets messy and how to embrace the constraints inherent in distributed systems.

Application Programming Interfaces

Application programming interfaces (APIs) provide machine-readable data transfer and signaling between applications.

Why are APIs important?

HTML, CSS and JavaScript create human-readable webpages. However, those webpages are not easily consumable by other machines.

Numerous scraping programs and libraries exist to rip data out of HTML but it’s simpler to consume data through APIs. For example, if you want the content of a news article it’s easier to get the content through an API than to scrap the text out of the HTML.

Key API concepts

There are several key concepts that get thrown around in the APIs world. It’s best to understand these ideas first before diving into the API literature.

  • Representation State Transfer (REST)
  • Webhooks
  • JavaScript Object Notation (JSON) and Extensible Markup Language (XML)
  • Endpoints

Webhooks

A webhook is a user-defined HTTP callback to a URL that executes when a system condition is met. The call alerts the second system via a POST or GET request and often passes data as well.

Webhooks are important because they enable two-way communication initiation for APIs. Webhook flexibility comes in from their definition by the API user instead of the API itself.

For example, in the Twilio API when a text message is sent to a Twilio phone number Twilio sends an HTTP POST request webhook to the URL specified by the user. The URL is defined in a text box on the number’s page on Twilio as shown below.

Webhook definition in the Twilio API.

API open source projects

  • Swagger is an open source project written in Scala that defines a standard interface for RESTful APIs.

API resources

  • Zapier has an APIs 101 free guide for what APIs are, why they are valuable and how to use them properly.
  • GET PUT POST is a newsletter just about APIs. Past issues have included interviews with the developers behind Stripe, Dropbox and Coinbase.
  • What RESTful actually means does a fantastic job of laying out the REST principles in plain language terms while giving some history on how they came to be.
  • What is a webhook? by Nick Quinlan is a plain English explanation for what webhooks are and why they are necessary in the API world.
  • Simplicity and Utility, or, Why SOAP Lost provides context for why JSON-based web services are more common today than SOAP which was popular in the early 2000s.
  • API tools for every occasion provides a list of 10 tools that are really helpful when working with APIs that are new in 2015.

APIs learning checklist

  1. Learn the API concepts of machine-to-machine communication with JSON and XML, endpoints and webhooks.
  2. Integrate an API such as Twilio or Stripe into your web application. Read the API integration section for more information.
  3. Use a framework to create an API for your own application. Learn about web API frameworks on the API creation page.
  4. Expose your web application’s API so other applications can consume data you want to share.