Web analytics

Web analytics involves collecting, processing, visualizing web data to enable critical thinking about how users interact with a web application.

Why is web analytics important?

User clients, especially web browsers, generate significant data while users read and interact with webpages. The data provides insight into how visitors use the site and why they stay or leave. The key concept to analytics is learning about your users so you can improve your web application to better suit their needs.

Web analytics concepts

It’s easy to get overwhelmed at both the number of analytics services and the numerous types of data points collected. Focus on just a handful of metrics when you’re just starting out. As your application scales and you understand more about your users add additional analytics services to gain further insight into their behavior with advanced visualizations such as heatmaps and action funnels.

User funnels

If your application is selling a product or service you can ultimately build a user funnel (often called “sales funnel” prior to a user becoming a customer) to better understand why people buy or don’t buy what you’re selling. With a funnel you can visualize drop-off points where visitors leave your application before taking some action, such as purchasing your service.

Open source web analytics projects

  • Piwik is a web analytics platform you can host yourself. Piwik is a solid choice if you cannot use Google Analytics or want to customize your own web analytics platform.
  • Open Web Analytics is another self-hosted platform that integrates through a JavaScript snippet that tracks users’ interactions with the webpage.

Hosted web analytics services

  • Google Analytics is a widely used free analytics tool for website traffic.
  • Clicky provides real-time analytics comparable to Google Analytics’ real-time dashboard.
  • MixPanel‘s analytics platform focuses on mobile and sales funnel metrics. A developer builds what data points need to be collected into the server side or client side code. MixPanel captures that data and provides metrics and visualizations based on the data.
  • KISSmetrics‘ analytics provides context for who is visiting a website and what actions they are taking while on the site.
  • Heap is a recently founded analytics service with a free introductory tier to get started.
  • CrazyEgg is tool for understanding a user’s focus while using a website based on heatmaps generated from mouse movements.

Python-specific web analytics resources

General web analytics resources

Web analytics learning checklist

  1. Add Google Analytics or Piwik to your application. Both are free and while Piwik is not as powerful as Google Analytics you can self-host the application which is the only option in many environments.
  2. Think critically about the factors that will make your application successful. These factors will vary based on whether it’s an internal enterprise app, an e-commerce site or an information-based application.
  3. Add metrics generated from your web traffic based on the factors that drive your application’s success. You can add these metrics with either some custom code or with a hosted web analytics service.
  4. Continuously reevaluate whether the metrics you’ve chosen are still the appropriate ones defining your application’s success. Improve and refine the metrics generated by the web analytics as necessary.

Caching

Caching can reduce the load on servers by storing the results of common operations and serving the precomputed answers to clients.

For example, instead of retrieving data from database tables that rarely change, you can store the values in-memory. Retrieving values from an in-memory location is far faster than retrieving them from a database (which stores them on a persistent disk like a hard drive.) When the cached values change the system can invalidate the cache and re-retrieve the updated values for future requests.

A cache can be created for multiple layers of the stack.

Caching backends

  • memcached is a common in-memory caching system.
  • Redis is a key-value in-memory data store that can easily be configured for caching with libraries such as django-redis-cache and the similarly-named, but separate project django-redis.

Caching resources

  • Caching: Varnish or Nginx?” reviews some considerations such as SSL and SPDY support when choosing reverse proxy Nginx or Varnish.
  • Caching is Hard, Draw me a Picture has diagrams of how web request caching layers work. The post is relevant reading even though the author is describing his Microsoft code as the impetus for writing the content.
  • While caching is a useful technique in many situations, it’s important to also note that there are downsides to caching that many developers fail to take into consideration.

Caching learning checklist

  1. Analyze your web application for the slowest parts. It’s likely there are complex database queries that can be precomputed and stored in an in-memory data store.
  2. Leverage your existing in-memory data store already used for session data to cache the results of those complex database queries. A task queue can often be used to precompute the results on a regular basis and save them in the data store.
  3. Incorporate a cache invalidation scheme so the precomputed results remain accurate when served up to the user.

Monitoring

Monitoring tools capture, analyze and display information for a web application’s execution. Every application has issues arise throughout all levels of the web stack. Monitoring tools provide transparency so developers and operations teams can respond and fix problems.

Why is monitoring necessary?

Capturing and analyzing data about your production environment is critical to proactively deal with stability, performance, and errors in a web application.

Difference between monitoring and logging

Monitoring and logging are very similar in their purpose of helping to diagnose issues with an application and aid the debugging process. One way to think about the difference is that logging happens based on explicit events while monitoring is a passive background collection of data.

For example, when an error occurs, that event is explicitly logged through code in an exception handler. Meanwhile, a monitoring agent instruments the code and gathers data not only about the logged exception but also the performance of the functions.

This distinction between logging and monitoring is vague and not necessarily the only way to look at it. Pragmatically, both are useful for maintaining a production web application.

Monitoring layers

There are several important resources to monitor on the operating system and network level of a web stack.

  1. CPU utilization
  2. Memory utilization
  3. Persistence storage consumed versus free
  4. Network bandwidth and latency

Application level monitoring encompasses several aspects. The amount of time and resources dedicated to each aspect will vary based on whether an application is read-heavy, write-heavy, or subject to rapid swings in traffic.

  1. Application warnings and errors (500-level HTTP errors)
  2. Application code performance
  3. Template rendering time
  4. Browser rendering time for the application
  5. Database querying performance

Open source monitoring projects

  • Sentry started life as a Python-only monitoring project but can now be used for any programming language.
  • statsd is a node.js network daemon that listens for metrics and aggregates them for transfer into another service such as Graphite.
  • Graphite stores time-series data and displays them in graphs through a Django web application.
  • Bucky measures the performance of a web application from end user’s browsers and sends that data back to the server for collection.
  • Sensu is an open source monitoring framework written in Ruby but applicable to any programming language web application.
  • Graph Explorer by Vimeo is a Graphite-based dashboard with added features and a slick design.
  • PacketBeat sniffs protocol packets. Elasticsearch then allows developers to search the collected data and visualize what’s happening inside their web application using the Kibana user interface.
  • Munin is a client plugin-based monitoring system that sends monitoring traffic to the Munin node where the data can be analyzed and visualized. Note this project is written in Perl so Perl 5 must be installed on the node collecting the data.

Hosted monitoring services

Hosted monitoring software takes away the burden of deploying and operating the software yourself. However, hosted monitoring costs (often a significant amount of) money and take your application’s data out of your hands so these services are not the right fit for every project.

  • Sentry is the hosted version of the open source tool that is used to monetize and support further development.
  • New Relic provides application and database monitoring as well as plug ins for capturing and analyzing data about other devleoper tools in your stack, such as Twilio.
  • Rollbar instruments both the server side and client side to capture and report exceptions. The pyrollbar code library provides quick integration for Python web applications. There are also specific instructions for common web frameworks such as Django and Pyramid.
  • Status.io focuses on uptime and response metrics transparency for web applications.
  • StatusPage.io (yes, there’s both a Status and StatusPage.io) provides easy set up status pages for monitoring application up time.
  • PagerDuty alerts a designated person or group if there are stability, performance, or uptime issues with an application.
  • Opbeat Built for django. Opbeat combines performance metrics, release tracking, and error logging into a single simple service.

Monitoring resources

Monitoring learning checklist

  1. Review the software-as-a-service and open source monitoring tools below. Third party services tend to be easier to set up and host the data for you. Open source projects give you more control but you’ll need to have additional servers ready for the monitoring.
  2. My recommendation is to install New Relic‘s free option with the trial period to see how it works with your app. It’ll give you a good idea of the capabilities for application-level monitoring tools.
  3. As your app scales take a look at setting up one of the the open source monitoring projects such as StatsD with Graphite. The combination of those two projects will give you fine-grained control over the system metrics you’re collecting and visualizing.

DevOps

DevOps is the combination of application development and operations, which minimizes or eliminates the disconnect between software developers who build applications and systems administrators who keep infrastructure running.

Why is DevOps important?

When the Agile methodology is properly used to develop software, a new bottleneck often appears during the frequent deployment and operations phases. New updates and fixes are produced so fast in each sprint that infrastructure teams can be overwhelmed with deployments and push back on the pace of delivery. To allievate some of these issues, application developers are asked to work closely with operations folks to automate the delivery from development to production.

DevOps tooling resources

  • DevOps: Python tools to get started is a presentation slideshow that explains that while DevOps is a culture, it can be supported by tools such as Fabric, Jenkins, BuildBot and Git which when used properly can enable continuous software delivery.
  • A look at DevOps tools landscape provides an introductory overview of the tooling that is typically required to perform DevOps. The tools range from source control systems, continuous integration, containers to orchestration. For an Atlassian-centric perspective on tooling, take a look at this post on how to choose the right DevOps tools which is biased towards their tools but still has some good insight such as using automated testing to provide immediate awareness of defects that require fixing.

General DevOps resources

  • DevOps vs. Platform Engineering considers DevOps an ad hoc approach to developing software while building a platform is a strict contract. I see this as “DevOps is a process”, while a “platform is code”. Running code is better than any organizational process.
  • So, you’ve been paged provides their development team’s “Communicate -> Learn -> Act” structure for handling production issues based on lessons learned from their years of experience dealing with incidents.
  • Operations for software developers for beginners gives advice to developers who have never done operations work and been on call for outages before in their career. The advantage of DevOps is greater ownership for developers who built the applications running in production. The disadvantage of course is the greater ownership also leads to much greater responsibility when something breaks!
  • Why are we racing to DevOps? is a very high level summary of the benefits of DevOps to IT organizations. It’s not specific to Python and doesn’t dive into the details, but it’s a decent start for figuring out why IT organizations consider DevOps the hot new topic after adopting an Agile development methodology.