There has been much research that shows that page performance -- how fast a page loads -- can have a significant effect on the bottom line of a company. However, most of the research is focused on eCommerce sites. I have long been interested in seeing how page performance affects users on a news site, and Vox.com gave me the opportunity to do some research.
What I measured
To perform this study, I used a product called mPulse that allows you to correlate page performance with user behavior. Measurements are made on every page load and sent back through JavaScript. This tool is an example of Real User Measurement, often referred to as RUM.
The particular user behavior I focused on is bounce rate. Bounce rate measures what percentage of users abandon the site after looking at one page. Obviously, we desire that users view many pages on our site, so finding factors that can reduce bounce rate is important. Research has shown that bounce rate can be influenced by other factors, particularly by the route a user reaches the website. Traffic from social media, such as Facebook and Twitter, typically have a high bounce rate. This makes sense; a user clicking on a link in their social media stream is typically going to view the specific article they clicked on and then return to reading their stream. So in this research, I separate out the traffic that can be identified as coming from Facebook and Twitter from what we term "organic" traffic. For this study, I define organic traffic as traffic that starts on the vox.com home page. Organic traffic is more likely to visit more than one page on the website.
I set a cookie to track the original route by which a user reached the website: organic, Twitter, Facebook, or other. I also used the cookie to keep track if this was the user's first time visiting the site, or if it was a return visit. Because the cookie was not in place for a long time, it is possible that some users had visited the site before I started keeping record, or that some users cleared cookies and thus appeared as new visitors. However, we can use this as an approximate sample of the behavior of first-time visitors.
If we look at organic traffic overall, we see bounce rates as a function of load time as the green line in the graph below. Load time is defined as the point at which the browser fires the onload event. The chart below was compiled using mPulse.
The blue area chart reports the number of sessions that experienced a given load time. The weird looking data under a second will have almost no bearing in our analysis because it is based on only a few sessions.
The graph seems to show a clear correlation between load time and bounce rate. For each additional second of load time, the bounce rate increases by 1.5 percent. As I examined the data further I began to wonder which way the correlation worked: did a slower page load time increase bounce rate, or was the bounce rate influencing the calculation of load time? I learned from the mPulse team that measurements of load time include all the data collected over a user session. This made me wonder if there was a "cache effect" influencing the numbers. If a user visits multiple pages on the site during a visit, the load times on the second and subsequent pages will likely be less because many resources used to display any page will already be in the browser's cache from the first page view. Because these resources do not need to be sent over the network, the onload time will be less than if a visitor had came to this page directly. So a user visiting several pages will have a overall lower median load time than if multiple users each visited the same set of pages separately.
To get a sense if the cache effect influenced the numbers, I change my measurement technique to only include the load time for the first page in the user's session. This chart shows bounce rate versus load time with this data.
The effect of load time on bounce rate is much less clear. There is no clear increase in bounce rate as the load time becomes greater. These results run counter to the prevailing wisdom of the performance community, so I wanted to make sure the data was correct before publishing it. In addition, a bounce rate close to 90 percent for organic trafficdoes not match with the bounce rate we find with other tools. Luckily, mPulse can deliver the raw data to a Amazon AWS S3 bucket. So I created my own analysis of the data.
The raw data comes from an mPulse-enhanced version of the output of the open-source boomerang performance measurement library. Boomerang is a JavaScript library that reports back hundreds of measurements about page performance and the content of the page. It can be set up to associate the data with meaningful metadata about the pages it is measuring. Through mPulse, I configured it to note what type of page (Home Page, Article, Video Post, etc.) was being visited and what the source of the page view was (organic, Facebook, Twitter, etc.)
I decided to use Elasticsearch to analyze the data I had from boomerang. To get the data into Elasticsearch, I used a tool in the Elastic stack: Logstash. Logstash provides a way to take in data from a variety of sources, perform transformations of the data, and then push the data into Elasticsearch or other tools. I used the AWS S3 input plugin so that data placed into a specific S3 bucket would serve as the input for my analysis. In Logstash, I trimmed out some of the fields I did not need and did a determination of the session ID for each data point because the it could appear in two separate fields in the raw output. I ran Logstash on an AWS EC2 instance, and the output was sent to an Found Elasticsearch instance. After loading the boomerang data into an index, I created a second entity-centric index, a technique I learned about from Mark Harwood at Elasticon 2015. This second index was built using the scan and scroll technique so that I could read every page load data point (almost 30 million) of page performance data and from it build an index that contained information on each session. This allowed me to take the large set of raw page impression data and turn it into a smaller data set containing session data that summarized the user's visit to the website. I had a small ruby program to coordinate the processing of data.
With this second, smaller index, I was able to do aggregations to answer my question: does the load time of the first page a user sees affect the bounce rate. In the chart below, I compare the effect of first load time on organic traffic with the bounce rate for organic traffic visiting the site for the very first time. Note that the vertical scale shows only a small range in bounce rate to highlight the differences.
For organic traffic overall (the red line), there is no clear effect of the first page load time on the bounce rate. If we restrict the analysis to the very first visit to the site for organic traffic, we do see a small effect of load time on bounce rate around about 0.2 percent increase in bounce rate for each extra second of first page load time.
There are several key takeaways from this chart.
- For organic traffic, it appears that there is only a small correlation of bounce rate with page load time.
- The bounce rates in this chart are similar to what we see in other tools, so this points out the importance of checking your numbers. I believe that the high bounce rates reported in mPulse for the second chart above is due to limitations in the tool that made it very difficult to separate out the effect of the load time of the very first page in a session.
- There does appear to be a cache effect occurring in tools that use the page load time for all pages in the session. By focusing on the load time of the first page visited, I found that the correlation of bounce rate to page load time was much smaller than what mPulse reported.
- It is important to check your data and make sure that what you are measuring is what you are thinking you are measuring. For example, by examining the raw data, I saw several cases where the same page was loaded multiple times during a session. It is common for mobile browsers to reload the page when you switch browser tabs or when you reopen the web browser. Such a session should not be considered to be a non-bounce session, unless there is a distinct second url visited during the session. By doing my own analysis, I was able to rule out faulty conclusions and to distinguish browser behavior from user behavior.
The next chart shows the impact of first page load time on bounce rate for different traffic sources.
When we consider all forms of traffic to the site, the first page load time does have a significant impact on bounce rate. There is roughly a 1.2 percent increase in bounce rate for each additional second of load time. For traffic from Facebook, the effect is smaller, although the overall bounce rate is higher for Facebook than any other traffic source. Strangely, there appears to be a negative correlation of bounce rate with load time for Twitter. The Twitter data set was smaller than for other data sources, encompassing over 170,000 sessions over the two weeks of measurement.
Conclusions
So, does page performance matter for a news site? For organic traffic, if we filter out the cache effect, we can conclude that page performance has only a small effect on bounce rate. However, overall for all traffic, we see the classic pattern of worsening bounce rate as page load times increase. Knowing this, we can direct our efforts on optimizing our website. Since the time this data was collected, Vox has made many significant improvements in load time.
It is key to know the context of your data, so that you are really measuring what you think you are measuring. Data can keep us honest about what we hope to accomplish through optimization, but it is not a substitute for sound judgement. This experiment has looked at one aspect of user behavior, but there are many other to consider such as the likelihood that a site visitor will come back, and how long the user spends on the page. We know from personal experience that slow pages are frustrating, so page performance will always be important at Vox Media.