One of the main culprits for slowly loading pages is the presence of redirects. The site tells the browser the content has moved, and a second request ash to be made for the content to actually be downloaded.
A few years ago, I attempted to create a query to discover the webpage in the HTTP Archive with the highest number of redirects before the first HTML response. I was never totally satisfied with the results, and keep coming back from time to time to identify the biggest culprits.
I think I have it working now. The bottom query identifies the requestid for the “firsthtml.” I then join it to another query for all files with the same pageid (meaning the same website), but with a requestid smaller than the firsthtml file. I further limit to non HTTP 200 responses (this is ~95% 301 and 302 redirects).
select allreq.pageid, requestid, status
from httparchive:summary_requests.2018_10_15_mobile allreq
join(
//gets firsthtml for pageid
select pageid, requestid as htmlreq, firsthtml
from httparchive:summary_requests.2018_10_15_mobile
where firsthtml=true
) firsthtmlreq
on (allreq.pageid= firsthtmlreq.pageid)
where allreq.requestid < htmlreq and allreq.status !=200
order by allreq.pageid desc, requestid asc
By counting them all, and joining to the url of the page from the summary_pages table, the final query looks like this:
select allreq.pageid, count(allreq.pageid) cnt, pages.url
from(
select allreq.pageid, requestid, status
from httparchive:summary_requests.2018_10_15_mobile allreq
join(
//gets firsthtml for pageid
select pageid, requestid as htmlreq, firsthtml
from httparchive:summary_requests.2018_10_15_mobile
where firsthtml=true
) firsthtmlreq
on (allreq.pageid= firsthtmlreq.pageid)
where allreq.requestid < htmlreq and allreq.status !=200
order by allreq.pageid desc, requestid asc
) redirects
join
(select url, pageid
from httparchive:summary_pages.2018_10_15_mobile) pages
on(pages.pageid = redirects.pageid)
//where allreq.pageid=35053890
group by allreq.pageid, status, pages.url
order by cnt desc
There are a few false positives, but a number of interesting trends that we can quickly find in the results.
And the Winner is:
The “winner” (Its really hard to call a website with a lot of redirects a winner) has 13 redirects before successfully requesting HTML It actually fails when re-run in WebPageTest with the error “Too Many redirects”
But what is the site actually doing? Let’s take a look in devTools:
- https://www.site.com redirects to https://www.site.com/en-US, as the site is not originally in English.
- The www English site sees that I am testing on a mobile device, so redirects me to http://m.site.com
- Now here, they are doing something correctly. They are using
“Upgrade-Insecure-Requests:1” to enforce HTTPS on all pages.
- However, you’ll notice that they redirected to an http:// site.
- Now here, they are doing something correctly. They are using
- http://m.site.com redirects to https://m.site.com, because of the directive to enforce HTTPS.
- Now, the https, m dot site recognizes that the browser is set to english, so we redirect to https://m.site.com/en-US
- We now enter a loop of sorts and the english site redirects to http://m.site.com (and we repeat steps 2-5 several times) before the page fully loads.
On an emulated 3G connection, we see that the redirects add about 10.5 seconds to the time to First byte.
Trends in the Data
Logout Before Login
There are a lot of Login pages on the internet. Many of the pages with a lot of redirects are sites that what to ensure you are completely logged off every domain before logging you in. The waterfalls look like this:
The page circulates through all the domains you might be logged into, and logs you out. In the above screenshot, we see requests to subdomains portal, talent, and onboarding – all are logged out, and then you can commence logging in. This appears too the the major method used for many e-commerce and e-mail login systems (as there are many in the results). Sometimes these requests are cached on subsequent visits, but many have the same number of redirects on subsequent visits.
Journals
All of the Elsevier Health Journals use the same initial load setup. As above, the sites are looking to see if the site is logged in properly. Then, several cookies are added serially (each with its own 302 redirect). As a result, all of these journals have 7 redirects on initial load:
Finally, one of the top offenders in the data is CVS.com. In testing with WebPageTest, I initially saw no redirects (and sometimes the HTTP Archive catches a site in a funny state that no longer exists). But testing in my own browser (in the EU) the site did fail. It turns out that the redirect to http://www.cvs.com/international.html begins an infinite looping that quickly fails in Chrome:
Conclusion
Using multiple redirects on page load increases the time to first byte, and can seriously effect the load time of your website (especially on mobile!). Working to optimize login procedures (minimizing the number of logouts before logging in), can speed up the content appearing on the page.
Finally, test every version of your website. If you redirect to an English version, or an international version – they may not be your main target audience, but since you built the page, you should ensure that it loads without an infinite number of redirects.