HTTPArchive and Requests to Localhost

The HTTPArchive is a great tool for learning about what makes the web tick.  Every 2 weeks, the top 1M sites are tested on Chrome and on Android with WebPageTest, and the results are opensource and easily searchable with Google BigQuery.  My role at AT&T on Video Optimizer has me studying how video is transported over mobile networks, and I thought I’d dig into the Archive to look at the video data.  So, I have been using the archive to learn about video streaming (post coming soon), but as I was looking at video requests/responses from webpages, I saw a funny request that resulted in a failure.

It reminded me of this tweet from a few years ago:

Screen Shot 2017-10-14 at 3.45.28 PM.png



Yes, it was a request to localhost – in production. When building websites, developers often use a server on their local computer to test and develop, and the IP used to connect to that local server is, and can also be reached by typing ‘localhost’ in the browser.  The problem is, of course, that the url “localhost” or “” will only work on the ONE computer local to the user – it cannot be shared.  So any request to in production would have the browser looking for a local webserver on the customer’s computer – one that will probably NOT have the information the developer was hoping to share.


So, I got sidetracked from studying video.  How many of these exist in the entire HTTPArchive?  Let’s build a SQL query to find out:


sum(if(url contains "http://127",1,0)) as http, 
sum(if(url contains "https://127",1,0)) as https, 
sum(if(status <200, 1,0)) as total,
from (

 url, respSize,mimetype, status, pageid
from httparchive.runs.latest_requests
where (url contains "" )
order by status desc


In this query, I am searching every request from the latest run “httparchive.runs.latest_requests” data set, and filtering on urls with “”  The next query contains if statements – if the url contains “http://127&#8221;, add 1 to the total, otherwise 0.  This gives me a running count of the urls with “http://127&#8221; in them.  I do the same for https, and also calculate a total number.

But, Let’s have some Fun First

Rather than just announce the results – I thought a poll on Twitter might be a fun idea:


So I posted the poll.. and I started looking at the URLs in the list… I noticed that some are using HTTP and others using HTTPS.  This prompted a second poll:

But Then, Things Got Even More Interesting

Looking at the raw list of urls – over 50% had the same format:

Hmm, why are so many connections to Spotify pointing to Localhost?  Wouldn’t this massive issue be one that Spotify is aware of?  On each website with this error, I see the same failure occurs multiple times with incrementing different port numbers.

What the heck is going on? Well, it turns out that on websites like, there is a Spotify play widget:

Screen Shot 2017-10-14 at 4.13.10 PM.png

When you press play, it connects to the Spotify app on your computer (if you have it running), and the app shows the same data as the website, and keeps playing the playlist when you move away form the homepage. It’s pretty cool, and the way that these 2 services connect is through a local service run by the Spotify app that the browser connects too.  Of course, the computers powering WebPageTest do not have Spotify running on them – but wouldn’t you expect this widget to fail gracefully for those users?

It turns out, the service does degrade gracefully on a regular computer.  In playing with WebPageTest, the widget DOES fail gracefully when testing on Chrome in Dulles, but the error arises when we test on Chrome in EC2, so it likely has something to do with the configuration of the servers on Amazon. Since the HTTPArchive runs its tests in the cloud, we see these errors there, and it appeared in my data query.

OK, The Answer: How Many Requests to

Ok, including the connections to Spotify, we see 4,934 connections directed to http(s)://  Now, there are 2,619 connections to the Spotify link (it retries many different ports if it fails – adding to the request count), leaving 2,315 connections to localhost.

So, the middle answer in my poll was right BOTH with and without Spotify. <whew!>  Imagine if I had messed up a Twitter poll!



Breaking down these to HTTP:HTTPS ratio.

The Localhost calls to Spotify are 2559:60 – 97% HTTP.

The calls not pointing to the local Spotify server are almost exactly the opposite:

234 HTTP:2081 HTTPS.  So, developers who point to localhost are doing it securely, using HTTPS connections 90% of the time.


One thought on “HTTPArchive and Requests to Localhost

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.