The HTTPArchive is a great tool for learning about what makes the web tick. Every 2 weeks, the top 1M sites are tested on Chrome and on Android with WebPageTest, and the results are opensource and easily searchable with Google BigQuery. My role at AT&T on Video Optimizer has me studying how video is transported over mobile networks, and I thought I’d dig into the Archive to look at the video data. So, I have been using the archive to learn about video streaming (post coming soon), but as I was looking at video requests/responses from webpages, I saw a funny request that resulted in a failure.
It reminded me of this tweet from a few years ago:
Yes, it was a request to localhost – in production. When building websites, developers often use a server on their local computer to test and develop, and the IP used to connect to that local server is 127.0.0.1, and can also be reached by typing ‘localhost’ in the browser. The problem is, of course, that the url “localhost” or “127.0.0.1” will only work on the ONE computer local to the user – it cannot be shared. So any request to 127.0.0.1 in production would have the browser looking for a local webserver on the customer’s computer – one that will probably NOT have the information the developer was hoping to share.
So, I got sidetracked from studying video. How many of these exist in the entire HTTPArchive? Let’s build a SQL query to find out:
select sum(if(url contains "http://127",1,0)) as http, sum(if(url contains "https://127",1,0)) as https, sum(if(status <200, 1,0)) as total, from ( select url, respSize,mimetype, status, pageid from httparchive.runs.latest_requests where (url contains "127.0.0.1" ) order by status desc )
In this query, I am searching every request from the latest run “httparchive.runs.latest_requests” data set, and filtering on urls with “127.0.0.1.” The next query contains if statements – if the url contains “http://127”, add 1 to the total, otherwise 0. This gives me a running count of the urls with “http://127” in them. I do the same for https, and also calculate a total number.
But, Let’s have some Fun First
Rather than just announce the results – I thought a poll on Twitter might be a fun idea:
HTTP or HTTPS
So I posted the poll.. and I started looking at the URLs in the list… I noticed that some are using HTTP and others using HTTPS. This prompted a second poll:
But Then, Things Got Even More Interesting
Looking at the raw list of urls – over 50% had the same format:
Hmm, why are so many connections to Spotify pointing to Localhost? Wouldn’t this massive issue be one that Spotify is aware of? On each website with this error, I see the same failure occurs multiple times with incrementing different port numbers.
What the heck is going on? Well, it turns out that on websites like www.premierleague.com/home, there is a Spotify play widget:
When you press play, it connects to the Spotify app on your computer (if you have it running), and the app shows the same data as the website, and keeps playing the playlist when you move away form the homepage. It’s pretty cool, and the way that these 2 services connect is through a local service run by the Spotify app that the browser connects too. Of course, the computers powering WebPageTest do not have Spotify running on them – but wouldn’t you expect this widget to fail gracefully for those users?
It turns out, the service does degrade gracefully on a regular computer. In playing with WebPageTest, the widget DOES fail gracefully when testing on Chrome in Dulles, but the error arises when we test on Chrome in EC2, so it likely has something to do with the configuration of the servers on Amazon. Since the HTTPArchive runs its tests in the cloud, we see these errors there, and it appeared in my data query.
OK, The Answer: How Many Requests to 127.0.0.1?
Ok, including the connections to Spotify, we see 4,934 connections directed to http(s)://127.0.0.1. Now, there are 2,619 connections to the Spotify link (it retries many different ports if it fails – adding to the request count), leaving 2,315 connections to localhost.
So, the middle answer in my poll was right BOTH with and without Spotify. <whew!> Imagine if I had messed up a Twitter poll!
Breaking down these to HTTP:HTTPS ratio.
The Localhost calls to Spotify are 2559:60 – 97% HTTP.
The calls not pointing to the local Spotify server are almost exactly the opposite:
234 HTTP:2081 HTTPS. So, developers who point to localhost are doing it securely, using HTTPS connections 90% of the time.