Wednesday, October 23, 2013

Cookie counting

Understanding cookies through user studies

A cookie is a small piece of data stored in the browser by websites. Although cookies are mostly invisible, they serve many purposes such as saving items in shopping carts, authenticating to websites, and displaying targeted ads or other personalized content. Understanding more about how websites use cookies allows us to write tools that manage cookies effectively.

In June 2013, the Mozilla User Research team ran a paid study of 573 Firefox users that included data on cookie and browsing events. The user population was census-balanced and included only US users. The study ran for a median of 18.8 days, during which time we observed 18.4 million attempts to set cookies by examining HTTP Set-Cookie headers. Each Set-Cookie header counts as a single event, even though it may contain multiple cookies. Storing multiple pieces of information across separate cookies or combining them into a single cookie are equally powerful. Set-Cookie headers are not the only method for setting cookies, but they are sufficiently prevalent to be representative. We did not observe read events due to volume constraints. We observed 2.84 million pages loaded, measured by counting tab-ready events.

N = 573 Tab-ready events Set-Cookie events Tab-ready events/day
Median 3552 12297 189
Total 2842270 12439439

Counting origins

Throughout this post we use top-level domains (from the Public Suffix list) plus one component to count origins. For example, we consider foo.example.com and bar.example.com to represent the same origin. The public suffix mechanism is not perfect, because a single organization may own many origins (e.g., doubleclick.net and google.com both belong to Google). In total, study users visited 40682 unique origins (counted by tab-ready events) and received set-cookie events from 32786 unique origins. Below is the distribution of cookie events per tab event.

Who uses cookies?

Building effective cookie management tools requires understanding who sets cookies. Cookie activity is difficult to characterize because sites vary highly in both the number of cookies they set and the amount of third-party content (which may set cookies on behalf of the third-party site) that they include. Although each page load event incurs on average around 3.6 Set-Cookie events, many sites incur an order of magnitude more.

The graph below shows the 20 origins responsible for the most set-cookie events. These origins represent 0.05% of unique cookie-setting origins and are responsible for 42.7% of set-cookie events seen in the study data. Set-cookie attempts are either first-party, where the origin of the cookie being set is the same as the one in the location bar, or third-party, where the origins don't match.

Who uses third-party cookies?

Third-party cookies have many purposes. For example, social widget implementations usually rely on third-party cookies to display personalized content, and inline ads rely on third-party cookies to provide targeted ads and perform frequency capping. Of the 12.4 million set-cookie events in the study, 50.4% are for third-party cookies (shown in red in the graph above).

The graph below shows the top 20 origins setting third-party cookies, responsible for 41.1% of third-party set-cookie events. adnxs.com belongs to AppNexus, an ad exchange. Facebook sets mostly first-party cookies, but because Facebook's social widgets are included on many sites, Facebook sets many third-party cookies (which may have originally been created in a first-party context). Of the top 20 origins, 18 primarily offer advertising services.

It is interesting to compare this data to Table IV from Eubank et al.'s survey on third-party cookies. In the Eubank survey, the authors used simulated data from crawling Alexa's top 500 websites, included all types of third-party embedded data, and did not canonicalize domains using the public suffix list. Even though the methodology is different, many origins in the top 20 overlap.

How many third-party cookies are from origins the user knows?

One interesting question is whether or not users intentionally accept cookies, especially in the case of third-party cookies. We examine two possible heuristics for estimating whether a user interaction with an origin is intentional:
  1. The user has already accepted cookies from the origin (pre-existing cookie condition)
  2. The user has visited the origin by entering it into the location bar (simulated history condition)
Both of these conditions rely on previous interactions. Any potential changes to the way browsers handle third-party cookies must consider what to do with previous interactions (in this case, existing cookies and location bar history).

Pre-existing cookie condition

We did not ask study participants to clear cookies before beginning the study. Of the third-party set-cookie events, 90.8% of them were sent to users who had already accepted cookies from that origin. The graph below shows this percentage for the top 20 origins that set third-party cookies. In this graph, nearly all origins are above 75% with the exception of doubleclick.net. This dip can be explained by a handful of users who have a particular security addon installed.

Simulated history condition

Another heuristic for evaluating if a user has interacted with a site is whether that origin has appeared in the location bar, as measured by tab-ready events. This lets us count third-party origins that have previously appeared in a first-party context.

For each user, we take the entire set of origins extracted from tab-ready events to simulate that user’s history, then count whether the origins in the Set-Cookie events appear in the simulated history. The graph below shows this percentage for the top 20 origins of third-party cookies.


Overall, 19.6% of third-party cookie events came from origins in users’ simulated history in the course of the study. Not surprisingly, nearly all users had visited facebook.com and youtube.com, which are currently ranked 2nd and 3rd most visited sites according to Alexa. Interestingly, adnxs.com also appeared much of the time in simulated histories, even though the rank of adnxs.com is currently 576 in the US according to Alexa. From looking through tab-ready events, adnxs.com appeared in redirects and popups.

How long do cookies live?

The Set-Cookie HTTP header has an optional expiration time that tells the browser how long to keep the cookie. From the graph below, many cookies are long-lived, possibly longer-lived than the installation of the operating system or browser. 20% of third-party cookie expiration times were one week or less, and 51% of third-party cookie expiration times were longer than 6 months.

What's next?

Data from real users is crucial to understanding how websites use cookies and therefore what kind of technical solutions to cookie management make sense (or if indeed we should be concentrating on cookies at all). We hope that this is just the start of using data to shape our technologies and policies. Please join dev-privacy to continue the discussion.

Many thanks to Gregg Lind for deploying the study and to Jonathan Mayer, Alex Fowler, John Jensen, and Chris Karlof for reviewing this post.

Wednesday, July 17, 2013

Be who you want, when you want

Back in May I presented a paper on contextual identity, coauthored with Sid Stamm, at the Web 2.0 Security and Privacy workshop. Contextual identity is the notion that people choose how to present themselves depending on context, such as their audience or location. In contrast, external forces (such as naming policies imposed by social networks) promote the idea of having all your identities in one big identity. Although this is often convenient or desired, conflating all your identities can lead to serious privacy violations.

The desire for spontaneous, positive human interaction often requires sharing personal information, and sharing information doesn't negate the need or desire for privacy. We still have far to go when it comes to understanding how typical users think about privacy, publicity and identity, though I am delighted that the Mozilla User Research team has recently made inroads into understanding user data types.

For your amusement, my talk slides are below. Be on the lookout for Snoop Lion and the popemobile!

Thursday, May 30, 2013

Blushproof

Personal embarrassment, Firefox, and you.

Have you ever been personally embarrassed because your friends or housemates found out something about your browsing history? Even sites that you visit by accident stick around, in the form of your browser history and other local storage. Adult sites are not the only kind of site that have the potential to embarrass, either: it could be something as simple as not wanting your coworkers (or anyone else who has the opportunity to look over your shoulder) to know that you enjoy making ikebana arrangements to blow off steam, or are a huge fan of Hannah Montana.
This domain is now parked, btw

Avoiding personal embarrassment.

Life is full of surprises, some of which are terrifying rather than delightful. Fortunately Firefox has the tools to help, starting with Private Browsing Mode. Private Browsing Mode is intended to be a temporary mode that erases data that accrues while you're in it.
Firefox also has a feature to forget a single website, which will erase data associated with that site. Note that reaching "Forget About This Site" requires multiple steps (Go to the Firefox menu, select "History", then "Show all history", then navigate to the site in the Library window that you'd like to forget, then right-click, then select "Forget About This Site.")


What if you forget to forget?

Like any other feature, both Private Browsing Mode and Forget This Site are easy to misuse. A person might forget to use Private Browsing mode when visiting a potentially embarrassing site, forget to clear history, or not know these features exist in the first place!

Try out Blushproof!

Fortunately, there's a better solution, Blushproof! This is joint work with David Keeler, who is also responsible for implementing recent advances in Click-to-Play and HSTS. The source code is freely available on Github, and you can install it by visiting the Blushproof add-on page.



How it works

Blushproof helps both preventing mistakes and recovering from mistakes that might cause you to blush! Blushproof comes with a blushlist of potentially embarrassing sites and search terms, and prompts you to enter Private Browsing Mode when visiting one of those sites.

A potentially embarrassing search term


It also lets you forget about sites more easily, and add them to your own personal blushlist.


For more information about how Blushproof works, please visit the wiki.

Many thanks to Gregg Lind for the name, initial prototype, and ideation, and to Zach Carter for the awesome logo. Please give it a whirl, and let us know if you find any issues!

Wednesday, March 6, 2013

Can't live with them, can't live without them

Passwords have been around for approximately forever, and despised for nearly that long. However, while great strides have been made in improving password-based authentication, these improvements are not a panacea, often come with maintenance costs of their own, and sometimes even serve as additional attack vectors. While we should keep striving to improve authentication, it is also important to recognize that passwords are not going away any time soon, to understand the drawbacks of existing password solutions, and to try to improve them.

Many of the best practices for passwords (prohibiting reuse, requiring unguessable passwords, being able to remember passwords) seem impossible without a password manager. Firefox has implemented a password manager since inception. The built-in password manager detects the presence of login form and prompts the user to store the password via a notification.
We use data from the same Test Pilot study as in the last post, this time focusing on password statistics. Approximately 5.5% of users have disabled the password manager, which is enabled by default. However, are the remaining 94.5% of users actually using the password manager with intent?

To answer this question, let's first examine the number of users who have stored at least one password in the password manager (as obtained by querying nsILoginManager for all logins). The graph below shows the distribution of the the number of passwords stored in password manager for users who have no more than 30 passwords. This graph represents 96% of the nearly 12K users in the study.
The graph above shows 73.4% of users store at least one password in the password manager, but it's not clear at all that this is not accidental use: after all, 13.9% of users store only a single password, and it is doubtful that a password manager is necessary or beneficial in the single password case. We can also take a look at the distribution of the number of sites stored in the password manager, for users who have no more than 30 sites stored. 


Interestingly, the site distribution has a slightly longer tail than the password distribution, so this graph represents only 89% of users. The shape of the graph is very similar, however, and lends credence to the hypothesis that much of the information stored in the password manager represents accidental use, if we believe that the password manager is not beneficial in the case of a single site.

Because this study did not collect how frequently the password manager triggered on login forms, we can't definitively conclude that users storing only one password represents accidental use. Alternative explanations, ranked in order of increasing possibility according to my personal prejudice:
  1. I only use this browser for work and I don't care about my work password.
  2. I have a secure, memorizable password scheme but can't remember the requirements for this one site.
  3. I only have one main password but it doesn't meet this one site's requirements.
Does this data hint at anything interesting about password reuse? Let's examine mean number of passwords stored versus the number of sites.
This graph represents 97.5% of users and omits 43 outliers who have more than 100 passwords stored. The vertical error bars represent the standard deviation from the mean. This graph falls far south of x=y, the ideal case of storing one password per site. So we can conclude that even while using a password manager, people still reuse passwords across sites.

This level of reuse is not necessarily due to user choice. For example, subdomains on the same intranet frequently require the same password, due to LDAP linkage. This in itself is not a security problem if the security guarantees are identical across subdomains. However, it is a problem when those intranets outsource services to outside vendors through LDAP, requiring password reuse at external parties. Note to future study authors: please include counts for effective TLDs in addition to domains in order to account for this case.

In summary, it seems that even though 94.5% percent of people have the password manager enabled, far fewer users gain any benefit from the password manager. Over the years I have heard the following arguments against using password managers:
  • I only use one password so I don't need one.
  • They don't work across all my devices.
  • They don't work across all my browsers.
  • I don't trust local password managers against local attacks.
  • I don't trust cloud password managers because I don't trust third parties.
The first argument is especially egregious, considering the combined forces of account hijacking, phishing, and password database hacks. The second two arguments can't be solved with a local password manager, or even a browser-specific password manager. The fourth argument can be solved somewhat with master password, but only 1 out of 12K users had master password enabled (security.ask_for_password in about:config), so either that feature is undiscoverable, unusable, or regarded as too insecure to be effective. It is clear from the data that not enough people take advantage of password managers. I look forward to further progress from the identity team to solve some of these issues.

Many thanks to Paul Sawaya and Tanvi Vyas for advice on this post, and to Paul for writing the code to capture password manager statistics.


Thursday, February 21, 2013

Writing for the 98%

How often do users change their default preferences? According to one 2011 study of MS Word users, users rarely do. One could conclude from this that users never change their defaults, and so having user-mutable preferences at all is a waste of effort.

However, this view is reductive! To illustrate, compare the risk of having unintended MS Word settings with having unintended Facebook settings. Both applications will probably continue to function, but one is much more likely to surprise and dismay the user. Thus, users are more highly motivated to change social network settings than editor settings. In fact, according to Pew, 71% of people under 30 self-report changing their social network settings (and 55% of those over 50 do). Perhaps the lesson here isn't that users never change defaults, but rather that users change defaults when it's important enough for them to do so.

It's not always clear what the default for any given preference or feature should be. When the vast majority of users clearly express a preference for one default over another, then the question is easy. When the split is less one-sided, then choosing a default preference can be fraught. This is particularly true when a substantial minority of users are operating under a drastically different risk model. It is hard for one default set of security and privacy preferences to meet the needs of a heterogeneous user base where one person is concerned about invasive network attacks, and one person is concerned about shoulder-surfing.

Which security and privacy preferences are compelling enough for Firefox users to change? Back in December Ilana Segall from Test Pilot team ran a study measuring those preferences that are exposed in about:config and also in the UI. The population for this study is approximately 12000 users who are on the Aurora and Beta channels, and a relatively small number of users who have opted-in by installing Test Pilot on the release channel. In other words, this population is not representative of the Firefox user base, since nearly all Firefox users are on the release channel. In fact, the Test Pilot population is probably much more likely to change settings in general, since they are passionate enough to test Firefox pre-release.


The dates of data collection were from approximately 12/18/2012 for approximately a week, and includes nearly 12000 users. Here's a series of snapshots of the current preferences UI with no changes to default. The preference as named in about:config is listed on the left, along with the percent of users who changed them.


Obviously, no one touches the default security settings very much. There are at least three plausible, possibly-overlapping interpretations: Firefox predicted the most useful default settings correctly, Firefox is doing a poor job converting user actions into saved preferences, or the population who cares about browser security preferences is really that small. The most frequently changed preference on this tab, signon.rememberSignons, controls whether or not Firefox prompts the user to remember passwords. The question of when or why people use the password manager is complex and I'll save it for a later discussion. It is also interesting that fewer people disable Google SafeBrowsing checks for malware than for phishing (browser.safebrowsing.enabled and browser.safebrowsing.malware.enabled). Presumably these are disabled for privacy or performance reasons. Are users who disable one and not the other making a mistake, or do these users consider themselves phish-proof but not drive-by-download-proof? If it is a mistake, why do we allow users to construct a set of preferences that are internally inconsistent in reasoning?

The privacy preferences tab is more complicated than the security preferences tab.
A whopping 11.3% of users enabled Do Not Track (privacy.donottrackheader.enabled). This is an astonishingly high number of users to enable an HTTP header that broadcasts user intent, but is unable to enforce anything client-side. It is a testament to DNT advocates that adoption is this high, but even though this preference is changed by a large minority of users, Firefox should not enable it by default.

A more modest number of users (1.2%) change the autocomplete settings of the location bar (browser.urlbar.autocomplete.enabled) to use a smaller subset of "History and Bookmarks", 1.5% of users have completely disabled browser history (places.history.enabled), and 2.75% of users use custom history settings (privacy.sanitize.sanitizeOnShutdown). The custom history settings are even more complicated:


An astute reader will have noticed that there are already two ways to autostart in private browsing mode: using "Never remember history" and checking "Always use private browsing mode" in the screenshot above which automatically disables browsing, download, search and form history. Approximately 5% of users always use private browsing mode (browser.privatebrowsing.autostart). 0.83% of users have modified their cookie settings to be something other than "Accept all cookies," including 7 hardcore users who don't accept cookies at all. One intriguing explanation is that more users are concerned with local attacks (shoulder-surfing or accidental disclosure on shared devices, by people that they know) than remote attacks.

It is troubling that there are two sets of parallel preferences named privacy.cpd (short for clear private data) and privacy.clearOnShutdown that control custom history settings. Both appear to be used in the same way in the code, but Firefox allows users to enter an internally inconsistent states by maintaining multiple sets of preferences for the same functionality.

Finally let's take a look at the "Encryption" settings under the Advanced preferences tab.
Unsurprisingly, users touch this panel least of all. 0.02% have disabled SSL 3.0 (security.enable_ssl3), 1.6% have disabled TLS 1.0 (security.enable_tls), and 1.0% of users have opted to automatically select a personal certificate (security.default_personal_cert). Is it really worth having a preference panel that benefits fewer than 2% of users overall?

Similarly with the Online Certificate Status Protocol preferences:

1.75% of users have disabled OCSP (security.OCSP.enabled), and 0.03% of users require it (security.OCSP.require). Without knowing anything about OCSP other than its name and a few recent stories about certificate authorities, a reasonable person might conclude that requiring OCSP by default is a good thing. I am guessing that enabling it by default breaks in some cases. However, with so few people requiring OCSP no one is likely to gather enough implementation experience to require OCSP by default for everyone.

For fun I also measured browser.default.searchenginename, the preference that controls which search engine to use when the string in the location bar is not a URL. This preference probably has the most impact on privacy besides the preferences exposed in the privacy panel. Interestingly, it is modified 43.7% of the time (from chrome://browser-region/locale/region.properties, which is how complex preferences get set). The topmost occurring replacements include Ask, Babylon, and AVG, all of which produce interesting search results when combined with "firefox search" on various search providers. One might question if actual user intent underlies changing such an inaccessible preference so frequently, and if not, what to do about it. That, too is a topic that deserves further discussion.

I'll close with a few open questions for the reader:
  • Is there a better way to choose default preferences when a large minority of users expresses a different opinion (e.g., DNT)?
  • Is it worth the engineering effort, UX effort, and screen real estate to make user-visible (to say nothing of discoverable) preferences if fewer than 2% of users benefit?
  • How can we use use this data to make Firefox better?
Thanks for reading!