Wednesday, September 10, 2014

Making decisions with limited data

It is challenging but possible to make decisions with limited data. For example, take the rollout saga of public key pinning.

The first implementation of public key pinning included enforcing pinning on addons.mozilla.org. In retrospect, this was a bad decision because it broke the Addons Panel and generated pinning warnings 86% of the time. As it turns out, the pinset was missing some Verisign certificates used by services.addons.mozilla.org, and the pinning enforcement on addons.mozilla.org included subdomains. Having more data lets us avoid bad decisions.

To enable safer rollouts, we implemented a test mode for pinning. In test mode, pinning violations are counted but not enforced. With sufficient telemetry, it is possible to measure how badly sites would break without actually breaking the site.

Due to privacy restrictions in telemetry, we do not collect per-organization pinning violations except for Mozilla sites that are operationally critical to Firefox. This means that it is not possible to distinguish pinning violations for Google domains from Twitter domains, for example. I do not believe that collecting the aggregated number of pinning violations for sites on the Alexa top 10 list constitutes a privacy violation, but I look forward to the day when technologies such as RAPPOR make it easier to collect actionable data in a privacy-preserving way.

Fortunately for us, Chrome has already implemented pinning on many high-traffic sites. This is fantastic news, because it means we can import Chrome’s pin list in test mode with relatively high assurance that the pin list won’t break Firefox, since it is already in production in Chrome.

Given sufficient test mode telemetry, we can decide whether to enforce pins instead of just counting violations. If the pinning violation rate is sufficiently low, it is probably safe to promote the pinned domain from test mode to production mode.

Because the current implementation of pinning in Firefox relies on built-in static pinsets and we are unable to count violations per-pinset, it is important to track changes to the pinset file in the dashboard. Fortunately HighStock supports event markers which somewhat alleviates this problem, and David Keeler also contributed some tooltip code to roughly associate dates with Mercurial revisions. Armed with the timeseries of pinning violation rates, event markers for dates that we promoted organizations to production mode (or high-traffic organizations like Dropbox were added in test mode due to a new import from Chromium) we can see whether pinning is working or not.

Telemetry is useful for forensics, but in our case, it is not useful for catching problems as they occur. This limitation is due to several difficulties, which I hope will be overcome by more generalized, comprehensive SSL error-reporting and HPKP:
  • Because pinsets are static and built-in, there is sometimes a 24-hour lag between making a change to a pinset and reaching the next Nightly build.
  • Telemetry information is only sent back once per day, so we are looking at a 2-day delay between making a change and receiving any data back at all.
  • Telemetry dashboards (as accessible from telemetry.js and telemetry.mozilla.org) need about a day to aggregate, which adds another day.
  • Update uptake rates are slow. The median time to update Nightly is around 3 days, getting to 80% takes 10 days or longer.
Due to these latency issues, pinning violation rates take at least a week to stabilize. Thankfully, telemetry is on by default in all pre-release channels as of Firefox 31, which gives us a lot more confidence that the pinning violation rates are representative.

Despite all the caveats and limitations, using these simple tools we were able to successfully roll out pinning pretty much all sites that we’ve attempted (including AMO, our unlucky canary) as of Firefox 34 and look forward to expanding coverage.

Thanks for reading, and don’t forget to update your Nightly if you love Mozilla! :)

Tuesday, August 26, 2014

Firefox 32 supports Public Key Pinning

Public Key Pinning helps ensure that people are connecting to the sites they intend. Pinning allows site operators to specify which certificate authorities (CAs) issue valid certificates for them, rather than accepting any one of the hundreds of built-in root certificates that ship with Firefox. If any certificate in the verified certificate chain corresponds to one of the known good certificates, Firefox displays the lock icon as normal.

Pinning helps protect users from man-in-the-middle-attacks and rogue certificate authorities. When the root cert for a pinned site does not match one of the known good CAs, Firefox will reject the connection with a pinning error. This type of error can also occur if a CA mis-issues a certificate.

Pinning errors can be transient. For example, if a person is signing into WiFi, they may see an error like the one below when visiting a pinned site. The error should disappear if the person reloads after the WiFi access is setup.

Firefox 32 and above supports built-in pins, which means that the list of acceptable certificate authorities must be set at time of build for each pinned domain. Pinning is enforced by default. Sites may advertise their support for pinning with the Public Key Pinning Extension for HTTP, which we hope to implement soon. Pinned domains include addons.mozilla.org and Twitter in Firefox 32, and Google domains in Firefox 33, with more domains to come. That means that Firefox users can visit Mozilla, Twitter and Google domains more safely. For the full list of pinned domains and rollout status, please see the Public Key Pinning wiki.

Thanks to Camilo Viecco for the initial implementation and David Keeler for many reviews!

Wednesday, July 23, 2014

Download files more safely with Firefox 31


Did you know that the estimated cost of malware is hundreds of billions of dollars per year? Even without data loss or identity theft, the time and annoyance spent dealing with infected machines is a significant cost.

Firefox 31 offers improved malware detection. Firefox has integrated Google’s Safe Browsing API for detecting phishing and malware sites since Firefox 2. In 2012 Google expanded their malware detection to include downloaded files and made it available to other browsers. I am happy to report that improved malware detection has landed in Firefox 31, and will have expanded coverage in Firefox 32.

In preliminary testing, this feature cuts the amount of undetected malware by half. That’s a significant user benefit.

What happens when you download malware? Firefox checks URLs associated with the download against a local Safe Browsing blocklist. If the binary is signed, Firefox checks the verified signature against a local allowlist of known good publishers. If no match is found, Firefox 32 and later queries the Safe Browsing service with download metadata (NB: this happens only on Windows, because signature verification APIs to suppress remote lookups are only available on Windows). In case malware is detected, the Download Manager will block access to the downloaded file and remove it from disk, displaying an error in the Downloads Panel.

How can I turn this feature off? This feature respects the existing Safe Browsing preference for malware detection, so if you’ve already turned that off, there’s nothing further to do. Below is a screenshot of the new, beautiful in-content preferences (Preferences > Security) with all Safe Browsing integration turned off. I strongly recommend against turning off malware detection, but if you decide to do so, keep in mind that phishing detection also relies on Safe Browsing.

Many thanks to Gian-Carlo Pascutto and Paolo Amadini for reviews, and the Google Safe Browsing team for helping keep Firefox users safe and secure!

Wednesday, October 23, 2013

Cookie counting

Understanding cookies through user studies

A cookie is a small piece of data stored in the browser by websites. Although cookies are mostly invisible, they serve many purposes such as saving items in shopping carts, authenticating to websites, and displaying targeted ads or other personalized content. Understanding more about how websites use cookies allows us to write tools that manage cookies effectively.

In June 2013, the Mozilla User Research team ran a paid study of 573 Firefox users that included data on cookie and browsing events. The user population was census-balanced and included only US users. The study ran for a median of 18.8 days, during which time we observed 18.4 million attempts to set cookies by examining HTTP Set-Cookie headers. Each Set-Cookie header counts as a single event, even though it may contain multiple cookies. Storing multiple pieces of information across separate cookies or combining them into a single cookie are equally powerful. Set-Cookie headers are not the only method for setting cookies, but they are sufficiently prevalent to be representative. We did not observe read events due to volume constraints. We observed 2.84 million pages loaded, measured by counting tab-ready events.

N = 573 Tab-ready events Set-Cookie events Tab-ready events/day
Median 3552 12297 189
Total 2842270 12439439

Counting origins

Throughout this post we use top-level domains (from the Public Suffix list) plus one component to count origins. For example, we consider foo.example.com and bar.example.com to represent the same origin. The public suffix mechanism is not perfect, because a single organization may own many origins (e.g., doubleclick.net and google.com both belong to Google). In total, study users visited 40682 unique origins (counted by tab-ready events) and received set-cookie events from 32786 unique origins. Below is the distribution of cookie events per tab event.

Who uses cookies?

Building effective cookie management tools requires understanding who sets cookies. Cookie activity is difficult to characterize because sites vary highly in both the number of cookies they set and the amount of third-party content (which may set cookies on behalf of the third-party site) that they include. Although each page load event incurs on average around 3.6 Set-Cookie events, many sites incur an order of magnitude more.

The graph below shows the 20 origins responsible for the most set-cookie events. These origins represent 0.05% of unique cookie-setting origins and are responsible for 42.7% of set-cookie events seen in the study data. Set-cookie attempts are either first-party, where the origin of the cookie being set is the same as the one in the location bar, or third-party, where the origins don't match.

Who uses third-party cookies?

Third-party cookies have many purposes. For example, social widget implementations usually rely on third-party cookies to display personalized content, and inline ads rely on third-party cookies to provide targeted ads and perform frequency capping. Of the 12.4 million set-cookie events in the study, 50.4% are for third-party cookies (shown in red in the graph above).

The graph below shows the top 20 origins setting third-party cookies, responsible for 41.1% of third-party set-cookie events. adnxs.com belongs to AppNexus, an ad exchange. Facebook sets mostly first-party cookies, but because Facebook's social widgets are included on many sites, Facebook sets many third-party cookies (which may have originally been created in a first-party context). Of the top 20 origins, 18 primarily offer advertising services.

It is interesting to compare this data to Table IV from Eubank et al.'s survey on third-party cookies. In the Eubank survey, the authors used simulated data from crawling Alexa's top 500 websites, included all types of third-party embedded data, and did not canonicalize domains using the public suffix list. Even though the methodology is different, many origins in the top 20 overlap.

How many third-party cookies are from origins the user knows?

One interesting question is whether or not users intentionally accept cookies, especially in the case of third-party cookies. We examine two possible heuristics for estimating whether a user interaction with an origin is intentional:
  1. The user has already accepted cookies from the origin (pre-existing cookie condition)
  2. The user has visited the origin by entering it into the location bar (simulated history condition)
Both of these conditions rely on previous interactions. Any potential changes to the way browsers handle third-party cookies must consider what to do with previous interactions (in this case, existing cookies and location bar history).

Pre-existing cookie condition

We did not ask study participants to clear cookies before beginning the study. Of the third-party set-cookie events, 90.8% of them were sent to users who had already accepted cookies from that origin. The graph below shows this percentage for the top 20 origins that set third-party cookies. In this graph, nearly all origins are above 75% with the exception of doubleclick.net. This dip can be explained by a handful of users who have a particular security addon installed.

Simulated history condition

Another heuristic for evaluating if a user has interacted with a site is whether that origin has appeared in the location bar, as measured by tab-ready events. This lets us count third-party origins that have previously appeared in a first-party context.

For each user, we take the entire set of origins extracted from tab-ready events to simulate that user’s history, then count whether the origins in the Set-Cookie events appear in the simulated history. The graph below shows this percentage for the top 20 origins of third-party cookies.


Overall, 19.6% of third-party cookie events came from origins in users’ simulated history in the course of the study. Not surprisingly, nearly all users had visited facebook.com and youtube.com, which are currently ranked 2nd and 3rd most visited sites according to Alexa. Interestingly, adnxs.com also appeared much of the time in simulated histories, even though the rank of adnxs.com is currently 576 in the US according to Alexa. From looking through tab-ready events, adnxs.com appeared in redirects and popups.

How long do cookies live?

The Set-Cookie HTTP header has an optional expiration time that tells the browser how long to keep the cookie. From the graph below, many cookies are long-lived, possibly longer-lived than the installation of the operating system or browser. 20% of third-party cookie expiration times were one week or less, and 51% of third-party cookie expiration times were longer than 6 months.

What's next?

Data from real users is crucial to understanding how websites use cookies and therefore what kind of technical solutions to cookie management make sense (or if indeed we should be concentrating on cookies at all). We hope that this is just the start of using data to shape our technologies and policies. Please join dev-privacy to continue the discussion.

Many thanks to Gregg Lind for deploying the study and to Jonathan Mayer, Alex Fowler, John Jensen, and Chris Karlof for reviewing this post.

Wednesday, July 17, 2013

Be who you want, when you want

Back in May I presented a paper on contextual identity, coauthored with Sid Stamm, at the Web 2.0 Security and Privacy workshop. Contextual identity is the notion that people choose how to present themselves depending on context, such as their audience or location. In contrast, external forces (such as naming policies imposed by social networks) promote the idea of having all your identities in one big identity. Although this is often convenient or desired, conflating all your identities can lead to serious privacy violations.

The desire for spontaneous, positive human interaction often requires sharing personal information, and sharing information doesn't negate the need or desire for privacy. We still have far to go when it comes to understanding how typical users think about privacy, publicity and identity, though I am delighted that the Mozilla User Research team has recently made inroads into understanding user data types.

For your amusement, my talk slides are below. Be on the lookout for Snoop Lion and the popemobile!

Thursday, May 30, 2013

Blushproof

Personal embarrassment, Firefox, and you.

Have you ever been personally embarrassed because your friends or housemates found out something about your browsing history? Even sites that you visit by accident stick around, in the form of your browser history and other local storage. Adult sites are not the only kind of site that have the potential to embarrass, either: it could be something as simple as not wanting your coworkers (or anyone else who has the opportunity to look over your shoulder) to know that you enjoy making ikebana arrangements to blow off steam, or are a huge fan of Hannah Montana.
This domain is now parked, btw

Avoiding personal embarrassment.

Life is full of surprises, some of which are terrifying rather than delightful. Fortunately Firefox has the tools to help, starting with Private Browsing Mode. Private Browsing Mode is intended to be a temporary mode that erases data that accrues while you're in it.
Firefox also has a feature to forget a single website, which will erase data associated with that site. Note that reaching "Forget About This Site" requires multiple steps (Go to the Firefox menu, select "History", then "Show all history", then navigate to the site in the Library window that you'd like to forget, then right-click, then select "Forget About This Site.")


What if you forget to forget?

Like any other feature, both Private Browsing Mode and Forget This Site are easy to misuse. A person might forget to use Private Browsing mode when visiting a potentially embarrassing site, forget to clear history, or not know these features exist in the first place!

Try out Blushproof!

Fortunately, there's a better solution, Blushproof! This is joint work with David Keeler, who is also responsible for implementing recent advances in Click-to-Play and HSTS. The source code is freely available on Github, and you can install it by visiting the Blushproof add-on page.



How it works

Blushproof helps both preventing mistakes and recovering from mistakes that might cause you to blush! Blushproof comes with a blushlist of potentially embarrassing sites and search terms, and prompts you to enter Private Browsing Mode when visiting one of those sites.

A potentially embarrassing search term


It also lets you forget about sites more easily, and add them to your own personal blushlist.


For more information about how Blushproof works, please visit the wiki.

Many thanks to Gregg Lind for the name, initial prototype, and ideation, and to Zach Carter for the awesome logo. Please give it a whirl, and let us know if you find any issues!

Wednesday, March 6, 2013

Can't live with them, can't live without them

Passwords have been around for approximately forever, and despised for nearly that long. However, while great strides have been made in improving password-based authentication, these improvements are not a panacea, often come with maintenance costs of their own, and sometimes even serve as additional attack vectors. While we should keep striving to improve authentication, it is also important to recognize that passwords are not going away any time soon, to understand the drawbacks of existing password solutions, and to try to improve them.

Many of the best practices for passwords (prohibiting reuse, requiring unguessable passwords, being able to remember passwords) seem impossible without a password manager. Firefox has implemented a password manager since inception. The built-in password manager detects the presence of login form and prompts the user to store the password via a notification.
We use data from the same Test Pilot study as in the last post, this time focusing on password statistics. Approximately 5.5% of users have disabled the password manager, which is enabled by default. However, are the remaining 94.5% of users actually using the password manager with intent?

To answer this question, let's first examine the number of users who have stored at least one password in the password manager (as obtained by querying nsILoginManager for all logins). The graph below shows the distribution of the the number of passwords stored in password manager for users who have no more than 30 passwords. This graph represents 96% of the nearly 12K users in the study.
The graph above shows 73.4% of users store at least one password in the password manager, but it's not clear at all that this is not accidental use: after all, 13.9% of users store only a single password, and it is doubtful that a password manager is necessary or beneficial in the single password case. We can also take a look at the distribution of the number of sites stored in the password manager, for users who have no more than 30 sites stored. 


Interestingly, the site distribution has a slightly longer tail than the password distribution, so this graph represents only 89% of users. The shape of the graph is very similar, however, and lends credence to the hypothesis that much of the information stored in the password manager represents accidental use, if we believe that the password manager is not beneficial in the case of a single site.

Because this study did not collect how frequently the password manager triggered on login forms, we can't definitively conclude that users storing only one password represents accidental use. Alternative explanations, ranked in order of increasing possibility according to my personal prejudice:
  1. I only use this browser for work and I don't care about my work password.
  2. I have a secure, memorizable password scheme but can't remember the requirements for this one site.
  3. I only have one main password but it doesn't meet this one site's requirements.
Does this data hint at anything interesting about password reuse? Let's examine mean number of passwords stored versus the number of sites.
This graph represents 97.5% of users and omits 43 outliers who have more than 100 passwords stored. The vertical error bars represent the standard deviation from the mean. This graph falls far south of x=y, the ideal case of storing one password per site. So we can conclude that even while using a password manager, people still reuse passwords across sites.

This level of reuse is not necessarily due to user choice. For example, subdomains on the same intranet frequently require the same password, due to LDAP linkage. This in itself is not a security problem if the security guarantees are identical across subdomains. However, it is a problem when those intranets outsource services to outside vendors through LDAP, requiring password reuse at external parties. Note to future study authors: please include counts for effective TLDs in addition to domains in order to account for this case.

In summary, it seems that even though 94.5% percent of people have the password manager enabled, far fewer users gain any benefit from the password manager. Over the years I have heard the following arguments against using password managers:
  • I only use one password so I don't need one.
  • They don't work across all my devices.
  • They don't work across all my browsers.
  • I don't trust local password managers against local attacks.
  • I don't trust cloud password managers because I don't trust third parties.
The first argument is especially egregious, considering the combined forces of account hijacking, phishing, and password database hacks. The second two arguments can't be solved with a local password manager, or even a browser-specific password manager. The fourth argument can be solved somewhat with master password, but only 1 out of 12K users had master password enabled (security.ask_for_password in about:config), so either that feature is undiscoverable, unusable, or regarded as too insecure to be effective. It is clear from the data that not enough people take advantage of password managers. I look forward to further progress from the identity team to solve some of these issues.

Many thanks to Paul Sawaya and Tanvi Vyas for advice on this post, and to Paul for writing the code to capture password manager statistics.