When it comes to websites, we have ever more sophisticated techniques at our disposal to block the ads that sometimes track our wanderings around the internet. But most of us spend much of our time these days in mobile apps that offer no transparency on how we’re being tracked or sold–nor tools for blocking that behavior.
We must rely on operating system makers–primarily Apple and Google–to promulgate guidelines to developers on legitimate practices when it comes to tracking behavior, asking for personal information, and transferring data to remote servers. OS makers are also responsible for enforcing those requirements. The rules in place are very broad, and except for abuses that can be quickly checked by in-house reviewers, come into play most often when users and researchers report violations.
Apple’s rules, for instance, require that apps must obtain someone’s permission before transmitting personal data, and have to describe how and where the data will be used. Apple doesn’t police these rules by performing network communication interceptions or demanding to audit remote databases. (The company declined an interview for this story.)
“When applications ask for permissions, that is not really done in a manageable way,” says Franziska Roesner, an assistant professor in computer science and engineering at the University of Washington, who researches computer security and privacy. “iOS doesn’t know necessarily whether it’s reasonable for an application to use your location, and that’s why they ask the user,” she says.
Apple has to rely on a developer’s disclosure as to what’s being done with that location data. Some of Roesner’s work tries to match up an app’s purpose and interface elements with the kind of permission being asked, to make sure a request isn’t misused.
Many developers embed functionality in the form of third-party analytics packages and ad-technology code, which may associate seemingly innocuous user details with information collected from other sources. Thus, even if the data sent from an app seems benign in isolation, it might uniquely identify a user or be used for purposes that the developer is unaware of. Developers typically haven’t audited this code and couldn’t tell you in detail what it does.
A recent case study was the app Meitu, made by a firm of the same name, which applies anime-like styling to people’s facial photos. The free app was available in China for years, but an English-language version went viral. When security researchers examined the software’s innards, they found that it was laden with analytics and ad packages, only some of which were linked to working code, and that it asked for extensive permissions in Android and iOS.
Meitu told me at the time of the kerfuffle that it included certain geolocation and app-checking code to comply with advertising network requirements in China, where jailbroken devices can be used to defraud advertisers, and advertisers may demand that their messages be geofenced to appear only in certain regions. Apple confirmed that the app was and remains in compliance.
In the U.S., the Federal Trade Commission can’t intervene on behalf of consumers unless there’s a suspicion that a company has either broken a law regarding information privacy, including COPPA (the Children’s Online Privacy Protection Rule), or that a company has made a representation about what it does and lied. The FTC’s site has a page on its lawsuits and results on data privacy, including ones related to apps.
So what’s a user to do? Academics are on it. Two complementary efforts, which are in the process of cooperating further, will turn more control over to those with mobile devices to monitor app connections, helping to expose bad actors and poorly designed private data security transfers, and allow scrubbing private information or blocking it altogether from being sent.
LISTENING IN ON YOUR BEHALF
A team led by Northeastern University’s Dave Choffnes, an assistant professor in its College of Computer and Information Science, developed ReCon, a sort of virtual private network (VPN) for personally identifiable data (PII in the field’s jargon). Unlike a regular VPN, which protects data in a secure tunnel between a user’s device and a data center or corporate server to prevent snoopers, ReCon also uses the VPN connection to act as a scanning proxy to examine all the data passing between your smartphone and the rest of the internet. It works by installing a network profile in iOS or Android, just like regular VPN services.
ReCon can fully examine the contents of the unencrypted connections, which would also be in the clear for anyone on a public Wi-Fi network or other points of network interception when a VPN isn’t in use.
Choffnes and his colleagues found some surprising practices. For instance, he says, GrubHub unintentionally sent user passwords to Crashlytics, a Google-owned firm that helps developers pinpoint code failures. When informed, GrubHub revised its code and had Crashlytics delete all the associated data that contained passwords.
The group extracts data from app communications, and tries to determine what parts of it are PII. This is both harder and easier than it might sound. Most data is sent in a structured way, using an API and often in the standard JSON format, which groups data into a label (the “key”) and its associated value. But the team also applies machine learning, allowing it to identify PII more broadly, even when it appears without using any standard structure format, or shows up in surprising places.
The ReCon project publishes some data derived from a few hundred early users, listing apps, the kind of data they passed, a severity score, whether a developer was notified, and when misbehavior was fixed (if indeed it was).
For those who have installed the app, ReCon has a web-based console that allows users to block or modify information that’s sent. For instance, a user can block all examples of a given kind of PII, or block all location data sent from a given app. However, because some apps fail without location coordinates, the team is looking into coarsening GPS information instead of blocking it entirely. An app’s backend still gets relevant information, “but other parties aren’t able to pin down where you are to a few meters,” Choffnes notes.
Of course, examining a flow of data from users itself raises massive privacy red flags, which is part of the evolution of ReCon. Its creators don’t ask for passwords, try to avoid storing the values sent, and check only to see whether, say, a password is obviously being passed without encryption. The group ultimately wants to perform distributed machine learning without users disclosing private or secret information, such as domains they’re visiting.
BEFORE IT EVER GETS OFF THE PHONE
The Haystack Project, a collaboration at the International Computer Science Institute (ICSI) at the University of California, Berkeley, among multiple academic institutions, starts with an Android app that captures data right at the source. (It’s not yet available for iOS.)
Like ReCon, Haystack’s Lumen Privacy Monitor app acts like a VPN, but it shunts data internally, rather than sending it off the device for analysis. Because it’s in user control, the app can be given permission to intercept https connections and analyze everything sent between apps and servers. ReCon, sitting outside the device and the network, can’t, although it can identify a connection went to a particular destination, the rough payload size, and the frequency of communication.
Narseo Vallina-Rodriguez, a researcher at ICSI, says that the fact that Lumen doesn’t send data off the device means that its developers have to be more careful about bogging down a smartphone with processing tasks. At the moment, the tool measures and reports what apps are doing, though it could offer blocking controls in the future.
The app reports back fully anonymized pieces of information, allowing researchers to understand the kind of personal information that’s being extracted and transmitted. “We’re seeing tons of things like some applications linking the MAC address of the Wi-Fi access point as a proxy for location,” says Vallina-Rodriguez. (A base station’s MAC address identifies it uniquely, and is used by Wi-Fi location databases run by Apple, Google, and other firms.)
While certain kinds of personal data requires an app to trigger Android to ask for user permission, “we have found applications and third-party services that are somehow using inside channels without user awareness,” says Vallina-Rodriguez. He notes that an Android file that contains a variety of system-information details, like buffer size, may also have unique network identifiers, including an IP address, and that’s being sent without user consent.
Fortunately, the two projects have both a friendly competition and plans for collaboration. The efforts will likely remain separate, but incorporate aspects or associate data to get a bigger picture about app behavior.
And the ReCon team would like to develop software for a network appliance, a Raspberry Pi that would act as a sniffer or proxy or firmware for a network router, to let someone see the interactions across all network devices—especially Internet of Things equipment, which have all sorts of privacy and security issues of their own.
Both ReCon and Lumen are working on obtaining more funding to improve the projects and make them viable for a large-scale consumer rollout.
POLICIES INSTEAD OF GOBBLEDYGOOK
As informative as RecCon and Lumen are, what apps are doing with our data remains an impenetrable subject. Many privacy experts and researchers point to the use of dense legal documents to define disclosure without those being linked to verifiable discrete elements that software (or humans) could check. The privacy disclosures are nearly impossible for typical users to parse, and even lawyers trained in the field might need to devote hours of effort to confirming whether they’re being followed.
Choffnes notes, “The privacy policies, which is what they’re claiming they’ll do, tend to be written in a very broad way,” which gives them wiggle room to avoid running afoul of FTC deceptive business-practice regulations. The more vague they are, the less chance they’ve failed at disclosing information they grab.
“As long as you disclose, almost anything goes,” says Stacey Gray, the policy counsel at the Future of Privacy Forum, a group that pulls from industry, consumer advocates, and other stakeholders. “If you’re not being deceptive in your policy, you can do almost anything.”
Where matters become especially murky, she says, is where data usage is “unexpected, inappropriate, or sensitive.” A restaurant-finding app might ask for location to give you recommendations around you. But if it’s also selling your location as a revenue stream without disclosing so, “that’s unexpected or inappropriate.” You wouldn’t give up that right intentionally, but it could be easily hidden in a miasma of legal terminology.
Gray also points out unintended consequences, where the app maker and a third-party ad tech network could both act within reasonable terms, but an unrelated party could violate privacy. She cites a situation in May 2016 in which a company claimed to be able to use advertising targeting to find women in the vicinity of Planned Parenthood clinics and serve them ads about anti-abortion religious counseling services. That action is possibly legal, but certainly not desirable by the users, ad networks, or publishers involved. (The service’s operator said he could place ads on Facebook pages; Facebook stated it could find no record of him or its ads.)
The same conflicts that have driven the ad-blocking wars make it unlikely that the business models behind mobile apps will provide more transparency, making the research behind ReCon and Lumen all the more important. As Choffnes explains, “Most of the advances in this area are coming from academia; these are things that are a clear public good and would not come out of the business community, because their incentives are aligned to promote this behavior instead of consumer privacy.”