Balancing user privacy and analytics
April 7, 2022

Balancing user privacy and analytics


KA Blog WeCountOnYou

by David Barnett

Analytics at Khan Academy

Offering a free, world-class training to anybody, wherever is a lofty purpose and one that each one of us at Khan Academy pursue with ardour. Attaining it means making an attempt to make one of the best selections for particular person learners throughout many international locations, areas, districts, and colleges. So as to do  the fitting factor (and perceive what that proper factor is) throughout so many demographics, we should perceive our successes and our failures. 

The important thing to understanding our successes and failures is information. Now we have designed our methods to supply our analysts with the instruments they want with out compromising our dedication to privateness. On this submit, I’ll undergo some examples of how privacy-protecting analytics could be achieved.

Overlook private information whereas analyzing consumer actions

There are various causes for a company to retailer private information. In Khan Academy’s case, you may want us to e mail you outcomes, inform you of recent options, or simply let you already know a couple of new task out of your trainer! Nevertheless, if you happen to resolve to sever your relationship with the positioning, we might need to shield your privateness by now not maintaining your private data round. 

It may be robust to stability privateness safety with the will to know what folks have been doing on our website in a extra normal sense. Happily, this isn’t an unsolvable drawback. Let’s check out an instance! (Observe: The info and schema under are fictional.)

id first_name last_name e mail state hours_used
1 Nadda Realname MI 100
2 Madus Allup CA 38
3 Think about Aryname think MI 15
4 Justin Mymind OH 68
5 Will Ibereal CA 103

By this information, we may be curious about studying some staple items about who’s utilizing our web site and the way:

  • What number of customers do we’ve got?
  • What number of customers do we’ve got per state?
  • What’s the complete variety of hours of utilization by state?
  • How do states rank by common variety of hours used?

After all, there are many different issues we may ask and reply, however one factor the questions above have in widespread is that (regardless of them being essential and fascinating) none of them require the analyst to know something private about any consumer. With out diving into SQL or different question strategies, we merely don’t want to make use of any of the columns containing personally identifiable data (PII) to reply any of our proposed questions. 

The easiest way to keep away from misuse of private information, whether or not intentional or unintentional, is to not give anybody (exterior or inner) pointless entry to that information within the first place.

Our strategy is to encrypt every consumer’s private information with an encryption key distinctive to that consumer in order that analysts can do their work with out compromising private data. These keys could be saved or locked down even additional to be accessible solely to some analysts and used solely when we have to talk straight with the consumer. 

Now the tables might look one thing like this:

id encryption_key
1 igwaordks
2 wiorjdfklv
3 fmnaasdnf
4 lkvjwekjsd
5 fhqwhgads
id first_name last_name e mail state hours_used
1 Ipymv9XvfAWC6OAOZ6SBjwRkcrB1MN24= yFaR17EO2luqxSP4CZEXjSOiUj1j4UeQ= kprB9exzIqFtwqTTa0VIqfc7DlwCW1ssQG4/o2fNCLsu2iVp5C3Si MI 100
2 8yaTJ1DSPglQwJPn7aKEz1rjjS2YbeUGo= 9vVhxLK4aQzhh+BxiiTgbrYbkKHDRo7RU= 6z1rn2zawbS5JomjVFPVvFi8iCn1hiTDuJmusLC4vc4ME+/3ddX88 CA 38
3 9LJfdVIjSoVITCYttPKoUB5GzQCet0n58= 6h+IJ6QXVIeSciFUPyvHoVsfjSUMEflk= 9XT2X2KnPWCNa7NStM73q3jtB2KJA3g1LzK3E4LZWP1V3nOdnVUHZf MI 15
4 viSyRXduVhun8fSgWqm1q6BA5h4haDPQ= FcMEsH1wRPRWfPJ4Y47tyQ4iUVq+4neE= AHbTTPAUMR3zs2leuxhDqr3ixuSwCFXSx0W52bm5EuJsc69NkzmvZ OH 68
5 iQ2KNmKoBFpCG2oJNfGiwSXMOMH1ke8U= 8rVZDpsKFDxyxrTKo41MXFMe48XIG30FU= vG/Q90GW24WLQFR0mzZQdGjIlDKRjcNBse6t00ewy6IEDidhpn4yA CA 103
Astute observers will discover that this information appears base64 encoded fairly than encrypted. That’s as a result of we’ve base64 encoded these values after encryption so as to make storage (to not point out show on this article) less complicated.

Solely individuals who have entry to the keys desk can have any thought how humorous the names I made up are. However, that’s okay as a result of it’s none of their enterprise. It is their enterprise to have the ability to reply questions on how individuals are utilizing the positioning, and so they can nonetheless reply all of the questions we listed above.

It’s widespread for a supplier to retailer consumer data in a number of locations, corresponding to distinct databases, backups, an object retailer, or a knowledge warehouse. Given the a number of places, it is smart to simplify the anonymization course of with a single management level. 

Nevertheless, if we at all times retailer consumer information encrypted and hold the decryption key in a single place, then we solely have to fret about deleting the consumer decryption key. As soon as the consumer’s secret is faraway from the keys desk, there is no such thing as a technique to get better the consumer information. And, since we don’t require any private data for analytics functions, we don’t lose the power to reply our normal, aggregated questions on their utilization.

This strategy has allowed us to respect our customers’ proper to privateness whereas nonetheless having the ability to present important data to our information analysts and nonprofit management.

We at Khan Academy love working with information! Are you interested by working with our information or any of our different instruments/groups? Our staff comes from all kinds of backgrounds, and we actively foster a cross-disciplinary setting as a result of we imagine that’s the place the magic occurs. Khan Academy presently employs round 200 full-time workers, together with the creators of our instructional content material, who come from instructing backgrounds. Learn more and explore open positions.


Source link
#Balancing #consumer #privateness #analytics