It turns out people have researched how to do algorithmic recommendations without users having to reveal their personal preferences, and I am intrigued. Apparently, in principle we could have the good parts of, say, Netflix suggesting more things you might want to watch, without exposing ourselves to entities like Facebook selling all our data.
See "Distributed Differential Privacy and Applications" by Narayan, for example. (Also that's the first CC-BY licensed PhD thesis I've seen!)
@b_cavello Okay, I've now skimmed the Leaking in Data Mining paper and watched Octavio Good's talk. They were both interesting and I learned things, but I'm not yet seeing how either one is related to either deidentification or differential privacy. Could you explain more?
At this point I'm nervous about any deidentification technique that doesn't have a differential privacy proof. There have been too many successful reidentification attacks; this feels like "don't roll your own crypto" again.
@b_cavello I still don't see the relation, but I agree that the use of adversarial networks to limit over-training was a really interesting part of that talk. I've seen stuff before about trying to remove bias from word2vec embeddings so that for example "doctor" doesn't get associated to "man" and "nurse" doesn't get associated to "woman", and I could imagine using the GAN approach to try to tackle that kind of problem too.
The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!