It turns out people have researched how to do algorithmic recommendations without users having to reveal their personal preferences, and I am intrigued. Apparently, in principle we could have the good parts of, say, Netflix suggesting more things you might want to watch, without exposing ourselves to entities like Facebook selling all our data.
See "Distributed Differential Privacy and Applications" by Narayan, for example. (Also that's the first CC-BY licensed PhD thesis I've seen!)
@cstanhope Yup, that's the one! I haven't really gotten past its related-work section yet because I got sidetracked by reading one of the papers it references, so it's also good as a bit of a survey paper.
@b_cavello Okay, I've now skimmed the Leaking in Data Mining paper and watched Octavio Good's talk. They were both interesting and I learned things, but I'm not yet seeing how either one is related to either deidentification or differential privacy. Could you explain more?
At this point I'm nervous about any deidentification technique that doesn't have a differential privacy proof. There have been too many successful reidentification attacks; this feels like "don't roll your own crypto" again.
@jamey I don't think they're directly linked, exactly, but related in goal. The idea of training systems to ignore particular data to me seems hopeful for developing less biased models.
@b_cavello I still don't see the relation, but I agree that the use of adversarial networks to limit over-training was a really interesting part of that talk. I've seen stuff before about trying to remove bias from word2vec embeddings so that for example "doctor" doesn't get associated to "man" and "nurse" doesn't get associated to "woman", and I could imagine using the GAN approach to try to tackle that kind of problem too.
@jamey Definitely excited to read the paper, but my inutition has long told me that MinHash combined with an anonymising multicast network could be used for this purpose.
@alcinnz I wasn't familiar with MinHash, but after skimming Wikipedia, I think I see what you mean. Are you suggesting multicast so all participants see all hashes and can compute the result independently, or what? How would that go together?
@jamey I'm thinking of using minhash to select a "clique" of similar peers and using the anonymising network to send recommendations within that clique.
@alcinnz Oh! So maybe your hash becomes kind of like your network/group address? That seems neat.
Do you have a reference for this kind of multicast anonymising network? I'm not sure how that would work.
@alcinnz Ah, yes, I keep looking at the IPFS PubSub stuff because it sure seems like that ought to be useful for something. 😅
The design goals and implementation details aren't well documented as far as I've found, but superficially it doesn't look like it provides any anonymity at present. At the least it looks like you can tell who the peers are in a group you've joined?
@jamey My understanding is that at the moment they use simple flood networking. So if you receive a message from someone you don't know if you're actually in the group or simply forwarding a message from it.
@alcinnz The way I interpreted the little bit they wrote about how it works, nodes only forward "floodsub" messages for groups they're actually in, so you can effectively only join groups that at least one of your peers participates in too. But I dunno!
@jamey Rereading it again, I think you're right.
If so, that could cause some flakiness with the recommendations engine I described. Oh, well. I'll figure it all out later.
A Mastodon instance for cats, the people who love them, and kindness in general. We strive to be a radically inclusive safe space. By creating an account, you agree to follow our CoC.