We can all give an example of a time where Google has spookily known exactly what you’re looking for before you’ve searched it, sometimes before you’ve even realised that you want it. I was talking to a friend about Jesus College, Cambridge and the next day I was suggested loads of Christian memes on Instagram… This was funny at the time but also shows that Facebook’s (who own Instagram) ad data does include ‘sensitive’ information (we class ‘sensitive’ under Europe’s GDPR laws as being linked to a person’s politics, religion, race or sexuality) which can be dangerous when considering other minority groups.
In Saudi Arabia, where homosexuality can be punished with death, 540 000 people were tagged as having ‘an interest in homosexuality’ and this quite polemic. On the one hand, Facebook never claims that someone is gay but rather that they are interested in it. For example, me being interested in dogs does not mean that I have one. Consequently, they argue they aren’t putting anyone at risk. However, Ed Boal at Stephenson Law in Bristol says that this technical difference that Facebook is highlighting is actually of very little significance. He says ‘Facebook is in the wrong for sure as far as EU data protection law is concerned’.
Imagine we have a magic formula in the form y = ax1 + bx2 + cx3… where y is how much you like an advert (think your rating out of 10, for example) and xi are different tags. For example, x1 refers to how related this advert is to dogs, x2 is how related it is to cats. There can be thousands of x terms, and, in the article, it says that 2000 of these x terms are to do with ‘sensitive’ information, and one of them is ‘interested in homosexuality’. Now, the coefficients (a,b,c…) vary depending on the user. They represent the importance of each feature to the user.
You can imagine that these coefficients will not be the same. After all, the link between there being dog related content and me liking it may be weaker than the link between the holidays content and me liking it. For example, for me I would have a high value of a but a low value of b because I care about whether there are dogs in the film, but don’t particularly mind whether there are cats. Now, there’s an interesting algorithm that I’d urge you all to look into which finds the optimum value of these coefficients for each user. It does this by looking at how much you liked previous films with certain values of x1, x2 …This could be in the form of a list of adverts we’ve clicked on or even what percentage of a film you watched.
The code for this algorithm is as follows for those of you who are curious!
X = [ones(m, 1), data(:,1)]; % Add a column of ones to x theta = zeros(2, 1); % initialise fitting parameters % Some gradient descent settings iterations = 1500; alpha = 0.01; J = 1 / (2 * m) * sum(((X * theta) - y) .^ 2); temp_theta0 = theta(1)-alpha*(1/m)*sum(X*theta-y); temp_theta1 = theta(2)-alpha*(1/m)*sum((X*theta-y).*X(:,2)); theta(1) = temp_theta0; theta(2) = temp_theta1;
Recommender system algorithms are nothing new and the threat that data can pose on our privacy and security is a widespread fear. The jury is still out as to how we can reconcile the powerful private sector in technology, with unfathomable quantities of data. How can we maintain privacy? Are there limits on the spheres in which technology firms can operate? But one thing is for sure: we can no longer sacrifice the wellbeing of the user.