Last year we published a SHERPA report that, amongst other things, explained how AI systems can be attacked. Our report elaborated on a number of known attack methods, including fooling a model into incorrectly classifying an input, obtaining information about the data that was used to train a model, and model poisoning. Of these attacks, model poisoning stands out as the most serious, and that’s what this article will focus on.
Machine learning models shape reality
Online-trained machine learning models are deployed in many services that we use on a daily basis. Those that you’re probably most familiar with are recommenders – machine learning models that recommend actions based on how you interact with a system. An example would be YouTube.
YouTube recommends videos to a viewer based on what’s popular in their region, or what videos the user has previously watched. As more videos are watched, the system will more accurately recommend content that was viewed, liked, or subscribed to by others who’ve interacted with similar content. If a user is logged into the site, underlying mechanisms will gather more and more data, over time, in order to fine-tune recommendations, and suggest new content to the user when they visit the site in the future. The same logic applies to other video platforms such as Netflix, and music streaming services such as Spotify.
Online shopping services such as Amazon recommend products in a similar way. When a user views an item, items that other users browsed or purchased, after having viewed or purchased that same item are recommended. As a user shops more, the system fine-tunes recommendations to match the user’s shopping habits with the shopping habits of similar shoppers.
Social networks such as Facebook, Twitter, Instagram, and TikTok are almost completely based on recommenders. Every action on a social network, be it a post, view, like, retweet, or follow creates input for recommendation algorithms that suggest other accounts to follow, send notifications about recent activity, and even determine what content a user sees on their timeline.
Online-trained models are also used in other less-visible systems such as credit card fraud prevention, network intrusion detection, spam filtering, medical diagnostics, and fault detection. They may also be found in online review sites (like TripAdvisor) and chat bots, like Microsoft’s Tay, that was quickly manipulated by Twitter users to become racist.
What all of these services have in common is how they work under the hood. They’re powered by machine learning models that are continually updated or trained on new data created by the users of that service. Every like, retweet, hashtag, or follow on Twitter serves as input for a subsequent model update. Every video viewed, liked, or commented, and every channel subscribed-to on YouTube provides input to retrain a model. And so on. Details of how these models work and how they’re trained differ between each service and piece of functionality, but the underlying mechanisms are always somewhat similar.
If recommendations can be manipulated, so can we
Recommendation algorithms and other similar systems can be easily manipulated by performing actions that pollute the input to the next model update. For instance, if you want to attack an online shopping site to recommend product B to a shopper who viewed or purchased product A, all you have to do is view A and then B multiple times (or add both A and B to a wish list or shopping basket). If you want a hashtag to trend on a social network, simply post and/or retweet that hashtag a great deal. If you want that new fake political account to get noticed, simply have a bunch of other fake accounts follow it and continually engage with its content.
During the run up to the 2019 UK general elections, disinformation about an incident in a hospital in Leeds was spread by a coordinated group of fake accounts posting replies to prominent journalists and politicians on Twitter. This reply-spam is known to trick people (into believing it, engaging with it, etc.), especially if they’re busy or in a panic. If a high-profile Twitter account engages with disinformation such as this, it is given legitimacy. Right-wing Twitter groups subsequently picked up on what was happening and continued to post similar messages from their own accounts (and sockpuppets), thus amplifying the disinformation further (and potentially hiding the original malicious actions). This sort of activity is extremely common on social networks.
How algorithms are manipulated
Recommendation algorithms can be attacked in a variety of ways, depending on the motive of the attacker. Adversaries can use promotion attacks to trick a recommender system into promoting a product, piece of content, or user to as many people as possible. Conversely, they can perform demotion attacks in order to cause a product, piece of content, or user to be promoted less than it should. Algorithmic manipulation can also be used for social engineering purposes. In theory, if an adversary has knowledge about how a specific user has interacted with a system, an attack can be crafted to target that user with a recommendation such as a YouTube video, malicious app, or imposter account to follow. As such, algorithmic manipulation can be used for a variety of purposes including disinformation, phishing, scams, altering of public opinion, promotion of unwanted content, and discrediting individuals or brands. You can even pay someone to manipulate Google’s search autocomplete functionality.
Attacks against recommendation algorithms are often conducted via automation or large-scale coordination. Since a fairly large amount of input data if often required to alter the target model in a meaningful way, fake accounts may be used to perform such an attack. Note, however, that systems that employ online-trained models often contain safeguards designed to prevent common poisoning attacks. As such, adversaries must first probe the system’s detection capabilities before launching a real attack. This can be done with throw-away accounts. Once a system’s automated detection capabilities are understood, the adversary can freely craft multiple fake accounts that look and behave like normal users.
Poisoning attack detection methods are often designed to notice sudden, large changes in their input data. In order to evade detection, attackers can use a “boiling frog” strategy of slowly feeding corrupted data into a model over a period of time.
AI poisoning attacks aren’t theoretical, they’re an industry
Numerous attacks are already being performed against recommenders, search engines, and other similar online services. In fact, an entire industry exists to support these attacks. With a simple web search, it is possible to find inexpensive purchasable services to manipulate app store ratings, post fake restaurant reviews, post comments on websites, inflate online polls, boost engagement of content or accounts on social networks, and much more. The prevalence and low cost of these services indicates that they are widely used.
The fact that a model has been poisoned often goes unnoticed. When an online shop starts recommending product B alongside product A, it’s highly unlikely anyone will notice. Shoppers may consider it odd that product B was recommended to them, and just move on. By the way, attacks against online shopping algorithms are not just theory – Amazon’s recommendation algorithm has been manipulated to recommend anti-vaccination literature alongside medical publications and in medical categories, spread white supremacy, anti-semitism, and islamophobia, aid a 4-chan troll campaign, and recommend QAnon-related materials. It was also recently discovered that people were using creative naming schemes to bypass Amazon’s detection logic in order to sell Boogaloo-related merchandise. This last example, while not an algorithm manipulation attack, is still interesting, since it demonstrates how adversaries are able to defeat measures put in place to prevent specific types of content from being published, or in this case, sold.
Why it’s so hard to fix a poisoned model
If the owner of a system does notice that something is wrong, and suspects their model has been attacked, how do they go about fixing it? In a majority of cases, the process involved is non-trivial. If the owner of an online shop notices that their site has started recommending product B alongside product A, and they’re suspicious that they’ve been the victim of an attack, the first thing they’d need to do is look through historical data to determine why the model started making this recommendation. To do this, they’d need to gather all instances of product B being viewed, liked, or purchased alongside product A. Then they’d need to determine whether the users that generated those interactions look like real users or fake users – something that is probably extremely difficult to do if the attacker knows how to make their fake accounts look and behave like real people. If they were able to conclude that this was an attack, they’d then want to fix their model.
Fixing a poisoned model, in most cases, involves retraining. You take an old version of the model (from before the day when the attack started), and train it against all accumulated data between that past date and the present day, but with the malicious entries removed. You then deploy the fixed model into production and resume business. Note that both cleaning the poisoned input data and retraining the model can take quite some time. If at some point in the future you discover a new attack, you’ll need to perform the same steps over again. Social networks and other large online sites are under attack on numerous fronts, on an almost constant basis. It would be infeasible for them to constantly retrain their models on cleaned data as new attacks are discovered. And so they don’t. Even in the simplest of cases, fixing a poisoned model is an unrealistic proposition, and thus poisoned models are simply left as they are.
‘Fake’ manipulation can become ‘real’
It is worth noting that, in the case of social networks, algorithm poisoning can happen via both “fake” activity and actual organic activity. Coordinated actors amplify a piece of content just enough to be picked up by real users, who then further amplify it. Sophisticated actors (such as nation states) ride on top of the efforts of established manipulators in order to push their own agenda – this is known to have happened during high-profile political events of the past few years. Large-scale manipulation is being used right now to spread of disinformation about corona virus, disseminate fake stories about protest incidents, and to recruit people into extremist groups.
As an example, on July 28th 2020 (while I was writing this post), an anti-mask, anti-vax disinformation campaign was being amplified on Facebook.
This push on Facebook caused a similar wave of engagement on Twitter, causing the top two trends in the United States to look like the screenshot below. Note that the top trend was likely hard-coded in an attempt to counter the disinformation that was spreading – it persisted for many hours after the second trend had subsided.
Social networks reacted by removing links to the story, but not before it had made the rounds. As such, many people will have viewed it and understood it to be true – they wouldn’t notice the fact that it was debunked and removed from social media later on. This is how many fake stories and conspiracy theories are spread, and this is why groups like QAnon are so prevalent right now. Reactively deny-listing content is a bandaid-fix for a problem that really should have been solved at a deeper level – the mechanisms used to amplify content in this manner need to be understood and addressed so that campaigns such as these can be blocked before they have an actual effect on the content people see.
Stop the poisoning before it starts
Given that fixing an already poisoned model is, in most cases, infeasible, the logical course is thus to detect attacks as they happen. If an attack can be detected, poisoned inputs can be discarded before the next model update, thus retaining the model’s integrity. A variety of proposed mechanisms and practices for detecting and preventing model poisoning attacks exist. These include rate-limiting, regression testing, input validity checks, manual moderation, and various anomaly detection and statistical methodologies. I wrote about some of these in our SHERPA report. We’ll describe some others in an upcoming blog post. Despite this, detecting algorithmic manipulation is still a very difficult task.
One way to approach the problem is to develop and understand attack mechanisms and how they affect model input data, and the model itself. Once an attack is understood, it should be possible to develop defences against it. As part of the SHERPA project, we are studying attacks against online distributed machine learning models. We’re simulating real attacks against anomaly detection systems in order to study how they work and how effective they are. We will use this data to develop mechanisms to detect those attacks. This work is currently ongoing, and we hope to publish our first results in the near future, so stay tuned!
When considering social networks, detecting poisoning attacks is only part of the problem that needs to be solved. In order to detect that users of a system are intentionally creating bad training data, a way of identifying accounts that are fake or specifically coordinating to manipulate the platform is also required. There are also issues outside of algorithmic manipulation that social networks need to address at scale, such as online harassment and hate speech.
Don’t underestimate the impact—or how many issues need solving
I want to conclude this article by reiterating that threats arising from the manipulation of recommenders, especially those used by social networks hold broad societal implications. It is widely understood that algorithmic manipulation has led to entirely false stories, conspiracy theories, and genuine news pieces with altered figures, statistics, or online polls being circulated as real news. These disinformation mechanisms, which continue to work to great effect even today, do damage to public health, divide society, breed hatred and extremism, and may ultimately threaten civic order. This fact was even mentioned in the UK Russia Report. They’re also a powerful weapon that are undoubtedly being researched and improved upon by a variety of groups, organizations (the next “Cambridge Analyticas”), and undoubtedly other nation states. I think it’s safe to say that the companies operating social networks are aware of these problems, and are attempting to address them.
However, recent findings demonstrate that there are a lot of issues that still need solving, and that a lot more effort could be put into this area.