New research shows how the AI in online recommendations can be manipulated
Recommendation engines are used by sites and apps across the internet in order to nudge people into purchasing more products or consuming more content. The choices they present can influence users, both by reinforcing existing preferences or introducing new items in hopes of increased engagement.
To test the integrity of these systems, we examined how recommendation mechanisms might be manipulated by an adversary. Our research attempted to poison collaborative filtering models used by many sites and apps for recommendations, using techniques that could be used to cause a piece of content to appear at a higher position in a person’s social media timeline or search results by latching on to another piece of content.
The simulated attacks effectively increased the chances that a user would be recommended another specific user, based on retweet activity. Even a very small number of retweets were enough to manipulate the recommendation algorithm into promoting accounts whose content was shared through the injected retweets. We found that these attacks could represent effective mechanisms for spreading disinformation, or for growing follow-back rings that coordinate their activity in order to boost specific content.
Our findings also suggest that artificial intelligence models likely used by all social media platforms to increase site engagement are already being manipulated by motivated users. And these platforms are vulnerable to much greater exploitation by adversaries who might take a more experimental approach to testing the best ways to influence the AI in these mechanisms.
The dangers of disinformation
YouTube, Facebook, TikTok, and Twitter play a key role in influencing audiences and shaping public opinion. These sites can already be manipulated in various ways, which include:
- services that sell retweets, likes, views, subscribes, etc.
- organized groups, such as troll factories
- covert ad hoc groups that use coordinated activity to influence political discourse or spread disinformation
Many adversaries treat social media as an information warfare game, continually devising ways to amplify and spread content, often achieving amplification via coordinated actions performed by fake accounts. To do this more effectively, they often work around systems designed to prevent manipulation, publishing of disinformation, and other actions that are prohibited via that platforms’ terms of service.
Billions of people around the globe get their news from social media. And the ability to accelerate the spread of disinformation is especially worrisome given that untruths seem to have an inherent virality. A 2018 study published in Science that falsehoods are 70 percent more likely to be retweeted on Twitter than the truth. A more recent study found Covid-19 misinformation spread faster than the truth. This phenomenon could be encouraged by recommendations fueling the efforts of groups purposely spreading bad information. These experiments illustrate the difficulty differentiating between what is intentional manipulation of recommendation mechanisms and what has become standard user practice on these platforms.
Regardless, these findings present a warning for how these mechanisms could be abused to sway public opinion, especially during time-sensitive events that give manipulators brief windows to effectively spread misinformation before their activity is detected—such as, in the period of time leading up to an election or when attempting to time the rise or decline of the price of a stock or in the middle of a mass vaccination campaign that relies on winning the trust of a vast majority of a country’s population.
Why focus on recommendations?
Machine learning models on e-commerce sites learn to recommend products to shoppers based on the items they’ve browsed or previously purchased. The recommendations mechanisms on social networks operate in a similar way. They train models based on their users’ interactions with other users and content to provide curated timelines, ranked search results, and recommendations of users to follow and groups to join. Some of these tools utilize machine learning techniques that create models based on the way users behave on the platform.
The output of these models reflect behavior contained in their training sets. Consciously or unconsciously, groups that actively spread political messaging online seem to have some sense of how these models work and have utilized them to their benefit. Understanding how that manipulative behavior might affect those underlying models presents a unique opportunity to see how social engineering, disinformation, scams and even legitimate marketing tricks spread content.
To illustrate how simple manipulation techniques can be used to affect recommendations on a social network, we collected data from Twitter and used it to train models which were then used to implement simple recommendation mechanisms. We then performed experiments where we poisoned the original datasets, trained new models with the poisoned data, and observed how recommendations changed.
Collaborative filtering—what’s that?
We chose to study recommendation mechanisms based on collaborative filtering models.
Collaborative filtering is a machine learning technique that can be used to build a model that encodes similarities between users and content based on how users have previously interacted with them. In our experiments, user preference data was represented by how often users retweets others users’ content.
We then added some poisoned data. This involved injecting additional retweets between selected accounts into the original dataset in order to cause a specific account to be recommended to a small group of control users. Accounts targeted for poisoning and their actions were selected in a variety of different ways. The goal was to determine what types of user behavior most efficiently manipulated our recommendation mechanism.
On Twitter, coordinated groups are often formed through “follow-back” mechanisms to amplify content, keywords, phrases, or hashtags. The attacks modeled in our simulations were designed to operate in a similar manner to how coordinated groups behave on Twitter. This result may illustrate how these groups settled upon their current modus operandi.
What we found
By selecting appropriate accounts to retweet and varying the number of accounts performing retweets along with the number of retweets they published, it was possible to alter similarity values between specific accounts such that they were suggested to members of our control group. Our full report includes visual representations demonstrating how these mechanisms work in practice.
Twitter does seem to be aware of how the site’s mechanisms have been manipulated, possibly in nearly the exact ways we tested. Many of the accounts in this dataset have since been suspended by Twitter, including, in some cases, specific accounts that were focused on during our experimentation.
However, our experiments intentionally utilize very simple mechanisms, and are only designed to approximate how a social network recommendation system might work. If our approximations were to be reflective of how real recommendation mechanisms work, it may be possible for groups who seek to manipulate those mechanisms to further alter their behavior in a manner that would allow for more effective amplification of content. It is, however, not possible to determine how close our implementations were to actual mechanisms used in social networks, and thus our results can be viewed as illustrative.
A full report including methodology, results, and the code used to perform these experiments can be found can be found at https://github.com/r0zetta/collaborative_filtering/
Categories