As part of the Horizon 2020 SHERPA project, I’ve been studying adversarial attacks against smart information systems (systems that utilize a combination of big data and machine learning). Social networks fall into this category – they’re powered by recommendation algorithms (often based on machine learning techniques) that process large amounts of data in order to display relevant information to users. As such, I’ve been trying to determine how attackers game these systems. This post is a follow-up to F-Secure’s recent report about brexit-related amplification.
In this article, I’d like to share methodology I’ve been developing to observe “behind the scenes” amplification on Twitter. I will illustrate how I have applied this methodology in an attempt to discover political disinformation around pro-leave brexit topics, and to hopefully further clarify how much overlap exists between accounts promoting far-right ideology in the US, and accounts pushing pro-leave ideology in the UK.
Step 1: Collect a list of candidate accounts
I wrote a simple crawler using the Twitter API. It does the following:
- Pull a new target account from a queue of targets
- Gather Twitter user objects of the accounts the target is following
- For each user object, run a set of string matches and regular expressions against text fields (name, description, screen_name)
- In this experiment, string searches included things like “yellowvest”, “hate the eu”, “istandwithtommy”, “voted brexit”, “qanon”, and “maga”.
- Check the list of accounts followed against a pre-compiled list of roughly 500 influencer accounts. This list was collected over several months by hand-visiting Twitter accounts identified through data analysis and manual research.
- If any of the above matched, add the account name to the queue (if it isn’t already on it, and if we haven’t already queried the account), and save the user object for later processing steps.
- Also, save a node-edge graph representation of the network, as it is crawled
The crawler was seeded with a couple of random troll accounts I found while browsing Twitter.
As with any Twitter account crawl, the process never finishes – the length of the queue grows faster than it can be consumed. I ran the crawler for just a couple of days (between April 2nd and 3rd 2019), during which it collected about 260,000 “interesting” Twitter user objects. For this experiment, I filtered the objects into a few different groups:
- Filtering by account creation date, in order to find recently created accounts. (during, or after March 2019). This yielded a list of just over 2,000 accounts.
- By filtering on strings in description and name fields, I collected lists of accounts self-associating with Tommy Robinson. This yielded a list of just over 900 accounts.
- By filtering based on who the accounts were following, I collected a list of accounts that followed at least 10 of the pre-compiled list of influencer accounts. This yielded a list of just over 23,000 accounts.
Step 2: Observe candidate account behaviour
I wrote a second script that uses Twitter API’s statuses/filter functionality to follow the activity of a list of candidate accounts collected in the previous step. I ran this script with the first two candidate lists – Twitter’s standard API allows a maximum of 5,000 accounts to be followed in this way. Full tweet objects were pre-processed (abreviated to include only metadata I was interested in looking at), and written sequentially to disk for later processing. I allowed this script to run for a few days (between April 2nd and 3rd 2019), in order to capture a representative sample of the candidate accounts’ activities. I then used Python and Jupyter notebooks to analyze the collected data.
The “new accounts” group tweeted about a variety of topics, and retweeted accounts from both the US and UK. Here’s a graph visualization of their activity:
Names in larger fonts indicate accounts that were retweeted more often.
In comparison, the list of “tommy” accounts promoted very samey content, giving rise to this rather awkward visual:
When analyzing the collected data, I imediately noticed that both groups were promoting a few almost brand new accounts. Here’s the most prominent of them – BringUkip. Notice how the account’s pinned tweet doesn’t even spell UKip party leader Gerard Batten’s name correctly. I suspect this isn’t an official UKIP account.
During the (less than two days) collection period, close to 850 of the 2000 accounts in the “new accounts” group retweeted BringUkip almost 1800 times. In that same time span, over 600 of the approximately 900 accounts in the “tommy” group retweeted BringUkip close to 1000 times.
Here are some of the other brand new accounts being amplified by the “new accounts” group:
Some really surprising phenomena can be observed in the above chart. For instance, Fish_171, an account that was created on March 2nd 2019 (almost exactly a month ago) has over 12,000 followers (and follows almost 11,000 accounts). WillOfThePpl11, an account that was created on March 16th 2019 (about two weeks ago at time of writing) has published over 10,000 Tweets.
Here’s a similar chart, but for the “tommy” group. Not as much new account amplification was being done by those accounts:
Here are a few of the other new accounts being promoted by these groups. Red26478680 was created on March 30th 2019 (4 days ago at time of writing).
I happened to spot this account while browsing Twitter on my way into work this morning. It was promoting itself in a reply to Breaking911 (one of the accounts on my list of influencers):
Another hotly promoted account is HomeRuleNow, an account that was created on March 29th 2019 (5 days ago).
This account self-identifies with #Bluehand – a group that Twitter appears to have been actively clamping down on. This account accrued over 1000 followers in the five days since it was created.
Here’s one more – AGirlToOne:
This account was created on March 23rd 2019 (just over a week ago). It was retweeted by over 1000 of the approximately 2000 “new accounts” users during the collection period.
While browsing through a random selection of timelines in the “new accounts” group, I found some interesting things. Here is a tweet from one of the users (ianant4) discussing how to evade Twitter’s detection mechanisms:
I also found eggman25503141:
This account is quite bot-like. All tweets have the same format, starting with “YOU WONT SEE THIS ON #BBCNEWS”, an extra line break, and then a sentence, followed by a few pictures. All of this account’s tweets promote extreme anti-islamic content. Most of this account’s tweets have received no engagement. Except this one:
The above tweet received, at the time of capture, 48 replies, 374 retweets, and 254 likes. Many of the accounts collected in the first step of my collection process participated in retweeting the above.
For fun, here’s one of the seed accounts used.
The methodology outlined in this article works well due to the fact that the studied demographic uses plenty of self-identifying keywords in their Twitter descriptions. For other subject matter, gathering a candidate list may be more problematic.
Given that the account list I gathered was based on following-follower relationships between Twitter accounts, it is clear that many pro-brexit Twitter accounts and MAGA accounts follow each other. The content these accounts promote is, however, somewhat separate, and dependent on the account’s persona (MAGA versus brexit). Some accounts I looked at promoted both types of content, but they were in the minority. Most of the accounts observed in this research were/are being operated, at least in part, by actual human beings. In between heavily retweeting content, these accounts occasionally publish original tweets, converse with each other, and troll other people. I believe the results of this piece of research strongly prove the existence of a well-organized and potentially large network of individuals that are creating and operating multiple Twitter accounts in order to purposefully promote political content directly under our noses.