Britten die voor een Brexit zijn, worden gesteund door extreemrechtse Twitteraars van buiten het Verenigd Koninkrijk. Dat concludeert Andy Patel, onderzoeker van het F-Secure Artificial Intelligence Center of Excellence, nadat hij 24 miljoen tweets heeft onderzocht die betrekking hebben op de Brexit. In aflevering 22 van Cyber Security Sauna behandelt Patel zijn onderzoek, de verspreiding van verkeerde informatie, en hoe sociale media vaak niets meer zijn dan een echokamer voor mensen met dezelfde meningen. Luister hier of lees verder voor het transcript. En vergeet niet je te abonneren en een review achter te laten!
Janne: Can you tell us a little about your research – what were you looking for, and how did you do the research?
Andy: So I have been collecting tweets, basically just from a standard Twitter API stream with the search term Brexit, writing those tweets, or an abbreviated form of the metadata in those tweets to disk, and then loading up a portion of, or all of those tweets, and analyzing the data.
Okay, so basically you just grabbed every tweet about Brexit, and saw what you could find.
Yeah. Initially a lot of it was just about counters. So I would run analysis across the data for a 24-hour period, and I would count how many times I’d seen a user tweeting, how many times a user published a retweet instead of an original tweet, how many times each user was retweeted, how many times a hashtag was used, URLs that were shared, words that were seen in tweets, so it tokenized all the words, things like that. How many times a user was replied to, and just tried to build a picture of the trends that happened every 24-hour period.
So what does that tell you?
It gave me an idea of which accounts were tweeting a lot, which accounts were amplifying a lot of content, which accounts were influencers, most of those being well-known personalities, like Theresa May, or Jeremy Corbyn, or some of the more prominent accounts that are on either the Leave or Remain side of the conversation. And, of course, the hashtags kind of give an indication of what people are talking about. So if some piece of news comes up, then we can see for that day that that hashtag was more prominent.
So this is like a baseline, but it’s a baseline of a situation where whatever amplification or phenomenon you’re looking at is already taking place.
Yeah, so if I looked at that previous 24 hours’ worth of data, every day, I started to get an idea of what was normal. What accounts got retweeted a lot, what accounts participated a lot in the conversation, and what hashtags were most frequently seen. So when things changed, you can see that something happened and start looking into it. For instance, if there was something that happened in Parliament that day. I was also reading a lot of the tweets just searching for Brexit in the actual UI, just scrolling down reading through to understand what was going on. I think that is obviously important when you’re trying to understand the conversations that are happening.
The other thing I did quite heavily was graph analysis, so looking at the interactions between users. So for instance if User A retweets User B, that creates a link between two nodes in a graph. Or if user A replies to User B, or mentions or so on, then you can build a node edge graph of all the interactions that happened during that particular portion of Twitter activity, and then you can visualize it. And by visualizing it you can see which participants participated in which parts of the conversation. By doing that I could very easily distinguish between pro-Remain conversation, pro-Leave conversation, conversation about the Labour party, things like this. And that formed the basis of the further research, which was then to understand what was going on in each of these conversations, what sort of activity, what the users were doing, what the prominent hashtags in those communities were, things like that.
So once you had the groundwork in place, what were some of the first things you started noticing?
Well, I noticed for instance, and this is in the research, accounts that were quite highly amplified that perhaps aren’t well-known personalities within the UK political sphere, and I dug into what was going on there. For instance, Twitter accounts that are linked to websites that pretend to be real news websites but aren’t, or just accounts that are pushing a certain political agenda.
You also started noticing that some of these accounts were not just talking about Brexit, but other topics as well.
Yeah. Something that I’ve noticed over the last half year was things like these France protests. One of the days when I looked at the data for the last 24-hour period, I saw that the France protest hashtag was fairly high up in the list of hashtags that had been most seen on that day. That was actually traced back to a single tweet from a US right wing account that then got retweeted by like five or six thousand accounts. But it happened to have hashtag Brexit in it, so it showed up in my collection.
So what’s the common denominator here? Why would someone who cares passionately about Brexit one way or the other, care about the yellow vests or MAGA or whatever?
Yeah, exactly. That’s what makes this whole thing a bit odd. Because you have accounts that tweet about Donald Trump, tweet about Make America Great Again, tweet about protests in France, tweet about Brexit, specifically go into detail tweeting about how “World Trade Organization No-Deal Brexit is good for Britain,” and these same accounts are tweeting about many different things happening in the world. And these accounts are not necessarily from the UK, or from France or from the US. They might be from one of those places, or elsewhere. And you also see tweets about AFD in Germany, things like this.
What am I missing? Why would somebody care about all those topics?
Well, indeed. Why would they? And that’s sort of what I’ve been trying to highlight in the research, that there are these accounts that seem to be very interested in all of these more right-leaning political causes, and these accounts are pushing all of those agendas, not just pro-Leave or pro-No-Deal-Leave, or pro-AFD.
That’s very interesting. I’m wondering if I just have my conspiracy hat screwed too tightly on my head, because I’m thinking the only thing in common with all these topics is that they tend to encourage discontent and division inside each of these countries.
Yeah, it’s very much the case. It’s populist right wing ideas, these sort of things.
Hmm. So it’s almost like there’s somebody with an agenda like that, somebody who would benefit from discord in these nations. Is that a fair assessment?
Yeah, I think it does look semi-organized in that way.
If there is an actor that wants to amplify a specific message, do you think it’s more likely that they’ll create original content and amplify that, or that they’ll just pick an existing opinion that they like, and try to push that out of proportion?
I think it’s a bit of both, because if a well-known Twitter personality tweets something that resonates amongst that group, then it’s obviously going to get a lot of amplification anyway. But in the case of accounts that aren’t what I would consider high profile, then it does look a bit more suspicious. It looks like it’s possible that there was some sort of behind-the-scenes coordinated effort to amplify that particular tweet. An everyday person can have their tweet go viral if they just happen to tweet the right thing at the right time, and someone who has a lot of followers retweets it and so on. So you do see those things happening from time to time.
Is it sometimes quite hard to spot what’s artificial amplification and what’s just somebody striking a chord?
It is, yes, it is. And if you think about it, if someone wanted to buy retweets for a particular tweet, they’ll pay a company – and it’s very easy, you can just Google for “buy retweets.” So you’ll pay a company and then they’ll, over some period of time, be it 24 hours or a week or whatever, they’ll have their fleet of accounts retweet that tweet. And if you – let’s say during this two month period I collected like 25 million tweets. So if you are trying to find out if someone paid for a retweet, it’s possible that that fleet of accounts that belonged to the service that provided the pay for retweet, those accounts may only show up once in the entire data set. Just having retweeted that one tweet over some period of time during the time that the data was being collected. And finding that is also really difficult because probably half of the accounts that were seen in that data set may have only been seen once.
Yeah, I would think that if there was artificial amplification on just one tweet of a particular account, it would be more conspicuous than if it was happening over time, because isn’t it a clearer anomaly from the baseline in that case?
Well, if you think about how you would try and discover such activity, what you could do is say “Look for all accounts that have only been seen once,” and then of those, “Show me only the accounts that have retweeted something” and then of those, “Show me which tweets were retweeted.” So you would then have a list of tweets that were retweeted by accounts that were only seen once in the whole data set. How would you then figure out which of those were real users and which of those belong to a fleet of users that are owned by a retweet service? You know, It’s very difficult. Unless you manually inspect those accounts. And very often they’re built to look like normal people’s accounts.
But aren’t there telltale signs you can see that this might be a bot account? I’ve heard that the ratio of original tweets and retweets tends to be different when there’s a bot trying to amplify a message.
Well, I don’t know if that’s a clear indication, because some people just do like to hit the retweet button a lot. It’s almost impossible, unless you can actually find the person who’s behind that account and talk to them, it’s almost impossible to determine whether said account is real, if it’s a real person, or if it’s automated or whatever.
I think that’s crazy. Intuitively I would think that you would be able to tell bots from people. But I guess, like what you’re saying, like how a lot of people just retweet stuff anyway, I guess that makes sense. I would hope that my Twitter account doesn’t look like a bot, but maybe it does, I don’t know.
Well, I can check for you if you want. (Laughing)
Do you have a metric, like how botty is an account?!
(Laughing) I can run scripts against your account and show you the results. Most people’s accounts don’t look like the ones I highlighted in that research.
So is this being orchestrated by a single actor, like a person or a group of people with an agenda, or do you think there are multiple parties benefiting from artificial amplification for their own purposes?
I mean, that’s sort of difficult to say. Obviously some of it is people agreeing with those ideas. Some of it is you know, what you might know as originating from things like 4chan, and /r/The_Donald and these sort of groups. Some of it may even be like nation states leveraging what’s already there. But it’s impossible to say what’s what in that respect.
Did you notice any difference between the organic tweets and the artificially amplified tweets in the pro-Leave and pro-Stay camps of Brexit?
Like some of the accounts on the Remain side that do tweet a lot are followed by people that I follow. So I’m going to assume, given the people that are following them, that they are real people. Those accounts are real people, they’re just very, very into Twitter. But then the accounts that tweeted quite heavily on the other side, a lot of them in my opinion displayed suspicious behavior. You know, like accounts that were tweeting from US time zone, or these sort of things.
So are you saying that one side of the Brexit conversation is being artificially inflated, and the other side is just benefiting from people who are super into Twitter?
I think that there’s people who are super into Twitter on both sides, but I think that – my gut feeling is that there is more artificial inflation on the Leave side.
So with artificial amplification of tweets and opinions, you always want to talk about attribution, like who’s behind it. And I know you’re a careful guy, you don’t want to go out on a limb too much, but if I pushed you to say who’s behind this, what would you say?
I mean, to me this looks like the global far right.
That doesn’t sound like a very orchestrated bunch to me.
They actually are probably more organized than people give them credit for.
Yeah. A friend of mine who also researches this field found plenty of interesting posts on different forums and also things on Pastebin where they detail meme making, advertising, all kinds of topics around trying to put out messages that are engaging, create memes, these sorts of things. How to organize around that, instructions on how to make fake BLM accounts and pretend you’re part of the other side so you can infiltrate that part of the organization, and talking about which divides on the US Democratic-thinking side were good to push on. You know, like Bernie Sanders, or Black Lives Matter, or Hillary Clinton, they had like a list of like “We can push people apart on these issues.” Lots and lots of references to books that you should read, all kinds of things like this. It was like a training guide for doing this stuff.
Like an alt-right internet influencing training guide.
Yeah. And it definitely looked like there was some thinking and organization and collaboration behind this stuff. And then researchers in Germany also have found people talking on Discord servers about this same thing, like organizing around certain causes, instructing people to go post comments on YouTube, these sorts of things.
That is very interesting.
A lot of the organization doesn’t happen in plain sight, but it can be found if you look for it. I think there was a recent publication in Germany about the Bavarian elections and all the right wing campaigns that have happened there. I haven’t actually read the whole thing yet, but there are many many detailed campaigns that they investigated, and it was all far right.
This is very interesting, because back in my day I’ve come across similar guides, how to behave, how to create specific content, stuff like that, on the far left as well, but you’re saying that lately you’re seeing more of it on the far right side.
Well, I mean maybe it’s just because that’s what I’ve been looking at, and because of the people that I’ve been collaborating with recently and what they’ve found.
That’s very interesting because these extreme ends of the political spectrum don’t strike me as very uniform people. To me they appear more like crackpots with very specific agendas. And for them to be able to coordinate their efforts across the board, across all these different kinds of topics in all these different countries, it just seems bizarre.
Well, if you look at the yellow vest thing, that started off as a working person’s protest about petrol prices. And then it got co-opted by the right wing, and now you see people in London wearing yellow vests, going around and harassing people. That coordination has happened, people have thought about it, and how to use these things, and you can now find kelta liivi in Finnish Twitter.
The Finnish hashtag for that.
Yeah. So there is some sort of thought around these things and how to co-opt these. And again, with the Macron leaks and those things.
This is turning into a very political episode of Cyber Security Sauna. So to get us back on the security topics, you’re saying it’s hard for people to tell the difference between what’s being amplified naturally, and what’s artificially amplified. So what is a Twitter user to do? How can I tell what’s disinformation and what’s not, and try not to be a part of the problem here?
Good question. To be honest I don’t have a definitive way of identifying – for a layperson to identify what’s real and not. I think the idea of social media is that – and there have been studies about this – that people do share links to articles that they haven’t even read, if the headline is clickbaity and it aligns with their thinking. And then of course you tend to be in a bit of a bubble anyway most likely, and so the people that see that link being shared might also share it, and there’s not a lot of fact checking behind these things. And so what happens then is that someone shares a clickbaity link headline, it sort of ends up going viral by virtue of a lot of people agreeing with it, or “Oh my God, people need to see this.” And then by the time it’s refuted, it’s sort of too late. And so that’s the idea. I don’t know if you saw the Brexit: The Uncivil War film?
They were just pushing, you know, “Turkey and NHS, Turkey and NHS.” And all their advertising was about “80 million people are coming to the UK when Turkey joins the EU, and NHS doesn’t have money, but if we leave the Euro we’ll have a lot more money to give the NHS.” And that was just the two things. And those two things went viral, and they were factually incorrect. But that’s how they got people on their side. So it’s by sharing things that resonate with people’s views, either on immigration or whatnot, that slowly pull people into then becoming part of these groups. Following more people who have the same beliefs or who are sharing the same things, and so it’s just sort of sucking people into these different conversational spheres. And that’s why you see from my graph analysis, for instance, you can see that there are very separated conversations and there’s not much talking between the two.
So what does that mean? We just have two sort of bubbles of opinion echo chambering each other’s views, but not engaging in a real discourse?
Somewhat, yeah. And there was a very interesting article about this, that a journalist joined one of these far right Twitterspheres in the US, QAnon Twittersphere, and observed what goes on in there, and he was able to get a lot of followers by live streaming something. He noticed that there was a lot of this sort of followback culture. So people posted these lists of accounts, and if you follow them they will follow you back. He ended up having many, many followers, but also he was following thousands of accounts. So at that point, what you see is just the highest amplified content. Which is of course the most well-known personalities in those spheres who get lots of retweets. And so your timeline is just this stuff that you then click retweet on. And all those people tend to retweet the same things. So this kind of behavior can also cause an amplification. And there’s nothing automated or fake in that. It’s just the way that the community works, it’s the way that the recommendations that the software gives you works.
Well, that’s the takeaway I’m getting from all this. That Twitter doesn’t seem to be a great tool for the exchange of ideas, but more an echo chamber for reinforcing what I already think, or what I already think I know.
Yeah, yeah. I suppose it is a bit like that.
So what did the work you were doing for the research look like? What were your tools, what were your methods of research?
To collect tweets, it was just a simple Python script. I think I used Tweepy, which is a wrapper for the Twitter API. To do the analysis work, I just used Python in Jupiter Notebooks. So I just created myself a helper module that contained like a hundred functions for doing different things, and every time I did a bit more analysis or needed to do something else, I created a function that then I ended up putting into this helper library.
So what did that helper do?
Simple things like reading in the data, getting these counters out of the data, “Look for a person that tweeted this” or “Look for the people who reteeted this person” or “Look for the people who use this hashtag,” all kinds of helper functions.
But that’s more a script than an AI thingy.
All that type of analysis was just scripting. And I also had helper functions to generate simple graphs, like bar charts or line graphs, things like that. But then the machine learning part of it is the community detection part of graph analysis. So once you have this node edge graph structure built, I used Gephi to visualize it, but I also used Python igraph to do community detection on it. And what it does is you give it the data structure that represents the nodes and edges, and it spits you out communities, and each community basically has a label (in this case it would be a number, like 0, 1, 2, 3, 4) and then a list of the nodes that belong to that community, which are then Twitter user accounts. And from there then you have a list of accounts that belong to this community, a list of accounts that belong to that community, and from there I can say, “Okay, tell me what hashtags this list of accounts published over the data set, tell me which accounts this list of accounts retweeted the most, tell me which URLs this list of accounts shared,” these sorts of things.
So from breaking down all of the users in the data set into these lists of accounts that belong to different communities, I could then pull out data from that entire data set based on just asking what these people did. And since the data set was huge, I can’t fit it into memory. So I just made a simple yield function that iterates through all the data, and takes a couple of hours or something, and then it spits out something and then I saved it and I did analysis on that. So that’s why it kind of took a long time. Towards the end of my research it was four times a day that I would set something up to run, my computer would get hot for two hours, and then I’d come back and be able to work up the data.
All right, this has been very interesting. Thanks for being with us again.
Thanks for having me.
That was our show for today. I hope you enjoyed it. Make sure you subscribe to the podcast, and you can reach us with questions and comments on Twitter @CyberSauna. Thanks for listening.