As I mentioned in a previous post, I’m writing scripts designed to analyze patterns in Twitter streams. One of the goals of my research is to follow Twitter activity around a newsworthy event, such as an election. For example, last weekend France went to the polls to vote for a new president. And so I tuned the parameters of my scripts to see what I could find.
The script in question receives a stream of Tweets based on a list of search parameters. Here are the parameters I gave it:
[‘macron’, ‘lepen’, ‘presidentielle2017’, ‘presidentielles2017’, ‘MarineLePen’, ‘Marine2017’, ‘ ElectionPresidentielle2017’ ‘enmarche’, ‘aunomdupeuple’, ‘jevote’, ’emmanuelmacron’, ‘choisirlafrance’, ‘MLP’, ‘debat2017’, ‘debatpresidentiel’, ‘jevotepour’]
I kicked the script off on the afternoon of Friday May 5th, just before 14:00 French time, and terminated it at 22:00 on Sunday May 7th, a few hours after election results had been called. The script received a stream of Twitter status objects matching the search terms above. The number of Tweets per hour varied from about 18,000 (in the middle of the night, French time) to as much as 79,000 (in the last few hours before the polls closed). Processing involved extracting metadata such as tweet language, hashtags, URLs, and mentions to a set of output files.
Quite quickly after starting the script it became apparent that there were a fair number of URLS pointing to English language political opinion pieces being shared on the stream. As the weekend went on, it was obvious that a majority of them were positive of Le Pen and negative of Macron. Here are some examples of the sort of headlines that were being shared:
- “BREAKING: WikiLeaks confirms leaked Macron emails authentic”
- “BREAKING: Macron emails lead to allegations of drug use, homosexual adventurism and Rothschild money”
- “Betting Markets Flip to Marine Le Pen in Final Hours Before Election”
- “French Police Defy Their Unions to Vote For Le Pen”
- “BOMBSHELL REPORT : Email Leak Shows Macron on Gay Lifestyle Mailing List”
One article, who’s headline read “Macron Whistleblower Dies Under Suspicious Circumstances”, insinuated that a member of the Macron campaign had been assassinated using a “heart-attack gun”. Here’s a quote from that story:
“Intelligence agencies have been using ‘heart-attack gun’ technology for years, according to a Congressional testimony video filmed in 1975. Could it be that Corinne Erhel was the victim of such technology?”
Right. Anyway, moving on…
Regardless of the configured search terms, my scripts tend to always pick up a fair amount of URLS pointing to non-authoritative opinion pieces. This stuff is usually “background noise”, but last weekend, the volume had definitely been turned up. It wasn’t until late Sunday evening that stories in French, by French publications started to show up in the URL feed.
Since I was monitoring data about the French elections, I figured it would be interesting to see how many Tweets were in French as opposed to English. On the whole, there were more Tweets flagged as ‘fr’ by Twitter than those flagged as ‘en’. One particular moment during the weekend caught my eye, though. Have a look at this graph that depicts Tweets by language between the afternoon of Saturday May 6th and the afternoon of Sunday May 7th.
The orange line is clearly what we’d expect – after midnight on the 6th of May, the number of Tweets in French start to drop off as people presumably went to sleep. That number then picks up again on the morning of Sunday May 7th, as people began their day. The blue line shows Tweets in English, which spike at 01:00 French time. I don’t know what caused this spike, but the time zone lines up with early evening on the American continent.
Interesting patterns were also observed with regards to hashtags. When I started the script up, and for the first few hours, top hashtags included #Macron, #LePen and #Presidentielle2017. Later in the evening of Friday May 5ht, the #MacronGate hashtag started showing up. DFR Lab wrote a great article explaining the mechanisms behind this phenomenon. I highly recommend reading it. (tl;dr Bots!) The data I collected also points to patterns indicating the use of automation to push this hashtag. For instance, take a look at the following graph.
The above graph shows the number of times my script saw one of the four hashtags during each hour between 03:00 and 11:00 French time on May 7th, 2017. What you’ll notice is that the #Macron, #LePen, and #Presidentielle2017 hashtags were low-volume during the night (again, as expected, since everyone was probably asleep), and picked up as folks woke up. However, the #MacronLeaks hashtag maintained a fairly steady volume across this entire time-slice. In fact, the #Macron hashtag remained at the same steady volume all the way from its introduction on Friday evening until the election results were called. It then dropped like a stone to less than 5% of its previous volume during that hour, as the bot infrastructure was shut off.
Both the URLs and #MacronLeaks hashtags were predominantly shared by “American Alt-Right” Twitter accounts. In some cases, these accounts even tweeted/retweeted in French. At the end of the whole weekend, the most shared URL was a link to a YouTube video entitled “The Truth About Macron”. Next was the pastebin page containing links to the stolen Macron data. Seven out of the ten top shared URLs were links to non-authoritative news sources. Luckily, DFR Labs’ article made it into sixth position.
While the above analysis looks to be pretty doom and gloom, things really aren’t as bad as you might think. A vast majority of Twitter users probably wouldn’t have noticed the URL and hashtag flooding going on at all. Why? Well, performing a search in Twitter provides “Top” results by default, which ranks Tweets using an algorithm. And that algorithm appears to filter by some sort of quality (that tends to separate the wheat from the chaff). All that spamming by bot accounts going on in the background doesn’t appear to register. The same also goes for the “News” tab and the list of top 10 trending hashtags. The only place you’ll readily see the background noise is in the “Latest” tab.
So, if all that noise no longer generates much signal, why even still create it in the first place? The answer lies in the fact that the press and the media do spend the effort to dig into raw data looking for a story to run. When they find this otherwise “hidden” data, they run with it. In effect, the press are doing the bots’ jobs for them.
The French presidential election was an ideal moment for me to refine the scripts I’ve been writing to find the usage patterns associated with “active measures” in upcoming elections and world events. The UK general election is in just a few weeks, so I’ll get to see how well my changes work. I’m sure I’ll have sometime interesting to report on after that event happens!