During the United States 2016 presidential election, the Russian government carried out a massive disinformation campaign that reached more than 150 million people. By introducing a machine learning model that they created, Miles McCain ’19 and Jeffrey Shen ’19 explored this topic in their Independent Project (IP), which they presented on Wednesday in the Underwood Room.
The result of McCain and Shen’s research was a digital model that distinguishes between “organic” and “troll” content from Russian sources. Shen explained specifically how their system functions and the level of success that they found.
“Our main output for the term was a machine learning model which can distinguish between Russian troll content and organic content in our data set with 90 percent accuracy. We want to clarify that in that it’s not meant to detect Russian trolls. It just so happens that in our data set, it’s able to identify them with 90 percent accuracy,” said Shen in an interview with The Phillipian.
Gayatri Rajan ’22 said she found McCain and Shen’s approach intriguing, as it related to her past coding projects. Rajan’s familiarity with the subject, however, did not lessen her interest.
“I have done a couple machine learning projects in the past, so it was really interesting to figure out how they approached the idea of explainable machine learning. That is something that a lot of researchers are sort of striving towards, because you need to be able to see. Because if it is a black box, then it is obviously very open to bias,” said Rajan.
During the presentation, McCain provided a brief background of the Russian troll accounts. McCain explained that such accounts attempted to influence the result of the election through the use of Twitter.
“Russia used an organization called the Internet Research Agency [I.R.A.], which ran in a building in St. Petersburg. It was deeply integrated with the Russian government. There were three objectives to using the I.R.A. accounts. The first one was to incite division, exploiting the existing divisions between different groups, causing further conflicts. The second was to generate distrust in the nature of truth itself. Their ultimate objective was to influence the presidential election in favor of [President Donald] Trump. They did this by spreading pro-conservative messaging across social media and dilute the democratic vote.” said McCain.
According to McCain, social media acted as a convenient platform for the Russian government to exert political influence through disinformation campaigns.
During the talk, McCain said, “The Russian disinformation campaign has really latched onto social media, because the barriers of entry to form the sort of influencing that took place using American newspapers can now be done much more cheaply via twitter and reach a lot more people. These messages were not sponsored by any organization. Instead, they were naturally shared and retweeted by someone and sort of proliferated.”
Jaswin Hargun ’20 noted the seemingly innocuous yet deceptive nature of “troll” tweets that were addressed during the talk. Nevertheless, Hargun said she believed McCain and Shen’s model was successful.
“I was pretty surprised by the tweet they showed at the start when they were discussing how some Russian troll tweets are seemingly not troll tweets. They’re not politicized. They just seem very normal, very wholesome, but their model was able to detect that that was at least partly a troll,” said Hargun.
Developing the machine learning model did not come without challenges, according to Shen. Shen noted the various difficulties associated with acquiring the data from Russia needed to create the digital model, especially when compared to other subjects such as China.
In an interview with The Phillipian, Shen said, “There are technical challenges… one main thing with Russia is the data present. With China, you can sort of create your own data in that you can monitor U.R.L.s, you can see what posts are. But for Russia, you have to rely on the fact that government agencies or tech companies will release their own data. So right before we started our project, we were actually really fortunate that Twitter released a massive data set of confirmed I.R.A. troll tweets, which is what made our machine learning model possible.”
For McCain, a significant discovery from the research was the ubiquity of the disinformation. He considered how the Russian campaign reached people on all sides of the American political spectrum.
“One thing that really stuck out to us was firstly that the Russian trolls did not post purely conservative content. They did not serve only to direct votes towards Donald Trump. They also worked to direct Democratic voters towards, for example, Jill Stein. What they really wanted to do, according to our research, [was] generate distrust in the American political system and prevent Hillary Clinton from winning. However that was accomplished is sort of a secondary question,” said McCain in an interview.
McCain continued, “The important part and the main interesting finding that we had is that this was not something that only a certain demographic fell for. Almost every single American demographic was targeted and fell victim to the Russian disinformation.”
While McCain and Shen had success with the model, Shen acknowledged that there are many opportunities to expand the scope of their project.
“This is sort of just an entire can of worms by itself. This issue, there’s so many other ways you can go and analyze the content. One sort of thing is that the content Russia uses changes over time, and our model is trained only on 2018 content, so I guess expanding the scope of this work to sort of study new periods of time. This is just one specific area. Our work was mainly centered around Twitter. There was obviously content on Facebook, on other platforms. It’s just really about exploring it,” said Shen.
Editor’s Note: Jeffrey Shen is a Digital Editor for The Phillipian.