By PATRICK STEPHENSON
BRUSSELS – The so-called Islamic State may be on the run in Syria and Iraq, but “lone-wolf” extremists acting in its name continue to exploit Europe’s cyber-space to radicalise others and to facilitate attacks. The most recent major attack, on 17 August, involved extremists in Barcelona who used vehicles to kill 16 innocents and injure 150 others. According to news reports, the attackers – mostly young men – were inspired in part by online terrorist propaganda.
Deleting that propaganda has become a singular focus for Western law enforcement agencies and counter-terrorist specialists, in particular images and videos showing beheadings or bomb-making instructions. But like the proverbial game of Whack-a-Mole, content suppressed on one platform simply pops up on another.
However a Dartmouth professor thinks he has the answer. It’s called eGLYPH, a technology that searches the internet for banned content. By extracting unique digital signatures from banned images and photos and then comparing them to the signatures of newly uploaded content, the technology can take terrorist propaganda offline, and keep it that way.
The technology’s inventor, Dr Hany Farid, explained how it works via his Skype presentation during a 17 October conference here organised by the Counter-Extremist Project.
He began by recounting how Salman Abedi – the UK-based suicide bomber who killed 22 people during a pop concert in Manchester – learned how to make bombs from watching YouTube and Facebook video tutorials. The social media giants have taken down the videos only for them to reappear, even after the Manchester attack.
Facial-recognition technology is often considered as a solution but too many of today’s facial-recognition algorithms searching for recurrent content are easily fooled, according to Farid. He showed the example of a photo correctly identified by an algorithm as Reese Witherspoon but, with only a few small adjustments, the same algorithm predicted with “100 percent certainty” that the photo was that of Russell Crowe.
Another problem with image analysis is the huge processing time required, which cannot keep up with the sheer amounts of material entering the Internet. “Four hundred hours of YouTube videos are uploaded every minute,” observed Farid. “We can’t turn over the problem [wholly] to computers because we are just not there yet. So how do we accurately and quickly identify content in the face of billions of uploads per day?”
He said e-GLYPH is based on “robust hashing” technology originally designed to identify and flag online images of child pornography. Robust hashing extracts a digital signature composed of 64 numbers from an image of known content that is marked as illegal and stored in a database. “That’s the easy part,” he said.
The harder part is extracting a similar signature from all unanalysed content being uploaded at any given moment. By greater simplifying an image to its most essential features and extracting a unique signature, Farid said his technology can compare signatures of banned content against all incoming traffic. “The comparison is fast and accurate, with error rates of 1 in a 100 billion.”
For its original mission of stopping child pornography, robust hashing technology was first deployed in 2009 to identify illegal images. By 2013, it was being deployed against images uploaded to Facebook, Twitter and Dropbox. According to Farid, his team disrupted over 10 million child pornography images in 2016 – up from one million in 2013. “Maybe it goes on to the dark web, but there’s no reason for it to be widely available,” he said, referring to the internet’s illicit layer of highly encrypted or hard-to-access websites and content.
Robust hashing was initially developed for just images, and not the giant video files regularly uploaded to the web – a more complex task, but not an impossible one. “A video is just a bunch of images,” he said, adding that typically there are 24 to 30 frames for every second of video. “This increases the complexity [of computer processing] by three orders of magnitude.”
Farid’s solution was to remove 90 percent of the video by ignoring frames that are not substantially different from those before or after them, thus focusing on those featuring different shots or venues, thus reducing the complexity “by an order of magnitude”, he said. The technology also incorporates techniques that biologists used to sequence DNA by comparing ‘sub-strings’ of extracted digital signatures.
e-GLYPH is now ready to go, according to Farid, who said it can analyse short videos of 2-3 minutes in real time, with longer content being sliced up and sent to different computers for simultaneous analysis. “As long as I have enough computers, I can process everything in real time…[meaning] the technology can work at the back end, analysing billions of videos every day.”
The only drawback to e-GLYPH’s approach is that it must have a continuously updated database of banned content upon which to draw. “The technology is only as good as the database,” he said.
So far, Farid’s team has compiled 637 pieces of content – the “worst of the worst”, with beheadings, executions, and bomb-making videos. “We find one new piece of violent content per day,” he said. “If we had more people, we’d find more content.”
To help enlarge a database of terrorist-driven imagery, the Counter-Terrorism Project has asked the US National Centre for Reporting Extremism to house a comprehensive database of extremist content for hashing and digital signature extraction.
If Farid’s technology is as effective as touted, then this database should be a priority for all concerned.