Trojan source bug threatens the security of the whole code
At the University of Cambridge, researchers recently found a bug that involves standard encoding Unicode.
Unicode is a digital text standard that allows computers to exchange information regardless of the language used. Currently, Unicode is capable of handling more than 143,00 characters across 154 language scripts.
The weakness involves the "Bidi" algorithm, which displays the text of mixed scripts like English, which is read from left to right and Arabic, which is read from right to left.
But computers have a pre-determined way of handling these types of characters. They use Bidi override, which does the directional transfer of text from right to left and vice versa.
In some cases, the default ordering system set by the "Bidi" is not sufficient. If that is the case, the Bidi override system enables the control character to switch the display ordering of a group of characters.
It even enables the switching of a single-script character to be displayed differently from their logical encoding.
- Exploiting Control Sequence and String Literals
Here's how it gets interesting, now most programming languages allow you to put these Bidi overrides in comments and strings.
This is a flaw because most programming languages allow comments and control sequences to skip from the check of compiler and interpreter; hence, a malicious sequence of commands can be added and seen as a perfectly good code.
Also, most programming languages allow string literals that may or may not contain control sequences for random characters. Hence it was named 1 "Trojan Source."
The researchers wrote on their paper that:
Therefore, by placing Bidi override characters exclusively within strings and comments, we can smuggle them into source code in a manner that most compilers will accept. Our key insight is to reorder source code characters such that the resulting display order also represents syntactically valid source code.
This vulnerability is bad for projects like Linux and others that accept contributions from random people because anyone can send the Bidi override code.
The programmer would not consider it as a threat hence attaching it to a protected code. Which would later result in the whole software getting affected?
This vulnerability can be used to effects the whole software rather than a specific target code.
This type of method is not new; in 2011, a similar vulnerability came, which also targeted the Unicode system. It disguised the file extensions of malware disseminated via email. What it did was that it sent a .exe file via email to the victim. The victim would see the file like an ordinary file that can be of any commonly used files like .docx, .jpg, etc.
However, in reality, it was a .exe file that, after opening, infected the victim's system. A fine example of "Trojan Horse."
It can be noted that now the vulnerability has been found, and the public knows that this method can exploit people. Once the vulnerability is out, no patch or update can be used to stop it from being exploited.
In October 2021, Cooltechzone Dmytro Cherkashyn wrote about the rise of non-standard viruses languages, which are especially dangerous for developers' environments.
And what this exploit first tricks is the human itself, as humans are the most vulnerable. Then, later on, it tricks the compiler and interpreter.
It is a fact that programmers often copy-paste programs from that internet; hence this vulnerability is a significant source of a real-world security exploit.
It can be outmanoeuvred by updating the compiler and interpreter, but the manufacturers' speed is very slow.
Matthew Green, who is an associate professor at the Johns Hopkins Information Security Institute, says a widespread vulnerability scan was done. Fortunately, they were not able to find a single exploit based on this.
This tells us that people do not know about this exploit yet. But it should be noted that people can exploit this vulnerability now that this information has become public.
In my opinion, websites like Github, Gitlab, and Atlassian should update their tools so that this Trojan code should not be available to the wider market in the first place.
These websites are pretty famous for “helping programmers in sharing codes”.