The fundamentals of implementing strong Device Identification

The fundamentals of implementing strong Device Identification

Some weeks ago, IBM Trusteer reported a massive fraud operation against banks, which lead to losses in the order of millions of dollars. In a nutshell, attackers managed to 1) steal end user credentials and 2) bypass the fraud management scheme of the banks in order to automate millions of small transactions, which would go through unnoticed by the banks’ risk assessment infrastructure (and, potentially, also by the users). The first step (stealing the bank customer’s credentials) was apparently done using some sort of social engineering attack (that involved installing malware in the end-user’s device). Whereas in the second step attackers used emulators to “trick” the fraud management system of the different banks. We believe it is fair to say that, in this case, the fraud management system failed to do its job, which is, precisely, to protect against situations like this one, where users have been tricked or deceived. So, let’s dig a bit more into this.

Identifying the weak spot

Let’s first try to understand what the anti-fraud system was supposed to be doing. From the reading, one can understand that it was supposed to detect situations where a user account was being used in a device other than the legitimate user’s one. This is a typical scenario for fraud management: if we see the user is using an iPhone, whereas normally they are connecting from a Samsung… Well, that is strange, isn’t it? So, this is why the hackers used emulators: because emulators gave them the ability to “trick” the anti-fraud solution to believe that they were using the end user’s device, when in fact they were not. So, how might have this worked out? And let’s be clear here, we are entering the land of speculation as the researchers didn’t give enough details to know exactly how it worked. But it is safe to assume that attackers:
  1. Somehow managed to plant malware into the user’s device (e.g., via a phishing scam)
  2. Used this malware to steal credentials, device information and get access to SMS
  3. With the information harvested, set up an automated scheme that a) set up an emulator that looked (to the eyes of the anti-fraud system) exactly like a user’s device. b) logged in to the banking app using the stolen credentials. Since the device seemed to be the end user’s one, the anti-fraud system didn’t react. c) performed a small transaction, that would not raise any suspicious in the banks’ system (and, likely, would go unnoticed by the user), and, finally d) was able to repeat these steps millions of times in a matter of days, effectively stealing millions of dollars.
There are multiple things that could have been done differently in this scenario, but here we want to focus on how the attackers managed to “spoof” the device. Or, in other words, how strong was the device identification mechanism. In order to answer this question, let’s first put ourselves in the shoes of the attacker and let’s try to understand how they would go about “spoofing” an end user device. The relevant step here is not so much 3.b, but rather step 2: collecting the right device information to be used later on. Because the most important thing the attacker needs to know is which information does the fraud management system use to identify a device. At the end, this is the information the hackers need to extract (using the malware) from the end user device to be able to spoof it. Most likely, the fraud management system uses many different values to identify a device: some of them are obvious, but other ones not so much. Among the obvious ones we can find things like the device manufacturer, the device model, the operating system version, the Android ID (for Android devices) or the IDFV (for iOS ones) … But these are not really interesting, as any device identification system would use them. The real strength of the device identification system resides on the not-so-obvious device identifiers, especially how many of them are used and, also, how well protected are they. Notice that it is not that important which parameter is being used, but how difficult it is for an attacker to know that it is being used. At the end, the ultimate goal of the attacker is to write a piece of malware that will gather it later on – and this is what we want to prevent.

Exploiting the weakness

So: how could an attacker know the device identifiers that are being used?
  • The easiest approach would likely be to sniff the information transmitted from the app to the backend, using a MITM attack. By analyzing the traffic, the attacker would get a good understanding of which identifiers are being used.
  • Another option would be to read through the source code of the application, in order to find the routines that gather this information and send it to the backend systems. Since the source code is likely to not be available to the attacker, the alternative will be the decompiled binary. It is not as useful as the source code, but a good amount of information can be extracted from a decompiled binary performing static analysis.
  • Finally, the last option available (because it’s the most difficult one) to the hacker would be to use dynamic analysis. Examples of dynamic analysis include debugging the application while it is executing, using a hooking framework (like Frida), etc.
Notice that the options listed above are not mutually exclusive: the attacker needs to succeed in only one of them in order to succeed in the whole attack. So, solutions that prevent only one of the scenarios will make the attacker’s life difficult but will not, ultimately, prevent them from succeeding.

Protecting your app

Considering all of this, we could say that a strong device identification scheme should:
  1. Use many, non-obvious, device identifiers. The more device identifiers, the better.
  2. Protect against MITM attacks. Sniffing network traffic yields a lot of information, so a good protection on this front is a must.
  3. Obfuscate the sensitive parts of the binary. In the case of mobile apps, attackers will always have access to the binary, so proper obfuscation to prevent static analysis is crucial.
  4. Prevent against dynamic analysis. When static analysis is not enough, attackers will fall back to dynamic analysis: debugging, instrumenting, modifying the app, etc. This means that preventing against debuggers, hooking frameworks (such as Frida) and code injection, among others, is also important.
Taking a broader perspective, we can conclude that proper endpoint protection is a must for fraud-prevention solutions based on end-user device identification. Otherwise, attackers will manage to feed “fake” data that will bypass the protections implemented in the backend.