Thursday, January 04, 2018

Why The Aadhar Data Breach Is A Very Big Deal

A sting operation by the newspaper The Tribune has reported that with a very nominal payment of 500 rupees, their team was able to get access to an Aadhar portal that was intended for use only by authorised officials responsible for helping citizens retrieve lost or forgotten data.

Aadhar is a relatively new unique identifier for all residents of India

When the news broke, the organisation in charge of India's unique ID database, the Unique Identification Authority of India (UIDAI), played down suggestions that a data breach had taken place. Their main contention was that the biometric data (fingerprints and iris scans) of residents was stored in a secure and encrypted manner, and that it was not exposed through the mechanism used by the Tribune.

Let us analyse what happened, and why it is in fact a big deal.

There are four primary security threats that organisations have to guard against:

  • Disclosure (unauthorised persons gaining access to sensitive information)
  • Deception (the system being presented with false data and made to accept it)
  • Disruption (interruption in normal operations and loss of service)
  • Usurpation (unauthorised persons gaining control of the system)

What the UIDAI is saying is that there has been no Disclosure of authentication tokens (biometric data) that could lead to a future Deception or Usurpation attack. But importantly, they have not denied that a breach has occurred which could have given an unauthorised user access to certain kinds of information. In fact, some of their officials have confirmed this:

Sanjay Jindal, Additional Director-General, UIDAI Regional Centre, Chandigarh, accepting that this was a lapse, told The Tribune: "Except the Director-General and I, no third person in Punjab should have a login access to our official portal. Anyone else having access is illegal, and is a major national security breach."


In other words, an attack could have successfully taken place resulting in the Disclosure of certain kinds of information.

What kind of information? According to the Tribune article, they were able to retrieve the following data on provision of an Aadhar number:

  • Name
  • Address
  • Post code (PIN)
  • Photo
  • Phone number
  • Email address

In fact, this is probably the minimal set of data that is accessible through the portal.

It took just Rs 500, paid through Paytm, and 10 minutes in which an “agent” of the group running the racket created a "gateway" for this correspondent and gave a login ID and password. Lo and behold, you could enter any Aadhaar number in the portal, and instantly get all particulars that an individual may have submitted to the UIDAI (Unique Identification Authority of India), including name, address, postal code (PIN), photo, phone number and email.


Reading between the lines, it doesn't appear that an API (a non-visual means by which a software program could retrieve data) was provided. It looks like login credentials were provided into an administrator's portal, in other words, access to a secure web page.

What is the worst that could happen if such access were granted to a non-authorised user? How much data could they steal? In other words, how severe is the Disclosure exposure?

To a layperson, this may not seem like a huge exposure. How many people's data can a person steal by sitting at a portal and entering Aadhar numbers on a screen? A few hundred, perhaps a couple of thousand records. Unfortunately, a potential data thief is much more efficient.

I'm not a security expert, but even I would potentially be able to steal the entire Aadhar database (i.e., not biometric data but the set of data listed above) in a matter of hours or days, without much personal effort on my part. Hackers may have much more efficient means at their disposal, but I would probably use a web application testing tool like Selenium, which I use for user testing of software as part of my day job. Selenium is built on top of a basic browser "engine", and has a number of programmable features built around it. It can be "trained" by a user to follow a certain sequence of steps, and it can then repeat that sequence of steps ad nauseam, using different data values based on its controlling script. Best of all, Selenium looks just like a regular browser to any web server, so the server would have no idea that an automated tool is logging in and moving around the site, and not a human user.

An article from DeveloperFusion explains how Selenium works

Assuming I have been granted login credentials into the Aadhar administrator's portal (on payment of the going rate of 500 rupees), I would first put Selenium into "learning mode", where it would record my actions to be replayed later. I would use it just as I would use a regular browser, except that this browser is recording my every action, including the data values I am entering. Based on its observation of my actions, it would then know which URL to navigate to, what username and password to enter on the login form, which menu item to select to get to the search screen, etc. Then it would learn about the repeatable part of the task, which is the entry of an Aadhar number in a certain field, and a click on a Search or Retrieve button. When the system responded with the data of the resident (name, address, etc.), I would stop the learning mode. (In practice, I would train Selenium to deal with invalid Aadhar numbers also, since it should be able to recognise when a query was unsuccessful.)

I would then look into the script that Selenium generated to describe the sequence of actions that it had learnt. I would modify this script to make Selenium repeat its search in a loop.

Aadhar numbers are 12-digit numeric strings, which means they can theoretically be any number in the range "000000000000" to "999999999999", a trillion (1012) numbers in all. I would modify the script to loop through all trillion numbers, perhaps in lots of a million at a time, and I would get it to extract the data from within specific HTML tags on the result screen. If these tags had helpful "id" attributes, it would make my job much easier, otherwise I would have to rely on the relative position of each field's tag within the returned page (known as a DOM search). This is how I would get Selenium to "read" the returned name, address, postcode, phone number, email address,  etc.

The last thing I would do within the loop is to record the Aadhar number and all the retrieved fields into a local database on my machine, provided the query returned a value. I expect that only about a billion of the trillion numbers I use in my loop will return valid data, since there are just about a billion Indian residents.

And this is how I would collect the basic personal and contact details of every single resident of India. Unless UIDAI has a throttling or choking mechanism to prevent such a rapid-fire query of data, it's possible that I will be able to get away with this over a couple of days at most.

More sophisticated hackers would hide behind multiple IP addresses, and use multiple sets of user credentials, over a longer time period, so as to fool any auditing system on the UIDAI side from realising that a bulk theft of records was underway.

Since the Aadhar numbers are permanent identifiers for residents, the data in this database is likely to remain useful for decades to whoever steals it. It can form the foundation of a database to which other data can be added.

And that brings us to the even bigger threat that this breach enables.

The Indian government has also committed another cardinal sin from a privacy angle. It has mandated the linking of Aadhar numbers to a number of other important data, for example, bank account numbers and mobile SIM cards. In fact, although the courts have ruled that such linking is to be voluntary, the government, banks and telcos are not endorsing that liberal message at all. Residents are being virtually threatened into providing their Aadhar numbers to their telecom providers and their banks. Aadhar is also being used across various arms and services of the government, such as the Tax department and the Public Distribution System.

Now, the databases of such organisations are generally a lot less secure than the biometric data stored by UIDAI. It is unlikely that banks and telcos are being diligent enough to encrypt their data. Besides, with just a few large banks and telcos operating in the country, it is relatively easy for a malicious organisation, such as the secret service agency of a foreign power, to perform the necessary "social engineering" required to access this data. No conventional "hacking" is even necessary for such simple data theft. The same goes for government departments holding Aadhar-linked data.

Now, if one had a list of bank account numbers with Aadhar numbers against them, it would be a trivial matter to map these to the Aadhar database stolen from UIDAI. One would then have the personal and contact details of every resident of India, -- plus their bank account numbers!

Repeat this for mobile SIM cards, ration cards, PAN cards (tax), etc.

Now one has put together a very lucrative set of data that a number of hostile powers would be very interested in. The cost of acquiring this data, as I have shown, is well within their budgets.

The Tribune's report also claims

Spotting an opportunity to make a quick buck, more than one lakh VLEs (Village-Level Enterprises) are now suspected to have gained this illegal access to UIDAI data to provide “Aadhaar services” to common people for a charge, including the printing of Aadhaar cards. However, in wrong hands, this access could provide an opportunity for gross misuse of the data.

Indeed, if the vulnerability has been around for the last few months, as suspected, it would not be unreasonable to assume that sensitive information on every Indian resident is now sitting in a Big Data lab in more than one foreign country, subject to sophisticated analysis and insight mining. In fact, if organisations like the NSA have not already acquired this data, my opinion of their competence has plummeted.

It's nothing short of a national security disaster.

Still, from the muted press around it, it appears that no one has a realistic handle on how grave the breach is. Critics of the government have seized on The Tribune's initial exposé to claim that Aadhar has been a massive security failure, but without understanding the nuances of what we have seen in this analysis. Supporters of the government have seized on the UIDAI's reassurances to downplay the significance of the breach, again without a sophisticated analysis of the potential exposure. Both sides are being irresponsible.

My conclusion is that a serious breach of data security has been proven by The Tribune. This vulnerability has probably been around for a few months, enough time for a competent organisation or set of individuals to steal the master data (about a billion records), and create a foundation to add more useful data as it is acquired, since the Aadhar number is a permanent identifier that is likely to be associated with every significant product or service that residents may own or use.

A security assessment of the implications is imperative. At the very least, linking of Aadhar numbers to services must be halted.

And it would not be unreasonable to demand that someone somewhere should resign.

Update 24/03/2018: It appears that nothing has been done to fix the Aadhaar breach. If anything, even more breaches have been discovered, as described in this ZDNet article.