Generating South African ID Numbers

Introduction

An ID number is a basic requirement for various online applications. There are two ways we can take advantage of ID numbers – we can verify them using their check digit to ensure that the user entered their ID number correctly, and we can extract information from them to speed up the rest of the application process. No one likes entering the same data over and over again, after all.

First off, we need to understand exactly how they work. The information available from the Department of Home Affairs is historically patchy, which has produced a range of implementations and restrictions in various blogs. The more recent ID books come with an explanation of what the ID numbers actually mean, which we can combine with what we already know to produce a reasonable approximation.

This post will examine the existing methods of producing SA ID numbers found on the web in conjunction with the ID book information and present some JavaScript functions to verify an entered ID number, to extract relevant information from that ID number and to generate new, valid ID numbers that can be used in testing.

Initial code that was used to begin this investigation can be found at Marcin Pajdzik’s post.

Chrome Extension

If you’re not interested in the methods used but just need to generate or analyse ID numbers, check the Chrome extension here.

If you’re interested in looking at / extending / forking, the source is up on GitHub.

The Numbers

Let’s say say we have the ID number above. The information contained in the green ID books would break this down as follows.

850101: “Year of birth, Month of birth and Day of birth, 1 January 1985”

6: “0-4, Female, 5-9, Male”

184: No explanation given

0: “0, SA Citizen, 1, Non SA Citizen”

8: “Usually 8. If yours is a 9, please ensure that you have a letter confirmation authentication of your ID document from Home Affairs.”

6: No explanation given.

From this we can extract a fairly large amount of information, such as gender, citizenship, etc. The three digits following the gender code appear to be sequence digits which would obviously be needed for people who are born on the same day, so those are easy enough to generate and validate.

The option of SA citizen or non-SA citizen refers to whether the person is a citizen or a permanent resident – an ID number would not be assigned otherwise.

The digit which is flagged as usually being 8 was historically used to indicate race, but this has been dropped. This would explain why it is apparently possible to find ID numbers with values other than 8 or 9. It would seem that the 9 is also used in cases where the sequence number would have exceeded the available space.

The final check is the checksum digit which is calculated using the Luhn algorithm.

Luhn Algorithm

Overview

This algorithm allows for the detection of most transpositions of digits (that is, digits being swapped) as well as digits which are incorrect. There are some transpositions which it will not detect – these are given on the Wiki page. The basic idea behind the algorithm is to sum the digits in a special way, multiple the result by 9 and the take that number modulus 10 to produce the final digit. The algorithm is intended purely as a check in case of mistakes, and not as any sort of encryption or for other security purposes.

The final number generated by the Luhn algorithm can be verified in two ways. You can strip off the last digit, run the algorithm again and check that the digit calculated matches the digit removed, or you can resum all of the digits and check that the result modulus 10 is 0 – that is, the result is a multiple of 10.

Summing Step

The algorithm works by summing the digits of the ID number in a special manner that allows us to detect the transposition of digits. The idea behind this is fairly simple – let’s have a look at a smaller example. Note that when the Luhn algorithm performs summing, the first digit summed is the digit on the right, however the ID numbers appear to not follow this and instead go from the left.

Say the correct number we are after is 123. One method of summing these would be to simply sum them, giving us a value of 6.  If someone was to enter a number incorrectly, say as 223, the new summed value would be 7, which would show that there had been an error. This basic approach works for the case where a single number is entered incorrectly.

Now, let’s do the same to a “mistake” combination of 213. In this case, we find that we still get a value of 6 – obviously this method will not work, as we have not detected an error in the case of transposition. What the Luhn algorithm does instead while summing the numbers is to double the value of every second digit – this gives us a sum of 1 + 4+ 3 = 8 for the correct number, and a value of 2+2+3 = 7 for the incorrect number.

If the correct number was padded with this digit, giving us a value of 1238, and the user entered 2138, we could confirm that the number entered was incorrect as the checksum digit on the end would not correspond with the calculated digit.

Calculating the Digit

Now that we have a number that represents our checked number, we need to find a way to append it. The value was less than 10 in our examples above, but this will not necessarily always be the case. In addition, if you recall from earlier, our final number can be checked by resumming all of the digits and checking that we have a value that is a multiple of 10 – we need to alter our checksum digit slightly to ensure that this is the case.

Basically, we need the checksum digit to add the missing numbers to bring our sum up to a multiple of 10. So if our current value was 27 we would need our checksum digit to be 3, giving us a total value of 30.

This can be calculated in one of two ways. In the first, we can take the units portion of our number and subtract it from 10. In our previous example this would mean taking 7 and subtracting it from 10. 10 - 7 = 3, so our checksum digit is 3.

The other way of calculating the digit involves multiplying our sum by 9, and then taking the unit digit. In our previous example, our current value was 27. Multiplying by 9 gives us 243 – if we take the unit digit we arrive at our previous answer of 3 again. But why does this work?

Let’s call our original number a. This means that a = 27.

Now, if we multiply a by 9, we produce 9a.

9\times a = 9a

If we then add our original a back onto this value, we get

9a + a = 10a

This resulting value is obviously a multiple of 10, so we’ve satisfied our original requirement by using a value of 9 \times a. In this case we would produce 243 though – we obviously cannot use that as the check digit, as we would end up appending several new digits.

Instead, we use the number modulus 10 – that is, 9a \mod 10. The reason why we can use just this value is fairly simple – we are trying to get a number that is a multiple of 10. To do this, we obviously need to add on a value that is less than 10 as we need to increase the unit column. This means that the unit column is the only column that is important in this equation, so we can drop the other columns. This is equivalent to taking the number modulus 10, and gives us our final formula of

(9 \times a) \mod 10 = checksum

Comprehension

Now that we understand how the Luhn algorithm works, we can use it to validate our ID number as well as to produce a valid checksum digit for new ID numbers.

To be thorough, you can perform some basic date checks as well – see the Java code for an example of this. We can check that the month falls in a valid range (1-12) and that the day given for that month is not more than the number of days in the month.

Let’s take a look at some JavaScript to extract information from an ID number.

Extracting Information

Now that we know how the ID number is formed, we can extract various pieces of information from it. To recap, we can obtain

  • whether the ID number itself is valid (that is, has no typing errors),
  • the person’s birthdate,
  • the person’s gender,
  • and whether they are a South Africa citizen or permanent resident.

Let’s put together some code to extract this info – first off, validating the ID number using the Luhn algorithm above. By the end of this, we will have a function that will return an object containing information on the ID number.

Is the ID valid – the Luhn Algorithm in Code

This code is exactly what was discussed before – we loop through our string, starting on the right hand side. We keep a running count of how many numbers we’ve added, which allows us to determine which numbers should be doubled. At each step we set our current multiple to be the count’s modulus plus 1 – this just means we alternate between 1 and 2. The +inputString[i] portion is shorthand for converting a string to a number. If the value calculated is higher than 10, we sum the digits seperately by adding the division by 10 and the modulus by 10 .

Finally, we multiple the total by 9 and get the value modulus 10 for our final digit.

We can use this to check if an ID number is valid by recalculating the checksum.

To do this, we extract the first part of the ID number using .substring() , then regenerate the Luhn digit based on that and check that it matches our ID number’s final digit.

Let’s quickly create our main function so that we can begin returning this information. This time we will include all functions we’ve seen so far, but going forwards we’ll just include the new functions. A complete listing will be provided at the end.

Running the extractFromID function on a string containing an ID number will produce an object containing one property valid , which will indicate if the number is valid or not.

Birthdate

The birthdate would seem like one of the easiest values to extract directly, although we have one potential cause of trouble. The birthdate is stored in the format of YYMMDD, meaning that we cannot be entirely certain of the year – that is, does 13 mean 1913 or 2013? It is likely, however, that if the year given is less than our current year (in the units / tens columns) that the year refers to a year after 2000.

 Gender

The gender can be extracted directly, by checking if the 7th digit is below 5 (female) or 5 and above (male).

 Citizenship

In a similar manner, you can extract the citizenship from the 11th digit – 0 for South African citizen, 1 for a permanent resident.

 All together

Here’s the complete code for extracting info from the ID number.

 Generating an ID number

Generating an ID number is fairly simple given what we know – we just need to convert from some convenient input into the numbers above.

We do some basic validation to check the date of birth string – this will most likely be fine for testing purposes. Next we get a random number between 0 and 4 for female and 5 and 9 for male.

To get our citizenship number, we use the !  shorthand to convert citizen to a boolean value, then the +  shorthand to convert that value to a number. We get 3 random digits, pad them if they have leading 0’s (thanks Sarkos) then combine what we have. This code generates numbers as suggested in the ID book spec, although this could be tweaked, so we add an 8.

Finally, we add the Luhn check digit and return our valid ID number.

Java Implementation

As per request, here’s the Java implementation of the code to validate ID numbers. There’s a static method  extractInformation that takes in the String version of the ID number and returns an  IDNumberDetails object containing the information on that number.

 

Conclusion

We’ve had a look at the structure of an ID number and what the various pieces mean. We also examined the means by which you can calculate the check digit and the algorithm which it (almost) follows, along with JavaScript code for all of the above.

Don’t forget about the handy Chrome extension, and feel free to leave comments with improvements or questions.

Thanks for reading guys!

Tagged with: , ,
Posted in Javascript, Math
29 comments on “Generating South African ID Numbers
  1. Thomas says:

    Do you have a PHP version of this perhaps?

    • Evan Knowles says:

      I don’t I’m afraid – just Java and JavaScript.

    • William says:

      … here PHP that worked for me …

      function zaID($idnr) {
      // Check first if there are 13 digits ONLY
      if(preg_match(“/^([0-9]){2}([0-1][0-9])([0-3][0-9])([0-9]){4}([0-1])([0-9]){2}?$/”,$idnr)){
      // Do Luhn check if true for 13 digits, if not then bail
      $d = -1;
      $a = 0;
      for($i=0;$i<6;$i++)
      $a += substr($idnr,$i*2,1);
      for($i=0;$i 0) {
      $c += $b % 10;
      $b = $b / 10;
      }
      $c += $a;
      $d = 10 – ($c % 10);
      if($d == 10)
      $d = 0;
      if($d == substr($idnr,strlen($idnr)-1,1))
      return true;}

      else return false;}

  2. Jacob says:

    May you provide me with the Java code if possible.

  3. Martin says:

    Thanks Evan.

    Here is the C# code is anyone needs it…

    void Main()
    {
    var dob = “750519”;

    var male = true;

    var citizen = true;

    Func getRandom = (range) => {
    return Convert.ToInt32(Math.Floor((double)new Random().Next(range)));
    };

    var gender = getRandom(5) + (male ? 5 : 0);
    var citBit = !citizen ? 1 : 0;
    var random = Convert.ToString(getRandom(1000));

    if (Convert.ToInt32(random) < 10) random = "00" + random;
    else if (Convert.ToInt32(random) < 100) random = "0" + random;

    var total = "" + dob + Convert.ToString(gender) + random + Convert.ToString(citBit) + "8";
    total += generateLuhnDigit(total);

    total.Dump();

    Clipboard.SetText(total);
    }

    // Define other methods and classes here
    string generateLuhnDigit(string inputString) {
    var total = 0;
    var count = 0;
    for (var i = 0; i < inputString.Length; i++) {
    var multiple = count % 2 + 1;
    count++;
    var temp = multiple * Convert.ToInt32(inputString[i]);
    temp = Convert.ToInt32(Math.Floor((double)temp / 10) + (temp % 10));
    total += temp;
    }

    total = (total * 9) % 10;

    return Convert.ToString(total);
    }

  4. Hi Evan, could you put this on GitHub? (Maybe even the extension code with the javascript) I’d like to fork it and add a couple of features.

  5. Flavier says:

    Hi Evan,

    Outstanding work on getting a java version.
    Consider this ID number that I came up with : 0000000005678

    It was found to be valid with the Java code you provided. Care to comment?

  6. Lucia` says:

    What does an id number start with if you are born after the year 2000? if you start with 00 then it will be 1900

  7. tjeuten says:

    Please check out this C# project I’m setting up. Once it gets to a stable and usable level, I’ll be publishing a nuget package of it. Feel free to contribute if you wish.

    https://github.com/TJSoft/IDNumberValidation

  8. Nish says:

    Hi Guys,

    Do any of you have the SA ID Number Validation for MS SQL.

    Thanks,

  9. CJ says:

    How do i get this value ( var getBirthdate/ var getGender= function(idNumber) ) into the birthdate input text box with onclick of submit button after adding the IdNumber in the relevant textbox.. Pls assist this beginner.

    • Lawrence says:

      How would the res/layout/activity_main.xml look like for the java Implementations and where do i declare them in the MainActivity.java. Still a beginner

  10. Waheed says:

    Thanks Evan Knowles. Can you provide code or suggest anything for validating Namibian id.

  11. Phillip says:

    Thanks Evan,

  12. Tyler says:

    I have a problem with the birth date calculation. If the first 6 numbers indicate the date at which the person was registered, wouldn’t it be a problem for permanent residents?
    They could be registered at the age of 50, and have an ID no. saying they are 2 years old.

    • Evan Knowles says:

      The first six numbers are just the birthdate as far as I understand, not the registered date. South Africans tend to get IDs at around 16 in any case, so that would throw our calculations off as well.

      Also, my wife is a permanent resident, and her ID number begins with her birthdate as well.

Leave a Reply