Contents

## Introduction

An ID number is a basic requirement for various online applications. There are two ways we can take advantage of ID numbers – we can verify them using their check digit to ensure that the user entered their ID number correctly, and we can extract information from them to speed up the rest of the application process. No one likes entering the same data over and over again, after all.

First off, we need to understand exactly how they work. The information available from the Department of Home Affairs is historically patchy, which has produced a range of implementations and restrictions in various blogs. The more recent ID books come with an explanation of what the ID numbers actually mean, which we can combine with what we already know to produce a reasonable approximation.

This post will examine the existing methods of producing SA ID numbers found on the web in conjunction with the ID book information and present some JavaScript functions to verify an entered ID number, to extract relevant information from that ID number and to generate new, valid ID numbers that can be used in testing.

Initial code that was used to begin this investigation can be found at Marcin Pajdzik’s post.

## Chrome Extension

If you’re not interested in the methods used but just need to generate or analyse ID numbers, check the Chrome extension here.

If you’re interested in looking at / extending / forking, the source is up on GitHub.

## The Numbers

1 |
8501016184086 |

Let’s say say we have the ID number above. The information contained in the green ID books would break this down as follows.

850101: “Year of birth, Month of birth and Day of birth, 1 January 1985”

6: “0-4, Female, 5-9, Male”

184: No explanation given

0: “0, SA Citizen, 1, Non SA Citizen”

8: “Usually 8. If yours is a 9, please ensure that you have a letter confirmation authentication of your ID document from Home Affairs.”

6: No explanation given.

From this we can extract a fairly large amount of information, such as gender, citizenship, etc. The three digits following the gender code appear to be sequence digits which would obviously be needed for people who are born on the same day, so those are easy enough to generate and validate.

The option of SA citizen or non-SA citizen refers to whether the person is a citizen or a permanent resident – an ID number would not be assigned otherwise.

The digit which is flagged as usually being 8 was historically used to indicate race, but this has been dropped. This would explain why it is apparently possible to find ID numbers with values other than 8 or 9. It would seem that the 9 is also used in cases where the sequence number would have exceeded the available space.

The final check is the checksum digit which is calculated using the Luhn algorithm.

## Luhn Algorithm

### Overview

This algorithm allows for the detection of most transpositions of digits (that is, digits being swapped) as well as digits which are incorrect. There are some transpositions which it will not detect – these are given on the Wiki page. The basic idea behind the algorithm is to sum the digits in a special way, multiple the result by 9 and the take that number modulus 10 to produce the final digit. The algorithm is intended purely as a check in case of mistakes, and not as any sort of encryption or for other security purposes.

The final number generated by the Luhn algorithm can be verified in two ways. You can strip off the last digit, run the algorithm again and check that the digit calculated matches the digit removed, or you can resum all of the digits and check that the result modulus 10 is 0 – that is, the result is a multiple of 10.

### Summing Step

The algorithm works by summing the digits of the ID number in a special manner that allows us to detect the transposition of digits. The idea behind this is fairly simple – let’s have a look at a smaller example. Note that when the Luhn algorithm performs summing, the first digit summed is the digit on the right, however the ID numbers appear to not follow this and instead go from the left.

Say the correct number we are after is 123. One method of summing these would be to simply sum them, giving us a value of 6. If someone was to enter a number incorrectly, say as 223, the new summed value would be 7, which would show that there had been an error. This basic approach works for the case where a single number is entered incorrectly.

Now, let’s do the same to a “mistake” combination of 213. In this case, we find that we still get a value of 6 – obviously this method will not work, as we have not detected an error in the case of transposition. What the Luhn algorithm does instead while summing the numbers is to double the value of every second digit – this gives us a sum of for the correct number, and a value of for the incorrect number.

If the correct number was padded with this digit, giving us a value of 1238, and the user entered 2138, we could confirm that the number entered was incorrect as the checksum digit on the end would not correspond with the calculated digit.

### Calculating the Digit

Now that we have a number that represents our checked number, we need to find a way to append it. The value was less than 10 in our examples above, but this will not necessarily always be the case. In addition, if you recall from earlier, our final number can be checked by resumming all of the digits and checking that we have a value that is a multiple of 10 – we need to alter our checksum digit slightly to ensure that this is the case.

Basically, we need the checksum digit to add the missing numbers to bring our sum up to a multiple of 10. So if our current value was 27 we would need our checksum digit to be 3, giving us a total value of 30.

This can be calculated in one of two ways. In the first, we can take the units portion of our number and subtract it from 10. In our previous example this would mean taking 7 and subtracting it from 10. , so our checksum digit is 3.

The other way of calculating the digit involves multiplying our sum by 9, and then taking the unit digit. In our previous example, our current value was 27. Multiplying by 9 gives us 243 – if we take the unit digit we arrive at our previous answer of 3 again. But why does this work?

Let’s call our original number . This means that .

Now, if we multiply by 9, we produce .

If we then add our original back onto this value, we get

This resulting value is obviously a multiple of 10, so we’ve satisfied our original requirement by using a value of . In this case we would produce 243 though – we obviously cannot use that as the check digit, as we would end up appending several new digits.

Instead, we use the number modulus 10 – that is, . The reason why we can use just this value is fairly simple – we are trying to get a number that is a multiple of 10. To do this, we obviously need to add on a value that is less than 10 as we need to increase the unit column. This means that the unit column is the only column that is important in this equation, so we can drop the other columns. This is equivalent to taking the number modulus 10, and gives us our final formula of

### Comprehension

Now that we understand how the Luhn algorithm works, we can use it to validate our ID number as well as to produce a valid checksum digit for new ID numbers.

To be thorough, you can perform some basic date checks as well – see the Java code for an example of this. We can check that the month falls in a valid range (1-12) and that the day given for that month is not more than the number of days in the month.

Let’s take a look at some JavaScript to extract information from an ID number.

## Extracting Information

Now that we know how the ID number is formed, we can extract various pieces of information from it. To recap, we can obtain

- whether the ID number itself is valid (that is, has no typing errors),
- the person’s birthdate,
- the person’s gender,
- and whether they are a South Africa citizen or permanent resident.

Let’s put together some code to extract this info – first off, validating the ID number using the Luhn algorithm above. By the end of this, we will have a function that will return an object containing information on the ID number.

### Is the ID valid – the Luhn Algorithm in Code

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
var generateLuhnDigit = function(inputString) { var total = 0; var count = 0; for (var i = 0; i < inputString.length; i++) { var multiple = count % 2 + 1; count++; var temp = multiple * +inputString[i]; temp = Math.floor(temp / 10) + (temp % 10); total += temp; } total = (total * 9) % 10; return total; }; |

This code is exactly what was discussed before – we loop through our string, starting on the right hand side. We keep a running count of how many numbers we’ve added, which allows us to determine which numbers should be doubled. At each step we set our current multiple to be the count’s modulus plus 1 – this just means we alternate between 1 and 2. The +inputString[i] portion is shorthand for converting a string to a number. If the value calculated is higher than 10, we sum the digits seperately by adding the division by 10 and the modulus by 10 .

Finally, we multiple the total by 9 and get the value modulus 10 for our final digit.

We can use this to check if an ID number is valid by recalculating the checksum.

1 2 3 4 |
var checkIDNumber = function(idNumber) { var number = idNumber.substring(0, idNumber.length - 1); return generateLuhnDigit(number) === +idNumber[idNumber.length - 1]; } |

To do this, we extract the first part of the ID number using .substring() , then regenerate the Luhn digit based on that and check that it matches our ID number’s final digit.

Let’s quickly create our main function so that we can begin returning this information. This time we will include all functions we’ve seen so far, but going forwards we’ll just include the new functions. A complete listing will be provided at the end.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
var extractFromID = function(idNumber) { var generateLuhnDigit = function(inputString) { var total = 0; var count = 0; for (var i = inputString.length-1; i >= 0; i--) { var multiple = count % 2 + 1; count++; total += multiple * +inputString[i]; } total = (total * 9) % 10; return total; } var checkIDNumber = function(idNumber) { var number = idNumber.substring(0, idNumber.length - 1); return generateLuhnDigit(number) === +idNumber[idNumber.length - 1]; } var result = {}; result.valid = checkIDNumber(idNumber); return result; } |

Running the extractFromID function on a string containing an ID number will produce an object containing one property valid , which will indicate if the number is valid or not.

### Birthdate

The birthdate would seem like one of the easiest values to extract directly, although we have one potential cause of trouble. The birthdate is stored in the format of YYMMDD, meaning that we cannot be entirely certain of the year – that is, does 13 mean 1913 or 2013? It is likely, however, that if the year given is less than our current year (in the units / tens columns) that the year refers to a year after 2000.

1 2 3 4 5 6 7 8 9 10 11 12 |
var getBirthdate = function(idNumber) { var year = idNumber.substring(0, 2); var currentYear = new Date().getFullYear() % 100; var prefix = "19"; if (+year < currentYear) prefix = "20"; var month = idNumber.substring(2, 4); var day = idNumber.substring(4, 6); return new Date(prefix + year + "/" + month + "/" + day); }; |

### Gender

The gender can be extracted directly, by checking if the 7th digit is below 5 (female) or 5 and above (male).

1 2 3 |
var getGender = function(idNumber) { return +idNumber.substring(6, 7) < 5 ? "female" : "male"; }; |

### Citizenship

In a similar manner, you can extract the citizenship from the 11th digit – 0 for South African citizen, 1 for a permanent resident.

1 2 3 |
var getCitizenship = function(idNumber) { return +idNumber.substring(10, 11) === 0 ? "citizen" : "resident"; }; |

### All together

Here’s the complete code for extracting info from the ID number.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
var generateLuhnDigit = function(inputString) { var total = 0; var count = 0; for (var i = 0; i < inputString.length; i++) { var multiple = count % 2 + 1; count++; var temp = multiple * +inputString[i]; temp = Math.floor(temp / 10) + (temp % 10); total += temp; } total = (total * 9) % 10; return total; }; var extractFromID = function(idNumber) { var checkIDNumber = function(idNumber) { var number = idNumber.substring(0, idNumber.length - 1); return generateLuhnDigit(number) === +idNumber[idNumber.length - 1]; }; var getBirthdate = function(idNumber) { var year = idNumber.substring(0, 2); var currentYear = new Date().getFullYear() % 100; var prefix = "19"; if (+year < currentYear) prefix = "20"; var month = idNumber.substring(2, 4); var day = idNumber.substring(4, 6); return new Date(prefix + year + "/" + month + "/" + day); }; var getGender = function(idNumber) { return +idNumber.substring(6, 7) < 5 ? "female" : "male"; }; var getCitizenship = function(idNumber) { return +idNumber.substring(10, 11) === 0 ? "citizen" : "resident"; }; var result = {}; result.valid = checkIDNumber(idNumber); result.birthdate = getBirthdate(idNumber); result.gender = getGender(idNumber); result.citizen = getCitizenship(idNumber); return result; }; |

## Generating an ID number

Generating an ID number is fairly simple given what we know – we just need to convert from some convenient input into the numbers above.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
var generateID = function(dob, male, citizen) { var getRandom = function(range) { return Math.floor(Math.random() * range); }; if (!/[0-9][0-9][0-1][0-9][0-3][0-9]/.test(dob)) { return "Please check your date of birth string."; } var gender = getRandom(5) + (male ? 5 : 0); var citBit = +!citizen; var random = getRandom(1000); if (random < 10) random = "00" + random; else if (random < 100) random = "0" + random; var total = "" + dob + gender + random + citBit + "8"; total += generateLuhnDigit(total); return total; }; |

We do some basic validation to check the date of birth string – this will most likely be fine for testing purposes. Next we get a random number between 0 and 4 for female and 5 and 9 for male.

To get our citizenship number, we use the ! shorthand to convert citizen to a boolean value, then the + shorthand to convert that value to a number. We get 3 random digits, pad them if they have leading 0’s (thanks Sarkos) then combine what we have. This code generates numbers as suggested in the ID book spec, although this could be tweaked, so we add an 8.

Finally, we add the Luhn check digit and return our valid ID number.

## Java Implementation

As per request, here’s the Java implementation of the code to validate ID numbers. There’s a static method extractInformation that takes in the String version of the ID number and returns an IDNumberDetails object containing the information on that number.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 |
import java.util.Calendar; import java.util.Date; import java.util.GregorianCalendar; import java.util.regex.Matcher; import java.util.regex.Pattern; public class IDNumberValidatorUtility { /** * Private constructor to hide implicit public one. */ private IDNumberValidatorUtility() { } private static int generateLuhnDigit(String input) { int total = 0; int count = 0; for (int i = 0; i < input.length(); i++) { int multiple = (count % 2) + 1; count++; int temp = multiple * Integer.parseInt(String.valueOf(input.charAt(i))); temp = (int) Math.floor(temp / 10) + (temp % 10); total += temp; } total = (total * 9) % 10; return total; } public static boolean validate(String idNumber) { try { Pattern pattern = Pattern.compile("[0-9]{13}"); Matcher matcher = pattern.matcher(idNumber); if (!matcher.matches()) { return false; } if (!validateDate(idNumber.substring(0, 6))) { return false; } int lastNumber = Integer.parseInt(String.valueOf(idNumber.charAt(idNumber.length() - 1))); String numberSection = idNumber.substring(0, idNumber.length() - 1); return lastNumber == generateLuhnDigit(numberSection); } catch (Exception ex) { return false; } } private static boolean validateDate(String date) { int year = Integer.parseInt(date.substring(0, 2)); int month = Integer.parseInt(date.substring(2, 4)); if (month < 1 || month > 12) { return false; } int day = Integer.parseInt(date.substring(4, 6)); Calendar myCal = new GregorianCalendar(year, month, day); int maxDays = myCal.getActualMaximum(Calendar.DAY_OF_MONTH); if (day < 1 || day > maxDays) { return false; } return true; } private static Date getBirthdate(String idNumber) { int year = Integer.parseInt(idNumber.substring(0, 2)); int currentYear = Calendar.getInstance().get(Calendar.YEAR) % 100; int prefix = 1900; if (year < currentYear) { prefix = 2000; } year += prefix; int month = Integer.parseInt(idNumber.substring(2, 4)); int day = Integer.parseInt(idNumber.substring(4, 6)); Calendar calendar = Calendar.getInstance(); calendar.setTimeInMillis(0); calendar.set(Calendar.YEAR, year); calendar.set(Calendar.MONTH, month - 1); calendar.set(Calendar.DAY_OF_MONTH, day); return calendar.getTime(); } public static IDNumberDetails extractInformation(String idNumber) { if (!validate(idNumber)) { return new IDNumberDetails(idNumber, false); } Date birthDate = getBirthdate(idNumber); boolean male = Integer.parseInt(idNumber.substring(6, 7)) >= 5; boolean citizen = Integer.parseInt(idNumber.substring(10, 11)) == 0; return new IDNumberDetails(idNumber, birthDate, male, citizen, true); } public static class IDNumberDetails { private String idNumber; private boolean valid; private Date birthDate; private boolean male; private boolean citizen; /** * @param idNumber * @param valid */ public IDNumberDetails(String idNumber, boolean valid) { super(); this.idNumber = idNumber; this.valid = valid; } public IDNumberDetails(String idNumber, Date birthDate, boolean male, boolean citizen, boolean valid) { this.idNumber = idNumber; this.birthDate = birthDate; this.valid = valid; this.male = male; this.citizen = citizen; } /** * @return the birthDate */ public Date getBirthDate() { return birthDate; } /** * @return the male */ public boolean isMale() { return male; } /** * @return the citizen */ public boolean isCitizen() { return citizen; } /** * @return the idNumber */ public String getIdNumber() { return idNumber; } /** * @return the valid */ public boolean isValid() { return valid; } } } |

## Conclusion

We’ve had a look at the structure of an ID number and what the various pieces mean. We also examined the means by which you can calculate the check digit and the algorithm which it (almost) follows, along with JavaScript code for all of the above.

Don’t forget about the handy Chrome extension, and feel free to leave comments with improvements or questions.

Thanks for reading guys!

Do you have a PHP version of this perhaps?

I don’t I’m afraid – just Java and JavaScript.

… here PHP that worked for me …

function zaID($idnr) {

// Check first if there are 13 digits ONLY

if(preg_match(“/^([0-9]){2}([0-1][0-9])([0-3][0-9])([0-9]){4}([0-1])([0-9]){2}?$/”,$idnr)){

// Do Luhn check if true for 13 digits, if not then bail

$d = -1;

$a = 0;

for($i=0;$i<6;$i++)

$a += substr($idnr,$i*2,1);

for($i=0;$i 0) {

$c += $b % 10;

$b = $b / 10;

}

$c += $a;

$d = 10 – ($c % 10);

if($d == 10)

$d = 0;

if($d == substr($idnr,strlen($idnr)-1,1))

return true;}

else return false;}

May you provide me with the Java code if possible.

I’ve updated it with the Java code.

Thanks Evan.

Here is the C# code is anyone needs it…

void Main()

{

var dob = “750519”;

var male = true;

var citizen = true;

Func getRandom = (range) => {

return Convert.ToInt32(Math.Floor((double)new Random().Next(range)));

};

var gender = getRandom(5) + (male ? 5 : 0);

var citBit = !citizen ? 1 : 0;

var random = Convert.ToString(getRandom(1000));

if (Convert.ToInt32(random) < 10) random = "00" + random;

else if (Convert.ToInt32(random) < 100) random = "0" + random;

var total = "" + dob + Convert.ToString(gender) + random + Convert.ToString(citBit) + "8";

total += generateLuhnDigit(total);

total.Dump();

Clipboard.SetText(total);

}

// Define other methods and classes here

string generateLuhnDigit(string inputString) {

var total = 0;

var count = 0;

for (var i = 0; i < inputString.Length; i++) {

var multiple = count % 2 + 1;

count++;

var temp = multiple * Convert.ToInt32(inputString[i]);

temp = Convert.ToInt32(Math.Floor((double)temp / 10) + (temp % 10));

total += temp;

}

total = (total * 9) % 10;

return Convert.ToString(total);

}

Thanks Martin!

Hi Evan, could you put this on GitHub? (Maybe even the extension code with the javascript) I’d like to fork it and add a couple of features.

Absolutely – I was planning on giving it a facelift in any case actually. I’ll put it up this afternoon.

Hey Cameron – I’ve updated with a link to the GitHub repo with the full extension source.

Looks good, thanks!

Hi Evan,

Outstanding work on getting a java version.

Consider this ID number that I came up with : 0000000005678

It was found to be valid with the Java code you provided. Care to comment?

I’d say that the code could do with some sanity checks on the produced date – thanks for pointing it out, I’ll have a look into fixing it.

I’ve updated with your suggestions, thanks.

Thank you, I have checked your update and it is top class. Always a pleasure to use a good API when there is really no need to reinvent the wheel.

Just a random question, why don’t you use advert all over your web like google adsense?

I don’t really get enough traffic – it wouldn’t benefit me and it would just annoy people.

I see, good choice there. Thank you, your content is really helpful.

What does an id number start with if you are born after the year 2000? if you start with 00 then it will be 1900

As far as I can tell, it’s still 00. You’d need a person to exceed a hundred years of age for this to become a problem, so they’re just being hopeful for now.

Please check out this C# project I’m setting up. Once it gets to a stable and usable level, I’ll be publishing a nuget package of it. Feel free to contribute if you wish.

https://github.com/TJSoft/IDNumberValidation

Hi Guys,

Do any of you have the SA ID Number Validation for MS SQL.

Thanks,

Hi Nish,

I believe the answer you looking for can be found here https://www.facebook.com/anytofreedev

This is based on this article Nish.

How do i get this value ( var getBirthdate/ var getGender= function(idNumber) ) into the birthdate input text box with onclick of submit button after adding the IdNumber in the relevant textbox.. Pls assist this beginner.

How would the res/layout/activity_main.xml look like for the java Implementations and where do i declare them in the MainActivity.java. Still a beginner

Thanks Evan Knowles. Can you provide code or suggest anything for validating Namibian id.

I’m not familiar with them – do you have any samples?

Thanks Evan,

I have a problem with the birth date calculation. If the first 6 numbers indicate the date at which the person was registered, wouldn’t it be a problem for permanent residents?

They could be registered at the age of 50, and have an ID no. saying they are 2 years old.

The first six numbers are just the birthdate as far as I understand, not the registered date. South Africans tend to get IDs at around 16 in any case, so that would throw our calculations off as well.

Also, my wife is a permanent resident, and her ID number begins with her birthdate as well.