Friday, November 2, 2012

How to generate account number?

Requirements for `account number` generator:
  1. Issue pseudo random consistent number (must be unique for dozen millions of records)
  2. Easy check validity (without a need to make a database call)
We will use Feistel cipher to generate pseudo random number (positive only) from a sequential numbers (e.g. returned by nextval() for a posgresql sequence). This algorithm is taken as basis for the `make_feistel_number` function available in wheezy.core package. Setup environment before proceed:
$ virtualenv env
$ env/bin/easy_install wheezy.core
$ env/bin/python
Let play a bit how numbers are generated (notice, the function is reversal and consistent, all numbers are unique, no collision):
>>> from wheezy.core.feistel import make_feistel_number
>>> from wheezy.core.feistel import sample_f
>>> feistel_number = make_feistel_number(sample_f)
>>> feistel_number(1)
573852158
>>> feistel_number(2)
1788827948
>>> feistel_number(1788827948)
2
We will use Luhn algorithm to generate a single digit checksum. This algorithm is taken as basis for the module available in wheezy.core package.
>>> from wheezy.core.luhn import luhn_checksum
>>> luhn_checksum(123456789)
7
There are two more useful functions to sign a number and also check it validity:
>>> from wheezy.core.luhn import luhn_sign
>>> from wheezy.core.luhn import is_luhn_valid
>>> luhn_sign(123456789)
1234567897
>>> is_luhn_valid(1234567897)
True
>>> is_luhn_valid(34518893)
False
Now let just make account number looking nice (pad with zeros, prefix with something meaningful, etc):
>>> account_number = lambda n: 'Z%011d' % luhn_sign( \
...     feistel_number(n))
>>> account_number(1)
'Z05738521581'
>>> account_number(2)
'Z17888279480'
>>> account_number(3)
'Z07395350007'
Per discussion in python mail list, there was discovered alternative, human readable representation of the same number:
>>> from base64 import b32encode
>>> human_format = lambda n: 'Z%s-%s' % (b32encode( \
...     chr((n >> 24) & 255) + \
...     chr((n >> 16) & 255))[:4], \
...     b32encode(chr((n >> 8) & 255) + \
...     chr(n & 255))[:4])
>>> human_format(5738521581)
'ZKYFA-4PWQ'
>>> human_format(17888279480)
'ZFI4Q-PO4A'
>>> human_format(7395350007)
'ZXDGA-CX3Q'
Side by side:
Z05738521581 = ZKYFA-4PWQ
Z17888279480 = ZFI4Q-PO4A
Z07395350007 = ZXDGA-CX3Q
Yes, it optimized for speed, you must provide your own `sample_f` implementation for security reasons. Source code available here.