6.21.2007

implementing the luhn checksum, differently

recently i came across some discussion and implementations of luhn's mod 10 checksum algorithm in comp.lang.scheme that puzzled me and piqued my interest. wikipedia has a useful entry including a straight-forward c# implementation based on an informal description, and pointers to other implementations.

alas, this description seems to have encouraged everyone to implement the algorithm more or less the same way, with minor variations: mostly right-to-left scan with a toggle to decide digit processing, but sometimes the string is reversed, or processed from left-to-right with a toggle. [some schemers have done the usual: convert string to a list of digits, reverse it and scan it with a toggle. cool!] wikipedia entry helpfully includes the pre-computed table to eliminate the unnecessary multiply/compare/subtract, but evidently this has gone unnoticed.

lunh checking again

here are some rather basic [for me anyway] observations on implementing the algorithm:

  • if we know the size of the string, we know enough to scan the string left to right, without a toggle, without reversing or backwards [right to left] scanning.
  • if the size of the string is even [eg. most credit card numbers] we can scan normally from left to right, two digits at a time, first one transformed, and next one untransformed.
  • if the size of the string is odd, the first digit is always an untransformed digit. initialize the sum with this digit, and process the rest of the string normally [as above]
  • if we do not have the size of the string, we can still check the string in a single left-to-right pass.

here is an implementation that uses the pre-calculated numeric transformation table, and a toggle-free left-to-right scan. [for simplicity, i excluded the isdigit check but assumed string length is not known in advance.]


static int ltab[] = { 0, 2, 4, 6, 8, 1, 3, 5, 7, 9 };

int
luhn(char *str) {
        int sum = 0;

        if (!str || !*str || !*(str + 1))
                return 0;   /* less than minimum */
        /*
         * if the length is odd, add the value of the
         * first digit and skip
         */
        if (strlen(str) & 1)
                sum = *str++ - '0';

        while (*str) {
                sum += ltab[*str++ - '0'];
                sum += *str++ - '0';
        }

        return (sum % 10) == 0;
}

even with isdigit check implemented as a first pass (during which we can also calculate the length of the string) this runs almost twice as fast as most naive implementations seen around.

suppose we do not know the string length in advance, and maybe it is costly to do multiple scans [assume many long strings or list of digits as some lispers would have it]. we want to see if a given string is (a) all numeric, and (b) passes the luhn checksum, all in a single pass. since we cannot decide if we have to perform the luhn transform for the first digit or not, we do it both ways and calculate two sums:


static int ltab[] = { 0, 2, 4, 6, 8, 1, 3, 5, 7, 9 };

/*
 * luhn without prior pass for string length
 */
int
luhn(const char *str) {
        int sum[2] = {0,0};
        int flip = 0;
        char c;

        if (!str || !*str || !*(str + 1))
                return 0;
        /*
         * calculate two alternating sums. we do not know
         * which one we will end up using until the end
         */
        while (c = *str++) {
                if (!isdigit(c))
                     return 0;
                int n = c - '0';
                sum[flip] += ltab[n];
                sum[flip = !flip] += n;
        }

        return (sum[flip] % 10) == 0;
}

luhn checking in awk and python

when i first decided to implement the algorithm, i used awk to prototype several of my approaches, including the two above. here is another approach with an extended lookup table that works well with awk and python and possibly with other scripting languages i like less.

# luhn - checks if a string of digits is a valid credit card number
# unlike other right-to-left scanning toggle and calculate
# implementations, this one does less than half the work
# author: ozan s. yigit
# insert bsd copyright here

BEGIN {
# generate all two digit sequences, with appropriate
# luhn translation of the first digit.

    for (i = 0; i < 10; i++)
        for (n = 0; n < 10; n++) {
        t = i * 2;
        if (t > 9)
            t = t - 9
        pairmap[i n] = t + n
    }
}

function luhn(digits,    sum, n, i)
{
    i = 1           # index
    sum = 0
    n = length(digits)
    # if the length is odd, save+skip the first char
    if ((n % 2) > 0)
        sum = substr(digits, i++, 1)

    while (i <= n) {
        pair = substr(digits, i, 2)
        ## print i ": ", pair, "->", pairmap[pair]
        sum += pairmap[pair]
        i += 2
    }
    ## print sum
    return sum % 10 == 0
}

/^[0-9]+$/ {
    if (luhn($0))
        print $0 ": ok."
    else
        print $0 ": no."
}

here is basically the same thing in python.


# author: ozan s. yigit
# insert bsd copyright here
pairmap = {
"00": 0, "01": 1, "02": 2, "03": 3, "04": 4, "05": 5, "06": 6, "07": 7,
"08": 8, "09": 9, "10": 2, "11": 3, "12": 4, "13": 5, "14": 6, "15": 7,
"16": 8, "17": 9, "18":10, "19":11, "20": 4, "21": 5, "22": 6, "23": 7,
"24": 8, "25": 9, "26":10, "27":11, "28":12, "29":13, "30": 6, "31": 7,
"32": 8, "33": 9, "34":10, "35":11, "36":12, "37":13, "38":14, "39":15,
"40": 8, "41": 9, "42":10, "43":11, "44":12, "45":13, "46":14, "47":15,
"48":16, "49":17, "50": 1, "51": 2, "52": 3, "53": 4, "54": 5, "55": 6,
"56": 7, "57": 8, "58": 9, "59":10, "60": 3, "61": 4, "62": 5, "63": 6,
"64": 7, "65": 8, "66": 9, "67":10, "68":11, "69":12, "70": 5, "71": 6,
"72": 7, "73": 8, "74": 9, "75":10, "76":11, "77":12, "78":13, "79":14,
"80": 7, "81": 8, "82": 9, "83":10, "84":11, "85":12, "86":13, "87":14,
"88":15, "89":16, "90": 9, "91":10, "92":11, "93":12, "94":13, "95":14,
"96":15, "97":16, "98":17, "99":18
}

def luhncheck(number):
    n = len(number)
    if n < 2:   # less than minimum
        return 0
    i = 0
    sum = 0
    if n & 1:   # odd length
        sum = int(number[i])
        i = 1

    while i < n:
        s = i
        i += 2
        ##      print number[s:i], "->", pairmap[number[s:i]]
        sum += pairmap[number[s:i]]

    return(sum % 10) == 0

## print luhncheck("1111")
## print luhncheck("8763")
## print luhncheck("446667651")
## print luhncheck("471036814")
## print luhncheck("23813103131311229292929228")

for fun luhn and duff

this is the fastest version of luhn checksum in C i happen to have. not surprisingly, it uses duff's device.

/*
* luhn check using duff's device
* author: ozan s. yigit
*/
static int ltab[] = { 0, 2, 4, 6, 8, 1, 3, 5, 7, 9 };

int luhn(char *str, int len)
{
        int sum = 0;
        int loop;

        if (len < 2)
                return 0;

#define LUHN    ltab[*str++ - '0']
#define NORM    *str++ - '0'

        loop = (len + 8 - 1) >> 3;

        switch (len & (8 - 1)) {
        case 0:
                do {
        sum += LUHN;
        case 7: sum += NORM;
        case 6: sum += LUHN;
        case 5: sum += NORM;
        case 4: sum += LUHN;
        case 3: sum += NORM;
        case 2: sum += LUHN;
        case 1: sum += NORM;
                } while (--loop);
        }

        return (sum % 10) == 0;
}
[notes: alas, code you see here is copyright. you can do anything you like with it so long as you give proper credit. (creative commons attribution, share alike) i usually leave trivial code of this sort in the public domain, but i am increasingly unhappy seeing public domain code being smothered with GPL that hides the original intent of its authors.
]

1 comment:

Anonymous said...

Has anyone figured out a way to verify card number via RegExps (using substitution)?

OIur email filter allows us to put any RegExp we want, but does not allow arbitrary code...