Rcjp’s Weblog

October 6, 2007

Hash Tables

Filed under: c — rcjp @ 11:21 am

Just a quick note on how we used to use hash tables for the compiler code at Salford:

#include <stdio.h>
#include <stdlib.h>

/* point this at a debugging malloc if required */
#define newm_array(T,N) (T*)malloc(sizeof(T)*(N))
#define HASHSIZE    1024

struct symbol_list {
   symbol_list * next;
   int symbol ;
   char* name ;
   int size ;
};

symbol_list** base;

int main()
{
    ... we have some name we want to hash...

    base=newm_array(symbol_list*, HASHSIZE);

    /* clear hash table */
    i=0;
    while(i<HASHSIZE) {
        base[i]=NULL;
        i++;
    }

    int ih=0;
    char* p=name;
    while (*p) {       /* compute hash */
        ih=(ih<<1)+*p;
        p++;
    }
    ih=and(ih,HASHSIZE-1);
    symbol_list* x=base[ih];
    while (x != NULL) x=x->next; /* find empty chain */

    ... set x = name...
}

though might be better to make some functions

static unsigned hash(char *s)
{
    unsigned hashval;

    for (hashval = 0; *s != ''; s++)
        hashval = *s + 31*hashval;

    return hashval % HASHSIZE;
}

static entry *lookup(char *s)
{
    entry *np;

    for (np = hashtab[hash(s)]; np != null; np = np->next)
        if (wstrcmp(s, np->name) == 0)
            return np;
    return null;
}

October 1, 2007

Simple diff of 2 files

Filed under: c, lisp, python — rcjp @ 7:59 pm

A few days ago I wanted to compare to files each of which had one word per line (they were completion files for the rlwrap utility incidentally) thats easy enough with the unix shell commands, infact you can do it in one line

diff -iyw --suppress-common-lines <(sort -f file1) <(sort -f file2)

but I thought as a quick programming exercise I’d do it in C++/C, python and Common Lisp…

Calculating the difference is very easy using C++’s set_difference etc. but it is a surprise, I think for most programmers anyway, that you have to write your own case insensitive string comparison function. C++ sure is a peculiar mix of high and low level programming.

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <iterator>
#include <functional>

// 
// not sure there is any advantage to inherit from binary_function here?
// (infact we could just define a function rather than a struct and operator)
//
struct lessthan_nocase :
  public std::binary_function<const std::string&, const std::string&, bool>
{
    bool operator()(const std::string& s1, const std::string& s2) const
    {
        std::string::const_iterator p1 = s1.begin();
        std::string::const_iterator p2 = s2.begin();

        while(p1 != s1.end() && p2 != s2.end()) {
            if (toupper(*p1) != toupper(*p2)) return toupper(*p1) < toupper(*p2);
            ++p1;
            ++p2;
        }
        return s1.size() < s2.size();
    }
};

//
// set_difference only work on sorted containers
//
void sorted_readfile(const char* file, std::vector<std::string>& fvec)
{
    std::ifstream f(file);
    std::istream_iterator<std::string> finput(f), fend;

    copy(finput, fend, back_inserter(fvec));
    sort(fvec.begin(), fvec.end(), lessthan_nocase());
}

//
// dump results (show words if < MAXSHOW)
//
const unsigned int MAXSHOW = 10;

void display_diff(std::string title, std::vector<std::string> v)
{
    std::cout << title;
    if (v.size() < MAXSHOW) {
        std::cout << std::endl << std::string(title.size(), '-') << std::endl;
        copy(v.begin(), v.end(),
             std::ostream_iterator<std::string>(std::cout, "\n"));
    } else {
        std::cout << " =  " << v.size() << " words" << std::endl;
    }
    std::cout << std::endl;
}

int main(int argc, char* argv[])
{
    if (argc != 3) {
        std::cerr << "Usage: tdiff filename1 filename2" << std::endl;
        exit(1);
    }

    std::vector<std::string> f1;
    sorted_readfile(argv[1], f1);

    std::vector<std::string> f2;
    sorted_readfile(argv[2], f2);

    std::vector<std::string> notinf1;
    set_difference(f1.begin(), f1.end(), f2.begin(), f2.end(),
            back_inserter(notinf1), lessthan_nocase());
    display_diff("words in 1st but not in 2nd", notinf1);

    std::vector<std::string> notinf2;
    set_difference(f2.begin(), f2.end(), f1.begin(), f1.end(),
            back_inserter(notinf2), lessthan_nocase());
    display_diff("words in 2nd but not in 1st", notinf2);

    std::vector<std::string> symdiff;
    set_symmetric_difference(f2.begin(), f2.end(), f1.begin(), f1.end(),
            back_inserter(symdiff), lessthan_nocase());
    display_diff("symmetric difference", symdiff);

    std::vector<std::string> inter;
    set_intersection(f2.begin(), f2.end(), f1.begin(), f1.end(),
            back_inserter(inter), lessthan_nocase());
    display_diff("intersection", inter);

    return 0;
}

In C, the only thing to trip you up is getting the casting of the void * pointers in strcmp_nocase correct before trying to dereference them. Also I’ve cheated and used the non-ansi strcasecmp (sometimes called stricmp).

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

#define MAXLINE  256
#define MAXWORDS 10000
#define MAXSHOW  10

typedef enum {false, true} bool;

bool strempty(char *str)
{
    char *p = str;
    while (*p) if (!isspace(*p++)) return false;
    return true;
}

int strcmp_nocase(const void *s1, const void *s2)
{
    /* The actual arguments to this function are "pointers to
       pointers to char", but strcmp() arguments are "pointers
       to char", hence the following cast plus dereference */
    return strcasecmp( *(char * const *)s1, * (char * const *) s2);
}

int sorted_readfile(char *filename, char **words)
{
    FILE *f = fopen(filename, "r");

    if (!f) {
        fprintf(stderr, "can't open %s\n", filename);
        exit(1);
    }

    int count = 0;
    char line[MAXLINE], *q;
    while (fgets(line, MAXLINE, f) && (count < MAXWORDS)) {
        if ((q=strpbrk(line,"\r\n"))) *q=0;     /* words dont include newlines */
        if (strempty(line)) continue;           /* skip blank lines */
        words[count] = (char *) malloc(strlen(line)+1);
        /* or words[count++] = strdup(line)*/
        strcpy(words[count], line);
        count++;
    }

    fclose(f);

    if (count == MAXWORDS) {
        fprintf(stderr, "increase MAXWORDS buffer size\n");
        exit(1);
    }
    qsort(words, count, sizeof(char *), strcmp_nocase);
    return count;
}

void display_diff(char *title, char **v, int vsize)
{
    int i;
    printf("%s", title);

    if (vsize < MAXSHOW) {
        printf("\n");
        for (i=0; i<strlen(title); i++) putchar('-');
        printf("\n");
        for (i=0; i<vsize; i++)
            printf("%s\n", v[i]);
    } else {
        printf(" = %d words\n", vsize);
    }
    printf("\n");
}

int main(int argc, char *argv[])
{
    if (argc != 3) {
        fprintf(stderr, "Usage: tdiff filename1 filename2\n");
        exit(1);
    }
    int i;
    char **f1words;
    f1words = malloc(MAXWORDS*sizeof(char*));
    int nf1words = sorted_readfile(argv[1], f1words);

    char **f2words;
    f2words = malloc(MAXWORDS*sizeof(char*));
    int nf2words = sorted_readfile(argv[2], f2words);

    char **diff;
    int ndiff = 0;
    diff = malloc(MAXWORDS*sizeof(char*));
    for (i=0; i < nf1words; i++) {
        if (!bsearch(&f1words[i], f2words, nf2words,
                     sizeof(char*), strcmp_nocase)) {
            /* could just point into f1words instead */
            diff[ndiff++] = strdup(f1words[i]);
        }
    }
    display_diff("words in 1st but not in 2nd", diff, ndiff);

    return 0;
}

In python, most of it is easy, though its perhaps not very pythonesque to cram some things on one line like I’ve done here. Making set case independent requires the __hash__ and __eq__ functions (I think that is all we need in this case) get overridden to use a saved lowercase version of the supplied string (see the python reference) .

import string

class NoCaseStr(str):
    def __init__(self, s):
        str.__init__(self, s)
        # keep a copy of the lower case string
        self.loweredstr = s.lower()

    def __eq__(self, s):
        return self.loweredstr == s.lower()

    def __hash__(self):
        return hash(self.loweredstr)

def display_diff(title, dset, Maxshow=10):
    if len(dset) < Maxshow:
        print title
        print len(title) * '-'
        print '\n'.join(dset)
    else:
        print title, '=', len(dset)
    print

def read_words(f):
    w = set(NoCaseStr(string.strip(word)) for word in open(f).readlines())
    w.discard('')# slightly clumsy... '  \n' gets stripped to '' so discard it
    return w

def tdiff(f1, f2):
    w1 = read_words(f1)
    w2 = read_words(f2)
    display_diff('in 1st but not in 2nd', w1.difference(w2))
    display_diff('in 2nd but not in 1st', w2.difference(w1))
    display_diff('symmetric difference',  w1.symmetric_difference(w2))
    display_diff('intersection',          w1.intersection(w2))

if __name__ == '__main__':
    import sys
    if len(sys.argv) < 3:
        print 'Usage: %s file1 file2' % sys.argv[0]
        sys.exit(1)

    tdiff(sys.argv[1], sys.argv[2])

My favourite language – Common Lisp has everything you need to do with without overriding anything. string-equal is case insensitive (as opposed to string=)

(defun read-words (file)
  (with-open-file (str file)
    (loop for line = (read-line str nil nil)
       for word = (string-trim " " line)
       while line
       unless (string= word "")
       collect word)))

(defun display-diff (title diff &optional (max-show 10))
  (format t "~A: ~V{~A~^ ~}~%" title max-show diff))

(defun tdiff (f1 f2)
  (let ((w1 (read-words f1))
        (w2 (read-words f2)))
    (display-diff "in 1st but not in 2nd" (set-difference w1 w2 :test #'string-equal))
    (display-diff "in 2nd but not in 1st" (set-difference w2 w1 :test #'string-equal))
    (display-diff "intersection" (intersection w1 w2 :test #'string-equal))))

(tdiff "/home/r/tmp/f1" "/home/r/tmp/f2")

June 8, 2007

gtk containers

Filed under: c — rcjp @ 7:05 pm

The gtk glib library includes some standard container like linked lists…


/* Compile with cc doubly.c  `pkg-config --cflags --libs gtk+-2.0`  */

#include <glib.h>
#include <string.h>
#include <stdio.h>

void myfunc(gpointer data, gpointer user_data)
{
    printf("value %s\n", (char *)data);
}

int main()
{
    GList *list = NULL;
    GList *it = NULL;

    list = g_list_append(list, "one");
    list = g_list_append(list, "two");
    list = g_list_append(list, "three");

    for (it = list; it != NULL; it = it->next)
        printf("looping - %s\n", (char *)it->data);

    /* or alternatively  */

    g_list_foreach(list, (GFunc) myfunc, NULL);

    printf("length of list is %d\n", g_list_length(list ));
    return 0;
}

May 10, 2007

Sign Extending Unsigned Numbers

Filed under: c — rcjp @ 2:26 pm

Just playing about really, but I thought it would be worth re-affirming in my brain that you have to be careful using assembler instructions like CLW, CLQ etc. when dealing with unsigned numbers since they propagate the sign bit.:

    #include <stdio.h>
    void extend(unsigned char x)
    {
        int y;
        asm ("xor %%eax, %%eax;\n\t"      /* clear EAX */
             "movb %1, %%al;\n\t"         /* move the byte x into AL */
             "cbw;\n\t"
             "movl %%eax, %0;\n\t"
             :"=r"(y)                     /* output only */
             :"r"(x)                      /* input */
             :"%eax");                    /* clobbered regs */

        printf("x=%d, cbw extended=%d\n", x, y);
    }

    int main(void)
    {
        extend(127);
        extend(128);

        return 0;
    }

gives

    ~/c/tmp> ./a.out
    x=127, cbw extended=127
    x=128, cbw extended=65408

there is no binary flag for printf in C, but in Common Lisp:

    CL-USER> (format t "~20B~%" 128)
                10000000
    NIL
    CL-USER> (format t "~20B~%" 65408)
        1111111110000000
    NIL

Instead of doing it in inline C, I thought I’d try nasm + gcc

            extern  printf

            SECTION .data
    fmt:    db "cbw %d (previously %d)", 10, 0 

            SECTION .text
            global main        ; needed for gcc 
    main:
            push    ebp
            mov     ebp,esp

            xor     eax,eax
            mov     ax, 128
            push    eax
            cbw
            push    eax
            push    dword fmt
            call    printf

            add     esp, 12    ; pop 3 pushes
            mov     esp, ebp   ; clean stack frame
            pop     ebp
            mov     eax,0      ; no error return
            ret

building and running with

    ~/c/tmp> nasm -f elf nasm2.asm
    ~/c/tmp> gcc -o nasm2 nasm2.o
    ~/c/tmp> nasm2
    cbw 65408 (previously 128)

April 27, 2007

Finding open oflag Settings

Filed under: c, python — rcjp @ 11:03 am

In tracings you see the numerical value for flags like the oflag settings to the C open commands e.g.

    5901 open("/proc/asound/cards", 32768, 0666) = 7

to find out what they mean I just quickly did:

    In [36]: def oflag_to_string(oflag):
       ....:     for c in dir(os):
       ....:         if c.startswith('O_'):
       ....:             if eval('os.'+c) & oflag:
       ....:                 print c
       ....:
       ....:

    In [37]: oflag_to_string(32768)
    O_LARGEFILE

but afterwards realised I could have just done an strace since that nicely converts the values for you.

January 15, 2007

Swapping Spaces for Punctuation

Filed under: c, lisp, python — rcjp @ 10:17 am

Just a quicky. In Common Lisp…

    (defun spacify-punc (string)
      (substitute-if #\Space #'punc-p string))

    (defun punc-p (char) (find char "*,.:;-()"))

then use it…

    CL-USER> (spacify-punc "this, string. and: more")
    "this  string  and  more"

in python…

    def spacify_punc(char):
        if char not in "*,.:;-()":
            return char
        else:
            return ' '

    print ''.join(spacify_punc(c) for c in "this, string. and: more")

in C possibly something like…

    #include <stdio.h>
    #include <string.h>

    int main()
    {
      char words[] = "this, string. and: more";
      char *answer = strdup(words);
      unsigned int i;

      /* using sizeof (instead of strlen) so we include the NULL */
      for (i=0; i < sizeof(words); i++)
        answer[i] = strchr("*,.:;-()", words[i]) ? ' ': words[i];

      printf("'%s' becomes '%s'\n", words, answer);
      return 0;
    }

All three languages have functions to detect punctuation, but we are only looking for a subset in this case.

I think I like the CL solution best but the order of args to substitute-if always throws me, thank heavens for SLIME’s argument prompting.

January 6, 2007

Twenty Questions Game in C

Filed under: c — rcjp @ 3:33 pm

Tree traversal is really really easy in Common Lisp so for a bit more of a challenge I wrote this in C. The game is for the user to think of an object and answer questions while the program tries to guess. The code is just a binary search tree, but you have to look ahead to see if you need to add in a piece of tree rather than just a straighforward leaf node.

    #include <stdio.h>
    #include <string.h>
    #include <stdbool.h>
    #include <stdlib.h>

    #define LINE_MAX 512

    typedef struct node NODE;
    struct node {
        char qu[LINE_MAX];
        NODE *yes, *no;
    };

    /*
     * print message and read stdin into LINE_MAX reply buffer
     * trimming newline
     */
    void getreply(char *message, char reply[LINE_MAX])
    {
        puts(message);
        fflush(stdout);
        fgets(reply, LINE_MAX, stdin);
        strtok(reply, "\n");        /* trim off \n */
    }

    bool ask(char *question, bool object)
    {
        char line[LINE_MAX], reply[LINE_MAX], qu[LINE_MAX];

        if (object)
            sprintf(qu, "Is it %s?", question);
        else
            strcpy(qu, question);
        getreply(qu, line);
        sscanf(line, "%s", reply);
        if (!strcmp(reply, "yes") || !strcmp(reply, "y"))
            return true;
        return false;
    }

    int main()
    {
        NODE info = { "Is it an animal?",
            &(NODE) {"Is it slimy?",
                     &(NODE) {"a fish", NULL, NULL},
                     &(NODE) {"a bear", NULL, NULL}},
            NULL
        };

        char object[LINE_MAX], clarify[LINE_MAX];
        bool yesno;
        do {
            NODE *curr = &info, *next, *newnode;
            while (curr) {
                yesno = ask(curr->qu, false);
                next = yesno ? curr->yes : curr->no;
                if (!next) {
                    /*  
                     * we need to add a leaf 
                     */
                    curr->no = malloc(sizeof(NODE));
                    getreply("What is it? ", curr->no->qu);
                    break;
                }
                if (!next->yes && !next->no) {
                    /* 
                     * we might need to add a new node in here 
                     */
                    if (ask(next->qu, true))
                        break;      /* the leaf was right.. new game */

                    newnode = malloc(sizeof(NODE));
                    getreply("What is it? ", object);
                    sprintf(clarify, "What question distiguishes %s from %s?",
                            object, next->qu);
                    getreply(clarify, newnode->qu);

                    /* check which way round this new question applies */
                    sprintf(clarify, "A %s, %s", object, newnode->qu);
                    if (ask(clarify, false)) {
                        newnode->yes = malloc(sizeof(NODE));
                        strcpy(newnode->yes->qu, object);
                        newnode->no = next;
                    } else {
                        newnode->no = malloc(sizeof(NODE));
                        strcpy(newnode->no->qu, object);
                        newnode->yes = next;
                    }
                    if (yesno)
                        curr->yes = newnode;
                    else
                        curr->no = newnode;
                    break;
                }
                curr = next;
            }
        }
        while (ask("New Game", false));
        return 0;
    }


    /*   Sample run...

        Is it an animal?
        y
        Is it slimy?
        n
        Is it a bear?
        y
        New Game
        y

        Is it an animal?
        n
        What is it?
        a box
        New Game
        y

        Is it an animal?
        n
        Is it a box?
        n
        What is it?
        a banana
        What question distiguishes a banana from a box?
        is it yellow?
        A a banana, is it yellow?
        y
        New Game
        y

        Is it an animal?
        n
        is it yellow?
        y
        Is it a banana?
        n
        What is it?
        the sun
        What question distiguishes the sun from a banana?
        is it cold?
        A the sun, is it cold?
        n
        New Game
        y

        Is it an animal?
        n
        is it yellow?
        y
        is it cold?
        n
        Is it the sun?
        y
        New Game
        n
        r@laptop:~/c/tmp$
    */

I started thinking about how this would look in a C++ object type program but, to be honest, I don’t enjoy trying to think out the right class structure; and what is annoying is that you can’t start coding and explore the problem as you code, because you’ll come to a grinding halt if you’ve got the structure wrong. I guess I just don’t think OOP is fun coding, at least not in C++.

December 30, 2006

Lyaponov Exponent

Filed under: c — rcjp @ 6:42 pm

Most introductions to chaos theory will talk about the logistic mapping x_{n+1}=r\, x_n(1-x_n) and then calculate the lyapunov exponent which is a measure of how rapidly small deviations in the initial conditions give exponential changes in results giving an indication how chaotic the system is. It is calculated from \lambda = \frac{1}{N} \sum^N_{n=1} \log(\frac{d x_{n+1}}{d x_n}) The program below calculates this for 2.95<r<4.0 after iterating for 1000 times to stabalise things then using N=30.

The code is slightly contrived; I wrote it to play with passing functions through C++ templates. The idea being that logis and deriv (its derivative) since they are templated, can get inlined by the compiler; though Id’ guess a much more straightforward C program would probably optimise just as well. In addition the std::bind1st(std::mem_fun(&IterMap::lyaponov_func),this) didn’t exactly trip off my fingers.

#include <iostream>
#include <fstream>
#include <algorithm>
#include <vector>
#include <numeric>
#include <functional>
#include <cmath>

typedef double (*CALC)(double, double);

class IterMap
{
    std::vector<double> data;
    const int nstabalise;
    const int nrecord;
    double r;
 public:
    IterMap(int rec, int stab=1000) : nstabalise(stab), nrecord(rec) {}
    template<CALC derivfunc>double lyaponov_func(double x);
    template<CALC mapfunc> void stabalise(double x);
    template<CALC mapfunc, CALC derivfunc> double lyapunov_exp(double r);
};

template<CALC mapfunc> void IterMap::stabalise(double x)
{
    for(int i = 0; i < nstabalise; ++i)
        x = mapfunc(r, x);

    for(int i = 0; i < nrecord; ++i) {
        x = mapfunc(r, x);
        data.push_back(x);
    }
}

template<CALC derivfunc> double IterMap::lyaponov_func(double x)
{
    return log(fabs( derivfunc(r,x) ));
}

template<CALC mapfunc, CALC derivfunc> double IterMap::lyapunov_exp(double ra)
{
    r = ra;
    stabalise<mapfunc>(0.1);

    // used mem_func, else have to declare func static

    transform(data.begin(), data.end(), data.begin(),
            std::bind1st(std::mem_fun(&IterMap::lyaponov_func<derivfunc>),this));
    double avg = accumulate(data.begin(), data.end(), 0.0) / data.size();
    data.clear();
    return avg;
}

////////////////////////////////////////////////////////////////////////////////

double logis(double r, double x)
{
    return r * x * (1.0 - x);
}

double deriv(double r, double x)
{
    return r - 2.0*r*x;
}

int main(int argc, char* argv[])
{
    std::ofstream out("ldata", std::ios::out);
    IterMap bir(30);
    for (double r = 2.95; r <= 4.0; r+=0.0005)
        out << r << " " << bir.lyapunov_exp<logis,deriv>(r) << std::endl;

    return 0;
}

December 29, 2006

Quick C++ Word Count Trick

Filed under: c — rcjp @ 2:19 pm

Probably not really useful in actual code but

#include <iostream>
#include <iterator>
#include <string>

typedef std::istream_iterator<std::string> in;

int main()
{
    std::cout << std::distance(in(std::cin), in());
}


will print the number of words piped in.

December 22, 2006

String Searching/Replacing

Filed under: c, lisp, python — rcjp @ 9:04 am

just some quick notes on how various languages handle replacing/splicing a string…

In C++ chopping, searching and replacing in a string is fairly easy

    #include <iostream>
    #include <string>

    int main()
    {
        std::string s = "some one with more than one that ones.";

        s.erase(0,3);
        s.replace(s.find("one"), 3, "three");

        std::cout << s << std::endl;
    }

in straight C doing the same thing is a bit more fiddly

    #include <stdio.h>
    #include <string.h>

    int main()
    {
        char* s = "some one with more than one that ones.";

        char buf[255];
        strcpy(buf, s+3);  /* chop off the first 3 chars */

        char *p = strstr(buf, "one");
        if (p)
        {
            char tmp[255];
            *p = (char) 0;
            strcpy(tmp, buf);
            strcat(tmp, "three");
            strcat(tmp, p+3);  /* skip over length of "one" */
            strcpy(buf, tmp);
        }

        printf("%s\n", buf);
        return 0;
    }

you have to think about the size of the temporary buffer unless you malloc something based on the size of s – but then you have to know how much bigger your operations will make the string.

Perhaps suprisingly there isn’t any standard function to replace strings in Common Lisp – I guess its one of those things your are supposed to deal with yourself since if you know you your replacement word is the same size you can destructively alter the string (using setf on the subseq), otherwise you have to build a new string (since you may be replacing a word with a bigger word). So CL leaves things to you to figure out the best approach.

You can get the position of a string within another with

    * (search "one" "there one is one more than ones")
    6

and then build up the string a la C…

    (let* ((str "some one with more than one that ones.")
           (word "one")
           (p (search word str)))
      (if p
          (concatenate 'string (subseq str 3 p)
                               "three"
                               (subseq str (+ p (length word))))
          p))

In python

    In [3]: "there one is one more than ones".replace("one", "three", 1)
    Out[3]: 'there three is one more than ones'

where we are using a count=1 otherwise it would replace all instances. We can even chop off the first three chars and then replace all in one go

    In [5]: "there one is one more than ones"[3:].replace("one", "three")
    Out[5]: 're three is three more than threes'

Replacing all occurances in C++ is a bit more work

    std::string::size_type n;  // or   size_t n; 
    std::string word = "one";
    while((n=s.find(word)) != std::string::npos)
        s.replace(n, word.size(), "three");

Replacing all strings in C or Common Lisp is quite alot more work and probably better to write a utility function and keep it somewhere or use a library.
http://en.wikibooks.org/wiki/Programming:Common_Lisp/Strings has

    (defun replace-all (string part replacement &key (test #'char=))
      (with-output-to-string (out)
        (loop with part-length = (length part)
           for old-pos = 0 then (+ pos part-length)
           for pos = (search part string :start2 old-pos :test test)
           do
             (write-string string out :start old-pos
                                      :end (or pos (length string)))
           when pos do
             (write-string replacement out)
           while pos)))

or maybe use cl-ppcre

December 21, 2006

Compiler warnings GCC/SCC

Filed under: c — rcjp @ 8:51 am

Got Salford SCC running under WINE (except for the /break debugger option) which is impressive.

Just tried running it on a couple of test programs and it didn’t like

    typedef struct box {
        const char *name;
        const int count;
    } BOX;

    r@laptop:~/c/exercises$ scc bsearch.c /ansi_c/link
    [Salford SCC/WIN32 Ver 3.16 Copyright (c) Salford Software Ltd 2002]
       0008       const int count;
    *** A structure member may not be const or volatile
        1 ERRORS  [<BSEARCH> SCC/WIN32 Ver 3.16]
    *** Compilation failed

I’m not quite sure about this though. I was initialising with

    struct box ruler = { "ruler", 0 };

C++ has member initialisers which are required for const member objects as in

    typedef struct box {
        const char *name;
        const int count;
        box();
    } BOX;

    box::box () : name(""), count(5){}

as you can’t do

    box::box () {
        name = "";
        count = 5;
    }

because that would mean that the constructors for the member object are called for count and then altered in the box constructor and you can’t alter a const! But there aren’t the same concepts in C, hmmm.

I also realised that gcc’s -Wall doesn’t mean what I thought it did – warn about everything. Infact the recommended compiler options for finding problems are (see http://www.network-theory.co.uk/docs/gccintro/index.html)

gcc -ansi -pedantic -Wall -W -Wconversion -Wshadow -Wcast-qual -Wwrite-strings

using -std=c99 if using c99 features. I wonder why c99 isn’t the default yet?

December 19, 2006

Reading in Files

Filed under: c, lisp — rcjp @ 5:52 pm

I was catching up on Dr.Dobb’s columns and came across Walter Browns clean way of reading lines from files in Andrew Koenig’s c++ blog.

    #include <iostream>    // for cout 
    #include <fstream>     // for ifstream 
    #include <string>      // for string 

    int main()
    {
        std::ifstream file("t2.cpp");

        for(std::string s; getline(file, s);)
        {
            std::cout << s << '\n';
        }
    }


which is a nice alternative to the usual while loop as the string is scoped in the for loop that uses it.

Of course you can do something similar in C, provided you use -std=c99 compiler flag to gcc to allow declarations in for loops…

    #include <stdio.h>
    #define LINE_MAX 256

    int main()
    {
        FILE *fp=fopen("t1.c","r");

        for (char line[LINE_MAX]; fgets(line, LINE_MAX, fp);)
        {
            printf("line = %s\n", line);
        }
    }


Production code should always check the results of fopen, and fgets may return a NULL because of a problem reading the file – not because it reached the end, so you should check with feof. But in ‘quick and dirty’ mode I skip these and C will segmentation fault trying to read from a missing file. C++ will just quietly carry on, neither of which is ideal.

Common Lisp will throw you into the debugger when

    (defun test ()
      (with-open-file (stream "badfilename" :direction :input)
        (loop for line = (read-line stream nil nil)
           while line do
             (format t "line = ~A~%" line))))


fails to find the file. (I’ve never really liked those nil’s that follow read-line though, the first says the end of file isn’t an error and the second is what you want to signify the end of input. I think I’d prefer it if they were keyword arguments – more descriptive.)

If you are using a lisp compiler that supports restarts (unfortunately CMUCL/SBCL don’t as far as I can tell, but clisp and Lispworks do) you can quickly patch the running code if you don’t feel like going back to the source just yet…

    r@laptop:~/lisp/tmp$ clisp -q
    [1]> (load "test.lisp")
    ;; Loading file test.lisp ...
    ;; Loaded file test.lisp
    T
    [2]> (test)

    *** - OPEN: file #P"/home/r/tmp/badfilename" does not exist
    The following restarts are available:
    ABORT          :R1      ABORT
    Break 1 [3]>


first find out where you are with :w

    Break 1 [3]> :w
    <1> #<SYSTEM-FUNCTION OPEN>
    EVAL frame for form (OPEN "badfilename" :DIRECTION :INPUT)
    Break 1 [3]>


now you can see the actual open command the with-open-file macro got expanded into, and use :rt to return the correct filename by typing in the correct value of OPEN at the “Values:” prompt and carry on execution

    Break 1 [3]> :rt
    Values: (OPEN "correctfilename" :DIRECTION :INPUT)


There is an interesting document showing some more advanced restarting of code here using acl but is worth reading even if you don’t have that implementation.

October 3, 2006

Word Frequency Comparison

Filed under: c, lisp, python — rcjp @ 2:49 pm

Quite a common programming exercise is to write a word frequency counter. For a bit of fun I thought I’d code it in a few languages for comparison. There are essentially two parts to the problem: splitting up text into real words and keeping a tally of the occurances. Before we look at the python here is an interesting piece of shell script from the Classic Shell Scripting Oreilly book:

#! /bin/sh
# Read a text stream on standard input, and output a list of
# the n (default: 25) most frequently occurring words and
# their frequency counts, in order of descending counts, on
# standard output.
#
# Usage:
#       wf [n]
tr -cs A-Za-z\' '\n' |        # Replace nonletters with newlines
  tr A-Z a-z |                # Map uppercase to lowercase
    sort |                    # Sort the words in ascending order
      uniq -c |               # Eliminate duplicates, showing their counts
        sort -k1,1nr -k2 |    # Sort by descending count, and then by ascending word
          sed ${1:-25}q       # Print only the first n (default: 25) lines;


You can then use e.g. head -n10 to get the top ten words, or wf 999999 < texfile | wc -l to find the number of unique words etc. It is impressive that it can be done, but I think I’d forget what those args to sort did pretty quick.

Python

There was an entry in the python shootout (which I can’t seem to find now online?)

import sys

def main():
    i_r = map(chr, range(256))

    trans = [' '] * 256
    o_a, o_z = ord('a'), (ord('z')+1)
    trans[ord('A'):ord('Z')+1] = i_r[o_a:o_z]
    trans[o_a:o_z] = i_r[o_a:o_z]
    trans = ''.join(trans)

    count = {}
    for line in sys.stdin:
        for word in line.translate(trans).split():
            try:
                count[word] += 1
            except KeyError:
                count[word] = 1

    l = sorted(zip(count.itervalues(), count.iterkeys()), reverse=True)

    print '\n'.join(["%7s %s" % (count, word) for count, word in l])

main()


and a sample run would be…

    r@laptop:~$ python py/work/wordfreq.py < testwords
   5621 the
   2739 a
   2585 of
   2242 and
   2155 it
   2086 to
   1938 he
   1750 said
   1495 you
   1387 was
   1221 in
   1132 that


But I don’t think I would have come up with that translate solution. My, probably slower, effort was:

import re

def wordfreq_fromfile(filename):
    wordfreq(file(filename).read())

def wordfreq(str):
    freq={}
    words = re.compile(r"[\w'-]+")
    for word in words.finditer(str):
        w = word.group(0).lower()
        freq[w] = freq.get(w,0) + 1
    # make a list of most common words
    common = sorted(freq, key=freq.get, reverse=True)
    print '\n'.join("%7s %s" % (freq[word], word) for word in common)


and I think it is more obvious what that code is doing.

C++

The standard library has a map container handy to keep the word/occurance, but you can’t sort that so we switch to a vector at the end

#include <iostream>
#include <fstream>
#include <map>
#include <string>
#include <vector>
#include <algorithm>
#include <iterator>
#include <iomanip>

typedef std::map<std::string,int>  Dict;
typedef std::pair<std::string,int> DictItem;
typedef std::vector<DictItem>      VDict;

bool wordfreq_greater(const DictItem& a, const DictItem& b)
{
    return a.second > b.second;
}

// cout will only look for << operators in std namespace
// (could've just defined format in print_words rather than <<)

namespace std{
    std::ostream& operator << (std::ostream& out, const DictItem& d)
    {
        return out << std::setw(7) << d.second << " : " << d.first;
    }
}

template <typename T>
void print_words(T iter, std::string title, int maxcount)
{
    std::cout << title << std::endl
              << std::string(title.size(), '-') << std::endl;
    for(int i = 0; i < maxcount; ++iter, ++i)
        std::cout << *iter << std::endl;
    std::cout << std::endl;
}

int main(int argc, char *argv[])
{
    std::ifstream essay(argv[1]);
    if (!essay) {
        std::cerr << "please supply a filename for input" << std::endl;
        exit(1);
    }

    Dict dict;
    std::string word;
    while (essay >> word)    // count up frequency of words
        ++dict[word];

    // print the first 10 words, map sorts alphabetically
    // by default.. map< std::string,int,std::less<std::string> > dict;

    print_words<Dict::const_iterator> (dict.begin(), "Alphabetically", 10);

    // copy the dict to a vector so we can sort it

    VDict vdict(dict.begin(), dict.end());
    sort(vdict.begin(), vdict.end(), wordfreq_greater);

    print_words<VDict::const_iterator>(vdict.begin(), "Frequency", 10);
}


when run with a sample text file prints something like:

Alphabetically
--------------
      3 : !
    230 : "
      1 : ""allo,
      1 : "'E's
      1 : "'Ha'
      1 : "'Yet
      2 : "...
      2 : "/
      2 : "7
     38 : "A

Frequency
---------
   4836 : the
   2529 : a
   2509 : of
   1997 : to
   1843 : and
   1342 : was
   1304 : said
   1156 : he
   1093 : in
    917 : it


so it could probably do with some work on getting punctuation right.

Common Lisp

Common Lisp doesn’t have regular expressions included, but its usually not too much trouble to just write a quick bit of parsing code

    (defvar *freq* (make-hash-table))

    (defun word-letter-p (c)
      (and (characterp c) (alpha-char-p c)))

    (defun gobble-punctuation (stream)
      (loop for c = (peek-char t stream nil)
         until (word-letter-p c)
         while c do (read-char stream)))

    (defun getword (stream)
      (gobble-punctuation stream)
      (loop for c = (read-char-no-hang stream nil nil)
         while (word-letter-p c)
         collect c into letters
         finally (return (format nil "~{~C~}" letters))))

    (defun wordfreq (stream)
      (clrhash *freq*)
      (loop for word simple-string = (getword stream)
           while (string-not-equal word "")
           do (incf (the fixnum (gethash (intern (string-downcase word)) *freq* 0))))
      (let ((freqlist nil))
        (maphash #'(lambda (k v) (push (cons v k) freqlist)) *freq*)
        (loop for (k . v) in (sort freqlist  #'> :key #'car)
           repeat 15 ;; just dump out the top 15 words
           do (format t "~4A, ~A~%" k v))))

    (defun wordfreq-string (str)
      (with-input-from-string (s str)
        (wordfreq s)))

    (defun wordfreq-file (filename)
      (with-open-file (s filename :direction :input)
        (wordfreq s)))

    (wordfreq-file "/home/r/py/finished/testwords")


of course there are regexp libraries for Common Lisp like Edi Weitz’s cl-ppcre:

    (eval-when (:compile-toplevel :load-toplevel :execute)
      (require :cl-ppcre))

    (defvar *freq* (make-hash-table))

    (defun wordfreq (stream)
      (clrhash *freq*)
      (loop for line simple-string = (read-line stream nil nil)
         while line
         do (cl-ppcre:do-matches-as-strings (word "[\\w'-]+" line)
              (incf (the fixnum (gethash (intern (string-downcase word)) *freq* 0)))))
      (let ((freqlist nil))
        (maphash #'(lambda (k v) (push (cons v k) freqlist)) *freq*)
        (loop for (k . v) (fixnum . symbol) in (sort freqlist  #'> :key #'car)
           repeat 15 ;; just dump out the top 15 words
           do (format t "~4A, ~A~%" k v))))

    (defun wordfreq-string (str)
      (with-input-from-string (s str)
        (wordfreq s)))

    (defun wordfreq-file (filename)
      (with-open-file (s filename :direction :input)
        (wordfreq s)))

    (wordfreq-file "/home/r/py/finished/testwords")


None of these programs are really finished… it’d be very fiddly to get them to deal correctly will all kinds of punctuations marks etc, also I haven’t bothered to profile any of them – computers are so quick these days that speed is almost irrelevent for this kind of stuff.

July 23, 2006

Creating Functions

Filed under: c, lisp — rcjp @ 2:34 pm

In the introduction of Paul Graham’s ANSI Common Lisp he writes that you can’t do the following in C


(defun addn (n)
  #'(lambda (x) (+ x n)))

(setf add-ten (addn 10))
(setf add-twenty (addn 20))

(format t "value ~d~%" (funcall add-ten 4))
(format t "value ~d~%" (funcall add-twenty 4))

(This will print 14 and 24). Although you can have function pointers in C, there isn’t really a convenient place to store the captured n variable. You can use a global variable, but that will get overridden by another function


#include <stdio.h>

int nglobal = 0;
int addn(int n) { return (n+nglobal); }

int main()
{
    int (* myfunc)(int);

    nglobal = 10;
    myfunc = addn;

    printf("value %d\n", (* myfunc)(4));
    return 0;
}

you can get further with C++ though by using a class instance variable


#include <iostream>

class addn
{
  int nInstance;
public:
  addn(int n) : nInstance(n) {}
  int operator()(int i) {return nInstance + i;}
};

int main()
{
  addn add_ten(10);
  addn add_twenty(20);

  std::cout << "value " << add_ten(4) << std::endl;
  std::cout << "value " << add_twenty(4) << std::endl;
  return 0;
}

but it is clumsy compared to the Common Lisp. The C++ function objects in the standard library aren’t particularly pretty either, compare:

#include <iostream>
#include <vector>

int main()
{
    std::vector<int> v;
    v.push_back(5);
    v.push_back(4);
    v.push_back(11);
    v.push_back(14);

    // count numbers less than 10

    std::cout << count_if(v.begin(),  v.end(),
                 bind2nd(std::less<int>(),  10)) << std::endl;
}


in python

In [9]: sum(itertools.imap(lambda x: x<10, [5,4,11,14]))
Out[9]: 2


and in Common Lisp this is just

(count-if (lambda (x) (< x 10)) '(5 4 11 14))

June 2, 2006

Using a shared lib from cmucl

Filed under: c, lisp, unix — rcjp @ 1:24 pm

library code mylib2.c

#include <stdio.h>

int mytest()
{
    printf("inside c lib !!!\n");
    return 2;
}

test.lisp

(defun fact (n)
  (if (= n 1)
      1
      (* n (fact (- n 1)))))

(ext:load-foreign "/usr/local/lib/libmylib2.so.1")
(alien:alien-funcall (alien:extern-alien "mytest" (function c-call:void) ))

test2.c

#include <stdio.h>

int main()
{
    printf("inside main program\n");
    mytest();
    printf("back inside main program\n");
}

build using

#!/bin/sh
cc -fPIC -c mylib2.c
gcc -shared -Wl,-soname,libmylib2.so.1 -o libmylib2.so.1   mylib2.o
# 
# now use the lib
#
# need to copy libmylib2.so.1 to /usr/local/lib or somewhere
# then as root do 'ldconfig' 
cc test2.c -L. -lmylib2.so.1 -o test2

Using an archive library

Filed under: c, unix — rcjp @ 1:19 pm

main program that calls into the archive test1.c

#include <stdio.h>

int main()
{
    printf("inside main program\n");
    mytest();
    printf("back inside main program\n");
}

the code in the archive mylib1.c

#include <stdio.h>

int mytest()
{
    printf("inside c lib !!!\n");
    return 2;
}

build with the script

#!/bin/sh
cc -c mylib1.c
ar -cvq mylib1.a mylib1.o
#
# create a symbol table to link with
# 
ranlib mylib1.a
# 
# now use the lib
#
cc test1.c mylib1.a -o test1

July 3, 2005

ACCU Review

Filed under: c — rcjp @ 3:07 pm

Maximizing .NET Performance by Nick Wienholt

(1-59059-141-0) Apress, 280pp @£32.00
(appeared in CVu Aug 2005 Vol 17 No 4)

The experienced C/C++ coder usually has a reasonable idea of the cost of using certain instructions and can visualise the generated assembler. However, with a .NET language the translation of instructions into Microsoft Intermediate Language (MSIL) calling into the Common Language Runtime (CLR) adds another layer between your code and the machine. Add in garbage collection and performance penalties are not obvious without investigation. This analysis is the subject of the book.

Assuming no previous experience of performance testing, it begins with basic definitions of white box/black box testing, profiling etc. itemising the factors to consider when benchmarking code. Most of this is common sense provided you have a healthy sceptical nature for performance figures. Described in the appendix, and available on the publisher’s website, is the benchmarking testbed used throughout the book.

If you find yourself wondering how to balance execution speed against convenient code use, then this book will prove invaluable in helping you make an informed decision avoiding disappointing performance. If your completed programs are too slow, it is usually too late to refactor code, changing deep internal structures: hence, there is only one short chapter at the end of the book devoted to solving existing performance problems.

It is not only the choice of data structure that need investigating; the .NET framework provides alternative solutions for many problems e.g. when considering thread safety, there are several methods for synchronisation to choose from. There are chapters devoted to studying the comparative merits of strings manipulations, collections, garbage collection, exceptions, IO and remoting amongst others. Throughout the book are references to numbered tests backing up the authors conclusions. Optimising is a notoriously controversial topic with more opinions than facts and I find it extremely refreshing to see frequent referenced tests. It also encourages you to investigate other performance characteristics of the .NET Framework using the provided testbed.

Most of the example code is in C# or VB but C++ is included with sections to cover the effects of converting unmanaged to managed code. There is also an interesting comparison of how all three languages take advantage of the CLR. The book is well written, perhaps occasionally a little verbose, but it is analytical without being dry, clearly typeset, includes plenty of valuable information and can be read cover-to-cover or used as a reference. In short, you should read this book before writing performance dependent .NET code. Recommended.

April 20, 2005

ACCU Book Review

Filed under: c — rcjp @ 3:11 pm

Microsoft Visual C# .Net 2003 Unleashed

by Lonny Kruger & Kevin Hoffman (0-6723-2676-0) SAMS @ £43.99
(appeared in CVu June 2005 Vol 17 No 3)

This book attempts to cover the vast range of the .NET framework using C# as well as the C# language itself. It has nine sections totalling 916 pages plus a comprehensive index and starts with a walk around Visual Studio. This is very difficult to do in words and it may have been better to point to one of the downloadable guided tours on the msdn website.

The syntax of C# described in the next 300 pages takes you from basic control structures through to COM and code reflection. The introduction to C# chapter is bizarre – the third program shows a SQL connection class implementing IDisposable and we do not see ‘hello world’ until the end of the chapter! Using undefined terms such as ‘metadata’ will only confuse a beginner and there is no glossary of terms. Yet the beginner is clearly the target audience for the book: chapter 3 has a two-page program to demonstrate the use of logical AND and OR with a score of Console.Write statements – surely the most confusing way to demonstrate such a simple concept.

The remaining parts cover Windows Forms, Web Applications, Data Access, Web Services, Secure Applications, Enterprise and Connected Applications and Debugging and Testing. It is odd, given the scope of the book, that it omits any instructions on setting up IIS needed run many of the examples and instructs the reader to look elsewhere.

For some reason the chapters start on the left hand page with a useless ‘What you need’ section containing entries such as ‘Recommended Software – Visual Studio.NET’ and ‘Skills Required – C# coding’. Most of the code layout is poor with messy indentation, multiple blank lines and regions of highlighted code that make reading the programs painful. The authors have often not even been bothered enough to remove the auto generated comments for class descriptions etc.

It seems in a desperate scramble to reach the thousand-page count for the book, the authors wasted space wherever possible in the early chapters. For instance, in the chapter on Windows Forms the same picture of their demo window appears no less than nine times.

The later chapters are considerably better and provide a useful introduction for the more advanced features of .NET that often require a separate book to cover in detail.

However, taking the book as a whole, it fails as a beginner’s guide with often trivial and uninspired examples and no glossary. That, along with too many worthless descriptions such as ‘The Call Stack window displays the functions on the call stack’ and sloppy formatting of code make the book feel carelessly put together. Not recommended.

November 8, 2004

Shellcode

Filed under: c — rcjp @ 3:22 pm

Some notes from the Shellcoders handbook…


                   +-------------------+
  High Address --> | env pointer argc? |
                   +-------------------+
                   |      stack        |
                   |        |          |
                   |        |          |
                   |        v          |
                   +-------------------+
                   |        ^          |
                   |        |          |
                   |        |          |
                   |       heap        |
                   +-------------------+
                   |       .bss        |
                   +-------------------+
                   |      .text        |
                   +-------------------+
                   |    shared libs    |
                   +-------------------+

o Stack Overflows
  EBP - frame pointer
  to investigate the stack do
  gcc -mpreferred-stack-boundary=2 -ggdb tt.c -o tt

  Before calling a function
  -------------------------
  o save EBP on the stack
  o store ESP in EBP (EBP can now be used to reference the args inside
                      this function even though the stack pointer ESP
                      changes when you push/pop etc.)
  o maybe sub ESP for some space for local variables -----------------+
  o maybe save some regs                                              |
  o sub ESP by required stack amount for returned values -----------+ |
  o push args for function                                          | |
  o call function (which implicitly first pushes the return address | |
                   on the stack)                                    | |
                                                                    | |
              +---------------------+                               | |
  EBP ------> |   old value of EBP  |                               | |
              +---------------------+                               | |
              |     locals...       |   ^                           | |
              +---------------------+   |-----------------------------+
              |     locals...       |   v                           |
              +---------------------+                               |
              | saved regs EBX etc. |                               |
              +---------------------+                               |
              |  returned value 1   |    ^                          |
              +---------------------+    |                          |
              |  returned value 2   |    |--------------------------+
              +---------------------+    v
              |    EIP of address   |
              |    just after CALL  |
              +---------------------+

 o space on the stack e.g. for local variables is usually rounded up
   to 4 byte blocks

Inside the called function
--------------------------
 Assuming this called function doesn't call any other functions
 (otherwise it will look very like the situation above)

  o push EBP on the stack
  o store ESP in EBP (EBP can now be used to reference the args inside
                      this function even though the stack pointer ESP
                      changes when you push/pop etc.)
  o maybe sub ESP for some space for local variables -----------------+
  o do the work of the function -                                     |
    note the function can refer to the arguments passed to it by:     |
    and to save the return values so the calling function can use them|
    it does:                                                          |
                                                                      |
              .                     .                                 |
              .                     .                                 |
              +---------------------+                                 |
              |   function arg 2... |                                 |
              +---------------------+                                 |
              |    EIP of address   |                                 |
              |    after CALL       |                                 |

              +---------------------+                                 |
  EBP ------> |   pushed  EBP       |                                 |
              +---------------------+                                 |
              |   function locals...|   ^                             |
              +---------------------+   |-----------------------------+
  ESP ------> |   function locals...|   v
              +---------------------+                                

    After that it just needs to clean up the stack before returning
  o point ESP to where push's (i.e. jump over function locals)
  o pop EBP etc.
  o ret from function (which implicitly first pops the EIP return address
                       off the stack)                                    

If you didn't allocate enough memory for the function locals the
memory will be overwritten upwards (increasing memory addresses)
overwriting 'pushed EBP' and 'EIP of address after CALL' etc.

ACCU Book Review

Filed under: c — rcjp @ 3:16 pm

The Shellcoder’s Handbook: Discovering and Exploiting Security Holes

by Jack Koziol, Dave Aitel, David Litchfield, Chris Anley, Sinan “noir” Eren, Neel Mehta and Riley Hassell ISBN (0-7645-4468-3)

appeared in CVu December 2004 Vol 16 No 6

It is forgivable, looking at the main title, to think that this book is a reference for writing bash or korn shell scripts, but in fact ‘shellcode’ is the name given to the piece of code that is run after gaining control of a vulnerable program. Shellcode is so named because often the injected codes are instructions that will launch a root shell under unix.

If you’ve ever wondered about the story behind the security holes announced seemingly daily this book will show you why they occur, how the exploits work and the methods that led them to be discovered in the first place.

The book is split into four parts: the first hundred pages covers an introduction to exploitation on Linux x86 systems, the second hundred looks at Windows and another hundred covering Solaris and HP Tru64 systems. The third part looks at how to discover vulnerabilities with some useful tools and a final more advanced section looks at alternative shellcodes, database and kernel hacking.

There are a number of typos in the text and no errata page has yet appeared on the publisher’s website, indeed the links to resources mentioned throughout the book have yet to appear either, although the example code is there for download. The text is well written and structured with a conclusion at the end of each chapter.

Much of the book is assembler, often embedded in C code, or occasionally python scripts and although there is a brief review you should already be fairly comfortable reading assembler, or be prepared to learn quickly, to enjoy this book.

Many of the ideas are fairly simple – overfilling buffers that are processing user input, but the low level nature, restricted memory spaces and unknown elements, such as where the code will be executing in memory, often create layers of dependent problems magnifying the complexity. It can take considerable skill and ingenuity to turn a vulnerability into an exploit, not to mention a certain amount of luck, unsurprisingly its often thought of as a black art.

This book then is essentially a compendium of the techniques and resources used by several clearly experienced hackers; the aim being to teach a creative approach rather than list known exploits. What comes across in the tone of the book is the authors’ desire for the reader to succeed and enjoy the challenge as much as they obviously do. There is quite a bit of hand holding and encouragement early on to get past the point where most people give up but it is also a rich source of information with index and deserves the title ‘handbook’.

For programmers who have no interest in creating their own exploits, is there anything in this book? Well yes, the section on vulnerability discovery contains interesting information about the authors’ favourite tools, there’s a chapter on fuzzing (generating automated test input to discover bugs in your program) and source code auditing showing many common faults in C code. But the direction of the book is very clear – to subvert a target system.
Writing shellscripts is surprisingly good fun and the book will appeal to those who enjoy tricky programming puzzles and those who want an advanced but accessible low level security perspective on the programs they write and the operating systems they use. Highly recommended.

Older Posts »

Blog at WordPress.com.