Rcjp’s Weblog

March 24, 2009

Redirecting stdout within ipython

Filed under: python — rcjp @ 12:09 am

A friend was asking if I knew a way of grabbing the output from a python function within ipython. I couldn’t see an easy way (the normal shell way of ‘>’ redirection won’t work). So the following is a quick hack on IPython’s OutputTrap:

Sample session, just defining a function that outputs something and I’m also using the ipython trick of preceding the command with a comma to make the args into strings…

In [1]: def myfunc():
   ...:     print 'hello'
   ...:     
   ...:     

In [2]: run grab

In [3]: ,grab dumpfile myfunc()

and grab.py is…

import sys
from IPython import OutputTrap


def grab(fname, cmd):
    dump = OutputTrap.OutputTrap('dump','','',trap_out=1,quiet_out=1)
    dump.trap_all()
    try:
        eval(cmd, sys._getframe(1).f_globals, sys._getframe(1).f_locals)
    except:
        print sys.exc_info()

    dump.release_all()
    file(fname,'w').writelines( dump.summary() )
    dump.flush_all()

… and the output of myfunc() appears in the file ‘dumpfile’ so it seems to work, but note I haven’t done much testing.

April 2, 2008

Gaussian PIL Image Filter

Filed under: python — rcjp @ 6:54 pm

I could not see a gaussian filter in the python imaging library, but its simple enough to write one…

import ImageFilter
from PIL import Image
from numpy import *

def gaussian_grid(size = 5):
    """
    Create a square grid of integers of gaussian shape
    e.g. gaussian_grid() returns
    array([[ 1,  4,  7,  4,  1],
           [ 4, 20, 33, 20,  4],
           [ 7, 33, 55, 33,  7],
           [ 4, 20, 33, 20,  4],
           [ 1,  4,  7,  4,  1]])
    """
    m = size/2
    n = m+1  # remember python is 'upto' n in the range below
    x, y = mgrid[-m:n,-m:n]
    # multiply by a factor to get 1 in the corner of the grid
    # ie for a 5x5 grid   fac*exp(-0.5*(2**2 + 2**2)) = 1
    fac = exp(m**2)
    g = fac*exp(-0.5*(x**2 + y**2))
    return g.round().astype(int)

class GAUSSIAN(ImageFilter.BuiltinFilter):
    name = "Gaussian"
    gg = gaussian_grid().flatten().tolist()
    filterargs = (5,5), sum(gg), 0, tuple(gg)


im = Image.open('/home/rcjp/tmp/test.png')
im1 = im.filter(GAUSSIAN)
im1.save('/home/rcjp/tmp/testfiltered.png')

February 25, 2008

K language – ultimate coding brevity?

Filed under: python — rcjp @ 5:56 pm

After reading a thread on c.l.l and, over the last week, more than I care to on arc’s emphasis on brevity, I can’t decide if I’m impressed or appalled by the K language.

As K seems to be a proprietary language I wouldn’t normally look at it, but thankfully the Wikipedia entry has a link to a screencast by Michael Schidlowsky solving the birthday problem – how many people do you need in a room to, more often than not, have two or more persons with the same birthday. In K, to simulate 1000 rooms with 10 people in them its

#+/{~(#x)=#?x}' {(1000,x) _draw 365} 10

using ipython I’d do:

In [1]: from random import randint
In [2]: def samebday(n): return n != len(set(randint(1,365) for i in range(n)))
In [3]: def bday1000(n): return [samebday(n) for i in range(1000)].count(True)
In [4]: bday1000(23)
Out[4]: 496
In [5]: %timeit bday1000(23)
10 loops, best of 3: 367 ms per loop


Casting the list of random ‘birthdays’ (just integers from 1..365) to a python set removes any duplicates, so just checking if the length has changed will show if two or more were the same. The tipping point is 23 according to wikipedia, and the bday1000 function simulating 1000 rooms shows 496 of those rooms had common birthdays – which is roughly half of them. Not as short as K, but concise enough for me I think.

October 17, 2007

Show Nearest X11 System Colours

Filed under: python, utils — rcjp @ 1:16 pm

ColourInfo
Just a little utility to show the nearest system rgb colours (taken from rgb.txt) on an X11 system compared to the one under the cursor.

I couldn’t work out how to listen to mouse messages when I don’t own the window under the mouse pointer (on X Windows anyway) – so I cheated and this code takes a snapshot of the screen and displays it in the background. Obviously you can’t move/click on any windows when this thing is running! quit with ‘q’ or ‘ESC’ or just close the window.

#!/usr/bin/env python
#
# Takes a screenshot image of the root window and display a table of 
# nearest system colours compared to that under the mouse pointer
# can specify an argument 1..29 to show more colours (default 6)
# The window title shows the exact rgb values of pointer in hex
#
import gtk, re
from sys import argv

RGBCOLOURS = '/etc/X11/rgb.txt'   # rgb colour data

class ScreenColour(object):

    syscolours = {}  # hold system rgb.txt relating colours to names

    def __init__(self, rgbfile=RGBCOLOURS):
        # 1 pixel buffer for pixel under mouse pointer
        self.pix = gtk.gdk.Pixbuf(gtk.gdk.COLORSPACE_RGB,False, 8, 1, 1)
        for line in file(rgbfile).readlines():
            if not line.startswith('!'):
                rgbname = re.compile(r'\s*(\d+)\s*(\d+)\s*(\d+)\s*(.+)')
                (r,g,b,name) = rgbname.match(line).groups()
                self.syscolours[name.rstrip()] = (int(r),int(g),int(b))

    def cmp_screencolour(self,col,basecol):
        """Numerical difference between colours col(name) and basecol(r,g,b)"""
        return sum(abs(a-b) for a, b in zip(basecol, ScreenColour.syscolours[col]))

    def pixelinfo(self):
        """Returns the (r,g,b) value of colour under mouse pointer"""
        (_, x, y, _) = gtk.gdk.display_get_default().get_pointer()
        self.pix.get_from_drawable(gtk.gdk.screen_get_default().get_root_window(),
                                   gtk.gdk.colormap_get_system(),
                                   x,y, 0,0, 1,1)
        col = self.pix.get_pixels_array()
        return (int(col[0,0,0]), int(col[0,0,1]), int(col[0,0,2]))

    def nearest_colours(self, n, basecol):
        """Return the nearest n system colours compared to basecol"""
        nearest = self.syscolours.keys()
        nearest.sort(key=lambda (c): self.cmp_screencolour(c, basecol))
        return nearest[:n]

class ColourInfoWindow(ScreenColour):

    """Draw a table of colours nearest to that under mouse pointer"""
    label = []
    eb = []
    oldrgb = (-1,-1,-1)

    def delete_event(self, widget, event, data=None):
        gtk.main_quit()
        return False

    def __init__(self, tablesize):
        self.tablesize = tablesize
        self.image = gtk.Window()
        self.image.set_decorated(False)
        self.image.add_events(gtk.gdk.POINTER_MOTION_MASK)
        self.image.connect("motion_notify_event", self.event_handler)
        #
        # set background from a screen shot of the root window
        #
        screen = gtk.gdk.Pixbuf(gtk.gdk.COLORSPACE_RGB, False, 8,
                                gtk.gdk.screen_width(), gtk.gdk.screen_height())
        screen.get_from_drawable(gtk.gdk.get_default_root_window(),
                                 gtk.gdk.colormap_get_system(), 0, 0, 0, 0,
                                 gtk.gdk.screen_width(), gtk.gdk.screen_height())
        pixmap, mask = screen.render_pixmap_and_mask()
        self.image.set_app_paintable(True)
        self.image.realize()
        self.image.window.set_back_pixmap(pixmap, False)
        del pixmap
        self.image.fullscreen()
        self.image.show()

        self.window = gtk.Window(gtk.WINDOW_TOPLEVEL)
        self.window.set_size_request(200, 40*self.tablesize)
        self.window.set_resizable(False)
        self.window.set_transient_for(self.image)  # stay above screen image
        self.window.set_title("Colour Info")
        self.window.connect("delete_event", self.delete_event)
        self.window.add_events(gtk.gdk.KEY_PRESS_MASK)
        self.window.connect("key-press-event", self.event_keys)

        self.Colour = ScreenColour()
        vbox = gtk.VBox(True,2)
        for i in range(self.tablesize):
            self.label.append(gtk.Label("Unknown"))
            self.eb.append(gtk.EventBox())
            self.eb[i].add(self.label[i])
            vbox.add(self.eb[i])
        self.window.add(vbox)
        self.window.show_all()
        self.update_colours()

    def update_colours(self):
        rgb = self.Colour.pixelinfo()
        if self.oldrgb != rgb:
            self.oldrgb = rgb
            nearest = self.Colour.nearest_colours(self.tablesize, rgb)
            self.window.set_title("%02X %02X %02X" % rgb)
            for i in range(self.tablesize):
                try:
                    self.eb[i].modify_bg(gtk.STATE_NORMAL,
                                         gtk.gdk.color_parse(nearest[i]))
                    #
                    # if brighter than 128,128,128 switch to 
                    # a black background so text is visable
                    #
                    if (sum(self.syscolours[nearest[i]]) > 384):
                        w = '<span foreground="black">%s</span>' % nearest[i]
                    else:
                        w = '<span foreground="white">%s</span>' % nearest[i]
                    self.label[i].set_markup(w)
                except ValueError:
                    print 'unknown colour... ', nearest[i]

    def event_handler(self, widget, event=None):
        if event and event.type == gtk.gdk.MOTION_NOTIFY:
            self.update_colours()

    def event_keys(self, widget, event=None):
        if event:
            if event.keyval == gtk.gdk.keyval_from_name("Escape") \
               or event.keyval == gtk.gdk.keyval_from_name("q"):
                  self.window.destroy()
                  self.image.destroy()
                  gtk.main_quit()

if __name__ == "__main__":
    if len(argv) == 2 and (0<int(argv[1])<30):
        ColourInfoWindow(int(argv[1]))
    else:
        ColourInfoWindow(6)
    gtk.main()

October 10, 2007

Number reversing game

Filed under: python — rcjp @ 3:10 pm

Here is the reverse game where the object is to flip the initial numbers in a random sequence of 1..9 repeatedly until they are in order.

import random

numbers = random.sample(range(1,10), 9)
steps = 0

while numbers != sorted(numbers):
    print " ".join(map(str, numbers))
    nflip = int(raw_input("Flip how many? "))
    numbers[:nflip] = reversed(numbers[:nflip])
    steps += 1

print "Finished in %d steps." % steps

""" sample run...

In [6]: run finished/game.py
2 3 4 5 9 7 6 1 8
Flip how many? 4
5 4 3 2 9 7 6 1 8
Flip how many? 6
7 9 2 3 4 5 6 1 8
Flip how many? 8
1 6 5 4 3 2 9 7 8
Flip how many? 5
3 4 5 6 1 2 9 7 8
Flip how many? 7
9 2 1 6 5 4 3 7 8
Flip how many? 9
8 7 3 4 5 6 1 2 9
Flip how many? 8
2 1 6 5 4 3 7 8 9
Flip how many? 3
6 1 2 5 4 3 7 8 9
Flip how many? 6
3 4 5 2 1 6 7 8 9
Flip how many? 3
5 4 3 2 1 6 7 8 9
Flip how many? 5
Finished in 11 steps.
"""

October 1, 2007

Simple diff of 2 files

Filed under: c, lisp, python — rcjp @ 7:59 pm

A few days ago I wanted to compare to files each of which had one word per line (they were completion files for the rlwrap utility incidentally) thats easy enough with the unix shell commands, infact you can do it in one line

diff -iyw --suppress-common-lines <(sort -f file1) <(sort -f file2)

but I thought as a quick programming exercise I’d do it in C++/C, python and Common Lisp…

Calculating the difference is very easy using C++’s set_difference etc. but it is a surprise, I think for most programmers anyway, that you have to write your own case insensitive string comparison function. C++ sure is a peculiar mix of high and low level programming.

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <iterator>
#include <functional>

// 
// not sure there is any advantage to inherit from binary_function here?
// (infact we could just define a function rather than a struct and operator)
//
struct lessthan_nocase :
  public std::binary_function<const std::string&, const std::string&, bool>
{
    bool operator()(const std::string& s1, const std::string& s2) const
    {
        std::string::const_iterator p1 = s1.begin();
        std::string::const_iterator p2 = s2.begin();

        while(p1 != s1.end() && p2 != s2.end()) {
            if (toupper(*p1) != toupper(*p2)) return toupper(*p1) < toupper(*p2);
            ++p1;
            ++p2;
        }
        return s1.size() < s2.size();
    }
};

//
// set_difference only work on sorted containers
//
void sorted_readfile(const char* file, std::vector<std::string>& fvec)
{
    std::ifstream f(file);
    std::istream_iterator<std::string> finput(f), fend;

    copy(finput, fend, back_inserter(fvec));
    sort(fvec.begin(), fvec.end(), lessthan_nocase());
}

//
// dump results (show words if < MAXSHOW)
//
const unsigned int MAXSHOW = 10;

void display_diff(std::string title, std::vector<std::string> v)
{
    std::cout << title;
    if (v.size() < MAXSHOW) {
        std::cout << std::endl << std::string(title.size(), '-') << std::endl;
        copy(v.begin(), v.end(),
             std::ostream_iterator<std::string>(std::cout, "\n"));
    } else {
        std::cout << " =  " << v.size() << " words" << std::endl;
    }
    std::cout << std::endl;
}

int main(int argc, char* argv[])
{
    if (argc != 3) {
        std::cerr << "Usage: tdiff filename1 filename2" << std::endl;
        exit(1);
    }

    std::vector<std::string> f1;
    sorted_readfile(argv[1], f1);

    std::vector<std::string> f2;
    sorted_readfile(argv[2], f2);

    std::vector<std::string> notinf1;
    set_difference(f1.begin(), f1.end(), f2.begin(), f2.end(),
            back_inserter(notinf1), lessthan_nocase());
    display_diff("words in 1st but not in 2nd", notinf1);

    std::vector<std::string> notinf2;
    set_difference(f2.begin(), f2.end(), f1.begin(), f1.end(),
            back_inserter(notinf2), lessthan_nocase());
    display_diff("words in 2nd but not in 1st", notinf2);

    std::vector<std::string> symdiff;
    set_symmetric_difference(f2.begin(), f2.end(), f1.begin(), f1.end(),
            back_inserter(symdiff), lessthan_nocase());
    display_diff("symmetric difference", symdiff);

    std::vector<std::string> inter;
    set_intersection(f2.begin(), f2.end(), f1.begin(), f1.end(),
            back_inserter(inter), lessthan_nocase());
    display_diff("intersection", inter);

    return 0;
}

In C, the only thing to trip you up is getting the casting of the void * pointers in strcmp_nocase correct before trying to dereference them. Also I’ve cheated and used the non-ansi strcasecmp (sometimes called stricmp).

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

#define MAXLINE  256
#define MAXWORDS 10000
#define MAXSHOW  10

typedef enum {false, true} bool;

bool strempty(char *str)
{
    char *p = str;
    while (*p) if (!isspace(*p++)) return false;
    return true;
}

int strcmp_nocase(const void *s1, const void *s2)
{
    /* The actual arguments to this function are "pointers to
       pointers to char", but strcmp() arguments are "pointers
       to char", hence the following cast plus dereference */
    return strcasecmp( *(char * const *)s1, * (char * const *) s2);
}

int sorted_readfile(char *filename, char **words)
{
    FILE *f = fopen(filename, "r");

    if (!f) {
        fprintf(stderr, "can't open %s\n", filename);
        exit(1);
    }

    int count = 0;
    char line[MAXLINE], *q;
    while (fgets(line, MAXLINE, f) && (count < MAXWORDS)) {
        if ((q=strpbrk(line,"\r\n"))) *q=0;     /* words dont include newlines */
        if (strempty(line)) continue;           /* skip blank lines */
        words[count] = (char *) malloc(strlen(line)+1);
        /* or words[count++] = strdup(line)*/
        strcpy(words[count], line);
        count++;
    }

    fclose(f);

    if (count == MAXWORDS) {
        fprintf(stderr, "increase MAXWORDS buffer size\n");
        exit(1);
    }
    qsort(words, count, sizeof(char *), strcmp_nocase);
    return count;
}

void display_diff(char *title, char **v, int vsize)
{
    int i;
    printf("%s", title);

    if (vsize < MAXSHOW) {
        printf("\n");
        for (i=0; i<strlen(title); i++) putchar('-');
        printf("\n");
        for (i=0; i<vsize; i++)
            printf("%s\n", v[i]);
    } else {
        printf(" = %d words\n", vsize);
    }
    printf("\n");
}

int main(int argc, char *argv[])
{
    if (argc != 3) {
        fprintf(stderr, "Usage: tdiff filename1 filename2\n");
        exit(1);
    }
    int i;
    char **f1words;
    f1words = malloc(MAXWORDS*sizeof(char*));
    int nf1words = sorted_readfile(argv[1], f1words);

    char **f2words;
    f2words = malloc(MAXWORDS*sizeof(char*));
    int nf2words = sorted_readfile(argv[2], f2words);

    char **diff;
    int ndiff = 0;
    diff = malloc(MAXWORDS*sizeof(char*));
    for (i=0; i < nf1words; i++) {
        if (!bsearch(&f1words[i], f2words, nf2words,
                     sizeof(char*), strcmp_nocase)) {
            /* could just point into f1words instead */
            diff[ndiff++] = strdup(f1words[i]);
        }
    }
    display_diff("words in 1st but not in 2nd", diff, ndiff);

    return 0;
}

In python, most of it is easy, though its perhaps not very pythonesque to cram some things on one line like I’ve done here. Making set case independent requires the __hash__ and __eq__ functions (I think that is all we need in this case) get overridden to use a saved lowercase version of the supplied string (see the python reference) .

import string

class NoCaseStr(str):
    def __init__(self, s):
        str.__init__(self, s)
        # keep a copy of the lower case string
        self.loweredstr = s.lower()

    def __eq__(self, s):
        return self.loweredstr == s.lower()

    def __hash__(self):
        return hash(self.loweredstr)

def display_diff(title, dset, Maxshow=10):
    if len(dset) < Maxshow:
        print title
        print len(title) * '-'
        print '\n'.join(dset)
    else:
        print title, '=', len(dset)
    print

def read_words(f):
    w = set(NoCaseStr(string.strip(word)) for word in open(f).readlines())
    w.discard('')# slightly clumsy... '  \n' gets stripped to '' so discard it
    return w

def tdiff(f1, f2):
    w1 = read_words(f1)
    w2 = read_words(f2)
    display_diff('in 1st but not in 2nd', w1.difference(w2))
    display_diff('in 2nd but not in 1st', w2.difference(w1))
    display_diff('symmetric difference',  w1.symmetric_difference(w2))
    display_diff('intersection',          w1.intersection(w2))

if __name__ == '__main__':
    import sys
    if len(sys.argv) < 3:
        print 'Usage: %s file1 file2' % sys.argv[0]
        sys.exit(1)

    tdiff(sys.argv[1], sys.argv[2])

My favourite language – Common Lisp has everything you need to do with without overriding anything. string-equal is case insensitive (as opposed to string=)

(defun read-words (file)
  (with-open-file (str file)
    (loop for line = (read-line str nil nil)
       for word = (string-trim " " line)
       while line
       unless (string= word "")
       collect word)))

(defun display-diff (title diff &optional (max-show 10))
  (format t "~A: ~V{~A~^ ~}~%" title max-show diff))

(defun tdiff (f1 f2)
  (let ((w1 (read-words f1))
        (w2 (read-words f2)))
    (display-diff "in 1st but not in 2nd" (set-difference w1 w2 :test #'string-equal))
    (display-diff "in 2nd but not in 1st" (set-difference w2 w1 :test #'string-equal))
    (display-diff "intersection" (intersection w1 w2 :test #'string-equal))))

(tdiff "/home/r/tmp/f1" "/home/r/tmp/f2")

April 27, 2007

Finding open oflag Settings

Filed under: c, python — rcjp @ 11:03 am

In tracings you see the numerical value for flags like the oflag settings to the C open commands e.g.

    5901 open("/proc/asound/cards", 32768, 0666) = 7

to find out what they mean I just quickly did:

    In [36]: def oflag_to_string(oflag):
       ....:     for c in dir(os):
       ....:         if c.startswith('O_'):
       ....:             if eval('os.'+c) & oflag:
       ....:                 print c
       ....:
       ....:

    In [37]: oflag_to_string(32768)
    O_LARGEFILE

but afterwards realised I could have just done an strace since that nicely converts the values for you.

April 17, 2007

reStructured Text

Filed under: python, utils — rcjp @ 4:20 pm

rst is a very easy to use ascii markup language. With some python script or the rst2html tool you can quickly generate html. For example, the following html (between the horizontal lines) was generated by the program below and uses the python docutils module on the ascii testtext python variable (but would normally be a file read in from somewhere):


The Title

My first sentence and some more.

  • bullet one
  • two
    • sublists, just like bullet lists must be separated by blank lines
    • second sublist
  • three

and more paragraph text.

Referees: A.N. Other
Boo Boo

Some literal indented code:

program(1)
for i = 1, 10
    nothing
done

And some more text

Header 1 Header 2 Header 3
body row 1 column 2 column 3
body row 2 Cells may span columns.
body row 3 Cells may
span rows.
  • Cells
  • contain
  • blocks.
body row 4

Run with:

rst2html ttt2 > k.html

For more details look at docutils quickref. and also check the latest
developments validator and latex


and the python code…
#
# some example rst text manipulated with python below...
#
testtext="""
The Title
=========

My first sentence and some more.

- bullet one
- two 

  - sublists, just like bullet lists must be separated by blank lines
  - second sublist

- three

and more paragraph text.

:Referees:
    A.N. Other
    Boo Boo

Some literal indented code::

    program(1)
    for i = 1, 10
        nothing
    done

And some more text

+------------+------------+-----------+
| Header 1   | Header 2   | Header 3  |
+============+============+===========+
| body row 1 | column 2   | column 3  |
+------------+------------+-----------+
| body row 2 | Cells may span columns.|
+------------+------------+-----------+
| body row 3 | Cells may  | - Cells   |
+------------+ span rows. | - contain |
| body row 4 |            | - blocks. |
+------------+------------+-----------+

Run with::

    rst2html ttt2 > k.html

For more details look at docutils_ quickref. and also check the latest
developments validator_ and latex_

.. _Python: http://www.python.org/
.. _docutils: http://docutils.sourceforge.net/docs/user/rst/quickref.html#bullet-lists
.. _validator: http://docutils.sourceforge.net/sandbox/dugui/
.. _latex: http://docutils.sourceforge.net/sandbox/latex_directive/
"""

from docutils.core import publish_string, publish_parts
#
# quickly dump out the html using my own stylesheet
#
output = open("test-rst.html", 'w')
customcss = {'embed_stylesheet':False, 'stylesheet_path':'include/mystyle.css'}
output.write(publish_string(testtext, writer_name = 'html',
                            settings_overrides=customcss))
output.close()
#
# just dump the body part, no stylesheet at all
#
output = open("test-rst-body.html", 'w')
parts = publish_parts(testtext, writer_name = 'html',
                      settings_overrides=dict(embed_stylesheet=False,
                                              stylesheet_path=None,))
output.write(parts['html_body'])
output.close()


Pulling out the body text is handy for generating html that you are inserting inside another document, like this blog entry for example.

April 5, 2007

Patterns

Filed under: physics, python — rcjp @ 3:56 pm

I think I got the matrix formula for this pattern from “Introduction to Dynamics” by Percival and Richards (its certainly the pattern I remember from the book cover)

Pattern

You can get different patterns if you move the slider. Python gtk code…

import pygtk
pygtk.require('2.0')
import gtk, math

class Pattern(object):
    x, y = 0, 0
    points=[]
    alpha = 76.11
    def __init__(self, xsize, ysize):
        self.xsize, self.ysize = xsize, ysize
        window = gtk.Window(gtk.WINDOW_TOPLEVEL)
        window.set_title("Pattern")
        window.connect("destroy", lambda w: gtk.main_quit())
        vbox = gtk.VBox(homogeneous=False, spacing=5)
        window.add(vbox)
        self.area = gtk.DrawingArea()
        self.area.set_size_request(xsize, ysize)
        self.area.connect("expose-event", self.area_expose_cb)
        self.area.connect("size-allocate", self.calculate_points)

        slider = gtk.Adjustment(value=self.alpha, lower=0.0,
                                upper=100.0, step_incr=0.01)
        slider.connect("value-changed", self.slider_changed)
        self.hscale = gtk.HScale(slider)
        vbox.pack_start(self.area, expand=True, fill=True, padding=0)
        vbox.pack_start(self.hscale, expand=False)
        window.show_all()

    def slider_changed(self, slider):
        self.alpha = slider.get_value()
        self.calculate_points()

    def calculate_points(self, area=None, event=None):
        x, y, self.xsize, self.ysize = self.area.get_allocation()
        xscale = self.xsize/2.0
        yscale = self.ysize/2.0
        m = 52
        angle = self.alpha*math.pi/180.0
        c = math.cos(angle)
        s = math.sin(angle)
        self.points=[]
        try:
            for j in range(m):
                x = 0
                y = j/float(m)
                for n in range(200):
                    w = x
                    x = x*c - (y - x*x)*s
                    y = w*s + (y - w*w)*c
                    if abs(x)>4 or abs(y)>4: raise StopIteration()
                    if x>1 or y>1: continue
                    self.points.append((int(x*xscale+xscale), int(y*yscale+yscale)))
        except StopIteration:
            self.area.queue_draw()

    def area_expose_cb(self, area, event):
        blue = self.area.get_colormap().alloc_color("#0000FF")
        pointgc = self.area.window.new_gc()
        pointgc.set_foreground(blue)
        area.window.draw_point(pointgc, 20, 10)
        self.area.window.draw_points(pointgc, self.points)

def main():
    gtk.main()
    return 0

if __name__ == '__main__':
    Pattern(600, 500)
    main()

March 30, 2007

Old Blog Code

Filed under: python — rcjp @ 1:10 pm

just incase there is some useful code in it, this is what I used to use to generate my old blog before I gave up and moved to wordpress. I had a nested directory of text files that contained keywords that this python expanded into html tags. It even used to spawn off gimp to generate heading images.

import os, sys, re
global bloghome
bloghome = 'http://www.codephrenic.co.uk/blog/'   # where the blog is destined for (see below)
blogbase = '/home/r/blog'                         # where the files get generated to

# html tag translations
# e.g. link["www.physics.org", "the physics website"]
#      bigpic[{'src':'/mypic.jpg', 'title':'blah', 'width':100, 'height':100}]
tags = {
    'title'     : '<div class="header"><h2>%s</h2></div>',
    'entry'     : '<div class="entry">%s</div>',
    'date'      : '<div class="date">%s</div>',
    'list'      : '<ul>%s</ul>',
    'item'      : '<li>%s</li>',
    'link'      : '<a href="%s">%s</a>',
    'pic'       : '<img src="%s" alt="%s" width="50" height="50"/>',
    'picright'  : '<img class="floatright" src="%s" width="50" height="50"/>',
    'bigpic'    : '<img src="%(src)s" alt="%(title)s" width="%(width)d" height="%(height)d"/>',
    'italic'    : '<i>%s</i>',
    'tt'        : '<tt>%s</tt>',
    'bold'      : '<b>%s</b>'
    }

blogheader = """<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
    <head>
        <title>rcjp blog</title>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
        <link rel="stylesheet" type="text/css" href="blog.css" />
        <link rel="stylesheet" type="text/css" href="titles.css" />
    </head>
    <body><div id="blogpage">
"""

blogfooter = '</div></body></html>'

def make_heading_image(title, category):
    print 'processing image %s' % category
    # change any  ' -> '\''
    cmd = """gimp --batch-interpreter plug_in_script_fu_eval -i -b '(blogheading "%s" 100 "Trebuchet MS" "%s")' -b '(gimp-quit 0)'""" % (title.replace("'", "'\\''"), blogbase+"/images/"+category+".png")
#     f = os.popen("%s" % cmd, 'w')
    f = os.popen(cmd, 'w')
    f.flush()
    f.close()

def func_expand(match):
    fname, body = match.group(1,2)
    try:
        if body.startswith('"') or body.startswith('{'):
            return tags[fname] % (eval(body))
        else:
            return tags[fname] % (body)
    except(TypeError), val:
        sys.exit('***ERROR: Tag "%s" has been supplied wrong arguments "%s"' % (fname, body))
    except(KeyError), val:
        sys.exit('***ERROR: Unknown tag "%s"' % fname)

def expand_string(str):
    # look for any functions of the form  func[...]
    # embedded that we should expand into html tags
    func = re.compile(r'(\w+)\s*?\[([^\[]*?)\]')
    found = 1
    while found > 0:
        (str, found) = func.subn(func_expand, str)
    return str

def escape_string(text):
    # change > to &gt; etc.
    # and deal with code[], it needs separate treatment because
    # the regexp approach in expand_string will find [] inside
    # code segments and try to expand it 
    searchstr = 'code['
    pos = text.find(searchstr)
    newtext = ''
    while pos >= 0:
        bcount = 1             # count the number of [ we have to skip
        newtext = text[:pos]+'<code>'
        pos += len(searchstr)
        while pos < len(text):  # find matching ] bracket
            if text[pos] == '[':
                bcount += 1
                newtext += '['
            elif text[pos] == ']':
                bcount -= 1
                if bcount == 0:
                    newtext += '</code>' + text[pos+1:]
                    break
                newtext += ']'
            elif text[pos] == '<':
                newtext += '&lt;'
            elif text[pos] == '>':
                newtext += '&gt;'
            elif text[pos] == '&':
                newtext += '&&;'
            else:
                newtext += text[pos]
            pos += 1
        if pos == len(text):
            sys.exit('Could not find matching bracket')
        text = newtext
        pos = text.find(searchstr)
    return text

def paragraphs(lines, is_separator=str.isspace, joiner=''.join):
    paragraph = []
    for line in lines:
        if is_separator(line):
            if paragraph:
                yield joiner(paragraph)
                paragraph = []
        else:
            paragraph.append(line)
    if paragraph:
        yield joiner(paragraph)

def process_entry(filename):
    print 'processing ', filename
    blogfile = open(filename, 'r')
    dir, name = os.path.split(filename)

    iday = int(name[6:8])    # turn 08 into 8 etc
    day = str(iday)
    if 4 <= iday <= 20 or 24 <= iday <= 30: # get the day suffix
        day+='th'
    else:
        day+= ['st', 'nd', 'rd'][iday % 10 - 1]

    months = {'01':'January', '02':'February', '03':'March', '04':'April', '05':'May', '06':'June',
              '07':'July', '08':'August', '09':'September', '10':'October', '11':'November', '12':'December'}
    month = months[name[4:6]]

    date = '%s %s %s' % (day, month, name[0:4])
    entry = 'entry[date[%s]%s]' % (date, blogfile.read())
    blogfile.close()

    text = ''
    for para in paragraphs(expand_string(escape_string(entry)).splitlines(True)):
        if para.startswith('<'):
            text += para
        else:
            text += '<p>%s</p>' % para
    return text

def build_blogfiles(categories):
    #
    # Create the rhs html links pane for all the categories and the archive years
    #
    category = ''
    archive = ''
    for cat in categories.keys():
        link = '<li><a href="%s" title="%s">%s</a></li>' % (bloghome + cat + '.html', cat, cat)
        if cat.isdigit():
            archive += link
        else:
            category += link
    archivepane = '<div id="archive"> <h2>Archive</h2> <ul>%s</ul> </div>' % archive
    categorypane = '<div id="category"><h2>Category</h2> <ul>%s</ul></div>' % category
    #
    # titles.css contains entries like
    # h1#titleimg2006 { background-image: url(images/titleimg2006.png); }
    #
    titles = open(blogbase+'/titles.css', 'w')

    for cat, files in categories.iteritems():
        #
        # create and image for each category and an reference in the titles.css
        #
        blogheading = "rcjp's blog - %s" % cat
        blogheadingID = 'titleimg' + cat             # can't have an id of '2006' since its a number
        blogheadingImageFile = 'images/'+blogheadingID+'.png'
        make_heading_image(blogheading, blogheadingID)
        titles.write("h1#%s { background-image: url(%s); }\n" % (blogheadingID, blogheadingImageFile))

        archive = open(blogbase+'/%s.html' % cat, 'w')
        archive.write(blogheader)
        archive.write('<h1 class="pageheading" id="%s"> <span>%s</span></h1>' %
                      (blogheadingID, blogheading))
        archive.write(archivepane + categorypane)
        archive.write('<div id="content">')
        for f in categories[cat]:
            archive.write(process_entry(f))
        archive.write('</div>')
        archive.write(blogfooter)
        archive.close()
    titles.close()

def find_entries(blogroot):
    '''walk directories below blogroot to find .txt files for processing
       the subdirectory gives the catagory for the blog entry
       the filename e.g. 20031126.txt YYYYMMDD gives the date for the entry'''
    categories = {}
    for path, dirs, files in os.walk(blogroot):
        if 'tmp' in dirs:           # ignore files in any 'tmp' subdirectory
            dirs.remove('tmp')

        for f in files:
            filename, ext = os.path.splitext(f)
            if ext == '.txt':
                fullname = os.path.join(path, f)
                dir, cat = os.path.split(path)
                year = filename[0:4]
                categories.setdefault(cat, []).append(fullname)
                categories.setdefault(year, []).append(fullname)
    return categories

if __name__ == '__main__':
    from optparse import OptionParser
    parser = OptionParser()
    parser.add_option("-u", "--upload", action="store", dest="upload", help="upload to website")
    options, args = parser.parse_args()
    if args:
        parser.error("no arguments are allowed")

    if options.upload is None:
        bloghome = 'file:///home/r/blog/'
    else:
        bloghome = 'http://www.codephrenic.co.uk/blog/'

    categories = find_entries(blogbase)
    print 'Generating blog'
    print 16*'-'
    build_blogfiles(categories)

February 1, 2007

globbing

Filed under: python, unix — rcjp @ 11:23 am

I hate globbing. Whoever thought that it was a good idea to let the shell snaffle the arguments you’ve given to a program needs to be pushed off somewhere high onto some burning cactii.

You can turn it off in the shell, but this makes most unix commands useless since they don’t do the file expansion themselves (like they should).

In ruby…

    for f in Dir.glob("/home/r/tmp/**/*")
        File.open(f) { |file|
            puts "filename #{f}"
        } if File.file?(f)
    end

python…

    import glob, os

    for root, dirs, files in os.walk('/home/r/tmp'):
        for f in files: print 'file -> ', os.path.join(root,f)

The Common Lisp pathname stuff can be a bit confusing, which is a pity because it does have some nice tricks up its sleeve.

You build the directory section of the pathname as a list taking keyword arguments :wild where you want to match at one directory level and :wild-inferiors when you want to recurse.

CL-USER> (make-pathname :name :wild
                         :type :wild
                         :directory (list :absolute "home" "r" "tmp" :wild-inferiors))
#P"/home/r/tmp/**/*.*"

so running (DIRECTORY #P"/home/r/tmp/**/*.*") should give you all the files in the tmp directory and below. Some compilers though take directories themselves to be files and others don’t, so you might need to use FILE-NAMESTRING to see if the filename part of the path is nil i.e. its a directory.

January 15, 2007

Swapping Spaces for Punctuation

Filed under: c, lisp, python — rcjp @ 10:17 am

Just a quicky. In Common Lisp…

    (defun spacify-punc (string)
      (substitute-if #\Space #'punc-p string))

    (defun punc-p (char) (find char "*,.:;-()"))

then use it…

    CL-USER> (spacify-punc "this, string. and: more")
    "this  string  and  more"

in python…

    def spacify_punc(char):
        if char not in "*,.:;-()":
            return char
        else:
            return ' '

    print ''.join(spacify_punc(c) for c in "this, string. and: more")

in C possibly something like…

    #include <stdio.h>
    #include <string.h>

    int main()
    {
      char words[] = "this, string. and: more";
      char *answer = strdup(words);
      unsigned int i;

      /* using sizeof (instead of strlen) so we include the NULL */
      for (i=0; i < sizeof(words); i++)
        answer[i] = strchr("*,.:;-()", words[i]) ? ' ': words[i];

      printf("'%s' becomes '%s'\n", words, answer);
      return 0;
    }

All three languages have functions to detect punctuation, but we are only looking for a subset in this case.

I think I like the CL solution best but the order of args to substitute-if always throws me, thank heavens for SLIME’s argument prompting.

January 11, 2007

Quick Graphviz Test

Filed under: python — rcjp @ 10:14 am

I occassionally need something to show a connection graph and graphviz is a handy tool especially using the python pydot interface, just define some node links…

    r@laptop:~/src/python/pydot-0.9.10$ sudo python setup.py install
    running install
    running build
    running build_py
    creating build
    creating build/lib
    copying pydot.py -> build/lib
    copying dot_parser.py -> build/lib
    running install_lib
    copying build/lib/pydot.py -> /usr/lib/python2.4/site-packages
    copying build/lib/dot_parser.py -> /usr/lib/python2.4/site-packages
    byte-compiling /usr/lib/python2.4/site-packages/pydot.py to pydot.pyc
    byte-compiling /usr/lib/python2.4/site-packages/dot_parser.py to dot_parser.pyc


then, running in ipython

    In [35]: import pydot

    In [36]: edges=[(1,2), (1,3), (1,4), (3,4)]

    In [37]: g=pydot.graph_from_edges(edges)

    In [38]: g.write_png('test-graph.png', prog='dot')
    Out[38]: True


test-graph

January 7, 2007

Renaming Groups of Files

Filed under: python, unix — rcjp @ 4:13 pm

I have done it in the times in the past, but I always forget the-right-way because different unix shells do loops differently etc, so I now always use ipython to e.g. lowercase all filenames in a directory…

    In [14]: import os

    In [15]: !! ls
    Out[15]:
    ['DEB1.GLE',
     'DEB1.eps',
     'DEBF1.RES',
     'DEBF2.RES',
     'DEBG1.RES',
     'DEBG2.RES',
     'DEBYE1.EXE',
     'DEBYE1.FOR',
     'DEBYE2.EXE',
     'DEBYE2.FOR']

    In [16]: for f in _15: os.rename(f, f.lower())


and for doing that recursively of course just do

    !! find . -type f

December 26, 2006

lotto numbers

Filed under: lisp, python — rcjp @ 9:26 am

I can’t remember how I came to be subscribed to a perl feed but I ended up reading http://www.oreillynet.com/onlamp/blog/2006/12/99_problems_in_perl_6.html
which was based on something similar in Lisp I think. The article showed some Scheme like Common Lisp solution to picking out the lottery numbers all cons’s etc.

My first somewhat crumby solution was

    (defun lotto (n m)
      "Draw N different random numbers from the set 1..M"
      (loop for num = (1+ (random m))
         for bag = nil then (adjoin num bag)
         while (< (length bag) n)
         finally (return bag)))

as I first thought of the adjoin function when I thought about adding numbers, but you shouldn’t be adjoining and then setting the result back to itself – pushnew would do that. Maybe its clearer just to use member

    (defun lotto1 (n m)
      (loop while (< (length bag) n)
         for num = (1+ (random m))
         unless (member num bag) collect num into bag
         finally (return bag)))

I vaguely remembered a thread on comp.lang.lisp on this subject and found a post from Alan Crowe did it prettier with ‘do’ – slapped wrists for me since I always dismiss ‘do’ as being harder to read than loop

    (defun lotto2 (n m)
      (do ((draw '()))
          ((= (length draw) n) draw)
        (pushnew (+ (random m) 1) draw)))

Of course these don’t work well if you want 49 out of 50 as there are too many non-unique numbers generated, Brian Downing posted a version using a nice idea of generating a list of numbers with an associated random number and sorting on that random number to shuffle the list

    (defun lotto3 (n m)
      (subseq (mapcar #'car (sort (loop for i from 1 upto m
                                     collect (cons i (random 1.0)))
                                  #'< :key #'cdr))
              0 n))

In python, as I was looking up the random function, the docs showed how you can do this sort of thing in one line (remember xrange, like range, generates numbers upto but not including the second arg)

    In [16]: random.sample(xrange(1,50),6)
    Out[16]: [29, 28, 32, 45, 42, 2]

December 22, 2006

String Searching/Replacing

Filed under: c, lisp, python — rcjp @ 9:04 am

just some quick notes on how various languages handle replacing/splicing a string…

In C++ chopping, searching and replacing in a string is fairly easy

    #include <iostream>
    #include <string>

    int main()
    {
        std::string s = "some one with more than one that ones.";

        s.erase(0,3);
        s.replace(s.find("one"), 3, "three");

        std::cout << s << std::endl;
    }

in straight C doing the same thing is a bit more fiddly

    #include <stdio.h>
    #include <string.h>

    int main()
    {
        char* s = "some one with more than one that ones.";

        char buf[255];
        strcpy(buf, s+3);  /* chop off the first 3 chars */

        char *p = strstr(buf, "one");
        if (p)
        {
            char tmp[255];
            *p = (char) 0;
            strcpy(tmp, buf);
            strcat(tmp, "three");
            strcat(tmp, p+3);  /* skip over length of "one" */
            strcpy(buf, tmp);
        }

        printf("%s\n", buf);
        return 0;
    }

you have to think about the size of the temporary buffer unless you malloc something based on the size of s – but then you have to know how much bigger your operations will make the string.

Perhaps suprisingly there isn’t any standard function to replace strings in Common Lisp – I guess its one of those things your are supposed to deal with yourself since if you know you your replacement word is the same size you can destructively alter the string (using setf on the subseq), otherwise you have to build a new string (since you may be replacing a word with a bigger word). So CL leaves things to you to figure out the best approach.

You can get the position of a string within another with

    * (search "one" "there one is one more than ones")
    6

and then build up the string a la C…

    (let* ((str "some one with more than one that ones.")
           (word "one")
           (p (search word str)))
      (if p
          (concatenate 'string (subseq str 3 p)
                               "three"
                               (subseq str (+ p (length word))))
          p))

In python

    In [3]: "there one is one more than ones".replace("one", "three", 1)
    Out[3]: 'there three is one more than ones'

where we are using a count=1 otherwise it would replace all instances. We can even chop off the first three chars and then replace all in one go

    In [5]: "there one is one more than ones"[3:].replace("one", "three")
    Out[5]: 're three is three more than threes'

Replacing all occurances in C++ is a bit more work

    std::string::size_type n;  // or   size_t n; 
    std::string word = "one";
    while((n=s.find(word)) != std::string::npos)
        s.replace(n, word.size(), "three");

Replacing all strings in C or Common Lisp is quite alot more work and probably better to write a utility function and keep it somewhere or use a library.
http://en.wikibooks.org/wiki/Programming:Common_Lisp/Strings has

    (defun replace-all (string part replacement &key (test #'char=))
      (with-output-to-string (out)
        (loop with part-length = (length part)
           for old-pos = 0 then (+ pos part-length)
           for pos = (search part string :start2 old-pos :test test)
           do
             (write-string string out :start old-pos
                                      :end (or pos (length string)))
           when pos do
             (write-string replacement out)
           while pos)))

or maybe use cl-ppcre

October 21, 2006

Autogenerated Poetry

Filed under: python — rcjp @ 7:23 pm

#
# Sucks in a story like Milton's Paradise Lost and generates text
# using the word probabilty from the sample document
# translated from Common Lisp Graham p.140
#
import re
import random
import textwrap

words = {}

def read_text(filename):
    """Fill the words dictionary with unique words from 'filename',
    each word entry is itself a dictionary of the words that follow
    that word and their frequency"""

    sampletext = open(filename, 'r').read()
    previous = '.'  #  i.e. the start of a sentence
    for wordpunc in sampletext.lower().split():
        # keep the punctuation as part of the word e.g. 'bye.'
        # now split 'bye.' as separate words 'bye' and '.'  note (\W)
        # returns the splitting characters as well as the fields so we
        # get 'bye' '.' '' and we want to ignore the last seperator 

        # [Note normally we'd want to avoid punctuation so should do   
        #  words=re.compile(r'[\w'-]+') then  
        #  for word in words.finditer(line)
        #  do something to word.group(0) ]
        for word in [w for w in re.split(r'(\W)', wordpunc) if w != '']:
            if previous not in words:
                words[previous] = {word : 1}
            else:
                words[previous][word] = words[previous].get(word, 0) + 1
            previous = word
    print 'Using a vocabulary of', len(words), 'words'

def format_word(word, previous):
    if previous == '.':
        word = ' ' + word.capitalize()
    else:
        if word.isalpha(): word = ' ' + word
    return word

def generate_text(nwords, previous = '.'):
    """Prints 'nwords' random words chosen according to statistically
    likely order calculated in read_text"""

    text = ''
    for n in xrange(nwords):
        nextwords = words[previous]
        count = 0
        pick = random.randint(0, sum(nextwords.values()))
        for word, freq in nextwords.iteritems():
            count += freq
            if count >= pick:
                text += format_word(word, previous)
                break
        previous = word
    return text

if __name__ == '__main__':
    read_text('c:/tmp/testwords')
    print textwrap.fill(generate_text(100), 50)

"""
e.g.
In [210]: read_text('c:/tmp/ParadiseLost.txt')
Using a vocabulary of 8993 words

In [211]: print textwrap.fill(generate_text(100),50)
 Farewell, immutable; i give not: conviction to
torment me, and worse. Satan only disagree of
servants feet. To that i boast what was old night,
metals of belial came; lest the government well
thou for which we seek some glade, or who wrong,
but he more soft and evil ruin. So erroneous there
dwell permits, and inward faculties, and bliss),
or, to know. With design to spend, till then to
the fields, when he ended; of nature him hither
thrust
"""

October 3, 2006

Word Frequency Comparison

Filed under: c, lisp, python — rcjp @ 2:49 pm

Quite a common programming exercise is to write a word frequency counter. For a bit of fun I thought I’d code it in a few languages for comparison. There are essentially two parts to the problem: splitting up text into real words and keeping a tally of the occurances. Before we look at the python here is an interesting piece of shell script from the Classic Shell Scripting Oreilly book:

#! /bin/sh
# Read a text stream on standard input, and output a list of
# the n (default: 25) most frequently occurring words and
# their frequency counts, in order of descending counts, on
# standard output.
#
# Usage:
#       wf [n]
tr -cs A-Za-z\' '\n' |        # Replace nonletters with newlines
  tr A-Z a-z |                # Map uppercase to lowercase
    sort |                    # Sort the words in ascending order
      uniq -c |               # Eliminate duplicates, showing their counts
        sort -k1,1nr -k2 |    # Sort by descending count, and then by ascending word
          sed ${1:-25}q       # Print only the first n (default: 25) lines;


You can then use e.g. head -n10 to get the top ten words, or wf 999999 < texfile | wc -l to find the number of unique words etc. It is impressive that it can be done, but I think I’d forget what those args to sort did pretty quick.

Python

There was an entry in the python shootout (which I can’t seem to find now online?)

import sys

def main():
    i_r = map(chr, range(256))

    trans = [' '] * 256
    o_a, o_z = ord('a'), (ord('z')+1)
    trans[ord('A'):ord('Z')+1] = i_r[o_a:o_z]
    trans[o_a:o_z] = i_r[o_a:o_z]
    trans = ''.join(trans)

    count = {}
    for line in sys.stdin:
        for word in line.translate(trans).split():
            try:
                count[word] += 1
            except KeyError:
                count[word] = 1

    l = sorted(zip(count.itervalues(), count.iterkeys()), reverse=True)

    print '\n'.join(["%7s %s" % (count, word) for count, word in l])

main()


and a sample run would be…

    r@laptop:~$ python py/work/wordfreq.py < testwords
   5621 the
   2739 a
   2585 of
   2242 and
   2155 it
   2086 to
   1938 he
   1750 said
   1495 you
   1387 was
   1221 in
   1132 that


But I don’t think I would have come up with that translate solution. My, probably slower, effort was:

import re

def wordfreq_fromfile(filename):
    wordfreq(file(filename).read())

def wordfreq(str):
    freq={}
    words = re.compile(r"[\w'-]+")
    for word in words.finditer(str):
        w = word.group(0).lower()
        freq[w] = freq.get(w,0) + 1
    # make a list of most common words
    common = sorted(freq, key=freq.get, reverse=True)
    print '\n'.join("%7s %s" % (freq[word], word) for word in common)


and I think it is more obvious what that code is doing.

C++

The standard library has a map container handy to keep the word/occurance, but you can’t sort that so we switch to a vector at the end

#include <iostream>
#include <fstream>
#include <map>
#include <string>
#include <vector>
#include <algorithm>
#include <iterator>
#include <iomanip>

typedef std::map<std::string,int>  Dict;
typedef std::pair<std::string,int> DictItem;
typedef std::vector<DictItem>      VDict;

bool wordfreq_greater(const DictItem& a, const DictItem& b)
{
    return a.second > b.second;
}

// cout will only look for << operators in std namespace
// (could've just defined format in print_words rather than <<)

namespace std{
    std::ostream& operator << (std::ostream& out, const DictItem& d)
    {
        return out << std::setw(7) << d.second << " : " << d.first;
    }
}

template <typename T>
void print_words(T iter, std::string title, int maxcount)
{
    std::cout << title << std::endl
              << std::string(title.size(), '-') << std::endl;
    for(int i = 0; i < maxcount; ++iter, ++i)
        std::cout << *iter << std::endl;
    std::cout << std::endl;
}

int main(int argc, char *argv[])
{
    std::ifstream essay(argv[1]);
    if (!essay) {
        std::cerr << "please supply a filename for input" << std::endl;
        exit(1);
    }

    Dict dict;
    std::string word;
    while (essay >> word)    // count up frequency of words
        ++dict[word];

    // print the first 10 words, map sorts alphabetically
    // by default.. map< std::string,int,std::less<std::string> > dict;

    print_words<Dict::const_iterator> (dict.begin(), "Alphabetically", 10);

    // copy the dict to a vector so we can sort it

    VDict vdict(dict.begin(), dict.end());
    sort(vdict.begin(), vdict.end(), wordfreq_greater);

    print_words<VDict::const_iterator>(vdict.begin(), "Frequency", 10);
}


when run with a sample text file prints something like:

Alphabetically
--------------
      3 : !
    230 : "
      1 : ""allo,
      1 : "'E's
      1 : "'Ha'
      1 : "'Yet
      2 : "...
      2 : "/
      2 : "7
     38 : "A

Frequency
---------
   4836 : the
   2529 : a
   2509 : of
   1997 : to
   1843 : and
   1342 : was
   1304 : said
   1156 : he
   1093 : in
    917 : it


so it could probably do with some work on getting punctuation right.

Common Lisp

Common Lisp doesn’t have regular expressions included, but its usually not too much trouble to just write a quick bit of parsing code

    (defvar *freq* (make-hash-table))

    (defun word-letter-p (c)
      (and (characterp c) (alpha-char-p c)))

    (defun gobble-punctuation (stream)
      (loop for c = (peek-char t stream nil)
         until (word-letter-p c)
         while c do (read-char stream)))

    (defun getword (stream)
      (gobble-punctuation stream)
      (loop for c = (read-char-no-hang stream nil nil)
         while (word-letter-p c)
         collect c into letters
         finally (return (format nil "~{~C~}" letters))))

    (defun wordfreq (stream)
      (clrhash *freq*)
      (loop for word simple-string = (getword stream)
           while (string-not-equal word "")
           do (incf (the fixnum (gethash (intern (string-downcase word)) *freq* 0))))
      (let ((freqlist nil))
        (maphash #'(lambda (k v) (push (cons v k) freqlist)) *freq*)
        (loop for (k . v) in (sort freqlist  #'> :key #'car)
           repeat 15 ;; just dump out the top 15 words
           do (format t "~4A, ~A~%" k v))))

    (defun wordfreq-string (str)
      (with-input-from-string (s str)
        (wordfreq s)))

    (defun wordfreq-file (filename)
      (with-open-file (s filename :direction :input)
        (wordfreq s)))

    (wordfreq-file "/home/r/py/finished/testwords")


of course there are regexp libraries for Common Lisp like Edi Weitz’s cl-ppcre:

    (eval-when (:compile-toplevel :load-toplevel :execute)
      (require :cl-ppcre))

    (defvar *freq* (make-hash-table))

    (defun wordfreq (stream)
      (clrhash *freq*)
      (loop for line simple-string = (read-line stream nil nil)
         while line
         do (cl-ppcre:do-matches-as-strings (word "[\\w'-]+" line)
              (incf (the fixnum (gethash (intern (string-downcase word)) *freq* 0)))))
      (let ((freqlist nil))
        (maphash #'(lambda (k v) (push (cons v k) freqlist)) *freq*)
        (loop for (k . v) (fixnum . symbol) in (sort freqlist  #'> :key #'car)
           repeat 15 ;; just dump out the top 15 words
           do (format t "~4A, ~A~%" k v))))

    (defun wordfreq-string (str)
      (with-input-from-string (s str)
        (wordfreq s)))

    (defun wordfreq-file (filename)
      (with-open-file (s filename :direction :input)
        (wordfreq s)))

    (wordfreq-file "/home/r/py/finished/testwords")


None of these programs are really finished… it’d be very fiddly to get them to deal correctly will all kinds of punctuations marks etc, also I haven’t bothered to profile any of them – computers are so quick these days that speed is almost irrelevent for this kind of stuff.

September 18, 2006

Image of a Logfile

Filed under: python, utils — rcjp @ 10:57 am

Logfile

I needed some code to create an image which roughly gave an impression of the file contents for a log file analyser I was writing. The following code crudely looks for the shape of letters and draws some dots into a .png file.

import Image, ImageDraw
import os

def drawfilethumb(filename, imagex=80, imagey=200, border=5):
    """Create a png image representing the file"""
    log = []
    log = open(filename).readlines()
    loglen = len(log)

    logimage = Image.new('RGB', (imagex+10,imagey+10), (255,255,255))
    draw = ImageDraw.Draw(logimage)

    maxlen = max(len(x) for x in log)
    xscale = maxlen/float(imagex)
    yscale = loglen/float(imagey)

    y = 0
    while y < imagey and y*yscale < loglen:
        line = log[int(y*yscale)]
        linelen = len(line)
        x = 0
        while x < imagex and x*xscale < linelen:
            ch = line[int(x*xscale)]
            if ch.isupper() or ch.isdigit():
                draw.point([(border+x, border+y), (border+x, border+y-1)], fill=0)
            elif ch in ('t', 'd', 'f', 'h', 'k', 'l', 'b'):
                draw.point([(border+x, border+y), (border+x, border+y-1)], fill=128)
            elif ch in ('q', 'y', 'p', 'g', 'j'):
                draw.point([(border+x, border+y), (border+x, border+y+1)], fill=128)
            else:
                draw.point((border+x, border+y), fill=128)
            x+=1
        y += 4

    del draw
    f, ext = os.path.splitext(filename)
    logimage.save(f+'.png', "PNG")

September 8, 2006

Drawing on a bmp

Filed under: python — rcjp @ 4:10 pm

I needed a bit of code to draw ontop of some bitmaps of buttons…

import Image, ImageDraw

im = Image.open("button.bmp")

# draw a cross ontop

draw = ImageDraw.Draw(im)
draw.line((0, 0) + im.size, fill=128)
draw.line((0, im.size[1], im.size[0], 0), fill=128)
del draw

im.save("button-cross.png", "PNG")

Older Posts »

Blog at WordPress.com.