Rcjp's Weblog

September 2, 2006

Characteristic Language Structures – Dictionaries

Filed under: lisp, python — rcjp @ 2:45 pm

Its funny how some languages get associated with a single storage container; even when they support many others. Its often the first thing that pops into your head when you hear the language mentioned and influences the resulting code you write. For lisp thats undoubtedly a list, for C++ its a class, python I reckon its a dictionary and for C, well I suppose pointers are the first thing most people think of, that and having to write almost all storage structures from scratch!

Almost all python programs seem to end up using a dictionary for something, like this one I used in some code to markup html tags

    tags =  {
        'date' : '<div class="date">%s</div>',
        'list' : '<ul>%s</ul>'
        }

accessing, iteration and checking for values is easy:

    In [8]: tags['date']
    Out[8]: '<div class="date">%s</div>'

    In [10]: tags.has_key('date')
    Out[10]: True

    In [24]: tags.keys()
    Out[24]: ['date', 'list']

as is adding to the dictionary, and removing:

    In [17]: tags['yyy']='blah'
    In [19]: tags.pop('yyy')
    Out[19]: 'blah'

normally a KeyError exception occurs if the key doesn’t exist but we can fail quietly if there isn’t a value, or return a default value:

    In [15]: tags.get('xxx')

    In [16]: tags.get('xxx',0)
    Out[16]: 0


But it python 2.5 it is apparantly faster to do

    from collections import defaultdict
    tags=defaultdict(int)
    tags['xxx'] += 1


this will call int() to supply a value if it is missing and that defaults to 0. If we wanted a different default then

    tags = defaultdict(lamda: 1)

gives 1 by supplying a ‘factory’ function. I’ll finish this quick dictionary overview with an example for swapping out those pesky characters like spaces when generating url’s

    In [48]: [{'@':'%40', ' ':'%50'}.get(c,c) for c in "name=ab@c"]
    Out[48]: ['n', 'a', 'm', 'e', '=', 'a', 'b', '%40', 'c']

    In [49]: ''.join({'@':'%40', ' ':'%50'}.get(c,c) for c in "name=ab@c")
    Out[49]: 'name=ab%40c'

the get(c,c) trick means that characters default to themselves if they are not in the dictionary we are using for swapping. Incidentally, as usual, python already has a built-in mapping of all those codes…

    In [42]: import urllib
    In [43]: urllib.urlencode({'name':'ab@c'})
    Out[43]: 'name=ab%40c'

Common Lisp

There are several choices for pairing data in Common Lisp. There are property lists – using get, getf, get-properties; but they are meant for symbol properties I think, not for general data. Then there are association lists (a-lists):

    CL-USER> (setq *tags* '((date . "<datey>") ("yyy" . "blah")))
    ((DATE . "<datey>") ("yyy" . "blah"))
    CL-USER> (assoc 'date *tags*)
    (DATE . "<datey>")
    CL-USER> (assoc "yyy" *tags*)
    NIL

but note that the ‘whole’ thing is returned and since the default test is eq, it won’t work for string comparison, you have to do

    CL-USER> (assoc "yyy" *tags* :test #'equalp)
    ("yyy" . "blah")

you can alter what the test function will operate on with key. Here we use rassoc to search on the values rather than the keys to find an entry with ‘a’ in position 2 (zero base indexed strings)

    CL-USER> (rassoc #\a *tags* :key #'(lambda(x) (char x 2)))
    (DATE . "<datey>")

you can replace existing values (but they must exist, else you’ll be trying to set a nil)

    CL-USER> (rplacd (assoc 'date *tags*) "yyy")
    (DATE . "yyy")

you can add things on, but not destructively with

    CL-USER> (acons "aaa" 5 *tags*)
    (("aaa" . 5) (DATE . "yyy") ("yyy" . "blah"))
    CL-USER> *tags*
    ((DATE . "yyy") ("yyy" . "blah"))

or destructively with

    CL-USER> (push '("aaa" . 5) *tags*)
    (("aaa" . 5) (DATE . "yyy") ("yyy" . "blah"))
    CL-USER> *tags*
    (("aaa" . 5) (DATE . "yyy") ("yyy" . "blah"))
    CL-USER> (pop *tags*)
    ("aaa" . 5)
    CL-USER> *tags*
    ((DATE . "yyy") ("yyy" . "blah"))

since its just a cons pair you’d probably loop on them by doing

    CL-USER> (loop for item in *tags* do (print (cdr item)))
    5
    "yyy"
    "blah"

You also have the choice of using hash tables: gethash can easily specify default values

    CL-USER> (defvar *tags-hash* (make-hash-table))
    *TAGS-HASH*
    CL-USER> *tags-hash*
    #<HASH-TABLE :TEST EQL :COUNT 0 {AC96F29}>
    CL-USER> (gethash "aaa" *tags-hash*)
    NIL
    NIL
    CL-USER> (gethash "aaa" *tags-hash* 0)
    0
    NIL

and to add a value to the hashtable we setf a query to the value we want

    CL-USER> (setq *tags-hash* (make-hash-table :test #'equalp))
    #<HASH-TABLE :TEST EQUALP :COUNT 0 {AC0DE51}>
    CL-USER> (setf (gethash "aaa" *tags-hash*) 111
                   (gethash "bbb" *tags-hash*) 222
                   (gethash "ccc" *tags-hash*) 333)

    333
    CL-USER> (gethash "aaa" *tags-hash*)
    111
    T
    CL-USER> (remhash "bbb" *tags-hash*)
    T
    CL-USER> *tags-hash*
    #<HASH-TABLE :TEST EQUALP :COUNT 2 {AC0DE51}>

and you can dump out the values of hashtables with

    CL-USER> (maphash #'(lambda (k v) (format t "~A ~A~%" k v)) *tags-hash*)
    aaa 111
    ccc 333
    NIL
    CL-USER> (loop for i being the hash-keys of *tags-hash* do (print i))

    "aaa"
    "ccc"
    NIL
    CL-USER> (loop for i being each hash-value of *tags-hash* do (print i))

    111
    333
    NIL
    CL-USER> (loop for i being the hash-keys of *tags-hash*
                using (hash-value myvalue)
                do (format t "~A->~A~%" i myvalue))
    aaa->111
    ccc->333
    NIL

Note: we have to set the test type when we create the hashtable rather than in the query like with a-lists. CL is really set up for using symbols wherever possible, so consider using intern to create a symbol from the string and store that instead. For small stuff use alists and bigger stuff use hashtables. Finally, the url character substitution from above could be something like

CL-USER> (loop for c across "name=ab@c" collect (or (cdr (assoc c '((#\@ . 40)))) c))
(#\n #\a #\m #\e #\= #\a #\b 40 #\c)
CL-USER> (format nil "~{~A~}" *)
"name=ab40c"

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: