Category Archives: Programming

Big Numbers in Perl

A simple numerical overflow had been causing an error in my code for ages, and I just found it.  The problem arose from using numbers greater than 2^32 in Perl. I’ve become so used to 64-bit systems that I’d forgotten to check for it.

I use powers of 2 to encode values such as file formats into a single numerical field in MySQL (which briefly was SunSQL and is now OracleSQL?).  So I have code that looks like:
our %cat_formats = (
  2**1  => [‘DICOM’],
  2**2  => [‘NEMA’],
  2**3  => [‘Analyze’],
and so on, adding the numbers together gives me a unique value, encoding combinations of values, which I can store in a BIGINT field in MySQL.  This was fine while I was listing 32 file formats or less, but I now list 36.  The test I had been using for matching a format-encoding value against the stored MySQL value is something like this, if I were testing for a value of 2.
  $ret = (($readfmt * 1) & 2) ? 1 : 0;
I multiply the value by 1 to be sure it’s in a numerical context, and I pedantically set the return value to 1 or 0 for consistency with cases where I might want to set it to something else. 
Problem was, this was returning true whenever the stored value was greater than 2^32, and I use the value 2^34 to denote NIFTI format.  So whenever I was testing for a value of 2 (denoting DICOM, common), I was returning true for any program that could read NIFTI format (somewhat rarer).  Which lead to the FSL program being listed as the second highest ranked DICOM viewing program.  Now FSL is a fine program, but sadly it cannot read DICOM.
All I had to do to fix it was enable big numbers, and all was well.
use bignum;

Sed Cleverness

I got sed to do something clever today, though sadly the cleverness was not mine.  I tried to solve the problem myself and although in the process I learned a great deal about sed, I had to resort to copying the answer.

I want to add Google Analytics code to my sister’s website, which she’s writing using iWeb.  Analytics is enabled by including in your HTML a javascript snippet that Google gives you.  iWeb does a really nice job, but you have to do things their way, and that means no javascript.  Fair enough, I guess, Apple want to be ensure that websites produced using their software will always work, and introducing a programming language pretty much ensures that things frequently won’t work.


The website is hosted on a Linux server, not an Apple account, so to publish the site we export its contents to a directory, then FTP the directory contents to the server.  Initially we used Filezilla but it does not have incremental directory synchronization, so we switched to the awesome Cyberduck.
So my first thought was, well perhaps we can modify the HTML files before they leave her Mac.  There are a couple of approaches available, one is an iWeb add-on which looked more complex than needed, and you have to buy it.  Another was an Automator action you can download that will insert the Google javascript into the HTML files.  That sounded good but was one more action to perform, and I’ve never used Automator.
So I thought, I’ll knock up a little script that the web server can run as an hourly cron job, and edit any HTML files that don’t contain the Google code.  Ha!  Little script though it is, it took a while.  I got a lot of help from Bruce Barnett’s sed guide, as I don’t have a good shell book with me right now.  I should buy O’Reilly’s ‘Sed & Awk’ book, a classic if ever there was one, and I believe may even have been their first ever book published. I remember it in print in the early 90’s, and they even had a T shirt of the cover, which I dearly wish I’d bought.

The tricky part was, the Google code is supposed to be included immediately before the </body> tag in each HTML page.  Two problems: I couldn’t be sure that the </body> tag would be on a line by itself, and sed file inclusion acts after the matched pattern, not before.
Problem 1 was addressed using a simple substitution with newlines:

sed -e ‘





I used pipe-character delimiters.  The substitution is of the string </body>, and the newlines are inserted literally.  So the line-continuation backslashes continue the substitution pattern.  The ampersand is the matched string, so this substitution puts a newline before and after the </body> tag, to ensure it’s on its own line.
Problem 2 was harder.  I didn’t know about the file-insertion operator till today, though I figured that sed would have one.  It does, but it inserts after the matched pattern.  My initial approach to insert the file Google Analytics Javascript code, ga.js was:

sed -e ‘                                                                                       
/</body>/ {                                                                                       

r ga.js’


But this inserted the file after the </body> tag, which wasn’t allowed.  
Next thought was to take a two-way approach.  I’d print every line not matching the </body> pattern, and in a separate rule matching the </body> pattern, delete the pattern, insert the file, then print the pattern.

sed -e ‘

/</body>/ !{
/</body>/ {
r ga.js


Not to be.  The pattern matching quits at the delete as explained by Barnett, so the print command is never executed.
At this point after some hours of learning sed, I looked for an answer to inserting a file before a sed pattern, and found one.  At least by this point I knew enough to understand it (sort of).  This post by Tapani Tarvainen gave me a very succinct answer for the second pattern action:

/</body>/ {

r ga.js

‘ -e N -e ‘}’

OK I wouldn’t have thought of that one.  As he explains,

The ‘r’ command actually outputs the file just before reading a new line to the pattern buffer (or at EOF). That can be forced in mid-script by ‘n’ or ‘N’, and while ‘n’ will also print the
buffer before ‘r’ does its thing.

He goes on to cover more general cases.  There is also another approach described in the same forum which employs use of the hold command, h, and the swap operator, x.
Having tested the code that does the file insertion, I put it into a shell script that finds all *.html files, checks to see if they have the GA code in them already (you’re only allowed to put it in once), and if not, performs the insertion.
Next problem…the ISP’s server doesn’t offer cron!  I need this to run the sed script on the HTML files without having to ssh in to run it.  Aargh.  Neither can I execute an arbitrary command on the server using my FTP client.  I might try making a passwordless ssh script (using keys) and then see if I can get Cyberduck (FTP client) to run that after it performs the transfer.
That’s enough for one day, though.  I learned a lot about sed, and a lot about how much I don’t know.

Perl Hashes Ate My Workstation

Perl is not noted for its leanness but today I finally ran some little tests to see just how much memory it was devouring.  I use some OO Perl code to process image files, there is a base class Image::Med from which are derived Image::Med::DICOM, Image::Med::Analyze, and a few others.  I store each DICOM element in an object instantiated as a hash; it’s of class Image::Med::DICOM::DICOM_element which is derived from a base class Image::Med::Med_element.  The inheritance works quite well and I’m able to move most of the functionality into the base classes, so adding new subclasses for different file formats is reasonably easy.

Perl hashes are seductive, it’s so easy to add elements and things tend to just work.  So my derived DICOM element class ends up having 13 elements in its hash, of which 10 are in the base class (‘name’, ‘parent’, ‘length’, ‘offset’ and so on) and three are added in the derived class (‘code’, ‘group’, ‘element’) as being DICOM-specific.
As mentioned, I never claim Perl is svelte (or fast) but today I was sorting about 2,000 DICOM files.  I like to keep them all in memory, for convenience and sorting, before writing or moving to disk.  Heck we’re only talking about a few thousand things here and computers work in the billions…all too easy to forget about memory usage.
I was unpleasantly surprised to find that each time I read in a DICOM file of just over 32 kB (PET scans are small, 128 x 128 x 2 bytes), I was consuming over 300 kB of memory.  So my full dataset of only 70 MB was using up almost a GB of RAM.  And that was for only 2,100 files, whereas I have one scanner that generates over 6,500 DICOM files per study.  I have the RAM to handle it, but my inner CS grad has a problem with a tenfold usage of memory.
I used the Perl module Devel::Size to measure the size of hashes and the answers aren’t pretty: on my 64-bit Linux workstation each hash element is consuming 64 bytes in overhead.  Crikey!  So 64 bytes, times 13 fields per DICOM element, times 200-odd DICOM elements per object, that’s over 200 kB per DICOM object before I even put any data into it.
On my 64-bit Mac with perl 5.8.8 it’s not much better at 39 bytes per minimal element.  I compared it with an array, which turned out to use 16 bytes per minimal element.
#! /usr/local/bin/perl -w                                                                                                      
use Devel::Size ‘total_size';
my %h = ();
print “0 hash elements, size = ” . total_size(%h) . “n”;
$h{‘a’} = 1;
print “1 hash elements, size = ” . total_size(%h) . “n”;
$h{‘b’} = 2;
print “2 hash elements, size = ” . total_size(%h) . “n”;
my @a = ();
print “0 array elements, size = ” . total_size(@a) . “n”;
$a[0] = 1;
print “1 array elements, size = ” . total_size(@a) . “n”;
$a[1] = 2;
print “2 array elements, size = ” . total_size(@a) . “n”;

[widget icon] 167% ~/tmp/
0 hash elements, size = 92
1 hash elements, size = 131
2 hash elements, size = 170
0 array elements, size = 56
1 array elements, size = 88
2 array elements, size = 104

I know the answer is, don’t use giant hashes in Perl, or perhaps it is, don’t use Perl when you’re manipulating 2,000 x 200 x 13 elements.  But I like Perl, it’s so convenient.  Perhaps I’ll reimplement the whole thing as an array (ugh), and/or cut down the number of elements per DICOM field (indexing a 13-element array, not fun).

Javascript and Apache

I’m testing out Walter Zorn’s very cool Javascript Tooltips for possible inclusion.  I want to use tooltips on the dropdown elements of the ‘Search’ page, since I am finally rationalizing the search categories and I’d like to be able to add some descriptive text to each element.

I run everything from Perl, so I made a little test program and included a call to his Javascript libraries:
print “<script type=’text/javascript’ src=’wz_tooltip.js’></script>n”;
No go, and I get error messages in the local httpd log files like this:
Permission denied: exec of ‘/Users/ahc/public_html/cgi-bin/wz_tooltip.js’ failed
Hmmm I think, that’s funny, I didn’t know Javascript files needed to be executable.  But it’s late and I’m not thinking too clearly, so I make the js file executable 755.  Now I get a different error message:
Exec format error: exec of ‘/Users/ahc/public_html/cgi-bin/wz_tooltip.js’ failed
OK someone’s trying to tell me something about how I shouldn’t be executing that file.  I resort to reading the manual.  Apache httpd Dynamic Content FAQ number 1, sentence number 1 begins:

“Apache recognizes all files in a directory named as a ScriptAlias  as being eligible for execution rather than processing as normal documents. This applies regardless of the file name…”

A dim light comes on.  httpd has been trying to execute the js file, because I told it to, with the ScriptAlias directive in my httpd.conf file, which says that everything in cgi-bin is a script:

   ScriptAlias /cgi-bin/ “/Users/ahc/public_html/cgi-bin/”

Dummy.  I moved the js file out of my cgi-bin directory and into the http document root, and changed the line to specify the file is in the root:
print “<script type=’text/javascript’ src=’/wz_tooltip.js’></script>n”;
Probably a well-known trap but at least I found it eventually.

Perl Varargs

I’ve often wanted to have more than one optional argument to a Perl subroutine.  For instance I have a utility function printHash() to which I pass a pointer to a hash (associative array).  It prints the hash contents in a formatted box, with an optional description at the top.

my %hash = (‘one’ => 1,
            ‘two’ => 2,
            ‘three’ => 3);
printHash(%hash, “My Comment”);

sub printHash {
  my ($hashptr, $comment) = @_;
  # … Test for existence of comment, and use it if necessary.
  # … Then print the hash in a tidy format.
  # … Default is to print the keys in alphabetical order.
[andy tmp] 116% ./
  | My Comment |
  | one   : 1  |
  | three : 3  |
  | two   : 2  |
Which is all very nice.  Frequently, though, the hash has a lot of elements – for example, all the header elements of an image file.  I don’t always want all 300 or so lines of a DICOM header.  Also, I’d like to be able to control the order of the elements.  I changed the function so  I can pass to the function an array of the keys I want passed, and the order I want them in. I use this array (if it is passed) as the hash keys, instead of a sorted list of all hash keys.
However, I already have one optional argument to the function (the comment).  Just as in C, you can’t put anything after an optional argument because of course you don’t know if it will be there or not.
For years I’ve been too lazy to do anything about this and I had either an optional comment, or an optional key-ordering array, but not both.  Recently I did the obvious thing and changed to passing a hash of variable arguments.  This way I can have as many varargs as I like.
my @keys = (qw(three one));
my %opts = (
   ‘keyptr’  => @keys,
            ‘comment’ => ‘My Other Hash Comment’,
printHash(%hash, %opts);

  | My Other Hash Comment |
  | three : 3             |
  | one   : 1             |
This does what I want.  But now what to do with the dozens of times I’ve written old-style printHash(%hash, “comment”) calls into my code?  Well I could be thorough, and edit every function call so that I pass a hash
every time.  But t
hat is Actual Work, and anyway it’s tedious to create a single-element hash whenever I just want to dump a full hash with a comment.  Usually, this is an image file header and the file name as a comment.  So I changed printHash() to overload the second argument: if it’s a hash, use it as a hash, if it’s a scalar, use it as a comment.  This is questionable programming practice but oh well.
my %opts = (
    ‘keyptr’  => @keys,
            ‘comment’ => ‘My Other Hash Comment’,
printHash(%hash, %opts);

sub printHash {
  my ($hashptr, $opts) = @_;

  my ($comment, $keyptr) = (”, ”);
  if (ref($opts) eq ‘HASH’) {
     ($comment, $keyptr) = @{$opts}{(qw(comment keyptr))};
  } else {

    $comment = $opts;
  my @keys = (ref($keyptr)) ? @$keyptr : sort keys %hash;

  # … Now proceed to use $comment and @keys.
  # … $comment will be my comment, or blank, @keys will be my keys, or all keys.
I was gratified to see that Damian Conway in ‘Perl Best Practices’ espouses the use of a hash of named arguments for nay subroutine that has more than three parameters.  I use a similar idea but for optional arguments.