Perl Hashes Ate My Workstation

Perl is not noted for its leanness but today I finally ran some little tests to see just how much memory it was devouring.  I use some OO Perl code to process image files, there is a base class Image::Med from which are derived Image::Med::DICOM, Image::Med::Analyze, and a few others.  I store each DICOM element in an object instantiated as a hash; it’s of class Image::Med::DICOM::DICOM_element which is derived from a base class Image::Med::Med_element.  The inheritance works quite well and I’m able to move most of the functionality into the base classes, so adding new subclasses for different file formats is reasonably easy.

Perl hashes are seductive, it’s so easy to add elements and things tend to just work.  So my derived DICOM element class ends up having 13 elements in its hash, of which 10 are in the base class (‘name’, ‘parent’, ‘length’, ‘offset’ and so on) and three are added in the derived class (‘code’, ‘group’, ‘element’) as being DICOM-specific.
As mentioned, I never claim Perl is svelte (or fast) but today I was sorting about 2,000 DICOM files.  I like to keep them all in memory, for convenience and sorting, before writing or moving to disk.  Heck we’re only talking about a few thousand things here and computers work in the billions…all too easy to forget about memory usage.
I was unpleasantly surprised to find that each time I read in a DICOM file of just over 32 kB (PET scans are small, 128 x 128 x 2 bytes), I was consuming over 300 kB of memory.  So my full dataset of only 70 MB was using up almost a GB of RAM.  And that was for only 2,100 files, whereas I have one scanner that generates over 6,500 DICOM files per study.  I have the RAM to handle it, but my inner CS grad has a problem with a tenfold usage of memory.
I used the Perl module Devel::Size to measure the size of hashes and the answers aren’t pretty: on my 64-bit Linux workstation each hash element is consuming 64 bytes in overhead.  Crikey!  So 64 bytes, times 13 fields per DICOM element, times 200-odd DICOM elements per object, that’s over 200 kB per DICOM object before I even put any data into it.
On my 64-bit Mac with perl 5.8.8 it’s not much better at 39 bytes per minimal element.  I compared it with an array, which turned out to use 16 bytes per minimal element.
#! /usr/local/bin/perl -w                                                                                                      
use Devel::Size ‘total_size’;
my %h = ();
print “0 hash elements, size = ” . total_size(%h) . “n”;
$h{‘a’} = 1;
print “1 hash elements, size = ” . total_size(%h) . “n”;
$h{‘b’} = 2;
print “2 hash elements, size = ” . total_size(%h) . “n”;
my @a = ();
print “0 array elements, size = ” . total_size(@a) . “n”;
$a[0] = 1;
print “1 array elements, size = ” . total_size(@a) . “n”;
$a[1] = 2;
print “2 array elements, size = ” . total_size(@a) . “n”;

[widget icon] 167% ~/tmp/
0 hash elements, size = 92
1 hash elements, size = 131
2 hash elements, size = 170
0 array elements, size = 56
1 array elements, size = 88
2 array elements, size = 104

I know the answer is, don’t use giant hashes in Perl, or perhaps it is, don’t use Perl when you’re manipulating 2,000 x 200 x 13 elements.  But I like Perl, it’s so convenient.  Perhaps I’ll reimplement the whole thing as an array (ugh), and/or cut down the number of elements per DICOM field (indexing a 13-element array, not fun).