De-identifying medical image files is usually harder than it first seems. There is patient data to think about: that seems obvious. But if you remove all identifying data (that’s true anonymisation), you have lost your primary means of finding a study. So perhaps what you want is pseudonymisation – replacing the primary patient identifier with a pseudonum, or code. But patient information can be scattered throughout a DICOM file, and some manufacturers even store it in private fields in the header. OK, you say, remove the private fields. But that can break some proprietary software that relies on those fields…sometimes, but not always.
Then there are less-obvious fields. The names of staff, the institution, the scanner serial number, and many more fields, all can be used to identify the scan, and through that, the patient. Dates and times are a particularly good way to identify a scan, but may be essential information for calculating drug or radioisotope decay, or the interval between scans. So these may need to be changed, but the relative interval between them retained.
This is a complex issue for which is no single answer. There are several publications available that discuss the matter in detail (this summary from David Clunie is a good introduction). The amount of editing required is a function of the amount of information initially present, the intended (or unintended) recipient of the data, and the desired level of identification. What is needed is a set of tools that are powerful enough to perform the task, yet flexible enough to be configured for the particular application.
Here at I Do Imaging World HQ, we are publishing sample data for public use, so it is heavily anonymised. We use a combination of tools when de-identifying DICOM data:
- A tool to examine the headers, so we know what we are dealing with.
- A dedicated de-identification program for the bulk of the work.
- Something to compare the before- and after- images, so we know what’s changed.
- An element-by-element editor to modify individual fields, or remove those missed by the anonymiser.
Here is just one example of the software available to perform each of these steps, chosen for their quality and for the fact they’re all Java-based, so will run on any desktop OS. We’ll look at DicomBrowser for steps 1 and 4 (viewing the headers and editing individual elements), DicomCleaner for the heavy work of de-idenfication, and Loni Inspector for the side-by-side comparison.
Viewing and editing: DicomBrowser
DicomBrowser (home page) comes from the Neuroinformatics Research Group at Washington University, the home of XNAT, the powerhouse open-source platform for neuroimaging research and processing. This is a program that does more than it promises: it can edit as well as view the headers of entire DICOM studies, it can send to a PACS node, and it has an advanced scripting language (shared with other software from the same group) for power users. It’s distributed with native installers for the three desktop platforms, and has comprehensive documentation available including an academic publication that cover its abilities in detail.
The easy way to run DicomBrowser is through its GUI (though it also can be run from the command line, allowing the input files to be specified in advance rather than hunting through the GUI). Point it at a directory containing any number of studies and DICOM files, and it will parse the contents and present a collated view in an hierarchical view of patient – study – series – instance. Header elements that have a single value at a particular level are presented as such: the patient details, for example, are constant throughout all instances and can be edited at the highest (patient) level by clicking on the value field. Several actions are available for the element value: Clear (keeping the element), Delete (removing the element), or Assign (store a new value). Entirely new elements can also be created.
Where an element has multiple values at a given level (say a multi-series scan viewed at the study level), the number of discrete values is displayed, and clicking on the values field presents a dialog box with a drop-down selector for the value to edit. Viewing the element at a lower level will reduce the field to a single value.
DicomBrowser can be scripted, for power users. Anonymisation scripts are written in DicomEdit, a custom language shared with XNAT, that has many of the features needed by programmers, including variables, constraints, conditional operators, and string operations. The language also includes the ability to call a web service URL and include the returned value in a constructed element value. Specific to DICOM editing is a UID generator to create DICOM UIDs that are unrelated to the original values, yet consistent across a DICOM series. Taking it a level higher, batch processing is provided through a companion program, DicomRemap, which uses XML files to provide per-element custom editing. This is a nontrivial task, but such is the nature of DICOM editing, and if you need it, the functionality is there, though with one caveat: DicomBrowser does not handle Dicom Sequences, an advanced feature of the DICOM standard.
Bulk anonymisation: DicomCleaner
DicomCleaner (home page) comes from PixelMed’s David Clunie, a major figure in the DICOM world and publisher of the Medical Image Format FAQ. DicomCleaner is a power tool for anonymising or ‘cleaning’ large quantities of DICOM data, and as expected coming from the DICOM authority, is rigorous in its adherence to the standard.
DicomCleaner’s approach to the complex problem of editing hundreds of header elements is to define blocks of elements to be removed or edited, providing multiple levels of anonymisation that reach deeply into the DICOM structure while keeping the user interactions to a manageable level. The program is particularly strong at maintaining the relationships between essential structures such as UIDS, and the temporal relationships between dates and times are preserved. DicomCleaner can also handle DICOM sequences, and can add the standard-specified information block describing the alterations performed on the files during process.
The depth of editing is apparent upon comparison of the original and cleaned files: at the highest level the great majority of the header elements are removed or modified. Lighter editing can retain some fields, such as series descriptions. For annotations contained within the pixel data, the program provides an editor to specify regions of the image to be blacked out. Image data can come from and be saved to local files, or a network PACS connection. This is a program that does a lot of heavy work with minimal user input, so a thorough reading of the documentation helps to understand the work being done.
Comparing DICOM images: LONI Inspector
Verifying the anonymisation process is aided by viewing the results. Given that a single DICOM image may contain over 300 fields, checking a complete list of values can be overwhelming. Seeing which fields have been edited can help make the reviewing task a lot easier.
LONI Inspector (home page) is designed for this job. In addition to DICOM, this program can read and compare the headers of AFNI, ANALYZE, ECAT, GE, Interfile, MINC, and NIFTI files. Filters help to locate and display values of interest, and when comparing multiple files, the differences can be highlighted, shown in isolation, or excluded. The header data may be exported as CSV or XML files for inclusion in other programs.
Editing and anonymising image files, particularly DICOM, is a big job. If you’re performing the same editing, especially on familiar data, a single program may be sufficient. But for complex or new data, or for developing a process, a balance between usability and thoroughness means that more than one program might be needed. To prepare the heavily-anonymised sample data for the I Do Imaging PACS, the three programs above were used in sequence:
- DicomCleaner does the bulk of the anonymisation, but care must be taken not to remove required values. In our case, the series descriptions were retained, but most other editing blocks were performed.
- LONI Inspector was then run on the before-and-after images, to highlight fields with identifying information that might have been missed or are to be edited.
- DicomBrowser is used both in its viewing and editing modes. The final editing was performed here, with several comparison iterations using Inspector. If a large amount of similar data were being processed, we could have developed a custom script to perform these operations, but with only a few data sets required, the GUI approach was sufficient.
These three programs are all written in Java. This means they will run on any platform, but there is the additional step of installing Java itself, and keeping up with its numerous (and intrusive, and sometimes problem-causing) updates. You’ll also be reminded when you start the applications that Java is slow, at least when the program is loading. These minor issues are the current price of cross-platform compatibility.
Anonymisation, done right, is not an easy task, and there’s a limit to how simple it can be made. But with these three professional tools, it’s possible for anyone to perform customised standards-compliant editing with the minimum effort. The hard work has already been done for you.