, , , , , , , ,

The Digital Imaging and Communications in Medicine (DICOM) standard is something I omitted from a project last year, but I’ve had to revise it again for something else. DICOM can be briefly described as an application layer protocol that enables medical devices (including workstations) to exchange, process and retrieve images (and sometimes documents) associated with patient records.
The basic idea is that patient and machine-readable information is embedded within a file (usually an image) as it’s created or converted, that file could be uploaded to a repository on the network, and any machine (workstation, CAT scanner, printer, etc.) with DICOM-compliant software could process it. The information is always associated with the same patient/entity record throughout, and consequently the images should never be attributed to the wrong records, since the information couldn’t unintentionally be separated. The general concept is also very similar to that of the ‘Semantic Web’, in which a variety of systems could present information from the same file/source in different ways.

Being an application layer thing, DICOM should work regardless of how the data is transported – an ‘Application Entity‘ sends a file through the TCP/IP stack, and the file is received and processed by another Application Entity on the other end. A DICOM Application Entity might be a CAT scanner machine, sending images to a central DICOM repository (a PACS server). From there, the images can then be accessible to other workstations in the facility.
In theory DICOM therefore could be deployed on a conventional LAN, and I’ve tried emulating this on my own network with partial success. So far, I’ve installed and played around with the following:
* Aeskulap DICOM viewer
* dcmtk
* gdcm
* python-dicom

DICOM Directories and Repositories
One of DICOM’s selling points is the files could be stored in a special directory called ‘DICOMDIR’, the root of which contains a data file that enables the application to find relevant images without having to read every file. Unfortunately, the way things were developed, a valid filename is limited to capital letters, numbers and underscores. Also, the dcmmkdir utility cannot work with ‘big endian’ data, because of the way bytes are arranged in certain filesystems. Perhaps it works okay on the FAT32 or FAT16, but certainly not on EXT3/EXT4.
On a PACS server, it appears that DICOM files can be stored in a conventional database, and DICOM operations, such as C-STORE, C-FIND and C-GET can be translated by a database management application to SQL operations.

File Conversion
Let’s see first what happens when a standard JPEG is converted to a DICOM file. I’ve used a random .jpg file and added a comment in the image properties, just to show roughly where stuff is in the data structure. Here’s part of the hex dump:


As expected, there’s the ‘magic number’ sequence that marks it as a JPEG, followed by the metadata, followed by some zero bytes before the main image bytecode.

What happens after the file is converted to DICOM? I used the img2dcm utility:
$img2dcm ibm-z10.jpg ibm-z10.dcm


This time there isn’t a ‘magic number’, but instead the initial 256 bytes are zero (not shown), and this is followed by the DICOM file’s data elements including their two-letter field identifiers. Finally there’s the bytecode that makes up the image itself. I guessed that the initial 256 bytes are reserved space for the image and patient record data. There’s no routing data in the file that I can see, so again it appears DICOM is purely an application layer thing that sends data down the TCP/IP stack, and the receiving application will listen for the incoming data on a fixed port at whatever address (Aeskulap uses ports 6000 and 6100 by default).

The hex dump for a converted PDF is even more interesting, as the PDF file structure, following the zero bytes and DICOM fields, is readily identifiable:


DICOM Data Elements
So what does the file contain, other than the image bytecode?


This is roughly what you’d see on medical workstations, but presented in a different way. The hex codes to the left are references to the DICOM fields, and my guess is the codes specify where given data is located within the initial 256 bytes as certain fields are populated. Kind of like offsets that count backwards.

Accessing and Modifying the DICOM Fields
The following utility will print the ‘metadata’ within the DICOM file:
$dcmdump ibm-z10.dcm

The fields can also be updated with dcmodify. e.g.
$dcmodify ibm-z10.dcm -i "PatientName=Michael"
$dcmodify ibm-z10.dcm -i "PatientID=20523386"
$dcmodify ibm-z10.dcm -i "PatientBirthDate=May 1983"

And when the image is opened in a DICOM image viewer again, we can see the fields have been updated.


What’s also notable is the initial 256 bytes of blank space has decreased to 160 bytes, so my initial hypothesis was kind of right.


Python DICOM Module
We could also manipulate DICOM files in Python using the pydicom library:
import dicom
fields = dicom.read_file("hive.dcm")
print fields

The fields can also be modified:
fields.PatientID = "00144"
fields.PatientsName = "Michael"
fields.PatientsBirthDate = "May 1983"


But how would I know the field names within the .fields namespace? Fortunately pydicom has a dir() function that enables us to determine the field name to use. e.g.
To find the field names starting with ‘bit’:

Gives us:
['BitsAllocated', 'BitsStored', 'HighBit']

So we can pick any of these and substitute the earlier line with it to change whatever fields.

Without encrypted connections between the Application Entities, anyone on the network could intercept the DICOM files and extract the patient information. DICOM specifies a security layer, and several ‘security profiles’ that involve the use of a public key scheme to encrypt the connections between Application Entities and a PACS server, the files and specific information within those files. This public key scheme might also be integrated with Active Directory or a Kerberos server, using one of the fields within the files as the user/account name to find the relevant private key.
Another consideration is the sharing of images with third parties for research. Obviously the data should be anonymised by removing or replacing the information fields within DICOM files. The GDCM toolkit has a utility called ‘gdcmanon’ for this.