November 30, 2020
Radiology Today: Imaging Informatics: Multilayered Response
Deidentifying medical images is more complex than it sounds.
A photo can highlight someone’s best side or show their eye color, but it won’t reveal the individual’s secure personal information. However, the same can’t be said for medical images, which contain embedded data such as names and social security numbers.
With the sharing of clinical information for research purposes comes the need to find efficient and effective methods for deidentifying images. In addition to research needs, the growth of data-driven AI applications has created a heightened interest in sharing medical data and a focus on “scrubbing” images of protected health information (PHI), individually identifiable health information that can be linked to a particular individual.
“There’s a big demand for deidentified imaging today,” says Matthew Michela, CEO of Life Image. “You need real-world evidence to build algorithms for AI, and state-of-the-art technology is just emerging to solve this problem.”
To meet increased demand, companies such as Life Image and Ambra Health are developing solutions that add efficiency and accuracy to the process of removing PHI from patient imaging.
“A key to successful research workflows and compliance with HIPAA is the deidentification of imaging data,” says Sarah Gabelman, director of product management with Ambra Health. “Every type of modality has metadata that includes patient identification, which needs to be removed in order to share that imaging. We have developed automated tools that eliminate the need for human intervention by automating system sorts on the information embedded in the image. This helps decrease the time needed to remove the identifying information and the chance for human error.”
Automating the deidentification process is extremely helpful when it comes to complex imaging that can include 800 to 1,200 slices. Michela says it would take a significant amount of time for one person to review each imaging layer for identifying information. With the need for research data for time-sensitive clinic trials, a human-only process could preclude the use of images in the research.
“There’s also the concern of HIPAA violations occurring if PHI is discovered in the research data during the peer review process,” Michela says.
In addition, The ACR, RSNA, and Society for Imaging Informatics in Medicine recently urged radiologists and allied medical professionals to take precautions when using images in PowerPoint presentations and PDF files. Search engines can index patient identifiers that were previously believed to have been deidentified. To ensure that no PHI is included, the organizations recommend the use of screen capture software to isolate image pixels for the region of interest only. Alternatively, users can disable patient information overlays or use an anonymization algorithm embedded in a PACS before saving a screen or active window presentation. Neither cropping out PHI with image formatting tools nor using “black bars” to obscure PHI are safe and compliant practices for deidentification.
As is the case with many new technologies, deidentification for the sake of deidentification doesn’t necessarily benefit the research community. Michela says deidentification needs to meet a specific demand.
“The process starts with researchers knowing what they need and then deidentifying for that need,” he says. “Conducting widespread deidentification would be costly. You need to work backwards by finding out what the research is about, what the patient group is, and what need is being investigated.”
Michela adds that knowing the purpose of the research can be important in determining which data are important to leave with the image.
“If you wipe out everything, you can lose important clinical data, such as the patient’s diagnosis,” he says. “You don’t want to affect the data to the point where it would alter the image and, therefore, the research.”
Life Image and Ambra Health’s solutions work in collaboration with the Google Cloud Healthcare API, adding a service that uses a combination of machine learning and human validators to Google’s own deidentification capabilities.
Michela says another “essential” step in successful deidentification for research purposes is to link the imaging data to other data sets, such as pharmaceutical information or EHR data.
“It’s important to know the patient, including their medical history and what medications they’re taking,” he says.
As Michela explains, PHI is embedded in the metadata, the DICOM headers, and the DICOM image itself. Removing PHI from three different locations presents a significant challenge.
Another challenge lies in gathering clinical imaging data from a variety of medical facilities and imaging centers. Different locations use equipment manufactured by different vendors, each with slightly different methods of embedding data.
“There are DICOM standards, but some vendors have private tags that need to be located in the image and removed as well,” Gabelman says. “You might easily remove the standard tags but miss the private tags.”
Michela adds, “The pure mathematical variety makes it difficult to find all the data. And add to that the capability of physicians to add data while they are reviewing images. There are many variables involved to overcome.”
Over time, Gabelman says, users learn the nuances of the different vendors, as well as of each modality, and can incorporate rules-based automation into the deidentifying tool.
“Rules-based automation is like making a key to each vendor’s images that enables you to identify how and where they incorporate PHI,” she says.
In the end, however, there are cases where identifying a patient, although not by name, remains a possibility, even after all PHI is removed from the images.
“For example, you can eliminate the patient’s name from an image of that person’s head, but the shape of the head could be so unique that it would still be an identifier of that individual,” Michela says.
There is also the matter of the potential for reidentification of a patient, for certain research purposes, using data extracted from the image. That scenario needs to be taken into consideration when conducting the deidentification, Michela says. In instances such as this, the data become like pieces of a puzzle that may need to be put back together for other research projects. Medical images have many layers of data and uses beyond the nature of their original purpose. What they reveal is truly more than meets the eye.
— Originally published November 30, 2020. Kathy Hardy is a freelance writer based in Pottstown, Pennsylvania. She is a frequent contributor to Radiology Today.