Really glad to hear you're working on this problem. I've done a bit of grooming medical imaging datasets for AI projects. A big chunk of time is spent working on pipelines to properly de-identify the images. Everything from PHI hidden deep in the dicom headers to patient name burned into the image by the scanner or some workstation that opened it. How are you dealing with those challenges?
That is certainly a challenge. Automated approaches of removing PHI often miss some things for the reasons you mentioned, and at the end of the day you need a person to verify that the image is free of PHI. Right now we depend on our clients to remove PHI, but we’re also working on a process where we verify some users credentials and have those users review cases for PHI before we release a potentially sensitive case to the crowd.