Searching for new ways to keep DNA data secure

GG headshot

Ever since scientists were able to map the human genome, they have been using people’s genetic information to help us understand our history and to create breakthrough medical treatments. These new discoveries come with potential risks for the people who contribute their DNA, however, especially in the form of data hacks and other privacy concerns.

Richard and Loan Hill Department of Bioengineering alumna Gamze Gursoy received a National Institutes of Health grant to help develop systems that protect the identities of DNA donors and prevent information from leaking out of databases.

To do this, she plans to use an unlikely ally: trash.

Gursoy received a K99/R00 Pathway to Independence Award to better quantify privacy leakage from genomic summary results, or GSRs: the output of analyses that are done on genomics data drawn from a large group of individuals.

A few years ago, the NIH established a new policy that made genomic summary results publicly available and no longer kept behind a data-security firewall. While there are upsides to this decision, Gursoy said, there are also risks. For example, thanks to cheaper and portable genome-sequencing devices and electronic health records, a massive amount of information resides in these databases.

“There are many ways of breaching genomics privacy by compiling all health data together,” Gursoy said, leading to a need for a clearer understanding of the risk in releasing genomic summary results and the development of ways to mitigate this risk.

The grant will allow Gursoy to try and breach GSR databases so she can learn how to prevent these hacks in the future. This will allow her to quantify the mathematical bounds of the privacy risk associated with releasing the data publicly.

She will use discarded items like coffee cups and used tissues from consenting individuals to experimentally validate and fine-tune these breaches using traditional and new sequencing techniques. The idea is whether she is able to identify if an individual participated in a phenotype-genotype study by looking at the DNA information about the individual that she gathered surreptitiously. Based on these quantifications, she will then create new ways to make this data public while protecting the private information.

Lastly, Gursoy plans to create software that data producers can use to quantify the risk of data sharing and better inform study participants before they make the decision to contribute their DNA.

“I hope that quantification and mitigation of the privacy risk will increase participation in large-scale genomics studies, as it will create trust between the scientists and the participants,” Gursoy said. “As we know, when it comes to making statistical inference on human health, more participation is always better.”

Gamze, who was born and raised in Istanbul in Turkey, is now a postdoctoral scholar at Yale University. She received her PhD in bioinformatics from UIC in 2016, advised by Hill Professor Jie Liang. Before that, she studied chemical engineering at Bogazici University because of her interest in physics, chemistry, and math. It was a biopolymer class that changed her career trajectory and sent her halfway across the world to UIC.

“I was mesmerized by the physics of the proteins and DNA,” she said. “One of my undergrad professors recommended I apply to UIC’s bioinformatics concentration because she knew some of the faculty and felt it would be perfect fit for me, as they do physics-based computational and mathematical modeling on biopolymers like proteins.”

After completing her postdoc, Gursoy plans to pursue a tenure-track faculty position where she can start her own lab.