Differential Privacy Library Works with Machine Learning

Differential privacy is now an integral way for data scientists to learn from most of their data simultaneously. It thus ensures that those results do not allow any individuals to be re-identified or distinguished.

IBM released an open-source Differential Privacy Library to help more researchers with their work. Naoise Nolohan is a research staff member on IBM Research Europe’s privacy and security team.

According to him, the library boasts a suite of tools for data analytics and machine learning tasks. Moreover, it comes with built-in privacy guarantees.

On Friday, Holohan wrote in a blog post that their library is unique to others. This is because it gives developers and scientists access to user-friendly, lightweight tools for machine learning and data analytics in a well-known environment. Moreover, most tasks can function with only a single line of code.

Also, what sets their library apart is their machine learning functionality. It enables organizations to share and publish data with rigorous guarantees on user privacy like never.

Holohan explained, in an interview, that differential privacy has become immensely popular. Thus, for the first time in its 230-year history, the United States Census will now use differential privacy. It will keep responses from citizens confidential when the data is made available.

Chris Sciacca is a communications manager at IBM Research. He added that the 2020 Census was an excellent example of differential privacy can be used for any large data sets where everyone can do statistical analysis.

Differential Privacy

Sciacca said that healthcare data would be another area that this would be interesting for. Any significant data sets where you keep the data anonymous. Nevertheless, you do not want to add so much noise so that it becomes useless. Thus, you can add a bit of noise wherever you can still get statistical anomalies to look at trends in large sets of data.

Differential privacy allows data collectors to anonymize information with the use of mathematical noise. Thus, the library of IBM stands out. This is because it has machine learning functionality, which enables organizations to share and publish their data with rigorous guarantees on user privacy.

He said that, initially, they started to look at the space of differential privacy and open-source software. They thus noticed that there was a big gap in the market in terms of being able to do machine learning with differential privacy easily. There is a lot of work available in the literature about it. They studied all the algorithms and made them differentially private, and have presented all the solutions. Nevertheless, there was no single library or single respiratory to do machine learning with differential privacy.

Thus, they decided to build this library. The library uses existing packages in Python. Therefore, it allows you to create your information on top of them. Moreover, you can do machine learning with differential privacy guarantees built-in. Thus, it is very user friendly. Moreover, it is straightforward to use. It can be integrated easily within scripts that people already have.

Let us see how successful the project will be.