PRIVACY PRESERVING MACHINE LEARNING
PRIVACY PRESERVING MACHINE LEARNING
PRIVACY PRESERVING MACHINE LEARNING
PRIVACY PRESERVING MACHINE LEARNING

Harnessing AI While Securing Your Data

Harnessing AI While Securing Your Data

Harnessing AI While Securing Your Data

Harnessing AI While Securing Your Data

With AI based systems in general, and generative AI in particular, becoming ubiquitous - it's important to keep in mind that these systems critically rely on, and are shaped by massive amounts of data fueling the underpinning Machine Learning models. In many cases, this data might be private and / or sensitive. It becomes increasingly important to find ways to unlock the power of AI, while still respecting and protecting data privacy.

With AI based systems in general, and generative AI in particular, becoming ubiquitous - it's important to keep in mind that these systems critically rely on, and are shaped by massive amounts of data fueling the underpinning Machine Learning models. In many cases, this data might be private and / or sensitive. It becomes increasingly important to find ways to unlock the power of AI, while still respecting and protecting data privacy.

With AI based systems in general, and generative AI in particular, becoming ubiquitous - it's important to keep in mind that these systems critically rely on, and are shaped by massive amounts of data fueling the underpinning Machine Learning models. In many cases, this data might be private and / or sensitive. It becomes increasingly important to find ways to unlock the power of AI, while still respecting and protecting data privacy.

With AI based systems in general, and generative AI in particular, becoming ubiquitous - it's important to keep in mind that these systems critically rely on, and are shaped by massive amounts of data fueling the underpinning Machine Learning models. In many cases, this data might be private and / or sensitive. It becomes increasingly important to find ways to unlock the power of AI, while still respecting and protecting data privacy.

How is Data Privacy protected today?

The classic approach to data privacy protection is based on two pillars - data anonymization, and data encryption. However both have gaps which need to be addressed. It has been recently proven that the classic approach to anonymization (e.g. crossing out names or addresses) is vulnerable to reconstruction, and the data can be tied back to the original form. Encryption works well when data is at rest or in transit, but this is not sufficient for machine learning models which need to operate on clear data (hence the data need to be decrypted at some point within an ML pipeline). Apparently, additional layers of protection needed on both pillars

How is Data Privacy protected today?

The classic approach to data privacy protection is based on two pillars - data anonymization, and data encryption. However both have gaps which need to be addressed. It has been recently proven that the classic approach to anonymization (e.g. crossing out names or addresses) is vulnerable to reconstruction, and the data can be tied back to the original form. Encryption works well when data is at rest or in transit, but this is not sufficient for machine learning models which need to operate on clear data (hence the data need to be decrypted at some point within an ML pipeline). Apparently, additional layers of protection needed on both pillars

How is Data Privacy protected today?

The classic approach to data privacy protection is based on two pillars - data anonymization, and data encryption. However both have gaps which need to be addressed. It has been recently proven that the classic approach to anonymization (e.g. crossing out names or addresses) is vulnerable to reconstruction, and the data can be tied back to the original form. Encryption works well when data is at rest or in transit, but this is not sufficient for machine learning models which need to operate on clear data (hence the data need to be decrypted at some point within an ML pipeline). Apparently, additional layers of protection needed on both pillars

How is Data Privacy protected today?

The classic approach to data privacy protection is based on two pillars - data anonymization, and data encryption. However both have gaps which need to be addressed. It has been recently proven that the classic approach to anonymization (e.g. crossing out names or addresses) is vulnerable to reconstruction, and the data can be tied back to the original form. Encryption works well when data is at rest or in transit, but this is not sufficient for machine learning models which need to operate on clear data (hence the data need to be decrypted at some point within an ML pipeline). Apparently, additional layers of protection needed on both pillars

How Machine Learning disrupts Privacy Preserving

In recent years several new emerging techniques, collectively known as Privacy-Preserving Machine Learning, are gradually gaining ground in the industry. These techniques are based on recent breakthroughs in academy research in the areas of cryptography and AI, and appear to have some “magical” properties:

Federated Learning

It is possible for a group of separate entities to collectively train a machine learning model by pooling their data but without explicitly having to share it with each other

Fully Homomorphic Encryption

It is possible to do machine learning on data which comes encrypted, stays encrypted throughout inference, all way back to the owner

Differential Privacy

It is possible to preprocess private datasets in such a way that the modified dataset, while accurately preserving statistical qualities of the original dataset, can't be tied back to the presence or absence of any individual record

Federated Learning

Let’s illustrate the problem and the solution space with an example of fraud detection, applicable to a range of verticals (banking, e-commerce, healthcare). Typically we’d have multiple entities (banks, retailers, healthcare institutions, local authorities etc.- depending on a vertical) having a shared goal of having a powerful and comprehensive machine learning model capable of accurately detecting fraud. To achieve that, the model needs to be trained on as much as possible a wider and larger pool of data - which becomes possible only by combining data of participating entities. However, the participating entities may be not allowed, or unwilling to share their private data.

Here Federated Learning comes to the rescue:

DataFabq provides, as a starting point, an initial model trained on publicly available data

This model is shared with individual parties who independently feed the model for training with their private data, leading to a number of somewhat improved models, per party

DataFab then aggregates the resulting models (but not the incoming private data!) into a combined, federated model leveraging the learnings from each individual party’s data

This process can be repeated for further improvement of the model, to further enhancement

Federated Learning

Let’s illustrate the problem and the solution space with an example of fraud detection, applicable to a range of verticals (banking, e-commerce, healthcare). Typically we’d have multiple entities (banks, retailers, healthcare institutions, local authorities etc.- depending on a vertical) having a shared goal of having a powerful and comprehensive machine learning model capable of accurately detecting fraud. To achieve that, the model needs to be trained on as much as possible a wider and larger pool of data - which becomes possible only by combining data of participating entities. However, the participating entities may be not allowed, or unwilling to share their private data.

Here Federated Learning comes to the rescue:

DataFabq provides, as a starting point, an initial model trained on publicly available data

This model is shared with individual parties who independently feed the model for training with their private data, leading to a number of somewhat improved models, per party

DataFab then aggregates the resulting models (but not the incoming private data!) into a combined, federated model leveraging the learnings from each individual party’s data

This process can be repeated for further improvement of the model, to further enhancement

Federated Learning

Let’s illustrate the problem and the solution space with an example of fraud detection, applicable to a range of verticals (banking, e-commerce, healthcare). Typically we’d have multiple entities (banks, retailers, healthcare institutions, local authorities etc.- depending on a vertical) having a shared goal of having a powerful and comprehensive machine learning model capable of accurately detecting fraud. To achieve that, the model needs to be trained on as much as possible a wider and larger pool of data - which becomes possible only by combining data of participating entities. However, the participating entities may be not allowed, or unwilling to share their private data.

Here Federated Learning comes to the rescue:

DataFabq provides, as a starting point, an initial model trained on publicly available data

This model is shared with individual parties who independently feed the model for training with their private data, leading to a number of somewhat improved models, per party

DataFab then aggregates the resulting models (but not the incoming private data!) into a combined, federated model leveraging the learnings from each individual party’s data

This process can be repeated for further improvement of the model, to further enhancement

Federated Learning

Let’s illustrate the problem and the solution space with an example of fraud detection, applicable to a range of verticals (banking, e-commerce, healthcare). Typically we’d have multiple entities (banks, retailers, healthcare institutions, local authorities etc.- depending on a vertical) having a shared goal of having a powerful and comprehensive machine learning model capable of accurately detecting fraud. To achieve that, the model needs to be trained on as much as possible a wider and larger pool of data - which becomes possible only by combining data of participating entities. However, the participating entities may be not allowed, or unwilling to share their private data.

Here Federated Learning comes to the rescue:

DataFabq provides, as a starting point, an initial model trained on publicly available data

This model is shared with individual parties who independently feed the model for training with their private data, leading to a number of somewhat improved models, per party

DataFab then aggregates the resulting models (but not the incoming private data!) into a combined, federated model leveraging the learnings from each individual party’s data

This process can be repeated for further improvement of the model, to further enhancement

Differential Privacy

One concern with regard to the Federated Learning process may still remain. Contemporary models typically have high capacity (measured in billions of parameters), and thus are capable of memorizing certain aspects of the private data they were trained on. This raises a risk of somebody having access to a model could later potentially extract bits of the original private information. To prevent that from happening, DataFab applies Differential Privacy techniques during the training process:

A small, controlled amount of noise is introduced into everybody's data prior to training

The noise prevents the model from memorizing individual data points, making the model resilient to data reconstruction attacks

The way the noise is introduced guarantees high accuracy of the learnings from each individual party’s data, nevertheless the noise

Differential Privacy

One concern with regard to the Federated Learning process may still remain. Contemporary models typically have high capacity (measured in billions of parameters), and thus are capable of memorizing certain aspects of the private data they were trained on. This raises a risk of somebody having access to a model could later potentially extract bits of the original private information. To prevent that from happening, DataFab applies Differential Privacy techniques during the training process:

A small, controlled amount of noise is introduced into everybody's data prior to training

The noise prevents the model from memorizing individual data points, making the model resilient to data reconstruction attacks

The way the noise is introduced guarantees high accuracy of the learnings from each individual party’s data, nevertheless the noise

Differential Privacy

One concern with regard to the Federated Learning process may still remain. Contemporary models typically have high capacity (measured in billions of parameters), and thus are capable of memorizing certain aspects of the private data they were trained on. This raises a risk of somebody having access to a model could later potentially extract bits of the original private information. To prevent that from happening, DataFab applies Differential Privacy techniques during the training process:

A small, controlled amount of noise is introduced into everybody's data prior to training

The noise prevents the model from memorizing individual data points, making the model resilient to data reconstruction attacks

The way the noise is introduced guarantees high accuracy of the learnings from each individual party’s data, nevertheless the noise

Differential Privacy

One concern with regard to the Federated Learning process may still remain. Contemporary models typically have high capacity (measured in billions of parameters), and thus are capable of memorizing certain aspects of the private data they were trained on. This raises a risk of somebody having access to a model could later potentially extract bits of the original private information. To prevent that from happening, DataFab applies Differential Privacy techniques during the training process:

A small, controlled amount of noise is introduced into everybody's data prior to training

The noise prevents the model from memorizing individual data points, making the model resilient to data reconstruction attacks

The way the noise is introduced guarantees high accuracy of the learnings from each individual party’s data, nevertheless the noise

Homomorphic Encryption

Now when we’ve got a high quality, secure model trained collectively on multiple parties’ data, with data privacy well protected - we move to operate the system. DataFab can host the model, receive incoming transactions (or healthcare cases etc.) from individual parties, let the model infer a result (is it a fraud or not, and supplementary information), and send the result back to the requesting party. There might be a concern with this process as transaction data which might be very sensitive (e.g. credit card number, or health record etc.) becomes available in clear to DataFab, and maybe Cloud and other IT infrastructure vendors involved in the pipeline. This is a case where Fully Homomorphic Encryption resolves the concern:

Every transaction is encrypted at the requesting party, so only this party have access to the data in clear

An encrypted transaction is sent to DataFab, processed by an FHE enabled model while encrypted, resulting with an encrypted answer

An encrypted answer comes back to the requesting party, and only them can unlock and get access to the model inference results

Homomorphic Encryption

Now when we’ve got a high quality, secure model trained collectively on multiple parties’ data, with data privacy well protected - we move to operate the system. DataFab can host the model, receive incoming transactions (or healthcare cases etc.) from individual parties, let the model infer a result (is it a fraud or not, and supplementary information), and send the result back to the requesting party. There might be a concern with this process as transaction data which might be very sensitive (e.g. credit card number, or health record etc.) becomes available in clear to DataFab, and maybe Cloud and other IT infrastructure vendors involved in the pipeline. This is a case where Fully Homomorphic Encryption resolves the concern:

Every transaction is encrypted at the requesting party, so only this party have access to the data in clear

An encrypted transaction is sent to DataFab, processed by an FHE enabled model while encrypted, resulting with an encrypted answer

An encrypted answer comes back to the requesting party, and only them can unlock and get access to the model inference results

Homomorphic Encryption

Now when we’ve got a high quality, secure model trained collectively on multiple parties’ data, with data privacy well protected - we move to operate the system. DataFab can host the model, receive incoming transactions (or healthcare cases etc.) from individual parties, let the model infer a result (is it a fraud or not, and supplementary information), and send the result back to the requesting party. There might be a concern with this process as transaction data which might be very sensitive (e.g. credit card number, or health record etc.) becomes available in clear to DataFab, and maybe Cloud and other IT infrastructure vendors involved in the pipeline. This is a case where Fully Homomorphic Encryption resolves the concern:

Every transaction is encrypted at the requesting party, so only this party have access to the data in clear

An encrypted transaction is sent to DataFab, processed by an FHE enabled model while encrypted, resulting with an encrypted answer

An encrypted answer comes back to the requesting party, and only them can unlock and get access to the model inference results

Homomorphic Encryption

Now when we’ve got a high quality, secure model trained collectively on multiple parties’ data, with data privacy well protected - we move to operate the system. DataFab can host the model, receive incoming transactions (or healthcare cases etc.) from individual parties, let the model infer a result (is it a fraud or not, and supplementary information), and send the result back to the requesting party. There might be a concern with this process as transaction data which might be very sensitive (e.g. credit card number, or health record etc.) becomes available in clear to DataFab, and maybe Cloud and other IT infrastructure vendors involved in the pipeline. This is a case where Fully Homomorphic Encryption resolves the concern:

Every transaction is encrypted at the requesting party, so only this party have access to the data in clear

An encrypted transaction is sent to DataFab, processed by an FHE enabled model while encrypted, resulting with an encrypted answer

An encrypted answer comes back to the requesting party, and only them can unlock and get access to the model inference results

DATAFAB.AI

© 2025 DATAFAB.AI. All rights reserved.

DATAFAB.AI

© 2025 DATAFAB.AI. All rights reserved.

DATAFAB.AI

© 2025 DATAFAB.AI. All rights reserved.

DATAFAB.AI

© 2025 DATAFAB.AI. All rights reserved.