PRIVACY PRESERVING MACHINE LEARNING
PRIVACY PRESERVING MACHINE LEARNING
PRIVACY PRESERVING MACHINE LEARNING
PRIVACY PRESERVING MACHINE LEARNING
Harnessing AI While Securing Your Data
Harnessing AI While Securing Your Data
Harnessing AI While Securing Your Data
Harnessing AI While Securing Your Data
With AI based systems in general, and generative AI in particular, becoming ubiquitous - it's important to keep in mind that these systems critically rely on, and are shaped by massive amounts of data fueling the underpinning Machine Learning models. In many cases, this data might be private and / or sensitive. It becomes increasingly important to find ways to unlock the power of AI, while still respecting and protecting data privacy.
With AI based systems in general, and generative AI in particular, becoming ubiquitous - it's important to keep in mind that these systems critically rely on, and are shaped by massive amounts of data fueling the underpinning Machine Learning models. In many cases, this data might be private and / or sensitive. It becomes increasingly important to find ways to unlock the power of AI, while still respecting and protecting data privacy.
With AI based systems in general, and generative AI in particular, becoming ubiquitous - it's important to keep in mind that these systems critically rely on, and are shaped by massive amounts of data fueling the underpinning Machine Learning models. In many cases, this data might be private and / or sensitive. It becomes increasingly important to find ways to unlock the power of AI, while still respecting and protecting data privacy.
With AI based systems in general, and generative AI in particular, becoming ubiquitous - it's important to keep in mind that these systems critically rely on, and are shaped by massive amounts of data fueling the underpinning Machine Learning models. In many cases, this data might be private and / or sensitive. It becomes increasingly important to find ways to unlock the power of AI, while still respecting and protecting data privacy.
How is Data Privacy protected today?
The classic approach to data privacy protection is based on two pillars - data anonymization, and data encryption. However both have gaps which need to be addressed. It has been recently proven that the classic approach to anonymization (e.g. crossing out names or addresses) is vulnerable to reconstruction, and the data can be tied back to the original form. Encryption works well when data is at rest or in transit, but this is not sufficient for machine learning models which need to operate on clear data (hence the data need to be decrypted at some point within an ML pipeline). Apparently, additional layers of protection needed on both pillars
How is Data Privacy protected today?
The classic approach to data privacy protection is based on two pillars - data anonymization, and data encryption. However both have gaps which need to be addressed. It has been recently proven that the classic approach to anonymization (e.g. crossing out names or addresses) is vulnerable to reconstruction, and the data can be tied back to the original form. Encryption works well when data is at rest or in transit, but this is not sufficient for machine learning models which need to operate on clear data (hence the data need to be decrypted at some point within an ML pipeline). Apparently, additional layers of protection needed on both pillars
How is Data Privacy protected today?
The classic approach to data privacy protection is based on two pillars - data anonymization, and data encryption. However both have gaps which need to be addressed. It has been recently proven that the classic approach to anonymization (e.g. crossing out names or addresses) is vulnerable to reconstruction, and the data can be tied back to the original form. Encryption works well when data is at rest or in transit, but this is not sufficient for machine learning models which need to operate on clear data (hence the data need to be decrypted at some point within an ML pipeline). Apparently, additional layers of protection needed on both pillars
How is Data Privacy protected today?
The classic approach to data privacy protection is based on two pillars - data anonymization, and data encryption. However both have gaps which need to be addressed. It has been recently proven that the classic approach to anonymization (e.g. crossing out names or addresses) is vulnerable to reconstruction, and the data can be tied back to the original form. Encryption works well when data is at rest or in transit, but this is not sufficient for machine learning models which need to operate on clear data (hence the data need to be decrypted at some point within an ML pipeline). Apparently, additional layers of protection needed on both pillars




How Machine Learning disrupts Privacy Preserving
In recent years several new emerging techniques, collectively known as Privacy-Preserving Machine Learning, are gradually gaining ground in the industry. These techniques are based on recent breakthroughs in academy research in the areas of cryptography and AI, and appear to have some “magical” properties:
Federated Learning
It is possible for a group of separate entities to collectively train a machine learning model by pooling their data but without explicitly having to share it with each other
Fully Homomorphic Encryption
It is possible to do machine learning on data which comes encrypted, stays encrypted throughout inference, all way back to the owner
Differential Privacy
It is possible to preprocess private datasets in such a way that the modified dataset, while accurately preserving statistical qualities of the original dataset, can't be tied back to the presence or absence of any individual record
Federated Learning
Let’s illustrate the problem and the solution space with an example of fraud detection, applicable to a range of verticals (banking, e-commerce, healthcare). Typically we’d have multiple entities (banks, retailers, healthcare institutions, local authorities etc.- depending on a vertical) having a shared goal of having a powerful and comprehensive machine learning model capable of accurately detecting fraud. To achieve that, the model needs to be trained on as much as possible a wider and larger pool of data - which becomes possible only by combining data of participating entities. However, the participating entities may be not allowed, or unwilling to share their private data.
Here Federated Learning comes to the rescue:
DataFabq provides, as a starting point, an initial model trained on publicly available data
This model is shared with individual parties who independently feed the model for training with their private data, leading to a number of somewhat improved models, per party
DataFab then aggregates the resulting models (but not the incoming private data!) into a combined, federated model leveraging the learnings from each individual party’s data
This process can be repeated for further improvement of the model, to further enhancement
Federated Learning
Let’s illustrate the problem and the solution space with an example of fraud detection, applicable to a range of verticals (banking, e-commerce, healthcare). Typically we’d have multiple entities (banks, retailers, healthcare institutions, local authorities etc.- depending on a vertical) having a shared goal of having a powerful and comprehensive machine learning model capable of accurately detecting fraud. To achieve that, the model needs to be trained on as much as possible a wider and larger pool of data - which becomes possible only by combining data of participating entities. However, the participating entities may be not allowed, or unwilling to share their private data.
Here Federated Learning comes to the rescue:
DataFabq provides, as a starting point, an initial model trained on publicly available data
This model is shared with individual parties who independently feed the model for training with their private data, leading to a number of somewhat improved models, per party
DataFab then aggregates the resulting models (but not the incoming private data!) into a combined, federated model leveraging the learnings from each individual party’s data
This process can be repeated for further improvement of the model, to further enhancement
Federated Learning
Let’s illustrate the problem and the solution space with an example of fraud detection, applicable to a range of verticals (banking, e-commerce, healthcare). Typically we’d have multiple entities (banks, retailers, healthcare institutions, local authorities etc.- depending on a vertical) having a shared goal of having a powerful and comprehensive machine learning model capable of accurately detecting fraud. To achieve that, the model needs to be trained on as much as possible a wider and larger pool of data - which becomes possible only by combining data of participating entities. However, the participating entities may be not allowed, or unwilling to share their private data.
Here Federated Learning comes to the rescue:
DataFabq provides, as a starting point, an initial model trained on publicly available data
This model is shared with individual parties who independently feed the model for training with their private data, leading to a number of somewhat improved models, per party
DataFab then aggregates the resulting models (but not the incoming private data!) into a combined, federated model leveraging the learnings from each individual party’s data
This process can be repeated for further improvement of the model, to further enhancement
Federated Learning
Let’s illustrate the problem and the solution space with an example of fraud detection, applicable to a range of verticals (banking, e-commerce, healthcare). Typically we’d have multiple entities (banks, retailers, healthcare institutions, local authorities etc.- depending on a vertical) having a shared goal of having a powerful and comprehensive machine learning model capable of accurately detecting fraud. To achieve that, the model needs to be trained on as much as possible a wider and larger pool of data - which becomes possible only by combining data of participating entities. However, the participating entities may be not allowed, or unwilling to share their private data.
Here Federated Learning comes to the rescue:
DataFabq provides, as a starting point, an initial model trained on publicly available data
This model is shared with individual parties who independently feed the model for training with their private data, leading to a number of somewhat improved models, per party
DataFab then aggregates the resulting models (but not the incoming private data!) into a combined, federated model leveraging the learnings from each individual party’s data
This process can be repeated for further improvement of the model, to further enhancement








Differential Privacy
One concern with regard to the Federated Learning process may still remain. Contemporary models typically have high capacity (measured in billions of parameters), and thus are capable of memorizing certain aspects of the private data they were trained on. This raises a risk of somebody having access to a model could later potentially extract bits of the original private information. To prevent that from happening, DataFab applies Differential Privacy techniques during the training process:
A small, controlled amount of noise is introduced into everybody's data prior to training
The noise prevents the model from memorizing individual data points, making the model resilient to data reconstruction attacks
The way the noise is introduced guarantees high accuracy of the learnings from each individual party’s data, nevertheless the noise
Differential Privacy
One concern with regard to the Federated Learning process may still remain. Contemporary models typically have high capacity (measured in billions of parameters), and thus are capable of memorizing certain aspects of the private data they were trained on. This raises a risk of somebody having access to a model could later potentially extract bits of the original private information. To prevent that from happening, DataFab applies Differential Privacy techniques during the training process:
A small, controlled amount of noise is introduced into everybody's data prior to training
The noise prevents the model from memorizing individual data points, making the model resilient to data reconstruction attacks
The way the noise is introduced guarantees high accuracy of the learnings from each individual party’s data, nevertheless the noise
Differential Privacy
One concern with regard to the Federated Learning process may still remain. Contemporary models typically have high capacity (measured in billions of parameters), and thus are capable of memorizing certain aspects of the private data they were trained on. This raises a risk of somebody having access to a model could later potentially extract bits of the original private information. To prevent that from happening, DataFab applies Differential Privacy techniques during the training process:
A small, controlled amount of noise is introduced into everybody's data prior to training
The noise prevents the model from memorizing individual data points, making the model resilient to data reconstruction attacks
The way the noise is introduced guarantees high accuracy of the learnings from each individual party’s data, nevertheless the noise
Differential Privacy
One concern with regard to the Federated Learning process may still remain. Contemporary models typically have high capacity (measured in billions of parameters), and thus are capable of memorizing certain aspects of the private data they were trained on. This raises a risk of somebody having access to a model could later potentially extract bits of the original private information. To prevent that from happening, DataFab applies Differential Privacy techniques during the training process:
A small, controlled amount of noise is introduced into everybody's data prior to training
The noise prevents the model from memorizing individual data points, making the model resilient to data reconstruction attacks
The way the noise is introduced guarantees high accuracy of the learnings from each individual party’s data, nevertheless the noise
Homomorphic Encryption
Now when we’ve got a high quality, secure model trained collectively on multiple parties’ data, with data privacy well protected - we move to operate the system. DataFab can host the model, receive incoming transactions (or healthcare cases etc.) from individual parties, let the model infer a result (is it a fraud or not, and supplementary information), and send the result back to the requesting party. There might be a concern with this process as transaction data which might be very sensitive (e.g. credit card number, or health record etc.) becomes available in clear to DataFab, and maybe Cloud and other IT infrastructure vendors involved in the pipeline. This is a case where Fully Homomorphic Encryption resolves the concern:
Every transaction is encrypted at the requesting party, so only this party have access to the data in clear
An encrypted transaction is sent to DataFab, processed by an FHE enabled model while encrypted, resulting with an encrypted answer
An encrypted answer comes back to the requesting party, and only them can unlock and get access to the model inference results
Homomorphic Encryption
Now when we’ve got a high quality, secure model trained collectively on multiple parties’ data, with data privacy well protected - we move to operate the system. DataFab can host the model, receive incoming transactions (or healthcare cases etc.) from individual parties, let the model infer a result (is it a fraud or not, and supplementary information), and send the result back to the requesting party. There might be a concern with this process as transaction data which might be very sensitive (e.g. credit card number, or health record etc.) becomes available in clear to DataFab, and maybe Cloud and other IT infrastructure vendors involved in the pipeline. This is a case where Fully Homomorphic Encryption resolves the concern:
Every transaction is encrypted at the requesting party, so only this party have access to the data in clear
An encrypted transaction is sent to DataFab, processed by an FHE enabled model while encrypted, resulting with an encrypted answer
An encrypted answer comes back to the requesting party, and only them can unlock and get access to the model inference results
Homomorphic Encryption
Now when we’ve got a high quality, secure model trained collectively on multiple parties’ data, with data privacy well protected - we move to operate the system. DataFab can host the model, receive incoming transactions (or healthcare cases etc.) from individual parties, let the model infer a result (is it a fraud or not, and supplementary information), and send the result back to the requesting party. There might be a concern with this process as transaction data which might be very sensitive (e.g. credit card number, or health record etc.) becomes available in clear to DataFab, and maybe Cloud and other IT infrastructure vendors involved in the pipeline. This is a case where Fully Homomorphic Encryption resolves the concern:
Every transaction is encrypted at the requesting party, so only this party have access to the data in clear
An encrypted transaction is sent to DataFab, processed by an FHE enabled model while encrypted, resulting with an encrypted answer
An encrypted answer comes back to the requesting party, and only them can unlock and get access to the model inference results
Homomorphic Encryption
Now when we’ve got a high quality, secure model trained collectively on multiple parties’ data, with data privacy well protected - we move to operate the system. DataFab can host the model, receive incoming transactions (or healthcare cases etc.) from individual parties, let the model infer a result (is it a fraud or not, and supplementary information), and send the result back to the requesting party. There might be a concern with this process as transaction data which might be very sensitive (e.g. credit card number, or health record etc.) becomes available in clear to DataFab, and maybe Cloud and other IT infrastructure vendors involved in the pipeline. This is a case where Fully Homomorphic Encryption resolves the concern:
Every transaction is encrypted at the requesting party, so only this party have access to the data in clear
An encrypted transaction is sent to DataFab, processed by an FHE enabled model while encrypted, resulting with an encrypted answer
An encrypted answer comes back to the requesting party, and only them can unlock and get access to the model inference results





DATAFAB.AI
© 2025 DATAFAB.AI. All rights reserved.

DATAFAB.AI
© 2025 DATAFAB.AI. All rights reserved.

DATAFAB.AI
© 2025 DATAFAB.AI. All rights reserved.

DATAFAB.AI
© 2025 DATAFAB.AI. All rights reserved.