PrivacyProof AI Challenges

chevron-icon
Back
project-presentation-img
Sudipto De
Project Owner

PrivacyProof AI Challenges

Funding Requested

$96,000 USD

Expert Review
Star Filled Image Star Filled Image Star Filled Image Star Filled Image Star Filled Image 0
Community
Star Filled Image Star Filled Image Star Filled Image Star Filled Image Star Filled Image 3.9 (8)

Overview

PrivacyProof AI Challenges - Unleash the wisdom-of the-crowd to hard data problems without sharing private data. A private Kaggle platform, that will leak zero data, while till enabling algorithms (and accompanied innovation) to thrive by engaging the AI community while ensuring comprehensive audit and security.

Proposal Description

How Our Project Will Contribute To The Growth Of The Decentralized AI Platform

The SynapseChain platform will be similar to Kaggle, (e.g., datasets and problem statements related to various concerns/topics/themes where data sharing is an issue – e.g., Pharma, Healthcare). It will host competitions with associated prize money, encouraging innovators and scientists to contribute their AI/ML skills to solve problems. It should also encourage data aggregators to submit their real-world data while providing robust data privacy guarantees.

Our Team

Arbiteria’s main strengths would be:
 1) Our expert advisory team
 2) Collaboration with folks from Zero2AI
 3) Technical acumen
 4) Reliability
 5) Successfully completed project in DF3: SingularityNet Marketplace ChatGPT plugin
 6) Innovation: Synthetic data generation (only time series data for now). Extensible Algorithm blind testing. Easy to use UI

View Team

AI services (New or Existing)

SynapseChain_Generic

Type

New AI service

Purpose

We will be finalizing the actual AI services in the design and architecture stage. These will include: Standard Dataset services •Generate and manage synthetic databases •The Model Builder and Evaluation service •The Challenge management service •The reporting service

AI inputs

Multiple please refer to long description above. Will be finalizing the actual AI services in the design and architecture stage

AI outputs

Multiple. Please refer to long description above for details. Will be finalizing the actual AI services in the design and architecture stage

Company Name (if applicable)

Arbiteria

The core problem we are aiming to solve

Healthcare, Insurance and Pharma companies have collected a wealth of data that is locked behind corporate firewalls. Increased sensitivity to personal data (especially health data) and scrutiny from government agencies, regulations like GDPR and CCPA, and increased exposure to liability and reputation have made these companies extremely risk-averse to sharing data with third parties and startups. Innovation in AI and analytics is grown 10x in recent years but in the regulated space it is increasingly disconnected from real data because corporations are reluctant to share data for training AI models (or just old-fashioned analytics). This grinds true innovation to a halt and benefits to consumers in terms of new therapies, more targeted clinical studies, and better efficacy in the field of HealthCare. Small and large businesses alike can benefit from reducing risks, improving operational efficiencies, and better customer interactions with community powered AI solutions. 

Our specific solution to this problem

SynapseChain, in its final form, will be designed as a set of micro-services. Through this platform, the SingularityNET community should be able to:

•Use standard Dataset services
•Generate and manage synthetic databases
•Use the Model Builder and Evaluation service
•Use the Challenge management service
•Use the reporting service

Project details

The platform is designed as a set of micro-services. The microservice architectural style is an approach to develop an application as a suite of smaller services, each running in its own container/process, and communicating using an HTTP API. These services are built around platform capabilities and are independently deployable.

Key features:
    Microservices reinforce modular structure, a key component of maintainable code.
    Services can be deployed independently, without leading to catastrophic failures.
    Mix and match languages, development framework and database technologies.
    Microservices enable hyper-scaling an application, especially when combined with cloud-native infrastructure.

 
*Core Services*
Dataset service
The Dataset service manages the datasets that are uploaded on the system. It is responsible for the following tasks:
•    Ingestion of a dataset (uploaded by a Dataset Aggregator)
•    Data format validation (schema check)
•    Data content validation (check for malicious code)
•    Dataset persistence on versioned Object storage
•    Listing of available datasets (by Dataset Aggregator)
•    Generate instrumentation data
 
Synthetic data generation service
The Synthetic Data generation service has one primary responsibility – to generate and manage synthetic datasets on the platform. It performs the following tasks:
•    Ingest original dataset
•    Get a list of synthetic data generation algorithms available
•    Generate synthetic dataset (default algorithm)
•    Generate synthetic dataset using a specific algorithm.
•    Benchmark quality of a synthetic dataset and return stats.
•    Save synthetic dataset to a versioned Object storage.
•    Generate instrumentation data
 
Model builder & evaluation service
The Model builder and evaluation service trains models and evaluates them against available datasets. It is the workhorse of the platform. Some of its roles are:
•    Ingest GIT repo
•    Build container image
•    Deploy & run container
•    Train model
•    Evaluate model
•    Generate instrumentation stats during:
      o      Building
      o            Training
      o            Evaluation
•    Get model performance stats
•    Persist the model container image in the model registry.
Challenge Management service
The challenge service manages challenges on the platform including:
•    Adding challenges
•    Associating challenges with a dataset (or multiple dataset)
•    Removing challenges.
•    Updating challenges, their datasets, rewards, etc.
•    Auto-expiring challenges when the competition window ends.
•    Auto-rewarding participants when the competition window ends.
•    Keeping tack of participants in the challenge and their performance scores.
Reporting service
The reporting service generates PDF or HTML reports given model or challenge statistics. It may also accept a destination email and forward the report to that email.
The platform is designed as a set of micro-services.
The microservice architectural style is an approach to develop an application as a suite of smaller services, each running in its own container/process, and communicating using an HTTP API. These services are built around platform capabilities and are independently deployable.
Key features:
    Microservices reinforce modular structure, a key component of maintainable code.
    Services can be deployed independently, without leading to catastrophic failures.
    Mix and match languages, development framework and database technologies.
    Microservices enable hyper-scaling an application, especially when combined with cloud-native infrastructure.
 
Core Services
Dataset service
The Dataset service manages the datasets that are uploaded on the system. It is responsible for the following tasks:
•    Ingestion of a dataset (uploaded by a Dataset Aggregator)
•    Data format validation (schema check)
•    Data content validation (check for malicious code)
•    Dataset persistence on versioned Object storage
•    Listing of available datasets (by Dataset Aggregator)
•    Generate instrumentation data
 
Synthetic Data generation service
The Synthetic Data generation service has one primary responsibility – to generate and manage synthetic datasets on the platform. It performs the following tasks:
 
•    Ingest original dataset
•    Get a list of synthetic data generation algorithms available
•    Generate synthetic dataset (default algorithm)
•    Generate synthetic dataset using a specific algorithm.
•    Benchmark quality of a synthetic dataset and return stats.
•    Save synthetic dataset to a versioned Object storage.
•    Generate instrumentation data
 
Model builder & evaluation service
The Model builder and evaluation service trains models and evaluates them against available datasets. It is the workhorse of the platform. Some of its roles are:
•    Ingest GIT repo
•    Build container image
•    Deploy & run container
•    Train model
•    Evaluate model
•    Generate instrumentation stats during:
     o      Building
     o      Training
     o      Evaluation
•    Get model performance stats
•    Persist the model container image in the model registry.

Challenge Management service-
The challenge service manages challenges on the platform including:
•    Adding challenges
•    Associating challenges with a dataset (or multiple dataset)
•    Removing challenges.
•    Updating challenges, their datasets, rewards, etc.
•    Auto-expiring challenges when the competition window ends.
•    Auto-rewarding participants when the competition window ends.
•    Keeping tack of participants in the challenge and their performance scores.

Reporting service
The reporting service generates PDF or HTML reports given model or challenge statistics. It may also accept a destination email and forward the report to that email.

SNET growth
Opening up SNET to potentially commercial solution space. First Data focused service!! Bring DataUnions into the fold, not just AI services.

USP
Synthetic data generation using GenAI. Full verifiable audit trails to ensure Trust in the data handling functions. Portable, offline AI algo development with submission as Dockerfile specs with auto evaluation of submissions and Leaderboard updates. Micro-services serverless architecture - extreme cost effectiveness

Competition and USPs

USP
Synthetic data generation using GenAI. Full verifiable audit trails to ensure Trust in the data handling functions. Portable, offline AI algo development with submission as Dockerfile specs with auto evaluation of submissions and Leaderboard updates. Micro-services serverless architecture - extreme cost effectiveness

Competition
Federated leaning via Homomorphic Encryption is unwieldy, slow and restrictive. Data Unions are only a an artifact/transaction mechanism, not offering any data privacy post acquisition. Compute2Data from Ocean forces Algorithm and Data from to be single source.

Links and references

https://www.arbiteria.com/

About Us: (Arbiteria)

Arbiteria’s vision is to create a collaborative, continuously learning community of gig contributors to the AI ecosystem.

Our AI mentors/ team ( https://www.arbiteria.com/our-mentors)

Our project got selected in Deep Funding Round 3, code -DFR3-TLG3 - SingularityNet Marketplace ChatGPT plugin.  Successfully delivered 3 out of 4 milestones within the given timelines.

We also are pleased to have Robin Lehmann from DataUnion as an advisor.

 

Additional videos

NA

Revenue Sharing Model

Token Allocation

We are planning for a token launch in:

2025-Q4

If awarded by Deep Funding, we will allocate this percentage of the total token supply to SNET / Deep Funding:

5

Token Description (type, value, utility):

To be disclosed later

Proposal Video

Placeholder for Spotlight Day Pitch-presentations. Video's will be added by the DF team when available.

  • Total Milestones

    9

  • Total Budget

    $96,000 USD

  • Last Updated

    20 May 2024

Milestone 1 - API Calls & Hostings

Description

This milestone represents the required reservation of 25% of your total requested budget for API calls or hosting costs. Because it is required we have prefilled it for you and it cannot be removed or adapted.

Deliverables

You can use this amount for payment of API calls on our platform. Use it to call other services or use it as a marketing instrument to have other parties try out your service. Alternatively you can use it to pay for hosting and computing costs.

Budget

$24,000 USD

Milestone 2 - Design and Architecture

Description

We will first create the design and architecture for this platform. Note: This is for time series data only

Deliverables

Detailed design document with architecture endpoints and schemas etc.

Budget

$9,000 USD

Milestone 3 - Cloud Services Provisioning

Description

Creating preparing and activating the underlying infrastructure of a cloud environment

Deliverables

Setting up the cloud environment for the platform

Budget

$3,000 USD

Milestone 4 - API with dummy implementation

Description

We will be creating a prototype with API calls and dummy implementation

Deliverables

API with dummy implementation

Budget

$5,000 USD

Milestone 5 - Synthetic data generation

Description

We will be creating the micro service for synthetic data generation

Deliverables

Synthetic data generation micro service

Budget

$10,000 USD

Milestone 6 - Challenge creation and admin APIs.

Description

In this milestone we will be creating micro services for challenge creation and admin API's

Deliverables

We will be adding challenge creation and admin API's as micro services to the platform

Budget

$10,000 USD

Milestone 7 - Evaluation APIs

Description

In this milestone we will be creating the model evaluation API micro service

Deliverables

We will be adding the model evaluation API micro service to the platform

Budget

$13,000 USD

Milestone 8 - Alpha - website and documentation

Description

Alpha release of SynapseChain platform with all micro services active. Complete documentation for all micro services.

Deliverables

SynapseChain platform - Alpha along with documentation

Budget

$13,000 USD

Milestone 9 - Beta release of SynapseChain platform

Description

Beta release of SynapseChain platform

Deliverables

SynapseChain platform - Beta release

Budget

$9,000 USD

Join the Discussion (1)

Sort by

1 Comment
  • 0
    commentator-avatar
    CLEMENT
    Jun 2, 2024 | 10:23 AM

    Great Job Sudipito. I just want to ask.  How do you plan on mitigating concerns such as data security, privacy protection, and algorithmic bias which can impact user confidence in your platform's capabilities ?

Reviews & Rating

Sort by

8 ratings
  • 0
    user-icon
    TrucTrixie
    Jun 9, 2024 | 2:00 PM

    Overall

    4

    • Feasibility 4
    • Viability 4
    • Desirabilty 5
    • Usefulness 4
    Ways to attract suppliers

    Engaging the community, attracting new people, and attracting data providers is important. How can we attract them? The answer here is that the team must have enough appeal from the way it works to the performance, even the promotional communications. I hope the team will take note of my opinion.

  • 0
    user-icon
    CLEMENT
    Jun 2, 2024 | 10:42 AM

    Overall

    4

    • Feasibility 3
    • Viability 4
    • Desirabilty 4
    • Usefulness 4
    Addresses need for data privacy preservation

    I find this proposer's innovative approach of leveraging the wisdom of the crowd for solving data problems without compromising privacy. For me, this is a sought after initiative that will enable collaboration and innovation within the AI community while ensuring comprehensive audit and security measures to protect sensitive data. More clearly, participants can engage in data challenges without sharing private data, the project promotes collaboration and knowledge sharing while safeguarding individual privacy rights.

    I believe on the SNET AI Marketplace, this project will see the integration of services such as standard dataset services, synthetic database generation, model building, evaluation, challenge management, and reporting enhances the marketplace's offerings, providing users with additional tools and resources for AI development and research. 

    I find this proposer's innovative approach of leveraging the wisdom of the crowd for solving data problems without compromising privacy. For me, this is a sought after initiative that will enable collaboration and innovation within the AI community while ensuring comprehensive audit and security measures to protect sensitive data. More clearly, participants can engage in data challenges without sharing private data, the project promotes collaboration and knowledge sharing while safeguarding individual privacy rights.

    I believe on the SNET AI Marketplace, this project will see the integration of services such as standard dataset services, synthetic database generation, model building, evaluation, challenge management, and reporting enhances the marketplace's offerings, providing users with additional tools and resources for AI development and research. 

  • 0
    user-icon
    Max1524
    Jun 8, 2024 | 3:58 PM

    Overall

    4

    • Feasibility 4
    • Viability 4
    • Desirabilty 4
    • Usefulness 4
    Expect to give a deadline for completion

    The lack of time representation at the 9 milestones greatly reduces the feasibility of this proposal (I think). Although I have to admit the analysis at each milestone is very good to commend. Or at least the team should demonstrate their professionalism by stating a specific time to complete this proposal. I look forward to that.

  • 0
    user-icon
    Gombilla
    Jun 10, 2024 | 10:44 AM

    Overall

    4

    • Feasibility 4
    • Viability 4
    • Desirabilty 4
    • Usefulness 4
    Ensures efficient data validation

    I would like to remind the team that handling datasets and ensuring their privacy and integrity involves significant security measures. Any lapses could lead to data breaches or unauthorized access. The team should beef up in this regard. 

    I would comment that the dataset service offered by this platform will ensure efficient data ingestion, validation, and persistence, enhancing the overall data management capabilities of the platform.

  • 0
    user-icon
    Joseph Gastoni
    May 22, 2024 | 1:01 PM

    Overall

    4

    • Feasibility 4
    • Viability 3
    • Desirabilty 3
    • Usefulness 4
    a platform for privacy-preserving AI development.

    This proposal outlines a platform (SynapseChain) for privacy-preserving AI development, allowing competitions on synthetic or real data without data sharing. Here's a breakdown of its strengths and weaknesses:

    Feasibility:

    • Moderate-High: The core functionalities (data management, synthetic data generation, model building/evaluation) leverage existing technologies.
      • Strengths: The concept builds on established microservice architecture and privacy-preserving techniques.
      • Weaknesses: Challenges might arise in ensuring the quality and representativeness of synthetic data, and the efficiency of handling large datasets.

    Viability:

    • Moderate: Success depends on attracting data providers, AI developers, the effectiveness of synthetic data generation, and the platform's ability to provide valuable solutions compared to existing approaches.
      • Strengths: The proposal addresses a need for privacy-preserving AI development in regulated industries like healthcare.
      • Weaknesses: The proposal lacks details on the incentive structure for data providers and the long-term sustainability of the platform.

    Desirability:

    • Moderate-High: For companies seeking to leverage AI on sensitive data while maintaining privacy, this could be desirable.
      • Strengths: The proposal offers a unique approach to collaborative AI development without data sharing.
      • Weaknesses: The proposal needs to demonstrate the accuracy and usefulness of AI models trained on synthetic data compared to real data.

    Usefulness:

    • Moderate-High: The project has the potential to accelerate AI innovation in privacy-restricted domains, but its impact depends on the quality of synthetic data and user adoption.
      • Strengths: The proposal offers functionalities for data privacy, synthetic data generation, and model development in a challenge-driven environment.
      • Weaknesses: The proposal lacks details on how the platform will address potential biases in synthetic data and the evolving nature of AI algorithms.

    Overall, the PrivacyProof AI Challenges project has a promising approach, but focus on:

    • Synthetic Data Quality: Demonstrating the effectiveness of GenAI in generating synthetic data that accurately represents real-world data for AI training.
    • Incentive Structure: Developing a clear plan for attracting data providers and AI developers to the platform, potentially through rewards or revenue sharing.
    • Long-Term Sustainability: Outlining a plan for the platform's long-term operation and financial viability.
    • Comparison with Existing Solutions: Providing a more comprehensive comparison with existing privacy-preserving AI solutions (federated learning, secure enclaves) highlighting SynapseChain's unique advantages.

    Strengths:

    • Focuses on privacy-preserving AI development using synthetic data and micro-services architecture.
    • Offers functionalities for data management, synthetic data generation, model building, and evaluation.
    • Addresses the challenge of data privacy in AI development for regulated industries.

  • 0
    user-icon
    BlackCoffee
    Jun 10, 2024 | 12:47 AM

    Overall

    4

    • Feasibility 4
    • Viability 4
    • Desirabilty 4
    • Usefulness 4
    Perceived risk of bias from the beginning

    Surely during the implementation process there will be data discrepancies and even information leaks. These are risks that the team must be aware of in order to be alert in advance and it is relevant to the long-term viability of the proposal. Can the team analyze so the community understands the solution it offers to minimize data discrepancies? Thank you team.

  • 0
    user-icon
    Tu Nguyen
    May 23, 2024 | 2:59 AM

    Overall

    4

    • Feasibility 4
    • Viability 4
    • Desirabilty 4
    • Usefulness 5
    PrivacyProof AI Challenges

    The problem this proposal will address is the risks of sharing personal data with 3rd parties and startups. Solution: they would design a platform as a collection of microservices. Through this platform, users can use Standard Dataset services, create and manage aggregate databases, use evaluation and model building services, use management services challenge, use reporting service. This is a useful solution in practice. Hope they will implement it successfully.
    The project team is people with a lot of experience. They also completed a funded project in DF3.
    Another idea: They should determine the start and end times of milestones.

  • 0
    user-icon
    Nicolad2008
    Jun 7, 2024 | 3:03 PM

    Overall

    3

    • Feasibility 4
    • Viability 4
    • Desirabilty 3
    • Usefulness 3
    share health data

    An important solution to security and privacy issues in the digital age, to create AI systems capable of effectively protecting user data. Using advanced encryption and machine learning algorithms, the project has a strong technical foundation for developing data security solutions. With extensive research on the application of AI in protecting privacy, PrivacyProof can provide methods to prevent fraud and protect users' personal information, while helping organizations comply with regulations. legal regulations such as GDPR. However, the project also faces significant challenges. Integrating AI into data security not only requires solving complex technical problems but also ensuring that privacy rights are not violated. Furthermore, user acceptance of the use of AI in security is a major psychological obstacle, as many people may fear that AI will invade their privacy. To achieve optimal effectiveness and widespread adoption, PrivacyProof needs to address these challenges through technological innovation and building user trust.

Summary

Overall Community

3.9

from 8 reviews
  • 5
    0
  • 4
    7
  • 3
    1
  • 2
    0
  • 1
    0

Feasibility

3.9

from 8 reviews

Viability

3.9

from 8 reviews

Desirabilty

3.9

from 8 reviews

Usefulness

4

from 8 reviews

Get Involved

Contribute your talents by joining your dream team and project. Visit the job board at Freelance DAO for opportunites today!

View Job Board