Risk-Aware Data Generation for SingularityNet Applications

chevron-icon
Back
project-presentation-img
Completed 👍
Kenric Nelson
Project Owner

Risk-Aware Data Generation for SingularityNet Applications

Funding Awarded

$115,000 USD

Expert Review
Star Filled Image Star Filled Image Star Filled Image Star Filled Image Star Filled Image 0
Community
Star Filled Image Star Filled Image Star Filled Image Star Filled Image Star Filled Image 0 (0)

Status

  • Overall Status

    🥳 Completed & Paid

  • Funding Transfered

    $115,000 USD

  • Max Funding Amount

    $115,000 USD

Funding Schedule

View Milestones
Milestone Release 1
$5,000 USD Transfer Complete TBD
Milestone Release 2
$500 USD Transfer Complete TBD
Milestone Release 3
$15,000 USD Transfer Complete TBD
Milestone Release 4
$16,000 USD Transfer Complete TBD
Milestone Release 5
$16,000 USD Transfer Complete TBD
Milestone Release 6
$15,000 USD Transfer Complete TBD
Milestone Release 7
$13,000 USD Transfer Complete TBD
Milestone Release 8
$10,000 USD Transfer Complete TBD
Milestone Release 9
$24,500 USD Transfer Complete TBD

Status Reports

Apr. 11, 2023

Status
😀 Excellent
Summary

The problems with execution of the Risk Assessment app on the SNET AI Marketplace have been resolved. The Risk-Aware Data Generator is now launched on the marketplace; however, its execution is currently having problems. We are working with the SNET team to debug those problems.

Full Report

Jan. 19, 2023

Status
🤔 We encountered some issues
Summary

The Data Generator project is on pause until the integration to the SingularityNET Publisher is resolved for the Risk Assessment project.

Full Report

Dec. 6, 2022

Status
🙂 Pretty good
Summary

During the month of November our projects got stalled regarding the integration with SingularityNET. Our approach to resolving this problem has been two-fold. We reached back to SNET and got a commitment from Serguey Shalyapin for support and a data scientist new hire has agreed to review the issues.

Full Report

Oct. 20, 2022

Status
😀 Excellent
Summary

Terrific. We have submitted our third report describing the performance capabilities and limitations of the Assessment and Data Generator services.

Full Report

Aug. 20, 2022

Status
🙂 Pretty good
Summary

Our team is on track. We delivered our first report in mid-July and plan to deliver our second report on August 12th.

Full Report

Video Updates

Photrek team meeting 8

15 March 2023

Photrek – Team meeting 7

15 March 2023

Photrek – Team meeting 3

15 March 2023

Photrek – Team meeting 1

15 March 2023

Photrek – Kickoff presentation

15 March 2023

Project AI Services

No Service Available

Overview

Photrek will provide a Coupled Variational Autoencoder as a service for the SingularityNet community. This algorithm will enable the learning of risk-aware models that can generate robust, accurate simulations and forecasts.

Proposal Description

Compnay Name

provides services to improve machine intelligence for complex systems. Photrek's current projects include environmental detection systems, decentralized governance, and development of risk-aware machine learning algorithms.

Problem Description

SingularityNet applications, such as financial trading, sustainability, and health and longevity, require training from voluminous, reliable datasets. Often, collecting relevant datasets suffers from incomplete gaps in the data. There is a need to fill the gaps with accurate simulations based on models that are aware of the applications' underlying risks.

Solution Description

Photrek will use a Coupled Variational Autoencoder (C-VAE) machine learning (ML) algorithm to create risk-aware datasets. The algorithm comprises three components: a variational autoencoder capable of learning probabilistic models; a Coupled Evidence Lower Bound (C-ELBO) that generalizes the log-likelihood and divergence metrics with a tunable risk tolerance; and a dynamic time-series model.

Project Benefits for SNET AI platform

The Photrek team will host on the SingularityNet marketplace the Coupled Variational Autoencoder algorithm.  The CVAE service will provide the SNET community with the ability to learn risk-aware probabilistic models. These models will have interpretable features, be able to generate datasets and classify objects, and be tunable to a desired degree of risk tolerance. The Photrek team held problem-identifying meetings with SingularityDOA, SingularityNet, and Rejuve. These discussions identified the ability to fill gaps in real-world data as a crucial issue in improving the pipeline of machine learning forecasting capability. For instance, SingularityDOA continuously collects market data, but often there are gaps in the data. The Rejuve team requires the integration of cohort datasets with differing degrees of completeness. And the sustainability team is seeking to ensure that the complexity of climate models does not lead to overfitting and thus over-confident forecasting. The Photrek Coupled VAE algorithm will provide simulated interpolation of these gaps that can be tuned based on the degree of risk desired for the model. 

 

Competitive Landscape

Within ML/AI, there are two principal technologies that have been used as data generators, the VAE and the Generative Adversarial Networks (GAN).  While the GAN technology has been highly successful on metrics of generator performance, like many deep learning methodologies the resulting networks are difficult to interpret. In contrast, the VAE produces a probabilistic model whose dimensions can be used to control semantic features.  To enhance the independence of the features, GoogleMind developed the beta-VAE, which strengthens the relative weight of the divergence of the probabilistic model from a prior with independent dimensions. While this technique has been successful in improving the “disentanglement” it is a trade-off with the reconstruction performance.  The coupled VAE provides a tunable improvement in both the latent probability divergence and the reconstruction. Rejuve’s Deborah Duong explained that she is currently using the beta-VAE technique, so this will serve as a good test case for Photrek to demonstrate its competitive advantage.

Marketing & Competition

Robust, accurate models that can be used to generate data and forecasts are crucial for a variety of markets. As a result of participating in the NSF Cornell I-Corps Customer Discovery in Summer 2021, we engaged over 30 industry leaders to determine “pain points” in business forecasting. A common theme was the need for addressing data sets with gaps. This proposal works to satisfy this demand. As one example, Photrek recently completed an analysis of the severe weather forecasting market. The global weather forecasting systems market is anticipated to grow to $4 - $10 B by 2028 at a Compound Annual Growth Rate (CAGR) of between 5% to 8%, according to Emergen of British Columbia and Grand View Research of San Francisco, respectively (Grand View Research, 2020; Emergen, 2021). Both marketing reports cite big data analytics and the development of Internet-of-Things (IoT) and AI capabilities as drivers of this growth. However, they indicate inconsistent and incomplete data along with the complexities of modeling and forecasting as limiting factors. 

To initiate our marketing efforts within the SingularityNET community, Photrek discussed modeling, data generation, and forecasting needs with three teams within the SingularityNET community, the Climate Sustainability team (Matt Ikle), the Singularity DOA team working on financial forecasting (Nejc Znidar), and the Rejuve on longevity (Deborah Duong). A common thread in these discussions was the challenges of training ML algorithms with sporadic datasets. In May, Photrek is scheduled to brief Ben Goertzel on the DC-VAE technology and we will be keen to leverage his guidance on market needs for robust models of complex data.

Needed Resources

The Photrek team expects to collaborate closely with the SingularityNet developers on the integration requirements. We have included in our costs an additional AI programmer to work with the team members identified. Once the integration is complete we will work closely with the SingularityNet marketing team to promote the application.

Long Description

Time series data often suffer from missing values, with data either missing completely at random (MCAR) or more critically,  missing not at random (MNAR). We use the generative capabilities of Variational Auto-Encoders (VAEs) to fill in (impute) these missing data. In particular, we will apply VAE technology-based techniques that have been developed through our research into Coupled Systems and implement emerging methods using dynamical systems approaches in VAEs.  

 

Figure 1: Basic Autoencoder

These ideas have grown out of research originally conducted in commercial and academic settings. Working with Constantino Tsallis, External Faculty fellow of the Santa Fe Institute, and Thistleton as consultants, Nelson, and his team demonstrated the use of a generalized entropy to measure hidden structure in complex signals 

(Marsh and Nelson, 2005), invented the generalized Box-Muller method to generate q-Gaussians (which are equivalent to the coupled-Gaussians described in the current work) 
(Thistleton, et al. 2007), and contributed to the understanding of the origins of long-range correlations.  Building upon this research, Nelson developed methods leading to improvements in probability forecasting algorithms. Nelson continued the development of these ideas while conducting fundamental research as a Research Professor at Boston University proving the role of NSC in modeling the statistics of nonlinear systems  (
Nelson et al. 2017), discovering new statistical estimators for heavy-tail distributions 
(Nelson, 2020), and prototyping the coupled VAE algorithm 

 A python library integrated with Google’s TensorFlow (and extendable to PyTorch) has been developed to support this work. This library includes Nonlinear Statistical Coupling (NSC) 

(Clements, et al.), a collection of functions for modeling the generalization of information theory to nonlinear complex systems in both Python and Mathematica, and Coupled-VAE (C-VAE) 
(Chen et al.), which contains the python software for learning robust models. The project supported under this solicitation will make these techniques available to the SingularityNet community.

 

Figure 2: A variational autoencoder (image from 

The Dynamic Coupled VAE (DC-VAE) will learn dynamic models of time series events with adjustable levels of risk tolerance. This technology builds on the Coupled VAE algorithm to combine deep learning, probabilistic programming, and complex systems theory to learn complex models (deep learning) while maintaining interpretability (probabilistic programming) and strengthening robustness against rare events (complex systems theory).  The work supported under this solicitation will complete offline prototype design and testing. We will propose a subsequent project which will integrate this capability into the SingularityNet framework.

We are confident that the DC-VAE approach will result in learning robust, accurate forecasts based upon studies we have completed on image processing tasks for the coupled-VAE algorithm. A VAE uses neural networks to encode and decode a probabilistic layer that stores an interpretable model. We have obtained improvements in the reconstruction of MNIST images as the coupling value of the negative Evidence Lower Bound (ELBO) is increased. This increases the cost of low-likelihood events and drives the training to learn a model that reduces the extremes of the reconstructed log-likelihood. The result is an improvement in the accuracy, measured by the geometric mean. This geometric mean is a translation of the information-theoretic measure of the average log-likelihood, and the robustness, a generalization of the information-theoretic metric and measured by the generalized mean. The decisiveness, measured by the arithmetic mean of the likelihoods, is sensitive to the best performing likelihoods and is closely related to the classification performance of a decision algorithm. 

The generalized information-theoretic tools used in this project have achieved success by highlighting inadequacies in existing forecasting methods (Nelson and Brooks, 2020). In prior work, Nelson uncovered similar issues of over-confidence (Tgavalekos et al. 2010) in otherwise highly sophisticated detection systems. 

 

Figure 3:  Reconstruction Histogram of Coupled-VAE from (Cao et al.)

 Shown are histograms of the probability likelihood of a reconstructed image. On the left is the standard VAE (=0) with corrupted shot noise input. On the right is the Coupled VAE (=0.1)   which shows many orders of magnitude improvements in the Robustness and Accuracy.  The Accuracy (blue line) is the geometric mean of the likelihoods and is a translation of the information-theoretic metric, the average log-likelihood. The Robustness (green, -2/3 generalized mean) and Decisiveness (red, arithmetic mean) are translations of a generalized information-theoretic metric and are sensitive to the worst and best likelihoods, respectively. 

AI Services

Proposal Video

Placeholder for Spotlight Day Pitch-presentations. Video's will be added by the DF team when available.

  • Total Milestones

    9

  • Total Budget

    $115,000 USD

  • Last Updated

    19 Feb 2024

Milestone 1 - Project start

Status
😀 Completed
Description

Signed Contract

Deliverables

Budget

$5,000 USD

Link URL

Milestone 2 - Pre-launch Prep

Status
😀 Completed
Description

CVAE Software & Presentation

Deliverables

Budget

$500 USD

Link URL

Milestone 3 - Design Reqs

Status
😀 Completed
Description

Integration & Dynamic Designs

Deliverables

Budget

$15,000 USD

Link URL

Milestone 4 - Prototypes

Status
😀 Completed
Description

Prototype Integ. & Dyn Code

Deliverables

Budget

$16,000 USD

Link URL

Milestone 5 - Testing

Status
😀 Completed
Description

Test Prototypes

Deliverables

Budget

$16,000 USD

Link URL

Milestone 6 - Customer Data

Status
😀 Completed
Description

Processing of Datasets

Deliverables

Budget

$15,000 USD

Link URL

Milestone 7 - Marketing Service

Status
😀 Completed
Description

Presentation & Whitepaper

Deliverables

Budget

$13,000 USD

Link URL

Milestone 8 - Final Report

Status
😀 Completed
Description

CVAE and DCVAE Results

Deliverables

Budget

$10,000 USD

Link URL

Milestone 9 - Hosting Costs

Status
😀 Completed
Description

Collaboration with SNET

Deliverables

Budget

$24,500 USD

Link URL

Join the Discussion (0)

Reviews & Rating

New reviews and ratings are disabled for Awarded Projects

Sort by

0 ratings

Summary

Overall Community

0

from 0 reviews
  • 5
    0
  • 4
    0
  • 3
    0
  • 2
    0
  • 1
    0

Feasibility

0

from 0 reviews

Viability

0

from 0 reviews

Desirabilty

0

from 0 reviews

Usefulness

0

from 0 reviews