Kevin R.C.

Project Owner

MAUQ: Convert any Model Point Prediction into a Probabilistic Prediction

Project for Round 3
Funding Awarded $72,000 USD
Funding Pools New projects
Milestones 6 / 11 Completed

Status

Overall Status
🛠️ In Progress
Funding Transfered
$26,066.68_USD
Max Funding Amount
$72,000_USD

Project Tags:
Algorithmic/technical

Funding Schedule

View Milestones

Milestone Release 1	$2,266.67 USD	Transfer Complete	12 Jan 2024
Milestone Release 2	$5,666.67 USD	Transfer Complete	03 Oct 2024
Milestone Release 3	$5,666.67 USD	Transfer Complete	21 Mar 2024
Milestone Release 4	$7,366.67 USD	Transfer Complete	29 Aug 2024
Milestone Release 5	$5,100 USD	Transfer Complete	29 Nov 2024
Milestone Release 6	$7,366.67 USD	Pending	TBD
Milestone Release 7	$5,100 USD	Pending	TBD
Milestone Release 8	$12,466.67 USD	Pending	TBD
Milestone Release 9	$6,800 USD	Pending	TBD
Milestone Release 10	$3,400 USD	Pending	TBD
Milestone Release 11	$10,800 USD	Pending	TBD

Overview

As the general manager of a retail store, you requested a sales forecast for widgets from your financial team. They projected a 12.5% month-over-month sales growth, doubling in 6 months. However, you're cautious due to the misleading nature of single-point forecasts that assume a deterministic future. These forecasts often omit potential downsides. Instead, you emphasize the value of probabilistic forecasts, which provide a range of values with associated confidence levels. For instance, the temperature forecast might be 20-30°C with 90% confidence. This approach acknowledges uncertainty and encourages critical thinking. Your proposed solution involves automatically converting single-point forecasts into probabilistic ones, promoting better decision-making by accounting for uncertainty.

Proposal Description

AI services (New or Existing)

Compnay Name

Temporai

Service Details

Model-Agnostic Uncertainty Quantification (MAUQ) service automatically provides a probabilistic prediction [interval or distribution] to any point prediction problem regardless of the underlying model [statistical or ML], problem type [regression or classification], data type [tabular or time-series], or scientific/industry domain. This allows users to be more uncertainty-aware of the risks and pitfalls of their respective problem spaces, as well as better evaluate their

confidence

in their underlying model's prediction.

The team behind MAUQ has also proposed and is currently implementing the SIBYL and Onbaord NeuralProphet services, both of which were awarded in DF2. Although a standalone service, MAUQ can also be bundled with SIBYL and NeuralProphet as a suite of core infrastructural AI services on the SNET platform, which together can benefit other AI services and end applications alike.

Problem Description

Suppose that you are a general manager of a retail store that sells widgets. You ask your financial team to provide a sales forecast of the widgets for the next half year. The team runs their historical sales, transaction, and customer acquisition numbers through their models and comes back to you with their forecast. They project that the sales will grow by an average of 12.5% month-over-month, and will essentially double in 6 months from today. You, as a shrewd manager, are naturally skeptical of these glowing forecasts and worry that this might be another case of

business overconfidence

with detrimental outcomes.

This above scenario above highlights the nefarious nature of single-point forecasts. Yet it is also the most common type of forecast. What is the temperature tomorrow afternoon? 25 degrees Celsius, according to your phone probably. What is the average price of Bitcoin by 2030? $221,510.19, or at least according to this

website

last time I checked, it increased by $26,060.03 year-after-year from 2023. Sure. The point is that single-point forecasts paint an illusion of a deterministic future for those who take it at face value without questioning it too much. Often a more optimistic and glowing future, leaving the decision-maker blinded by potential downsides and pitfalls. Hence the nefariousness.

Fortunately, there is a second type of forecast to consider: probabilistic forecasts. Rather than conveying just a point value, this type of forecast conveys a range of values at some probability of confidence. What is the temperature tomorrow afternoon? A range of 20-30 degrees Celsius with a 90% confidence. Why a large range? Maybe the afternoon is very cloudy and blocks the sun completely, or there are no clouds at all. What is the average price of Bitcoin by 2030? Range of $50,000-$500,000 with 75% confidence. Why such a vast range with little confidence? Maybe it is actually incredibly hard if not impossible to predict such a volatile asset seven years out, which is a couple of lifetimes in this space. For example, who knows what kind of crypto regulations and fiat monetary policies different nations will roll out by then, which all significantly impact Bitcoin price. Not to mention the state of cryptography (e.g., quantum computing disruption?), ASIC chips, electricity prices, scams/exploits, institutional adoption like ETF rollouts, and other altcoins. So many compounding and interacting variables to consider for such a long timeframe.

Because the forecast is framed probabilistically, the decision-maker becomes more uncertainty-aware and critical of the veracity of the forecasts themselves. They will naturally raise the essential follow-up questions and considerations based on the forecasts (as illustrated in the last paragraph), which can lead to more robust and better thought-out decisions.

Solution Description

Our solution is to automatically convert any point forecasting (which is already much more commonly used) into a probabilistic forecast, thereby removing any of the statistical and algorithmic legwork from the user.

Milestone & Budget

Here is the current milestones list and details:

Milestone 0 (M0)

Description: finalize milestone details and contract signing
Deliverables: finalize milestone details (including allocation of 40% API Calls across development milestones), sign contract with SNET, assemble team, and set up tooling for MAUQ development (including JIRA, GitHub, etc.)
Expected time to completion: 1-2 weeks
Budget: OPS $2,000 -> M0 Total $2,000

Milestone 1 (M1)

Description: MAUQ architectural design report
Deliverables: MAUQ architectural design report with the API schema, elaboration of conformal prediction and other UQ techniques, tools and libraries used, models and AI techniques used, evaluation metrics, and applications (comparable to SYBIL's
report

)
Expected time to completion: 2-3 weeks
Budget: R&D $5,000 -> M1 Total $5,000

Milestone 2 (M2)

Description: MAUQ v0.0: setup AWS and API gateway for MAUQ service
Deliverables: spin-up AWS cloud environment (i.e., EC2 instance, Lambda, or Elastic Beanstalk), set up API gateway in accordance to API schema, test hardcoded API calls and responses using Postman [Note: this is basically a skeleton version of MAUQ service without any of the UQ features]
Expected time to completion: 2-3 weeks
Budget: SWE $4,000 | PM $1,000 -> M2 Total $5,000

Milestone 3 (M3)

Description: MAUQ v0.1: Tabular Predictive Interval (PI) with Regression
Deliverables: research & develop UQ for tabular data inputs with PI probabilistic outputs with regression problems only, include PI width and miscoverage rate as evaluation metrics, and deploy v0.1 to the given AWS environment with API gateways available
Expected time to completion: 3-4 weeks
Budget: R&D $5,000 | SWE $1,000 | PM $500 -> M3 Total $6,500

Milestone 4 (M4)

Description: MAUQ v0.2: Tabular Predictive Interval with Classification
Deliverables: research & develop UQ for tabular data inputs with PI probabilistic outputs for classification problems, and deploy v0.2 to the given AWS environment with API gateways available
Expected time to completion: 2-3 weeks
Budget: R&D $3,000 | SWE $1,000 | PM $500 -> M4 Total $4,500

Milestone 5 (M5)

Description: MAUQ v0.3: Time-Series Predictive Interval with Regression & Classification
Deliverables: research & develop UQ for [stationary] time-series data inputs with PI probabilistic outputs for classification problems, and deploy v0.2 to the given AWS environment with API gateways available
Expected time to completion: 3-5 weeks
Budget: R&D $5,000 | SWE $1,000 | PM $500 -> M5 Total $6,500

Milestone 6 (M6)

Description: MAUQ v0.4: Conformal Predictive Intervals (CPD)
Deliverables: research & develop distributional probabilistic output using CPD for both tabular/time-series data types and for both regression/classification problems, include CRPS as evaluation metrics, and deploy v0.5 to the given AWS environment with API gateways available
Expected time to completion: 2-4 weeks
Budget: R&D $3,000 | SWE $1,000 | PM $500 -> M6 Total $4,500

Milestone 7 (M7)

Description: MAUQ v0.5: Improvements in Time-Series Predictive Interval (cutting-edge)
Deliverables: research & develop novel UQ for [non-stationary, more real-word] time-series data inputs data inputs for both PI/CPD probabilistic outputs and both regression/classification problems, develop and apply our own custom and cutting-edge technique, write a whitepaper about this technique and submit to a top AI conference (e.g., ICML 2024), and deploy v0.5 to the given AWS environment with API gateways available
Expected time to completion: 4-6 weeks
Budget: R&D $9,000 | SWE $1,000 | PM $1,000 -> M7 Total $11,000

Milestone 8 (M8)

Description: deploy SNET Platform onto as MAUQ v1
Deliverables: understand the SNET’s API schema, make final tweaks and release MAUQ to SNET Platform as v1, integrate MAUQ with SYBIL and other initial services, provide front-end of MAUQ on SNET Platform
Expected time to completion: 3-5 weeks
Budget: SWE $5,000 | OPS $1,000 -> M8 Total $6,000

Milestone * (M*)

Description: community report and market adoption
Deliverables: test/evaluate v1 forecasts with a couple of pilot services (i.e., SIBYL) and/or apps, write a final mini-report and/or whitepaper about the MAUQ architecture and pilots, and focus more on marketing/adoption
Expected time to completion: 3-4 weeks
Budget: PM $1,000 | OPS $2,000 -> M* Total $3,000

Milestones Total $54,000 | API Total $18,000 -> Grand Total $72,000

The expected total time to complete all milestones is around 7-9 months. Therefore, if this proposal gets approved in the middle of Q4 2023, then the expected completion time is sometime in Q3 2024.

Note: See Risk and Mitigation for further elaboration of the milestone breakdown.

Deliverables

Deliverables 1: finalize milestone details (including allocation of 40% API Calls across development milestones), sign contract with SNET, assemble team, and set up tooling for MAUQ development (including JIRA, GitHub, etc.)

Deliverables 2: MAUQ architectural design report with the API schema, elaboration of conformal prediction and other UQ techniques, tools and libraries used, models and AI techniques used, evaluation metrics, and applications (comparable to SYBIL's report)

Deliverables 3: spin-up AWS cloud environment (i.e., EC2 instance, Lambda, or Elastic Beanstalk), set up API gateway in accordance to API schema, test hardcoded API calls and responses using Postman [Note: this is basically a skeleton version of MAUQ service without any of the UQ features]

Deliverables 4: research & develop UQ for tabular data inputs with PI probabilistic outputs with regression problems only, include PI width and miscoverage rate as evaluation metrics, and deploy v0.1 to the given AWS environment with API gateways available

Deliverables 5: research & develop UQ for tabular data inputs with PI probabilistic outputs for classification problems, and deploy v0.2 to the given AWS environment with API gateways available.

Deliverables 6: research & develop UQ for [stationary] time-series data inputs with PI probabilistic outputs for classification problems, and deploy v0.2 to the given AWS environment with API gateways available.

Deliverables 7: research & develop distributional probabilistic output using CPD for both tabular/time-series data types and for both regression/classification problems, include CRPS as evaluation metrics, and deploy v0.5 to the given AWS environment with API gateways available.

Deliverables 8: research & develop novel UQ for [non-stationary, more real-word] time-series data inputs data inputs for both PI/CPD probabilistic outputs and both regression/classification problems, develop and apply our own custom and cutting-edge technique, write a whitepaper about this technique and submit to a top AI conference (e.g., ICML 2024), and deploy v0.5 to the given AWS environment with API gateways available.

Deliverables 9: understand the SNET’s API schema, make final tweaks and release MAUQ to SNET Platform as v1, integrate MAUQ with SYBIL and other initial services, provide front-end of MAUQ on SNET Platform.

Deliverables 10: test/evaluate v1 forecasts with a couple of pilot services (i.e., SIBYL) and/or apps, write a final mini-report and/or whitepaper about the MAUQ architecture and pilots, and focus more on marketing/adoption.

Revenue Sharing

We will adhere to the API Calls “user-friendly” template for revenue sharing. If this service crosses the threshold of $1,000 in monthly revenue, then 10% of the additional revenue over $1,000 will be fed back into the SNET and Deep Funding wallets.

Marketing & Competition

With our high-level vision/mission laid out, here is our marketing and adoption for MAUQ, on the following two levels:

MAUQ as part of the AI core infrastructure bundle: In the SYBIL
proposal

and

design report

, we have mentioned a variety of forecasting applications, from climate and energy to retail and crypto. While we adopt SYBIL for these applications, we can also bundle MAUQ to SYBIL as a value-augmenter, which helps the application user to be more uncertainty-aware and not take the point forecast immediately at face value.
MAUQ as a standalone service: Aside from bundling with SYBIL, MAUQ is also a standalone service that can work for non-time-series, non-forecasting applications as well. Therefore, we also aim to promote MAUQ on its own merits. This is where we have our advisor Valeriy M., who is a prominent AI researcher & data scientist in the fields of uncertainty quantification (UQ) and conformal prediction. He has a popular GitHub repo
Awesome Conformal Prediction

, which contains

a curated list of Conformal Prediction materials. As of this writing, this repo received 2,500+ stars, indicating a rapidly growing interest in this UQ space from fellow AI practitioners and developers, who are also MAUQ's primary target user base. In addition, we have Mahdi T.R. on our team who is also a YouTube content creator with a channel called

MLBoost

. He has posted multiple introductory videos around Conformal Prediction, with a couple of videos garnering 1,000+ views each. Therefore, we have a couple of outreach channels to promote our MAUQ service outside of the SNET/crypto ecosystem and orientate toward the greater AI/data science community.

Long Description

Company Name

Temporai

Summary

confidence

in their underlying model's prediction.

Funding Amount

$72,000

The Problem to be Solved

business overconfidence

with detrimental outcomes.

website

Our Solution

API Schema

For simplicity's sake, MAUQ has only one service with one primary API function for users to call:

1. Quantify Uncertainty: user inputs a calibration dataset and config parameters for MAUQ to fit the uncertainty measures, and outputs that dataset with the uncertainty measures back to the user in JSON or another compatible file format.

See the attached MAUQ API Schema.png for an enlarged picture.

Two Types of Uncertainty Quantifications (UQ)

1. Prediction Interval (PI): provide a range of predicted values between the upper and lower quantiles, given by the specified confidence level or inversely the error rate. The higher the confidence level, the wider the interval, or more distance between the quantiles.

1. Distributional: a continuous set of all predicted values, with probabilities > 0%. The predicted values (X-axis) and their probabilities form a probability density function (PDF), as shown below. The PDF allows you to get the model's probability over or under any given value, or between two values a and b.

What Problem Cases MAUQ Covers

MAUQ covers the following 3-dimensions (3D) of problem cases:

X: whether the data type is tabular (
exchangeable

or

i.i.d.

) or time-series (

non-exchangeable

)
Y: whether the problem type is regression (predict continuous values) or classification (predicts categories)
Z: whether the probabilistic output (or UQ type) is a PI or a distribution

Evaluation Metrics

Here are metrics that evaluate the quality and effectiveness of the UQ method that is used:

Prediction Interval (PI) width: the distance between the lower and upper quantiles, used for PI outputs
Continuous Ranked Probability Score (CRPS): compares a single ground truth value to a cumulative distribution function (CDF), also a generalization of Mean Absolute Error (MAE) metric for distributional outputs
Miscoverage rate: percentage of true value that falls outside of a range for a given confidence level, also 1 - Coverage Rate, used for both PI and distributional outputs

These evaluation metrics are derived from the final test set, which should neither be used to train the underlying model nor quantify its uncertainty. In addition, for all of these metrics, the lower the value, the better the performance.

A Concrete Example

As a concrete example,

here

we have an hourly univariate time-series of the electricity consumption of a San Francesco (SF) hospital in 2015. Below is the plot of the electricity consumption from the beginning of January to the beginning of May in kilowatts (kW). You can see it has a daily seasonality hovering between 800 Wh to 1,350 kW because the hospital is busiest during the midday and the least at midnight.

Next, I partitioned this annual data into train, calibration, and test sets. Here are the approximate time ranges of the 3 datasets:

Train set: January-October 2015
Calibration set: November 2015
Test set: December 2015

I trained a NeuralProphet model with autoregressive (AR) lags on the train set and forecasted it on the test set with a single timestep forecast horizon. Here is the plot of the last week of the test set (or the last week of 2015 during the winter holiday). The blue line and blue crosses are the model-predicted values while the block dots are the actual energy consumption values:

You can see that overall the model prediction hugs pretty closely to the actual values, but it does underestimate the actual value a bit at peak consumption. Here is the tabular data output of this same data for the first half of Christmas Day:

Here is the plot again, but this time including a red prediction interval (PI) with 90th percent confidence. In other words, at least 90% of the actual values should fall within the red PI. The uncertainty quantification that produces this interval is a basic split conformal prediction method, using the absolute residual from the holdout calibration set:

Here, you can see almost all of the actual values falling within the red PI, including the peak values the NeuralProphet model underestimates. Below is the same tabular data output, but this time including the uncertainty quantification metrics, including the yhat1 +- qhat1 that are the red PIs:

You can find this full example in the attached uncertainty_conformal_prediction.ipynb Jupyter Notebook file below. The MAUQ service will wrap up this conformal prediction code (and more) to automatically provide these uncertainty quantification metrics (and more) to the existing model point predictions (y).

Note: If the user opts for a distributional output instead of a PI output, then the user will see distributional measures like variance, degree-of-freedom (DoF), and/or skewness instead of upper/lower PI bounds like yhat1 +- qhat1. This depends on the distributional types available in the MAUQ service, as stated in the Two Types of Uncertainty Quantifications (UQ) section.

Marketing Strategy

Before laying out our current marketing strategy, we like to first convey our AGI Vision and SNET Mission, as they are the ethos of our marketing approach:

Our AGI Vision

We envision AGI (at least in its early form) to be a swarm (or mesh network) of relatively bite-sized narrow intelligence. Just like a network of independent miners/validators forming a decentralized blockchain, or layers of information-encoding neurons forming a deep neural network (DNN). The rationale behind this vision is that the vast datasets from various domains or industries are simply too heterogeneous for any singular

god

AI model to

interpret

, as stated in the

“no free lunch”

theorem. This is no matter how many billions or trillions of parameters this god model contains, or no matter how deep is its logic tree or knowledge graph.

Instead, these bite-sized narrow intelligence will specialize (or

fit

) in their respective data/domain

clusters

, as it is how AI is progressing now. How today's AI can leap into AGI v1.0 is whether these narrow intelligence can then trade their

"comparative advantages"

with each other. A mutually beneficiary exchange that can augment each narrow intelligence's own capabilities with the other. Like trading in the real world, this trade between AIs likely involved a monetary transaction. This is where crypto, due to its decentralized nature, can become the means of transaction between untrusted narrow intelligence, particularly from different entities.

Here is an excellent

blog post

by IntoTheBlock that touches further on Decentralized AI, and how blockchain can turn narrow AI (when encapsulated into applications or services) into economic agents. Another informative

post

by Francesco C. breaks down the different subgenres of Distributed Artificial Intelligence (DAI), from swarm intelligence to multi-agent system (MAS).

Our SNET Mission

Our mission is to build this swarm of core infrastructural AI services for the SNET platform, for the greater benefit of other AI services on the platform as well as other end-user applications by augmenting their capabilities.

Our team is currently developing the following two AI services that were awarded in Deep Funding Round 2 (DF2):

SIBYL: An automated, general-purpose forecaster that is made up of an ensemble of models. See project
link

.
Onboard NeuralProphet: A
NeuralProphet

-based forecaster that also serves as a base model for SIBYL. See project

link

.

In addition to NeuralProphet, we are also considering onboarding SNET's existing Time-Series Forecasting service into SYBIL as a base model, depending on its performance. See service

link

In the same vein, SIBYL will call on this MAUQ service not as another base model, but rather to convert SIBYL's point forecasts into probabilistic forecasts before outputting them to the user. Onboard NeuralProphet service (by itself) can also call on MAUQ to get its probabilistic forecasts, as illustrated in Our Solution - An Example section.

Finally, SIBYL can be called by downstream Apps, Agents, and other Services that need its forecasting capabilities. One example of service is the future Soil Health Ecosystem Modeling Tools. See our separate Soil Health ideation proposal

here

in DF3 - Pool Ideation for more details about the ecosystem modeling tools.

The combination of Onboard NeuralProphet, SIBYL, and MAUQ form our starting core infrastructure AI services. They are all designed to be domain-agnostic and plug-and-playable so that they can be widely used by other Apps and Services that will be populating the SNET platform, including each other. Of course, there will be a monetary transaction in AGIX tokens (e.g., 0.012 AGIX for SNET's TSF as demo) to call these services, so these services will become economically incentivized, or at least self-sufficient enough to offset their respective computing costs.

Here is a visual graphic of how these AIs are connected to each other, forming a collective/swarm intelligence on the SNET platform:

Marketing and Adoption Approach for MAUQ

With our high-level vision/mission laid out, here is our marketing and adoption for MAUQ, on the following two levels:

MAUQ as part of the AI core infrastructure bundle: In the SYBIL
proposal

and

design report

, we have mentioned a variety of forecasting applications, from climate and energy to retail and crypto. While we adopt SYBIL for these applications, we can also bundle MAUQ to SYBIL as a value-augmenter, which helps the application user to be more uncertainty-aware and not take the point forecast immediately at face value.
MAUQ as a standalone service: Aside from bundling with SYBIL, MAUQ is also a standalone service that can work for non-time-series, non-forecasting applications as well. Therefore, we also aim to promote MAUQ on its own merits. This is where we have our advisor Valeriy M., who is a prominent AI researcher & data scientist in the fields of uncertainty quantification (UQ) and conformal prediction. He has a popular GitHub repo
Awesome Conformal Prediction

, which contains

a curated list of Conformal Prediction materials. As of this writing, this repo received 2,500+ stars, indicating a rapidly growing interest in this UQ space from fellow AI practitioners and developers, who are also MAUQ's primary target user base. In addition, we have Mahdi T.R. on our team who is also a YouTube content creator with a channel called

MLBoost

. He has posted multiple introductory videos around Conformal Prediction, with a couple of videos garnering 1,000+ views each. Therefore, we have a couple of outreach channels to promote our MAUQ service outside of the SNET/crypto ecosystem and orientate toward the greater AI/data science community.

Our Project Milestones and Cost Breakdown

Cost breakdown

The proposed budget for SIBYL is $72,000. Here is the overall cost breakdown:

Modeling and AI R&D ("R&D"): $30,000
Software Engineering and Integration ("SWE"): $14,000
Service Product Management ("PM"): $6,000
Company Operations, Marketing, and Buffer ("OPS"): $4,000
Service Host and API Calls ("API"): $18,000

Service Host and API Calls meet the 25% reservation rule.

Milestones

Here is the current milestones list and details:

Milestone 0 (M0)

Description: finalize milestone details and contract signing
Deliverables: finalize milestone details (including allocation of 40% API Calls across development milestones), sign contract with SNET, assemble team, and set up tooling for MAUQ development (including JIRA, GitHub, etc.)
Expected time to completion: 1-2 weeks
Budget: OPS $2,000 -> M0 Total $2,000

Milestone 1 (M1)

Description: MAUQ architectural design report
Deliverables: MAUQ architectural design report with the API schema, elaboration of conformal prediction and other UQ techniques, tools and libraries used, models and AI techniques used, evaluation metrics, and applications (comparable to SYBIL's
report

)
Expected time to completion: 2-3 weeks
Budget: R&D $5,000 -> M1 Total $5,000

Milestone 2 (M2)

Description: MAUQ v0.0: setup AWS and API gateway for MAUQ service
Deliverables: spin-up AWS cloud environment (i.e., EC2 instance, Lambda, or Elastic Beanstalk), set up API gateway in accordance to API schema, test hardcoded API calls and responses using Postman [Note: this is basically a skeleton version of MAUQ service without any of the UQ features]
Expected time to completion: 2-3 weeks
Budget: SWE $4,000 | PM $1,000 -> M2 Total $5,000

Milestone 3 (M3)

Description: MAUQ v0.1: Tabular Predictive Interval (PI) with Regression
Deliverables: research & develop UQ for tabular data inputs with PI probabilistic outputs with regression problems only, include PI width and miscoverage rate as evaluation metrics, and deploy v0.1 to the given AWS environment with API gateways available
Expected time to completion: 3-4 weeks
Budget: R&D $5,000 | SWE $1,000 | PM $500 -> M3 Total $6,500

Milestone 4 (M4)

Description: MAUQ v0.2: Tabular Predictive Interval with Classification
Deliverables: research & develop UQ for tabular data inputs with PI probabilistic outputs for classification problems, and deploy v0.2 to the given AWS environment with API gateways available
Expected time to completion: 2-3 weeks
Budget: R&D $3,000 | SWE $1,000 | PM $500 -> M4 Total $4,500

Milestone 5 (M5)

Description: MAUQ v0.3: Time-Series Predictive Interval with Regression & Classification
Deliverables: research & develop UQ for [stationary] time-series data inputs with PI probabilistic outputs for classification problems, and deploy v0.2 to the given AWS environment with API gateways available
Expected time to completion: 3-5 weeks
Budget: R&D $5,000 | SWE $1,000 | PM $500 -> M5 Total $6,500

Milestone 6 (M6)

Description: MAUQ v0.4: Conformal Predictive Intervals (CPD)
Deliverables: research & develop distributional probabilistic output using CPD for both tabular/time-series data types and for both regression/classification problems, include CRPS as evaluation metrics, and deploy v0.5 to the given AWS environment with API gateways available
Expected time to completion: 2-4 weeks
Budget: R&D $3,000 | SWE $1,000 | PM $500 -> M6 Total $4,500

Milestone 7 (M7)

Description: MAUQ v0.5: Improvements in Time-Series Predictive Interval (cutting-edge)
Deliverables: research & develop novel UQ for [non-stationary, more real-word] time-series data inputs data inputs for both PI/CPD probabilistic outputs and both regression/classification problems, develop and apply our own custom and cutting-edge technique, write a whitepaper about this technique and submit to a top AI conference (e.g., ICML 2024), and deploy v0.5 to the given AWS environment with API gateways available
Expected time to completion: 4-6 weeks
Budget: R&D $9,000 | SWE $1,000 | PM $1,000 -> M7 Total $11,000

Milestone 8 (M8)

Description: deploy SNET Platform onto as MAUQ v1
Deliverables: understand the SNET’s API schema, make final tweaks and release MAUQ to SNET Platform as v1, integrate MAUQ with SYBIL and other initial services, provide front-end of MAUQ on SNET Platform
Expected time to completion: 3-5 weeks
Budget: SWE $5,000 | OPS $1,000 -> M8 Total $6,000

Milestone * (M*)

Description: community report and market adoption
Deliverables: test/evaluate v1 forecasts with a couple of pilot services (i.e., SIBYL) and/or apps, write a final mini-report and/or whitepaper about the MAUQ architecture and pilots, and focus more on marketing/adoption
Expected time to completion: 3-4 weeks
Budget: PM $1,000 | OPS $2,000 -> M* Total $3,000

Milestones Total $54,000 | API Total $18,000 -> Grand Total $72,000

Note: See Risk and Mitigation for further elaboration of the milestone breakdown.

Risk and Mitigation

While we do have the utmost confidence in our own capabilities and dedication towards the MAUQ project as well as our swarm AI vision/mission, we can fully guarantee completing this project all the way through. Here is a list of project execution risks (in bold) as well as the mitigation steps we have taken (or plan to take).

Running out of funding between milestones, especially since we are compensated only after the completion of each milestone. Migitation: That is why we established many milestones, but each is bite-sized and less than 10% of the overall budget. The only exception here is M7 due to the expected heavy R&D load. This will better spread the funding disbursement throughout the project lifecycle. For example, we provided a milestone M0 for the SNET contract signing, which gives us a buffer to begin designing and implementing MAUQ. Additionally, we have created separate milestones for more engineering-heavy tasks, such as spinning up the MAUQ skeleton API for the first time in M2, or deploying our service onto the SNET platform in M8. In the worst-case scenario, Temporai will use its own money to fill in any remaining cost gaps.
Scope of the MAUQ is too broad and complex. Migitation: We have done extensive research on the feasibility of UQ and conformal prediction space before putting MAUQ service out as this proposal. There are ample open-sourced Python libraries we can use from
MAPIE

to AWS

Fortuna

. Applying UQ to both regression and classification problems, for PI and distributional output types, as well as for exchangeable tabular problems, are already well-established practices, as explained in this excellent conformal prediction intro

whitepaper

. The major tricky part is applying UQ to non-exchangeable time-series problems, which happens to be our area of specialization. See the attached ISF 2023 Presentation for more details. This is why we budgeted additional funding for AI R&D to figure out what UQ methods work best for time-series, including our own custom ones.
Final MAUQ service implementation diverges our proposal. Migitation: It is true that in engineering, it is easy for the final product to be drastically different from the original specs. Therefore, for MAUQ, we will be transparent in our milestone deliverables as we have done for SIBYL. This includes publicizing API usage example guides as well as presenting Town Hall Break-Out rooms. That way, the community will continually stay up-to-date on our development progress and be aware of any changes or pivots that may potentially occur.
Wait, aren't you guys already working on SIBYL? Won't you get distracted by multiple projects? Mitigation: SIBYL is expected to be completed by EOY 2023, or at least the core implementation part of it (milestones M1-M7). If MAUQ gets awarded in October, then we plan to start in November this year. So there will probably be a couple of months of overlap. Our SIBYL team is using the method of task parallelism, where we can work on tasks from various milestones simultaneously. For example, while the engineers are finishing up the current milestone, our AI modelers can already start laying the groundwork for the next few milestones. We can also apply this method across different projects/services, like what we are doing for SIBYL and Onboard NeuralProphet. The worst-case scenario is that we can move the MAUQ start time back to sometime in Q1 2024 if SYBIL becomes too involved for us to handle multiple services.
Difficulty in deploying our service to the SNET Platform. Mitigation: Deploying onto the SNET platform is a non-trivial task, and it has been affecting all projects not just ours. We are in close touch with the SNET team about getting the latest requirements and specs of how to deploy to SNET. Additionally, we are reaching out to teams from earlier DF rounds (DF1) for their guidance.

Voluntary Revenue

Open Source

Yes eventually. It will be under the GNU General Public License (GPL) v3.0 License.

Our Team

Kevin R.C. - Service Lead, Modeling and AI R&D

Senior Data Scientist / AI Researcher
DF2 2x Awardee (
SYBIL

and

Onboard NeuralProphet

)
Former Lead Core Developer and Maintainer of NeuralProphet
8+ years in the finance and crypto domains
3+ years of Python PyPI open-source experience (including
dnntime

,

nsc

, and

neuralprophet

)
Speaker & session chair at 2023 International Symposium on Forecasting (ISF) Conference [
link]
LinkedIn
link
GitHub
link

Mahdi T.R. - Data Scientist, Modeling and AI R&D

Data/computational scientist, engineer, mentor, and YouTube content creator (
MLBoost

)
10+ years of experience in developing mathematical and machine-learning models for modeling complex physical phenomena
Developed a code simulating microgravity solidification experiments for the NASA-ESA sponsored CETSOL project [
link

]
Author of papers in top Physics journals (
PRL

,

Nature

) and ML conferences (

NeurIPS

)
Ph.D. in Mechanical & Industrial Engineering @ University of Iowa
LinkedIn
link
GitHub
link
YouTube
link
Bio
link

Francesco B. - Software Engineer, AI Architecture

AI algorithmic trader and developer
Numerai Top 100
Former Management Consultant specialized in Operations
MBA @ MIT Sloan School of Management
LinkedIn
link

Joseph K. - Software Engineer, Infrastructure and Deployment

Senior Software Engineer at NASA Goddard Space Flight Center (GSFC)
5+ years of experience in AWS cloud development
10+ years of experience in web application development

Tianhao G. - Graduate Student Researcher, Modeling and AI R&D

M.A. in Statistics @ Columbia University
LinkedIn
link

Advisor

Valeriy M. - Probabilistic ML and Conformal Prediction Advisor

Principal Data Scientist
Creator of
Awesome Conformal Prediction

(a curated list of Conformal Prediction materials)
Ph.D. in Computer Science (Probabilistic ML) @ Royal Holloway, University of London
LinkedIn
link
GitHub
link

Activity Summary

Milestones

11

Total

Discussion

0

Total Comments

Reviews

0

Total Posted

Project Team

0

Total People

Proposal Video

MAUQ: Convert any Model Point Prediction in Probabilistic Prediction - #DeepFunding IdeaFest Round 3

26 September 2023

Reviews & Rating

New reviews and ratings are disabled for Awarded Projects

Overall Community

0

from 0 reviews

5

0
4

0
3

0
2

0
1

0

Feasibility

0

from 0 reviews

Viability

0

from 0 reviews

Desirabilty

0

from 0 reviews

Usefulness

0

from 0 reviews

Sort by

0 ratings

No Reviews Avaliable

Check back later by refreshing the page.

Posting publicaly as

edit profile

Overall

Confidence level that the project is possible at all Feasibility

0
Confidence in a successful outcome considering team, time, and budget Viability

0
Market fit - Balancing needs and benefits against competition Desirabilty

0
To what extent will the project help the AI platform grow? Usefulness

0

Review Headline

0 /50 chars

Review Summary

0 /5000 chars

Expert Review

Overall

Confidence level that the project is possible at all Feasibility

0
Confidence in a successful outcome considering team, time, and budget Viability

0
Market fit - Balancing needs and benefits against competition Desirabilty

0
To what extent will the project help the AI platform grow? Usefulness

0

Review Headline

0 /50 chars

Review Summary

0 /5000 chars

Reviews and Ratings in Deep Funding are structured in 4 categories. This will ensure that the reviewer takes all these perspectives into account in their assessment and it will make it easier to compare different projects on their strengths and weaknesses. Overall (Primary) This is an average of the 4 perspectives. At the start of this new process, we are assigning an equal weight to all categories, but over time we might change this and make some categories more important than others in the overall score. (This may even be done retroactively). Feasibility (secondary) This represents the user's assessment of whether the proposed project is theoretically possible and if it is deemed feasible. E.g. A proposal for nuclear fission might be theoretically possible, but it doesn’t look very feasible in the context of Deep Funding. Viability (secondary) This category is somewhat similar to Feasibility, but it interprets the feasibility against factors such as the size and experience of the team, the budget requested, and the estimated timelines. We could frame this as: “What is your level of confidence that this team will be able to complete this project and its milestones in a reasonable time, and successfully deploy it?” Examples:

A proposal that promises the development of a personal assistant that outperforms existing solutions might be feasible, but if there is no AI expertise in the team the viability rating might be low.
A proposal that promises a new Carbon Emission Compensation scheme might be technically feasible, but the viability could be estimated low due to challenges around market penetration and widespread adoption.

Desirability (secondary) Even if the project team succeeds in creating a product, there is the question of market fit. Is this a project that fulfills an actual need? Is there a lot of competition already? Are the USPs of the project sufficient to make a difference? Example:

Creating a translation service from, say Spanish to English might be possible, but it's questionable if such a service would be able to get a significant share of the market

Usefulness (secondary) This is a crucial category that aligns with the main goal of the Deep Funding program. The question to be asked here is: “To what extent will this proposal help to grow the Decentralized AI Platform?” For proposals that develop or utilize an AI service on the platform, the question could be “How many API calls do we expect it to generate” (and how important / high-valued are these calls?). For a marketing proposal, the question could be “How large and well-aligned is the target audience?” Another question is related to how the budget is spent. Are the funds mainly used for value creation for the platform or on other things? Examples:

A metaverse project that spends 95% of its budget on the development of the game and only 5 % on the development of an AI service for the platform might expect a low ‘usefulness’ rating here.

A marketing proposal that creates t-shirts for a local high school, would get a lower ‘usefulness’ rating than a marketing proposal that has a viable plan for targeting highly esteemed universities in a scaleable way.
An AI service that is fully dedicated to a single product, does not take advantage of the purpose of the platform. When the same service would be offered and useful for other parties, this should increase the ‘usefulness’ rating.

Reviews and Ratings in Deep Funding are structured in 4 categories. This will ensure that the reviewer takes all these perspectives into account in their assessment and it will make it easier to compare different projects on their strengths and weaknesses. Overall (Primary) This is an average of the 4 perspectives. At the start of this new process, we are assigning an equal weight to all categories, but over time we might change this and make some categories more important than others in the overall score. (This may even be done retroactively). Feasibility (secondary) This represents the user\'s assessment of whether the proposed project is theoretically possible and if it is deemed feasible. E.g. A proposal for nuclear fission might be theoretically possible, but it doesn’t look very feasible in the context of Deep Funding. Viability (secondary) This category is somewhat similar to Feasibility, but it interprets the feasibility against factors such as the size and experience of the team, the budget requested, and the estimated timelines. We could frame this as: “What is your level of confidence that this team will be able to complete this project and its milestones in a reasonable time, and successfully deploy it?” Examples:

A proposal that promises the development of a personal assistant that outperforms existing solutions might be feasible, but if there is no AI expertise in the team the viability rating might be low.
A proposal that promises a new Carbon Emission Compensation scheme might be technically feasible, but the viability could be estimated low due to challenges around market penetration and widespread adoption.

Creating a translation service from, say Spanish to English might be possible, but it\'s questionable if such a service would be able to get a significant share of the market

A metaverse project that spends 95% of its budget on the development of the game and only 5 % on the development of an AI service for the platform might expect a low ‘usefulness’ rating here.

A marketing proposal that creates t-shirts for a local high school, would get a lower ‘usefulness’ rating than a marketing proposal that has a viable plan for targeting highly esteemed universities in a scaleable way.
An AI service that is fully dedicated to a single product, does not take advantage of the purpose of the platform. When the same service would be offered and useful for other parties, this should increase the ‘usefulness’ rating.

Total Milestones
11
Total Budget
$72,000_USD
Last Updated
27 May 2025

Milestone 1 -

Status

😀 Completed

Description

Finalize milestone details (including allocation of 40% API Calls across development milestones), sign contract with SNET, assemble team, and set up tooling for MAUQ development (including JIRA, GitHub, etc.)

Deliverables

Spread out 40% of total Hosting/API for development phase across milestones 1-10, proportional to each milestone’s budget. Therefore, reduced API milestone 11 by 40%.

Budget

$2,266.67 USD

Link URL

Link 1 Reviewer's report

Milestone 2 -

Status

😀 Completed

Description

MAUQ architectural design report with the API schema, elaboration of conformal prediction and other UQ techniques, tools and libraries used, models and AI techniques used, evaluation metrics, and applications (comparable to SYBIL's report)

Link URL

Milestone 11 - API/hosting

Status

😐 Not Started

Description

API (60% for Hosting/API for production and maintanence purposes)

Deliverables

Budget

$10,800 USD

Link URL

Join the Discussion (0)

Expert Ratings

Reviews & Ratings

New reviews and ratings are disabled for Awarded Projects

No Reviews Avaliable

Check back later by refreshing the page.

The weighted average of the 4 perspectives Overall

0.0
To what extent will the project help the AI platform grow?Usefulness

0.0
Confidence level that the project is possible at allFeasibility

0.0
Market fit - Balancing needs and benefits against competitionDesirabilty

0.0

Review Headline

0 /50 chars

Review Summary

0 /5000 chars

The weighted average of the 4 perspectives Overall

0.0
To what extent will the project help the AI platform grow?Usefulness

0.0
Confidence level that the project is possible at allFeasibility

0.0
Market fit - Balancing needs and benefits against competitionDesirabilty

0.0

Review Headline

0 /50 chars

0 /5000 chars

Warning: Adding final group rating for this project will prevent expert users from adding new or editing existing reviews

A proposal that promises the development of a personal assistant that outperforms existing solutions might be feasible, but if there is no AI expertise in the team the viability rating might be low.
A proposal that promises a new Carbon Emission Compensation scheme might be technically feasible, but the viability could be estimated low due to challenges around market penetration and widespread adoption.

Creating a translation service from, say Spanish to English might be possible, but it's questionable if such a service would be able to get a significant share of the market

A metaverse project that spends 95% of its budget on the development of the game and only 5 % on the development of an AI service for the platform might expect a low ‘usefulness’ rating here.

A marketing proposal that creates t-shirts for a local high school, would get a lower ‘usefulness’ rating than a marketing proposal that has a viable plan for targeting highly esteemed universities in a scaleable way.
An AI service that is fully dedicated to a single product, does not take advantage of the purpose of the platform. When the same service would be offered and useful for other parties, this should increase the ‘usefulness’ rating.

Reviews and Ratings in Deep Funding are structured in 4 categories. This will ensure that the reviewer takes all these perspectives into account in their assessment and it will make it easier to compare different projects on their strengths and weaknesses. Overall (Primary) This is an average of the 4 perspectives. At the start of this new process, we are assigning an equal weight to all categories, but over time we might change this and make some categories more important than others in the overall score. (This may even be done retroactively). Feasibility (secondary) This represents the user\'s assessment of whether the proposed project is theoretically possible and if it is deemed feasible. E.g. A proposal for nuclear fission might be theoretically possible, but it doesn’t look very feasible in the context of Deep Funding. Viability (secondary) This category is somewhat similar to Feasibility, but it interprets the feasibility against factors such as the size and experience of the team, the budget requested, and the estimated timelines. We could frame this as: “What is your level of confidence that this team will be able to complete this project and its milestones in a reasonable time, and successfully deploy it?” Examples:

A proposal that promises the development of a personal assistant that outperforms existing solutions might be feasible, but if there is no AI expertise in the team the viability rating might be low.
A proposal that promises a new Carbon Emission Compensation scheme might be technically feasible, but the viability could be estimated low due to challenges around market penetration and widespread adoption.

Creating a translation service from, say Spanish to English might be possible, but it\'s questionable if such a service would be able to get a significant share of the market

A metaverse project that spends 95% of its budget on the development of the game and only 5 % on the development of an AI service for the platform might expect a low ‘usefulness’ rating here.

A marketing proposal that creates t-shirts for a local high school, would get a lower ‘usefulness’ rating than a marketing proposal that has a viable plan for targeting highly esteemed universities in a scaleable way.
An AI service that is fully dedicated to a single product, does not take advantage of the purpose of the platform. When the same service would be offered and useful for other parties, this should increase the ‘usefulness’ rating.