Long Description
Company Name
Temporai
Summary
Model-Agnostic Uncertainty Quantification (MAUQ) service automatically provides a probabilistic prediction [interval or distribution] to any point prediction problem regardless of the underlying model [statistical or ML], problem type [regression or classification], data type [tabular or time-series], or scientific/industry domain. This allows users to be more uncertainty-aware of the risks and pitfalls of their respective problem spaces, as well as better evaluate their
in their underlying model's prediction.
The team behind MAUQ has also proposed and is currently implementing the SIBYL and Onbaord NeuralProphet services, both of which were awarded in DF2. Although a standalone service, MAUQ can also be bundled with SIBYL and NeuralProphet as a suite of core infrastructural AI services on the SNET platform, which together can benefit other AI services and end applications alike.
Funding Amount
$72,000
The Problem to be Solved
Suppose that you are a general manager of a retail store that sells widgets. You ask your financial team to provide a sales forecast of the widgets for the next half year. The team runs their historical sales, transaction, and customer acquisition numbers through their models and comes back to you with their forecast. They project that the sales will grow by an average of 12.5% month-over-month, and will essentially double in 6 months from today. You, as a shrewd manager, are naturally skeptical of these glowing forecasts and worry that this might be another case of
with detrimental outcomes.
This above scenario above highlights the nefarious nature of single-point forecasts. Yet it is also the most common type of forecast. What is the temperature tomorrow afternoon? 25 degrees Celsius, according to your phone probably. What is the average price of Bitcoin by 2030? $221,510.19, or at least according to this
last time I checked, it increased by $26,060.03 year-after-year from 2023. Sure. The point is that single-point forecasts paint an illusion of a deterministic future for those who take it at face value without questioning it too much. Often a more optimistic and glowing future, leaving the decision-maker blinded by potential downsides and pitfalls. Hence the nefariousness.
Fortunately, there is a second type of forecast to consider: probabilistic forecasts. Rather than conveying just a point value, this type of forecast conveys a range of values at some probability of confidence. What is the temperature tomorrow afternoon? A range of 20-30 degrees Celsius with a 90% confidence. Why a large range? Maybe the afternoon is very cloudy and blocks the sun completely, or there are no clouds at all. What is the average price of Bitcoin by 2030? Range of $50,000-$500,000 with 75% confidence. Why such a vast range with little confidence? Maybe it is actually incredibly hard if not impossible to predict such a volatile asset seven years out, which is a couple of lifetimes in this space. For example, who knows what kind of crypto regulations and fiat monetary policies different nations will roll out by then, which all significantly impact Bitcoin price. Not to mention the state of cryptography (e.g., quantum computing disruption?), ASIC chips, electricity prices, scams/exploits, institutional adoption like ETF rollouts, and other altcoins. So many compounding and interacting variables to consider for such a long timeframe.
Because the forecast is framed probabilistically, the decision-maker becomes more uncertainty-aware and critical of the veracity of the forecasts themselves. They will naturally raise the essential follow-up questions and considerations based on the forecasts (as illustrated in the last paragraph), which can lead to more robust and better thought-out decisions.
Our Solution
Our solution is to automatically convert any point forecasting (which is already much more commonly used) into a probabilistic forecast, thereby removing any of the statistical and algorithmic legwork from the user.
API Schema
For simplicity's sake, MAUQ has only one service with one primary API function for users to call:
-
- Quantify Uncertainty: user inputs a calibration dataset and config parameters for MAUQ to fit the uncertainty measures, and outputs that dataset with the uncertainty measures back to the user in JSON or another compatible file format.
See the attached MAUQ API Schema.png for an enlarged picture.
Two Types of Uncertainty Quantifications (UQ)
-
- Prediction Interval (PI): provide a range of predicted values between the upper and lower quantiles, given by the specified confidence level or inversely the error rate. The higher the confidence level, the wider the interval, or more distance between the quantiles.
-
- Distributional: a continuous set of all predicted values, with probabilities > 0%. The predicted values (X-axis) and their probabilities form a probability density function (PDF), as shown below. The PDF allows you to get the model's probability over or under any given value, or between two values a and b.
What Problem Cases MAUQ Covers
MAUQ covers the following 3-dimensions (3D) of problem cases:
- X: whether the data type is tabular (
or
) or time-series (
)
- Y: whether the problem type is regression (predict continuous values) or classification (predicts categories)
- Z: whether the probabilistic output (or UQ type) is a PI or a distribution
Evaluation Metrics
Here are metrics that evaluate the quality and effectiveness of the UQ method that is used:
- Prediction Interval (PI) width: the distance between the lower and upper quantiles, used for PI outputs
- Continuous Ranked Probability Score (CRPS): compares a single ground truth value to a cumulative distribution function (CDF), also a generalization of Mean Absolute Error (MAE) metric for distributional outputs
- Miscoverage rate: percentage of true value that falls outside of a range for a given confidence level, also 1 - Coverage Rate, used for both PI and distributional outputs
These evaluation metrics are derived from the final test set, which should neither be used to train the underlying model nor quantify its uncertainty. In addition, for all of these metrics, the lower the value, the better the performance.
A Concrete Example
As a concrete example,
we have an hourly univariate time-series of the electricity consumption of a San Francesco (SF) hospital in 2015. Below is the plot of the electricity consumption from the beginning of January to the beginning of May in kilowatts (kW). You can see it has a daily seasonality hovering between 800 Wh to 1,350 kW because the hospital is busiest during the midday and the least at midnight.
Next, I partitioned this annual data into train, calibration, and test sets. Here are the approximate time ranges of the 3 datasets:
- Train set: January-October 2015
- Calibration set: November 2015
- Test set: December 2015
I trained a NeuralProphet model with autoregressive (AR) lags on the train set and forecasted it on the test set with a single timestep forecast horizon. Here is the plot of the last week of the test set (or the last week of 2015 during the winter holiday). The blue line and blue crosses are the model-predicted values while the block dots are the actual energy consumption values:
You can see that overall the model prediction hugs pretty closely to the actual values, but it does underestimate the actual value a bit at peak consumption. Here is the tabular data output of this same data for the first half of Christmas Day:
Here is the plot again, but this time including a red prediction interval (PI) with 90th percent confidence. In other words, at least 90% of the actual values should fall within the red PI. The uncertainty quantification that produces this interval is a basic split conformal prediction method, using the absolute residual from the holdout calibration set:
Here, you can see almost all of the actual values falling within the red PI, including the peak values the NeuralProphet model underestimates. Below is the same tabular data output, but this time including the uncertainty quantification metrics, including the yhat1 +- qhat1 that are the red PIs:
You can find this full example in the attached uncertainty_conformal_prediction.ipynb Jupyter Notebook file below. The MAUQ service will wrap up this conformal prediction code (and more) to automatically provide these uncertainty quantification metrics (and more) to the existing model point predictions (y).
Note: If the user opts for a distributional output instead of a PI output, then the user will see distributional measures like variance, degree-of-freedom (DoF), and/or skewness instead of upper/lower PI bounds like yhat1 +- qhat1. This depends on the distributional types available in the MAUQ service, as stated in the Two Types of Uncertainty Quantifications (UQ) section.
Marketing Strategy
Before laying out our current marketing strategy, we like to first convey our AGI Vision and SNET Mission, as they are the ethos of our marketing approach:
Our AGI Vision
We envision AGI (at least in its early form) to be a swarm (or mesh network) of relatively bite-sized narrow intelligence. Just like a network of independent miners/validators forming a decentralized blockchain, or layers of information-encoding neurons forming a deep neural network (DNN). The rationale behind this vision is that the vast datasets from various domains or industries are simply too heterogeneous for any singular
AI model to
, as stated in the
theorem. This is no matter how many billions or trillions of parameters this god model contains, or no matter how deep is its logic tree or knowledge graph.
Instead, these bite-sized narrow intelligence will specialize (or
) in their respective data/domain
, as it is how AI is progressing now. How today's AI can leap into AGI v1.0 is whether these narrow intelligence can then trade their
with each other. A mutually beneficiary exchange that can augment each narrow intelligence's own capabilities with the other. Like trading in the real world, this trade between AIs likely involved a monetary transaction. This is where crypto, due to its decentralized nature, can become the means of transaction between untrusted narrow intelligence, particularly from different entities.
Here is an excellent
by IntoTheBlock that touches further on Decentralized AI, and how blockchain can turn narrow AI (when encapsulated into applications or services) into economic agents. Another informative
by Francesco C. breaks down the different subgenres of Distributed Artificial Intelligence (DAI), from swarm intelligence to multi-agent system (MAS).
Our SNET Mission
Our mission is to build this swarm of core infrastructural AI services for the SNET platform, for the greater benefit of other AI services on the platform as well as other end-user applications by augmenting their capabilities.
Our team is currently developing the following two AI services that were awarded in Deep Funding Round 2 (DF2):
- SIBYL: An automated, general-purpose forecaster that is made up of an ensemble of models. See project
.
- Onboard NeuralProphet: A
-based forecaster that also serves as a base model for SIBYL. See project
.
In addition to NeuralProphet, we are also considering onboarding SNET's existing Time-Series Forecasting service into SYBIL as a base model, depending on its performance. See service
.
In the same vein, SIBYL will call on this MAUQ service not as another base model, but rather to convert SIBYL's point forecasts into probabilistic forecasts before outputting them to the user. Onboard NeuralProphet service (by itself) can also call on MAUQ to get its probabilistic forecasts, as illustrated in Our Solution - An Example section.
Finally, SIBYL can be called by downstream Apps, Agents, and other Services that need its forecasting capabilities. One example of service is the future Soil Health Ecosystem Modeling Tools. See our separate Soil Health ideation proposal
in DF3 - Pool Ideation for more details about the ecosystem modeling tools.
The combination of Onboard NeuralProphet, SIBYL, and MAUQ form our starting core infrastructure AI services. They are all designed to be domain-agnostic and plug-and-playable so that they can be widely used by other Apps and Services that will be populating the SNET platform, including each other. Of course, there will be a monetary transaction in AGIX tokens (e.g., 0.012 AGIX for SNET's TSF as demo) to call these services, so these services will become economically incentivized, or at least self-sufficient enough to offset their respective computing costs.
Here is a visual graphic of how these AIs are connected to each other, forming a collective/swarm intelligence on the SNET platform:
Marketing and Adoption Approach for MAUQ
With our high-level vision/mission laid out, here is our marketing and adoption for MAUQ, on the following two levels:
- MAUQ as part of the AI core infrastructure bundle: In the SYBIL
and
, we have mentioned a variety of forecasting applications, from climate and energy to retail and crypto. While we adopt SYBIL for these applications, we can also bundle MAUQ to SYBIL as a value-augmenter, which helps the application user to be more uncertainty-aware and not take the point forecast immediately at face value.
- MAUQ as a standalone service: Aside from bundling with SYBIL, MAUQ is also a standalone service that can work for non-time-series, non-forecasting applications as well. Therefore, we also aim to promote MAUQ on its own merits. This is where we have our advisor Valeriy M., who is a prominent AI researcher & data scientist in the fields of uncertainty quantification (UQ) and conformal prediction. He has a popular GitHub repo
, which contains
a curated list of Conformal Prediction materials. As of this writing, this repo received 2,500+ stars, indicating a rapidly growing interest in this UQ space from fellow AI practitioners and developers, who are also MAUQ's primary target user base. In addition, we have Mahdi T.R. on our team who is also a YouTube content creator with a channel called
. He has posted multiple introductory videos around Conformal Prediction, with a couple of videos garnering 1,000+ views each. Therefore, we have a couple of outreach channels to promote our MAUQ service outside of the SNET/crypto ecosystem and orientate toward the greater AI/data science community.
Our Project Milestones and Cost Breakdown
Cost breakdown
The proposed budget for SIBYL is $72,000. Here is the overall cost breakdown:
- Modeling and AI R&D ("R&D"): $30,000
- Software Engineering and Integration ("SWE"): $14,000
- Service Product Management ("PM"): $6,000
- Company Operations, Marketing, and Buffer ("OPS"): $4,000
- Service Host and API Calls ("API"): $18,000
Service Host and API Calls meet the 25% reservation rule.
Milestones
Here is the current milestones list and details:
Milestone 0 (M0)
- Description: finalize milestone details and contract signing
- Deliverables: finalize milestone details (including allocation of 40% API Calls across development milestones), sign contract with SNET, assemble team, and set up tooling for MAUQ development (including JIRA, GitHub, etc.)
- Expected time to completion: 1-2 weeks
- Budget: OPS $2,000 -> M0 Total $2,000
Milestone 1 (M1)
- Description: MAUQ architectural design report
- Deliverables: MAUQ architectural design report with the API schema, elaboration of conformal prediction and other UQ techniques, tools and libraries used, models and AI techniques used, evaluation metrics, and applications (comparable to SYBIL's
)
- Expected time to completion: 2-3 weeks
- Budget: R&D $5,000 -> M1 Total $5,000
Milestone 2 (M2)
- Description: MAUQ v0.0: setup AWS and API gateway for MAUQ service
- Deliverables: spin-up AWS cloud environment (i.e., EC2 instance, Lambda, or Elastic Beanstalk), set up API gateway in accordance to API schema, test hardcoded API calls and responses using Postman [Note: this is basically a skeleton version of MAUQ service without any of the UQ features]
- Expected time to completion: 2-3 weeks
- Budget: SWE $4,000 | PM $1,000 -> M2 Total $5,000
Milestone 3 (M3)
- Description: MAUQ v0.1: Tabular Predictive Interval (PI) with Regression
- Deliverables: research & develop UQ for tabular data inputs with PI probabilistic outputs with regression problems only, include PI width and miscoverage rate as evaluation metrics, and deploy v0.1 to the given AWS environment with API gateways available
- Expected time to completion: 3-4 weeks
- Budget: R&D $5,000 | SWE $1,000 | PM $500 -> M3 Total $6,500
Milestone 4 (M4)
- Description: MAUQ v0.2: Tabular Predictive Interval with Classification
- Deliverables: research & develop UQ for tabular data inputs with PI probabilistic outputs for classification problems, and deploy v0.2 to the given AWS environment with API gateways available
- Expected time to completion: 2-3 weeks
- Budget: R&D $3,000 | SWE $1,000 | PM $500 -> M4 Total $4,500
Milestone 5 (M5)
- Description: MAUQ v0.3: Time-Series Predictive Interval with Regression & Classification
- Deliverables: research & develop UQ for [stationary] time-series data inputs with PI probabilistic outputs for classification problems, and deploy v0.2 to the given AWS environment with API gateways available
- Expected time to completion: 3-5 weeks
- Budget: R&D $5,000 | SWE $1,000 | PM $500 -> M5 Total $6,500
Milestone 6 (M6)
- Description: MAUQ v0.4: Conformal Predictive Intervals (CPD)
- Deliverables: research & develop distributional probabilistic output using CPD for both tabular/time-series data types and for both regression/classification problems, include CRPS as evaluation metrics, and deploy v0.5 to the given AWS environment with API gateways available
- Expected time to completion: 2-4 weeks
- Budget: R&D $3,000 | SWE $1,000 | PM $500 -> M6 Total $4,500
Milestone 7 (M7)
- Description: MAUQ v0.5: Improvements in Time-Series Predictive Interval (cutting-edge)
- Deliverables: research & develop novel UQ for [non-stationary, more real-word] time-series data inputs data inputs for both PI/CPD probabilistic outputs and both regression/classification problems, develop and apply our own custom and cutting-edge technique, write a whitepaper about this technique and submit to a top AI conference (e.g., ICML 2024), and deploy v0.5 to the given AWS environment with API gateways available
- Expected time to completion: 4-6 weeks
- Budget: R&D $9,000 | SWE $1,000 | PM $1,000 -> M7 Total $11,000
Milestone 8 (M8)
- Description: deploy SNET Platform onto as MAUQ v1
- Deliverables: understand the SNET’s API schema, make final tweaks and release MAUQ to SNET Platform as v1, integrate MAUQ with SYBIL and other initial services, provide front-end of MAUQ on SNET Platform
- Expected time to completion: 3-5 weeks
- Budget: SWE $5,000 | OPS $1,000 -> M8 Total $6,000
Milestone * (M*)
- Description: community report and market adoption
- Deliverables: test/evaluate v1 forecasts with a couple of pilot services (i.e., SIBYL) and/or apps, write a final mini-report and/or whitepaper about the MAUQ architecture and pilots, and focus more on marketing/adoption
- Expected time to completion: 3-4 weeks
- Budget: PM $1,000 | OPS $2,000 -> M* Total $3,000
Milestones Total $54,000 | API Total $18,000 -> Grand Total $72,000
The expected total time to complete all milestones is around 7-9 months. Therefore, if this proposal gets approved in the middle of Q4 2023, then the expected completion time is sometime in Q3 2024.
Note: See Risk and Mitigation for further elaboration of the milestone breakdown.
Risk and Mitigation
While we do have the utmost confidence in our own capabilities and dedication towards the MAUQ project as well as our swarm AI vision/mission, we can fully guarantee completing this project all the way through. Here is a list of project execution risks (in bold) as well as the mitigation steps we have taken (or plan to take).
- Running out of funding between milestones, especially since we are compensated only after the completion of each milestone. Migitation: That is why we established many milestones, but each is bite-sized and less than 10% of the overall budget. The only exception here is M7 due to the expected heavy R&D load. This will better spread the funding disbursement throughout the project lifecycle. For example, we provided a milestone M0 for the SNET contract signing, which gives us a buffer to begin designing and implementing MAUQ. Additionally, we have created separate milestones for more engineering-heavy tasks, such as spinning up the MAUQ skeleton API for the first time in M2, or deploying our service onto the SNET platform in M8. In the worst-case scenario, Temporai will use its own money to fill in any remaining cost gaps.
- Scope of the MAUQ is too broad and complex. Migitation: We have done extensive research on the feasibility of UQ and conformal prediction space before putting MAUQ service out as this proposal. There are ample open-sourced Python libraries we can use from
to AWS
. Applying UQ to both regression and classification problems, for PI and distributional output types, as well as for exchangeable tabular problems, are already well-established practices, as explained in this excellent conformal prediction intro
. The major tricky part is applying UQ to non-exchangeable time-series problems, which happens to be our area of specialization. See the attached ISF 2023 Presentation for more details. This is why we budgeted additional funding for AI R&D to figure out what UQ methods work best for time-series, including our own custom ones.
- Final MAUQ service implementation diverges our proposal. Migitation: It is true that in engineering, it is easy for the final product to be drastically different from the original specs. Therefore, for MAUQ, we will be transparent in our milestone deliverables as we have done for SIBYL. This includes publicizing API usage example guides as well as presenting Town Hall Break-Out rooms. That way, the community will continually stay up-to-date on our development progress and be aware of any changes or pivots that may potentially occur.
- Wait, aren't you guys already working on SIBYL? Won't you get distracted by multiple projects? Mitigation: SIBYL is expected to be completed by EOY 2023, or at least the core implementation part of it (milestones M1-M7). If MAUQ gets awarded in October, then we plan to start in November this year. So there will probably be a couple of months of overlap. Our SIBYL team is using the method of task parallelism, where we can work on tasks from various milestones simultaneously. For example, while the engineers are finishing up the current milestone, our AI modelers can already start laying the groundwork for the next few milestones. We can also apply this method across different projects/services, like what we are doing for SIBYL and Onboard NeuralProphet. The worst-case scenario is that we can move the MAUQ start time back to sometime in Q1 2024 if SYBIL becomes too involved for us to handle multiple services.
- Difficulty in deploying our service to the SNET Platform. Mitigation: Deploying onto the SNET platform is a non-trivial task, and it has been affecting all projects not just ours. We are in close touch with the SNET team about getting the latest requirements and specs of how to deploy to SNET. Additionally, we are reaching out to teams from earlier DF rounds (DF1) for their guidance.
Voluntary Revenue
We will adhere to the API Calls “user-friendly” template for revenue sharing. If this service crosses the threshold of $1,000 in monthly revenue, then 10% of the additional revenue over $1,000 will be fed back into the SNET and Deep Funding wallets.
Open Source
Yes eventually. It will be under the GNU General Public License (GPL) v3.0 License.
Our Team
Kevin R.C. - Service Lead, Modeling and AI R&D
- Senior Data Scientist / AI Researcher
- DF2 2x Awardee (
and
)
- Former Lead Core Developer and Maintainer of NeuralProphet
- 8+ years in the finance and crypto domains
- 3+ years of Python PyPI open-source experience (including
,
, and
)
- Speaker & session chair at 2023 International Symposium on Forecasting (ISF) Conference [
- LinkedIn
- GitHub
Mahdi T.R. - Data Scientist, Modeling and AI R&D
- Data/computational scientist, engineer, mentor, and YouTube content creator (
)
- 10+ years of experience in developing mathematical and machine-learning models for modeling complex physical phenomena
- Developed a code simulating microgravity solidification experiments for the NASA-ESA sponsored CETSOL project [
]
- Author of papers in top Physics journals (
,
) and ML conferences (
)
- Ph.D. in Mechanical & Industrial Engineering @ University of Iowa
- LinkedIn
- GitHub
- YouTube
- Bio
Francesco B. - Software Engineer, AI Architecture
- AI algorithmic trader and developer
- Numerai Top 100
- Former Management Consultant specialized in Operations
- MBA @ MIT Sloan School of Management
- LinkedIn
Joseph K. - Software Engineer, Infrastructure and Deployment
- Senior Software Engineer at NASA Goddard Space Flight Center (GSFC)
- 5+ years of experience in AWS cloud development
- 10+ years of experience in web application development
Tianhao G. - Graduate Student Researcher, Modeling and AI R&D
- M.A. in Statistics @ Columbia University
- LinkedIn
Advisor
Valeriy M. - Probabilistic ML and Conformal Prediction Advisor
Related Links
- uncertainty_conformal_prediction.ipynb example Jupyter Notebook (see attached)
- ISF 2023 Presentation PPTX slides (see attached)
- MAUQ API Schema PNG picture (see attached)
- Valeriy's Conformal Prediction tutorials (
and
)