Expert Rating 4.0

photrek

Project Owner

Implement clustering heuristics in MeTTa

Expert Rating

4.0

Type SingularityNET RFP
Funding Request $15,000 USD
RFP Guidelines Implement clustering heuristics in MeTTa

Overview

The aim of this project is to develop a robust architectural design for implementing clustering heuristics within the MeTTa programming language. We will approach this effort by integrating both traditional and probabilistic clustering algorithms into MeTTa. The clustering heuristics will be evaluated and optimized for scalability, enabling them to be used in more complex scenarios. Our use case demonstration will focus on applying these clustering algorithms to various datasets, with an emphasis on enhancing the AGI capabilities of the Hyperon platform.

RFP Guidelines

Implement clustering heuristics in MeTTa

Complete & Awarded

Type SingularityNET RFP
Total RFP Funding $40,000 USD
Proposals 6
Awarded Projects 1

SingularityNET

Aug. 12, 2024

The goal is to implement clustering algorithms in MeTTa and demonstrate interesting functionality on simple but meaningful test problems. This serves as a working prototype providing guidance for development of scalable tooling providing similar functionality, suitable for serving as part of a Hyperon-based AGI system following the PRIMUS cognitive architecture.

Proposal Description

Company Name (if applicable)

Photrek

Project details

The Problem We Are Aiming to Solve

Clustering is a fundamental task in machine learning and AI, yet the current implementation of such algorithms within the MeTTa programming language is either non-existent or underdeveloped. MeTTa, being a language designed for AGI systems, lacks the necessary clustering tools to perform meaningful data analysis. Clustering plays a critical role in organizing and understanding data, and without robust clustering capabilities, MeTTa users are limited in their ability to process and analyze large datasets effectively.

Additionally, the existing clustering methods often do not account for the probabilistic nature of real-world data. Traditional clustering algorithms may fail to provide accurate or meaningful clusters when dealing with uncertain data or when probabilities need to be factored into the clustering process.

The project will focus on the following areas:

1. Clustering Algorithms Implementation:

Implementing clustering algorithms available in Scikit-learn, focusing on K-Means, and Gaussian Mixture Models (GMMs) in MeTTa, ensuring that they are optimized for performance and scalability. Additional clustering methods, such as DBSCAN or Hierarchical Clustering, could be explored in a future phase to further enhance the framework's capabilities.
Large-Scale Data Handling:

Improve the performance and efficiency of clustering algorithms when dealing with large datasets. While Scikit-learn provides basic clustering tools, it may be limited for very large datasets, necessitating additional tools or techniques scaling.

2. Implement evaluation metrics for clustering algorithms, including Rand Index, Mutual Information, and Purity/Homogeneity Measures. These metrics will help in assessing the performance of the implemented algorithms.

3. Data Ingestion and Compatibility:

Ensure the system accepts inputs in popular data formats like CSV and TSV.
Interface seamlessly with Numpy and Pandas Python libraries to integrate with other AI workflows.

4. Visualization and Export Capabilities:

Implement visualization techniques, including t-SNE, to visualize clustering outputs.
Develop a submodule for exporting clustering results to ensure usability in downstream applications.

5. The implemented clustering algorithms will be demonstrated on test datasets, with a focus on evaluating the accuracy and performance of the algorithms.

Use Case Implementation Plan

The primary objective of this project is to develop and integrate a dedicated library within the MeTTa programming language that includes key clustering algorithms, with a particular focus on two core approaches: K-Means and Gaussian Mixture Models (GMM). Each algorithm will be optimized for performance, flexibility, and compatibility with probabilistic analysis, ensuring adaptability across various use cases. With its extensive expertise in machine learning, algorithm development, and scalable data solutions, the Photrek team will enhance MeTTa’s clustering functionalities to address current limitations and enable more sophisticated data analysis capabilities.

Our approach to implementing K-Means will prioritize efficient centroid initialization and iterative refinement, improving computational speed and convergence reliability. The K-Means algorithm will incorporate optimizations to handle high-dimensional datasets and large-scale clustering needs, enhancing scalability and reducing computational demands for users. Additionally, the algorithm will allow for user-defined distance metrics, giving flexibility in how data similarity is measured, which is particularly useful in contexts where standard Euclidean distance may not be appropriate. Supported distance metrics will include the Euclidean distance, which measures straight-line distance; Manhattan distance, ideal for grid-like data; and cosine similarity, which captures the angle between data points, making it particularly suitable for high-dimensional sparse data. These enhancements will empower users to adapt clustering to their specific data characteristics and objectives.

For Gaussian Mixture Models (GMM), the implementation will leverage a probabilistic framework, using Expectation-Maximization (EM) to iteratively improve cluster fit based on Gaussian distributions. The approach will be enhanced with advanced capabilities that allow flexible adjustment of clustering behavior, enabling more refined probabilistic modeling to capture complex data relationships across various scenarios. In terms of user interaction, this library will enable selection of different clustering models, parameter adjustments for probabilistic constraints, and custom initialization options.

To ensure that the clustering algorithms are accurately evaluated and suitable for different use cases, evaluation metrics will be implemented. These metrics will include the Rand Index, Mutual Information, and Purity/Homogeneity Measures, which are essential in assessing the performance of the clustering algorithms.

The system will also be designed to handle data ingestion efficiently, supporting common input formats such as CSV and TSV. Additionally, the system will leverage popular Python libraries like Numpy and Pandas, which are widely used for data manipulation, preprocessing, and analysis.

For visualization and exporting the clustering results, we will implement techniques such as t-SNE (t-Distributed Stochastic Neighbor Embedding) to help users visualize the structure of their data in a lower-dimensional space. Additionally, a submodule will be developed for exporting the clustering results to widely accepted formats such as CSV or JSON, ensuring that the results can be shared or used in downstream applications.

Finally, the algorithms will be tested on a variety of test datasets, and their accuracy and performance will be evaluated using the aforementioned metrics. This will ensure that the algorithms perform well across different types of data and provide reliable results in real-world applications.

Our commitment to usability extends to fostering engagement with the MeTTa user community for feedback on the library’s clarity, applicability, and flexibility. Furthermore, by integrating robust distance-based and probabilistic measures, the library will provide enhanced tools for tackling complex clustering tasks, thereby establishing MeTTa as a platform for cutting-edge machine learning applications.

Open Source Licensing

GNU GPL - GNU General Public License

Activity Summary

Milestones

3

Total

Discussion

0

Total Comments

Reviews

4

Total Posted

Project Team

6

Total People

Total Milestones
3
Total Budget
$15,000_USD
Last Updated
3 Dec 2024

Milestone 1 - Requirement Analysis & initial implementation

Description

Lay the groundwork for the project by defining objectives conducting requirement analysis and implementing the optimized K-Means algorithm with support for custom distance metrics and initial testing.

Deliverables

- Define the main goals and deliverables of the project. - Analyze the core requirements for developing clustering algorithms. - Prepare a detailed work plan with a timeline for the subsequent milestones. - Implement the K-Means algorithm with performance optimizations. - Add options for custom distance metrics to improve clustering accuracy. - Conduct preliminary tests to verify the implementation.

Budget

$6,000 USD

Success Criterion

success_criteria_1

Milestone 2 - GMM Development and Performance Testing

Description

Develop and test the Gaussian Mixture Model using Expectation-Maximization techniques create clustering evaluation metrics and optimize performance for large datasets.

Deliverables

- Implement the Gaussian Mixture Model using Expectation-Maximization techniques. - Perform tests to evaluate model performance. - Develop evaluation metrics for clustering algorithms such as Rand Index and Mutual Information. - Improve performance when handling large datasets. - Conduct tests to assess efficiency and effectiveness.

Budget

$6,000 USD

Success Criterion

success_criteria_1

Milestone 3 - Data Integration and Visualization

Description

Ensure compatibility with common data formats and libraries implement clustering result visualizations and develop export functionality for downstream applications.

Deliverables

- Ensure the system accepts inputs in popular data formats like CSV and TSV. - Ensure compatibility with Numpy and Pandas libraries. - Implement visualization techniques including t-SNE to visualize clustering results. - Develop an interface for exporting results to facilitate their use in downstream applications.

Budget

$3,000 USD

Success Criterion

success_criteria_1

Join the Discussion (0)

Expert Ratings

Reviews & Ratings

Group Expert Rating (Final)

Overall

4.0

Compliance with RFP requirements 4.3
Solution details and team expertise 4.0
Value for money 4.3

While experts originally rated this submission highly and argued in favor, ultimately we selected another proposal for strategic reasons.

Expert Review 1
Overall

3.0
- Compliance with RFP requirements 4.0
- Solution details and team expertise 3.0
- Value for money 0.0
most complete proposal (compared to others)

I have the following three comments about all the clustering proposals and to be fair, I will mention them for all the proposals. At the end, you can see my comments specifically for this current proposal. First, I was expecting to see more on the difficulties that one may face when a clustering algorithm is implemented in MeTTa, in other words, MaTTa-specific challenges, and the proposing team plans to handle them. I did not see that in any of the proposals. Second, I was expecting to see their plan for making sure the MeTTa clustering library will have the ability to work robustly on diverse datasets. For example, they could have listed a few datasets that may cause problems for a clustering algorithm and could have mentioned how they plan to avoid those problems. Third, based on my experience with clustering algorithms, most computational gains come from vectorization. None of the proposals even mention that even though the RFP specifically mentions Concurrent processing and the ability to work on large datasets. Proposal-specific comments: Positive The authors have done a good job demonstrating that they understand what the problem is. explaining the problem: clustering plays a critical role in organizing and understanding data, and without robust clustering capabilities, MeTTa users are limited in their ability to process and analyze large datasets effectively. Negative They talk about “probabilistic nature of real-world data”, without clarifying what they mean by that term. In the absence of that, reader is left guessing! Furthermore, the RFP does not mention anything about handling “probeblistic nature of data”, so it is not clear why, among so many other things that one can do in addition to what RFP explicitly asks for, they have chosen to focus on “probeblistic nature of data” With Respect to functional requirements that the RFP asks for: They provide enough details, and sometimes not at all, on how the algorithms will be implemented. Do they anticipate that they will face challenges in implementing those algorithms in MeTTa or do they expect everything to go very smoothly? What is the plan to overcome those challenges? Same issue on what they discuss about “Large-Scale Data Handling”! About their mention of “Improve the performance and efficiency of clustering algorithms when dealing with large datasets”! They mention “prioritize efficient centroid initialization and iterative refinement” as a way to increase efficiency. Why have they chose to focus on this! What other methods can be used to increase efficiency? Based on my experience, the way the custom distance metrics are implemented is very critical in performance. They provide no details abou this.

Expert Review 2
Overall

5.0
- Compliance with RFP requirements 5.0
- Solution details and team expertise 5.0
- Value for money 0.0
A detailed and competent response that hits all the bases of the RFP

Photrek is a known entity and has responded successfully to prior DF calls. This one seems well within their capability.

Expert Review 3
Overall

4.0
- Compliance with RFP requirements 4.0
- Solution details and team expertise 5.0
- Value for money 0.0
A known team. Covers the requisite bases. Would have liked to have seen discussion of how the clustering algorithms written in MeTTa are important for AGI and how this could impact the implementations, but this is a minor point.

The weighted average of the 4 perspectives Overall

0.0
Each RFP defines a maximum allowed budget, but teams can differentiate their proposal by offering a solution with a lower budget or a wider scope.Value for money

0.0
This rating indicates compliance to 'Must haves' but also adaptation of 'Nice to haves' and Non-functional requirements defined in the RFP.Compliance with RFP requirements

0.0
RFPs will offer varying degrees of freedom. This rating indicates the quality of the team's specific solution ideas, the provided details, and the reviewer's confidence in the team's ability to execute.Solution details and team expertise

0.0

Review Headline

0 /50 chars

Review Summary

0 /5000 chars

The weighted average of the 4 perspectives Overall

4.0
Each RFP defines a maximum allowed budget, but teams can differentiate their proposal by offering a solution with a lower budget or a wider scope.Value for money

4.3
This rating indicates compliance to 'Must haves' but also adaptation of 'Nice to haves' and Non-functional requirements defined in the RFP.Compliance with RFP requirements

4.3
RFPs will offer varying degrees of freedom. This rating indicates the quality of the team's specific solution ideas, the provided details, and the reviewer's confidence in the team's ability to execute.Solution details and team expertise

4.0

Review Headline

0 /50 chars

0 /5000 chars

Warning: Adding final group rating for this project will prevent expert users from adding new or editing existing reviews

Reviews and Ratings in Deep Funding are structured in 4 categories. This will ensure that the reviewer takes all these perspectives into account in their assessment and it will make it easier to compare different projects on their strengths and weaknesses. Overall (Primary) This is an average of the 4 perspectives. At the start of this new process, we are assigning an equal weight to all categories, but over time we might change this and make some categories more important than others in the overall score. (This may even be done retroactively). Feasibility (secondary) This represents the user\'s assessment of whether the proposed project is theoretically possible and if it is deemed feasible. E.g. A proposal for nuclear fission might be theoretically possible, but it doesn’t look very feasible in the context of Deep Funding. Viability (secondary) This category is somewhat similar to Feasibility, but it interprets the feasibility against factors such as the size and experience of the team, the budget requested, and the estimated timelines. We could frame this as: “What is your level of confidence that this team will be able to complete this project and its milestones in a reasonable time, and successfully deploy it?” Examples:

A proposal that promises the development of a personal assistant that outperforms existing solutions might be feasible, but if there is no AI expertise in the team the viability rating might be low.
A proposal that promises a new Carbon Emission Compensation scheme might be technically feasible, but the viability could be estimated low due to challenges around market penetration and widespread adoption.

Desirability (secondary) Even if the project team succeeds in creating a product, there is the question of market fit. Is this a project that fulfills an actual need? Is there a lot of competition already? Are the USPs of the project sufficient to make a difference? Example:

Creating a translation service from, say Spanish to English might be possible, but it\'s questionable if such a service would be able to get a significant share of the market

Usefulness (secondary) This is a crucial category that aligns with the main goal of the Deep Funding program. The question to be asked here is: “To what extent will this proposal help to grow the Decentralized AI Platform?” For proposals that develop or utilize an AI service on the platform, the question could be “How many API calls do we expect it to generate” (and how important / high-valued are these calls?). For a marketing proposal, the question could be “How large and well-aligned is the target audience?” Another question is related to how the budget is spent. Are the funds mainly used for value creation for the platform or on other things? Examples:

A metaverse project that spends 95% of its budget on the development of the game and only 5 % on the development of an AI service for the platform might expect a low ‘usefulness’ rating here.

A marketing proposal that creates t-shirts for a local high school, would get a lower ‘usefulness’ rating than a marketing proposal that has a viable plan for targeting highly esteemed universities in a scaleable way.
An AI service that is fully dedicated to a single product, does not take advantage of the purpose of the platform. When the same service would be offered and useful for other parties, this should increase the ‘usefulness’ rating.

photrek

Project Owner

Amenah: Scientific lead (PI) Igor: Principal coder & documentation Maciej: Test engineer & functional programmer Kenric: Strategic alignment Juana: Project manager

View Profile

Account Pending

Scientific Lead & Principal Investigator

Account Pending

Igor Oliveira

Developer

Principal coder & documentation

View Profile

MaciejSki

Test Engineer

Functional Programmer

View Profile

kenricnelson

Strategist

Strategic alignment

View Profile

Juana Attieh

Project Manager

View Profile

Implement clustering heuristics in MeTTa

photrek

Implement clustering heuristics in MeTTa

Expert Rating

Overview

RFP Guidelines

Implement clustering heuristics in MeTTa

SingularityNET

Proposal Description

Company Name (if applicable)

Project details

Open Source Licensing

Activity Summary

Milestones

3

Discussion

0

Reviews

4

Project Team

6

Proposal Video

Not Avaliable Yet

3

$15,000 USD

3 Dec 2024

Milestone 1 - Requirement Analysis & initial implementation

Description

Deliverables

Budget

Success Criterion

Milestone 2 - GMM Development and Performance Testing

Description

Deliverables

Budget

Success Criterion

Milestone 3 - Data Integration and Visualization

Description

Deliverables

Budget

Success Criterion

Join the Discussion (0)

Expert Ratings

Group Expert Rating (Final)

4.0

Expert Review 1

3.0

most complete proposal (compared to others)

Expert Review 2

5.0

A detailed and competent response that hits all the bases of the RFP

Expert Review 3

4.0

photrek

Account Pending

Igor Oliveira

MaciejSki

kenricnelson

Juana Attieh

Receive notifications on Deep Funding

Subscribe To NewsLetter

Product

Program

Resources

SingularityNET

Welcome to our website!

$15,000_USD