Proposal for implementation of Clustering Library

chevron-icon
RFP Proposals
Top
chevron-icon
project-presentation-img
Expert Rating 3.6
Nahom Senay
Project Owner

Proposal for implementation of Clustering Library

Expert Rating

3.6

Overview

Our approach to develop the meTTa based clustering library has the following layers or components. There is the frontend, the preprocessing layer, the core layer, the visualization layer. This will be tackled using an agile-like methodology where planning is coupled with coding. This project is expected to be completed within 3 months duration containing the above components.

RFP Guidelines

Implement clustering heuristics in MeTTa

Complete & Awarded
  • Type SingularityNET RFP
  • Total RFP Funding $40,000 USD
  • Proposals 6
  • Awarded Projects 1
author-img
SingularityNET
Aug. 12, 2024

The goal is to implement clustering algorithms in MeTTa and demonstrate interesting functionality on simple but meaningful test problems. This serves as a working prototype providing guidance for development of scalable tooling providing similar functionality, suitable for serving as part of a Hyperon-based AGI system following the PRIMUS cognitive architecture.

Proposal Description

Project details

Overview

This document serves as a proposal to build a clustering  library that is based on meTTa (metta type talk) language. It contains the component wise description of the proposed library and how it is planned to be implemented. The library will include various clustering algorithms: k-means clustering, Gaussian Mixture Models (GMMs), Spectral Clustering and Hierarchical Clustering.

 

Approach and Methodology

Our approach to develop the meTTa based clustering library has the following layers or components. There is the frontend, the preprocessing layer, the core layer, the visualization layer. This will be tackled using an agile-like methodology where planning is coupled with coding. 

 

The frontend contains python functions that instantiate reading various data formats. In addition it provides a function invocation to the clustering algorithms and the evaluation algorithms. This makes it work like the sci-kit learn library where people with no meTTa background can operate on.

 

The preprocessing layer contains adapters that change dataFrame or series objects from pandas and convert it to meTTa representable atoms to be stored on an atomspace. The number of adapters depends on the number file support required by the RFP.

 

The core-layer contains the algorithms that are to be built using meTTa. These are hierarchical clustering, k- means clustering, spectral clustering and the gaussian mixture models. Furthermore it implements the evaluation metrics Rand, Mutual Information and Purity measure which are famous evaluation metrics for clustering.

 

The visualizer layer contains integration of matplotlib.pyplot to represent the output of the clustering algorithms.

 

Since this is the high-level project structure, we begin by researching on how to port these algorithms to meTTa. After, we will write function definitions with their respective documentation to help other developers understand the functions. Then we will implement the core algorithms which are k-means clustering, spectral clustering, and hierarchical clustering. 

We will declare atom spaces based on the number of clusters that is given as an input and update the atom spaces based on the clustering logic for each algorithm.

 

Deliverables

  1. The clustering Algorithms (k-means clustering, hierarchical clustering, spectral clustering, gaussian mixture models)

  2. Python functions (as a library) that invoke the clustering algorithms.

  3. Graphical visualization integration for the clustering algorithms.

  4. Clustering metrics written in meTTa.

 

Project Timeline

 

Algorithmic Research (2 -weeks)

  • We will need a week to research on how the algorithms work on an implementational level.

  • We need to find ways to leverage non-deterministic approaches to implement the algorithms in meTTa.

K-means Clustering and Evaluation Metrics(2- weeks) (3000 $)

  • Implement k-means clustering in the meTTa programming language that works with a variable number of clusters.

  • Implement rand evaluation metrics.

  • Implement mutual information metrics.

  • Implement purity measures.

Hierarchical Clustering (2- weeks) (1000 $)

  • Implement hierarchical clustering in the meTTa programming language

Spectral Clustering (2 - weeks) (1000 $)

  • Implement spectral clustering in the meTTa programming language.

Gaussian Mixture Models (2-weeks) (1000 $)

  • Implement gaussian mixture model clustering in the meTTa programming language.

Implementation of Other features (2 - weeks) (2000 $)

  • Implement functions that make access of the algorithms:

  1. Hierarchical clustering

  2. Spectral Clustering

  3. Gaussian Mixture Models

  4. K-means Clustering.

  • Implement an adapter that changes data frames or series objects to metta atoms.

  • Implementing a visualizer integrated with matplotlib.

 

Team Competence

Our team is currently working with the meTTa programming language. We are working on implementing the MOSES (Meta Optimization Semantic Evolutionary Search) project with meTTa. This implies that we don’t have a huge learning curve with the DSL. We also have working knowledge with some of the clustering algorithms like k-means.

 

 





Open Source Licensing

Apache License

Links and references

https://github.com/icog-labs-dev/metta-moses

Proposal Video

Not Avaliable Yet

Check back later during the Feedback & Selection period for the RFP that is proposal is applied to.

  • Total Milestones

    6

  • Total Budget

    $8,000 USD

  • Last Updated

    8 Dec 2024

Milestone 1 - Algorithmic Research

Description

Conduct an in-depth study of algorithmic functioning at an implementation level, focusing on leveraging non-deterministic approaches for integration into meTTa.

Deliverables

- Study the core algorithms' functional and implementational aspects. - Having a pseudocode for the algorithms.

Budget

$500 USD

Success Criterion

- A clear understanding fo the algorithms

Milestone 2 - Implementation of k-means clustering

Description

Develop and implement a K-Means Clustering algorithm with a variable number of clusters in the meTTa programming language, along with three evaluation metrics: Rand Index, Mutual Information, and Purity Measures.

Deliverables

- Implementation of the K-Means algorithm in meTTa with support for a variable number of clusters. - Rand Index: Implementation of the Rand Index to measure similarity between two clusterings. Mutual Information Metrics: Implementation of metrics to quantify the amount of shared information between cluster assignments and ground truth. Purity Measures: Implementation to compute purity for evaluating clustering quality against labeled data.

Budget

$3,000 USD

Success Criterion

Functional implementation of K-Means in meTTa, supporting: Dynamic adjustment of the number of clusters. Accurate computation of Rand Index, Mutual Information, and Purity Measures for evaluating clustering results. Comprehensive tests passing for both clustering and metric evaluation. Clear and actionable documentation for future usage and maintenance.

Milestone 3 - Implementation of hierarchical clustering

Description

Develop and implement Hierarchical Clustering in the meTTa programming language, enabling robust cluster formation and analysis.

Deliverables

Implement hierarchical clustering algorithm as a function with the meTTa DSL

Budget

$1,000 USD

Success Criterion

- Having a successful working version of hierarchical clustering algorithm.

Milestone 4 - Implementation of Spectral Clustering

Description

Develop and implement Spectral Clustering in the meTTa programming language, enabling robust cluster formation and analysis.

Deliverables

Implement spectral clustering algorithm as a function with the meTTa DSL

Budget

$1,000 USD

Success Criterion

- Having a working version of spectral clustering algorithm.

Milestone 5 - Implementation of Gaussian Mixture Models

Description

Develop and implement Gaussian Mixture Models in the meTTa programming language, enabling robust cluster formation and analysis.

Deliverables

Implement GMM as a function with the meTTa DSL

Budget

$1,000 USD

Success Criterion

- Having a working version of GMM in the meTTa DSL.

Milestone 6 - Implementation of Other features

Description

Enhance the usability and functionality of the clustering and modeling algorithms in the meTTa programming language by implementing utility functions, data format adapters, and a visualization tool.

Deliverables

Develop intuitive functions for easier access and execution of the following algorithms: Hierarchical Clustering Spectral Clustering Gaussian Mixture Models (GMM) K-Means Clustering Create an adapter to convert common data structures like dataframes or series into meTTa atoms. Support seamless conversion for different input formats, ensuring compatibility with meTTa’s internal representations. Implement a visualization tool using matplotlib to: Plot cluster results for algorithms (e.g., 2D scatter plots, dendrograms). Support customization options (e.g., colors, markers, titles). Provide hooks to save visualizations in standard formats (e.g., PNG, SVG).

Budget

$1,500 USD

Success Criterion

All algorithms (Hierarchical Clustering, Spectral Clustering, GMM, and K-Means) are easily accessible through user-friendly functions. Adapter reliably transforms data into meTTa-compatible formats without data loss or inconsistencies.

Join the Discussion (0)

Expert Ratings

Reviews & Ratings

Group Expert Rating (Final)

Overall

3.6

  • Feasibility 4.3
  • Desirabilty 3.7
  • Usefulness 3.3

While experts originally rated this submission highly and argued in favor, ultimately we selected another proposal for strategic reasons.

  • Expert Review 1

    Overall

    2.0

    • Compliance with RFP requirements 3.0
    • Solution details and team expertise 1.0
    • Value for money 1.0

    I have the following three comments about all the clustering proposals and to be fair, I will mention them for all the proposals. At the end, you can see my comments specifically for this current proposal. First, I was expecting to see more on the difficulties that one may face when a clustering algorithm is implemented in MeTTa, in other words, MaTTa-specific challenges, and the proposing team plans to handle them. I did not see that in any of the proposals. Second, I was expecting to see their plan for making sure the MeTTa clustering library will have the ability to work robustly on diverse datasets. For example, they could have listed a few datasets that may cause problems for a clustering algorithm and could have mentioned how they plan to avoid those problems. Third, based on my experience with clustering algorithms, most computational gains come from vectorization. None of the proposals even mention that even though the RFP specifically mentions Concurrent processing and the ability to work on large datasets. Proposal-specific comments: The proposal lack technical details about the algorithimgs. For example, how are they going to make sure that their implementation is efficient? I am getting the impression that the proposers have experience in software engineering but no prior experience in clustering.

  • Expert Review 2

    Overall

    5.0

    • Compliance with RFP requirements 5.0
    • Solution details and team expertise 5.0
    • Value for money 5.0
    A very competent detailed proposal that fulfills all the requirements of the RFP

    I think this team is on Snet Foundation payroll already for another project, so am unsure if they are eligible?? they can certainly do it but OTOH then wouldn't they have to pause the work they are doing now on MOSES in MeTTa?

  • Expert Review 3

    Overall

    4.0

    • Compliance with RFP requirements 5.0
    • Solution details and team expertise 5.0
    • Value for money 4.0

    Ticks the boxes. The team is already somewhat familiar with programming in MeTTa, which is a definite plus. They also propose a frontend containing python functions for data input which makes sense. I also note their inclusion of adapters to convert input data into MeTTa representable Atoms stored in an Atomspace which gets at the heart of why we wish to port the clustering algorithms into MeTTa.

feedback_icon