Completed 👍

Ammar Khairi

Project Owner

Customizable Code Assistant

Project for Round 3
Funding Awarded $30,000 USD
Funding Pools New projects
Milestones 5 / 5 Completed

Status

Overall Status
🥳 Completed & Paid
Funding Transfered
$30,000_USD
Max Funding Amount
$30,000_USD

Funding Schedule

View Milestones

Milestone Release 1	$1,000 USD	Transfer Complete	12 Jan 2024
Milestone Release 2	$4,000 USD	Transfer Complete	12 Jan 2024
Milestone Release 3	$5,000 USD	Transfer Complete	02 Feb 2024
Milestone Release 4	$15,000 USD	Transfer Complete	01 Mar 2024
Milestone Release 5	$5,000 USD	Transfer Complete	19 Jul 2024

Status Reports

Jun. 20, 2024

Status

🤔 We encountered some issues

Summary

Trying to onboard the service to the SNET Platform

Full Report

Link Name 1

Mar. 27, 2024

Status

😀 Excellent

Summary

Completed all milestones until onboarding which should also be completed soon.

Full Report

Link Name 1

Video Updates

Customizable Code Assistant

9 February 2024

Overview

This project addresses the accessibility gap in using and developing large language models (LLMs) by making them more usable and reducing computational costs. The aim is to democratize LLMs' practicality by exploring transfer learning on small pre-trained models for new programming languages and tasks. The solution entails generating datasets, conducting performance-cost analysis, fine-tuning and evaluating models, creating GPU inference environments, and deploying a modular pipeline for customizable Code Assistants. The budget of $30,000 covers these milestones. Risks are mitigated through privacy measures, licensing compliance, and adapting to computation costs. The project's inspiration stems from previous collaborative work and aims to contribute to AI's democratization through accessible and open-source models and datasets.

Proposal Description

AI services (New or Existing)

Compnay Name

Enigma AI

Service Details

The aim of this project is to address the accessibility gap and further efforts towards democratising the use and development of code LLMs. To make LLMs more usable, we need to first address the high computational cost of large language models. Using smaller models also comes with the cost of being limited to some popular programming languages. Hence, we will explore the feasibility of using transfer learning on small pre-trained models to new programming languages and tasks. For more challenging tasks that require larger models and thus higher costs, we will make use of efficient training and inference methods to achieve good performance with reasonable costs.

Problem Description

Accessibility to code LLMs is one of the important goals of AI development. The Democratization of AI is an important factor in the progress of the field, and AI companies such as Meta and Stability are showing their commitment to it by publicly sharing their models and releasing their weights. However, this may not be enough, as the barrier to tuning and using LLM is higher than just access to their weights. For the practitioner, the choices of model architecture and learning algorithms are not obvious, and exploring these options is costly due to the high computation costs. On the other hand, users of these tools often have to choose between the high inference cost of running models locally or the monetary cost of access via hosted APIs (e.g. Co-pilot). In order to achieve the benefits of democratization of AI use and development, these issues need to be resolved.

Solution Description

Our approach to solving these issues addresses various factors:

Training Data: High-quality training data is essential to lowering the cost of training LLMs. Studies have shown that fine-tuning LLMs on smaller but more task-specific and higher-quality data is more beneficial to the performance of the model. Hence, we will focus on creating automated and semi-automated approaches to collect, clean, and filter our data in such a way that we can achieve significant performance in less training time with less data, thus lowering the overall cost of training.

Model Size: The scaling laws of LLMs show that better generalization is expected with larger models. However, our previous work has also shown the feasibility of the alternative approach of using relatively smaller LLMs on more specified tasks. Our approach enables us to significantly cut down on the cost and time associated with fine-tuning LLM to ensure more equitable access to these tools.

Privacy: The use of hosted APIs in code generation and understanding tasks poses significant privacy concerns for companies working with sensitive data, as the use of these APIs requires sending your code outside the company's servers. Our proposed API addresses this problem through its open-source and privacy-oriented setup. This setup is divided into two main paths. The first path is where customers can directly call the API to provide code suggestions for a variety of tasks. The second is for more sensitive cases where users can use our open-sourced efficient fine-tuning and data filtering methods to fine-tune their own code LLMs, which can then be used locally.

Milestone & Budget

Generate Datasets:

Description: Create an automated (or semi-automated) scaping pipeline on GitHub for permissively licenced code that represents a set of selected tasks. The scraping will be based on programming languages, documentation, dates, keywords, and other criteria relevant to the task. The pipeline should also include a filtering step where scraped files are filtered based on specific quality measures such as number of GitHub stars, length of the file, comments, recency, and additional relevant quality standards.

Status: Previous work includes pipelines for scraping based only on programming languages, filtering based on the aforementioned standards, and sharing the datasets on HuggingFace.

Cost: 1000$

Estimated Time: 10 working days (2 weeks)

-------------------------------------

Performance Cost Analysis:

Description: The objective of this milestone is to establish a roadmap for fine-tuning LLMs on selected tasks. The roadmap should include details about the optimal training hyperparameters that balance training cost and model performance. We intend to create these recommendations based on current literature and empirical experiments.

Status: Previous work has established a roadmap for smaller code LLMs (< 1 billion parameters) in the task of autoregressive code completion in different programming languages.

Cost: 4000$

Estimated Time: 20 working days (4 weeks)

-------------------------------------

Finetuning and Evaluation:

Description: Fine-tune our off-the-shelf efficient and specialised models and evaluate each model on its respective benchmark. In this stage, we also establish the steps of the second part of our API for one-call training of models with sensitive data.

Status: Not Started

Cost: 5000$

Estimated Time: 15 working days (3 weeks)

------------------------------------- Hosting/API Calls:

Description: Create a GPU inference environment where different models can be accessed through either API calls or an interactive dashboard. The hosting will be on Amazon's Sage Maker to enable efficient loading and invoking of models with minimal costs. The interactive dashboard is intended to showcase the versatility and performance of the models, while the direct API call can be used for integration in code IDEs such as VS-Code. Finally, the one-call API will also be developed to allow customers to train the models based on their own datasets and use the models locally.

Status: Previous work has explored options for optimised inference from code LLMs as well as establishing a CPU-hosted interactive code generation interface. We expect existing knowledge and architecture to be beneficial when transferring from HuggingFace to other hosting platforms.

Cost: 15000$

Estimated Time: 30 working days (6 weeks)

------------------------------------- Onboarding:

Description: Onboarding the services into the Singularity Marketplace

Status: Not Started

Cost: 5000$

Estimated Time: 15 working days (3 weeks)

------------------------------------ Total:

Cost: 30000$

Estimated Time: 90 working days (18 weeks)

Marketing & Competition

Our marketing strategy centres around the unique perks our approach provide:

Quality Training Data: Our solution focuses on automating the collection, cleaning, and filtering of data. By curating task-specific, superior-quality data, we reduce training time and costs significantly, making LLMs more accessible to all.
Open Source, Simplicity, and Privacy: Complexity and lack of privacy are often problems when creating LLM's APIs; our solution addresses this by offering two distinct paths. First, users can call our API directly for code suggestions across a range of tasks. Second, for sensitive data, we provide open-source tools for efficient fine-tuning and data filtering, allowing users to fine-tune their own LLMs locally while maintaining data privacy.
Customization and Versatility: Our solution is designed to adapt to any code or task that users require. It offers a high degree of customization, ensuring that it meets the unique needs of each practitioner or organisation.
Cost-Effective Pricing: We stand out with the option for a cost-effective pricing model that charges only for training, unlike the traditional pay-per-API-call or subscription models. This significantly reduces the financial burden on users, making LLMs more accessible.

Long Description

Company Name

Enigma AI

Summary

Funding Amount

30,000$

The Problem to be Solved

Our Solution

Our approach to solving these issues addresses various factors:

Marketing Strategy

Our marketing strategy centres around the unique perks our approach provide:

Quality Training Data: Our solution focuses on automating the collection, cleaning, and filtering of data. By curating task-specific, superior-quality data, we reduce training time and costs significantly, making LLMs more accessible to all.
Open Source, Simplicity, and Privacy: Complexity and lack of privacy are often problems when creating LLM's APIs; our solution addresses this by offering two distinct paths. First, users can call our API directly for code suggestions across a range of tasks. Second, for sensitive data, we provide open-source tools for efficient fine-tuning and data filtering, allowing users to fine-tune their own LLMs locally while maintaining data privacy.
Customization and Versatility: Our solution is designed to adapt to any code or task that users require. It offers a high degree of customization, ensuring that it meets the unique needs of each practitioner or organisation.
Cost-Effective Pricing: We stand out with the option for a cost-effective pricing model that charges only for training, unlike the traditional pay-per-API-call or subscription models. This significantly reduces the financial burden on users, making LLMs more accessible.

Our Project Milestones and Cost Breakdown

-------------------------------------

Generate Datasets:

Status: Previous work includes pipelines for scraping based only on programming languages, filtering based on the aforementioned standards, and sharing the datasets on HuggingFace.

Cost: 1000$

Estimated Time: 10 working days (2 weeks)

-------------------------------------

Performance Cost Analysis:

Status: Previous work has established a roadmap for smaller code LLMs (< 1 billion parameters) in the task of autoregressive code completion in different programming languages.

Cost: 4000$

Estimated Time: 20 working days (4 weeks)

-------------------------------------

Finetuning and Evaluation:

Status: Not Started

Cost: 5000$

Estimated Time: 15 working days (3 weeks)

------------------------------------- Hosting/API Calls:

Cost: 15000$

Estimated Time: 30 working days (6 weeks)

------------------------------------- Onboarding:

Description: Onboarding the services into the Singularity Marketplace

Status: Not Started

Cost: 5000$

Estimated Time: 15 working days (3 weeks)

------------------------------------ Total:

Cost: 30000$

Estimated Time: 90 working days (18 weeks)

Risk and Mitigation

Privacy Concerns:

One of the primary risks associated with our project is the potential exposure of sensitive information when fine-tuning models with personal datasets. To mitigate this risk, we have implemented a comprehensive privacy safeguard protocol. Before any data is utilized in the fine-tuning process, we recommend a thorough inspection of personal datasets to identify and remove any potential secrets or sensitive information. Additionally, we employ state-of-the-art AI-based code analysis tools to scan the code for any inadvertent disclosures. This proactive approach ensures that user data remains confidential and secure throughout the fine-tuning process.

Intellectual Property (IP) Infringement:

To prevent any inadvertent usage of unlicensed code during the model training phase, we have adopted a stringent policy of exclusively scraping permissively licensed code from reputable sources. This policy is aimed at minimizing the risk of inadvertently including copyrighted or proprietary code in our training datasets. By adhering strictly to this policy, we reduce the likelihood of IP infringement issues arising during the project.

High Computation Costs:

Another potential risk is the unforeseen increase in computation costs, particularly when it comes to GPU resources required for inference and training, which may exceed our estimated budget. To address this risk, we have contingency plans in place. In the event that computational costs exceed expectations, we will implement limitations on the size of the baseline Large Language Models (LLMs) we fine-tune. This proactive measure ensures that we can stay within budget constraints without compromising the quality of our service or overburdening users with unexpected costs. Additionally, we continually monitor and optimize our computational resource usage to maintain cost-efficiency throughout the project's lifecycle.

Open Source

This project has been inspired by a previous work done as part of an MSc dissertation at University of Edinburgh with Collaboration with the Amazon Data Centre Edinburgh.

This work has presented a concerted effort to bridge the gap between advanced AI technologies and their practical usability, particularly in the domain of code intelligence. By focusing on accessibility, usability, and empirical understanding, the work contributed to the ongoing narrative of democratisation in AI. The empirical insights gained through extensive experimentation shed light on the intricacies of fine-tuning code LLMs. These insights equip practitioners with valuable knowledge to navigate the complexities of model training, save resources, and ultimately drive innovation more effectively. The shared models and datasets are among the most downloaded for their specific task in the popular Hugging Face Platform.

Following the same spirit, we aim open-source our off-the-shelf-datasets and smaller models will be open-sourced.

Our Team

Ammar Khairi: MSc Artificial Intelligence University of Edinburgh - Machine Learning Engineer / Data Scientists. (

Portfolio

)

Mukhtar Mohammed: MSc Artificial Intelligence University of Edinburgh - Machine Learning Engineer / Deployment Engineer. (

Portfolio

)

Muhammed Saeed: MSc Artificial Intelligence University of Saarland - Machine Learning Engineer

Activity Summary

Milestones

5

Total

Discussion

0

Total Comments

Reviews

0

Total Posted

Project Team

0

Total People

Proposal Video

Customizable Code Assistant - #DeepFunding IdeaFest Round 3

26 September 2023

Reviews & Rating

New reviews and ratings are disabled for Awarded Projects

Overall Community

0

from 0 reviews

5

0
4

0
3

0
2

0
1

0

Feasibility

0

from 0 reviews

Viability

0

from 0 reviews

Desirabilty

0

from 0 reviews

Usefulness

0

from 0 reviews

Sort by

0 ratings

No Reviews Avaliable

Check back later by refreshing the page.

Posting publicaly as

edit profile

Overall

Confidence level that the project is possible at all Feasibility

0
Confidence in a successful outcome considering team, time, and budget Viability

0
Market fit - Balancing needs and benefits against competition Desirabilty

0
To what extent will the project help the AI platform grow? Usefulness

0

Review Headline

0 /50 chars

Review Summary

0 /5000 chars

Expert Review

Overall

Confidence level that the project is possible at all Feasibility

0
Confidence in a successful outcome considering team, time, and budget Viability

0
Market fit - Balancing needs and benefits against competition Desirabilty

0
To what extent will the project help the AI platform grow? Usefulness

0

Review Headline

0 /50 chars

Review Summary

0 /5000 chars

Reviews and Ratings in Deep Funding are structured in 4 categories. This will ensure that the reviewer takes all these perspectives into account in their assessment and it will make it easier to compare different projects on their strengths and weaknesses. Overall (Primary) This is an average of the 4 perspectives. At the start of this new process, we are assigning an equal weight to all categories, but over time we might change this and make some categories more important than others in the overall score. (This may even be done retroactively). Feasibility (secondary) This represents the user's assessment of whether the proposed project is theoretically possible and if it is deemed feasible. E.g. A proposal for nuclear fission might be theoretically possible, but it doesn’t look very feasible in the context of Deep Funding. Viability (secondary) This category is somewhat similar to Feasibility, but it interprets the feasibility against factors such as the size and experience of the team, the budget requested, and the estimated timelines. We could frame this as: “What is your level of confidence that this team will be able to complete this project and its milestones in a reasonable time, and successfully deploy it?” Examples:

A proposal that promises the development of a personal assistant that outperforms existing solutions might be feasible, but if there is no AI expertise in the team the viability rating might be low.
A proposal that promises a new Carbon Emission Compensation scheme might be technically feasible, but the viability could be estimated low due to challenges around market penetration and widespread adoption.

Desirability (secondary) Even if the project team succeeds in creating a product, there is the question of market fit. Is this a project that fulfills an actual need? Is there a lot of competition already? Are the USPs of the project sufficient to make a difference? Example:

Creating a translation service from, say Spanish to English might be possible, but it's questionable if such a service would be able to get a significant share of the market

Usefulness (secondary) This is a crucial category that aligns with the main goal of the Deep Funding program. The question to be asked here is: “To what extent will this proposal help to grow the Decentralized AI Platform?” For proposals that develop or utilize an AI service on the platform, the question could be “How many API calls do we expect it to generate” (and how important / high-valued are these calls?). For a marketing proposal, the question could be “How large and well-aligned is the target audience?” Another question is related to how the budget is spent. Are the funds mainly used for value creation for the platform or on other things? Examples:

A metaverse project that spends 95% of its budget on the development of the game and only 5 % on the development of an AI service for the platform might expect a low ‘usefulness’ rating here.

A marketing proposal that creates t-shirts for a local high school, would get a lower ‘usefulness’ rating than a marketing proposal that has a viable plan for targeting highly esteemed universities in a scaleable way.
An AI service that is fully dedicated to a single product, does not take advantage of the purpose of the platform. When the same service would be offered and useful for other parties, this should increase the ‘usefulness’ rating.

Reviews and Ratings in Deep Funding are structured in 4 categories. This will ensure that the reviewer takes all these perspectives into account in their assessment and it will make it easier to compare different projects on their strengths and weaknesses. Overall (Primary) This is an average of the 4 perspectives. At the start of this new process, we are assigning an equal weight to all categories, but over time we might change this and make some categories more important than others in the overall score. (This may even be done retroactively). Feasibility (secondary) This represents the user\'s assessment of whether the proposed project is theoretically possible and if it is deemed feasible. E.g. A proposal for nuclear fission might be theoretically possible, but it doesn’t look very feasible in the context of Deep Funding. Viability (secondary) This category is somewhat similar to Feasibility, but it interprets the feasibility against factors such as the size and experience of the team, the budget requested, and the estimated timelines. We could frame this as: “What is your level of confidence that this team will be able to complete this project and its milestones in a reasonable time, and successfully deploy it?” Examples:

A proposal that promises the development of a personal assistant that outperforms existing solutions might be feasible, but if there is no AI expertise in the team the viability rating might be low.
A proposal that promises a new Carbon Emission Compensation scheme might be technically feasible, but the viability could be estimated low due to challenges around market penetration and widespread adoption.

Creating a translation service from, say Spanish to English might be possible, but it\'s questionable if such a service would be able to get a significant share of the market

A metaverse project that spends 95% of its budget on the development of the game and only 5 % on the development of an AI service for the platform might expect a low ‘usefulness’ rating here.

A marketing proposal that creates t-shirts for a local high school, would get a lower ‘usefulness’ rating than a marketing proposal that has a viable plan for targeting highly esteemed universities in a scaleable way.
An AI service that is fully dedicated to a single product, does not take advantage of the purpose of the platform. When the same service would be offered and useful for other parties, this should increase the ‘usefulness’ rating.

Total Milestones
5
Total Budget
$30,000_USD
Last Updated
8 Sep 2024

Join the Discussion (0)

Expert Ratings

Reviews & Ratings

New reviews and ratings are disabled for Awarded Projects

No Reviews Avaliable

Check back later by refreshing the page.

The weighted average of the 4 perspectives Overall

0.0
To what extent will the project help the AI platform grow?Usefulness

0.0
Confidence level that the project is possible at allFeasibility

0.0
Market fit - Balancing needs and benefits against competitionDesirabilty

0.0

Review Headline

0 /50 chars

Review Summary

0 /5000 chars

The weighted average of the 4 perspectives Overall

0.0
To what extent will the project help the AI platform grow?Usefulness

0.0
Confidence level that the project is possible at allFeasibility

0.0
Market fit - Balancing needs and benefits against competitionDesirabilty

0.0

Review Headline

0 /50 chars

0 /5000 chars

Warning: Adding final group rating for this project will prevent expert users from adding new or editing existing reviews

A proposal that promises the development of a personal assistant that outperforms existing solutions might be feasible, but if there is no AI expertise in the team the viability rating might be low.
A proposal that promises a new Carbon Emission Compensation scheme might be technically feasible, but the viability could be estimated low due to challenges around market penetration and widespread adoption.

Creating a translation service from, say Spanish to English might be possible, but it's questionable if such a service would be able to get a significant share of the market

A metaverse project that spends 95% of its budget on the development of the game and only 5 % on the development of an AI service for the platform might expect a low ‘usefulness’ rating here.

A marketing proposal that creates t-shirts for a local high school, would get a lower ‘usefulness’ rating than a marketing proposal that has a viable plan for targeting highly esteemed universities in a scaleable way.
An AI service that is fully dedicated to a single product, does not take advantage of the purpose of the platform. When the same service would be offered and useful for other parties, this should increase the ‘usefulness’ rating.

Reviews and Ratings in Deep Funding are structured in 4 categories. This will ensure that the reviewer takes all these perspectives into account in their assessment and it will make it easier to compare different projects on their strengths and weaknesses. Overall (Primary) This is an average of the 4 perspectives. At the start of this new process, we are assigning an equal weight to all categories, but over time we might change this and make some categories more important than others in the overall score. (This may even be done retroactively). Feasibility (secondary) This represents the user\'s assessment of whether the proposed project is theoretically possible and if it is deemed feasible. E.g. A proposal for nuclear fission might be theoretically possible, but it doesn’t look very feasible in the context of Deep Funding. Viability (secondary) This category is somewhat similar to Feasibility, but it interprets the feasibility against factors such as the size and experience of the team, the budget requested, and the estimated timelines. We could frame this as: “What is your level of confidence that this team will be able to complete this project and its milestones in a reasonable time, and successfully deploy it?” Examples:

A proposal that promises the development of a personal assistant that outperforms existing solutions might be feasible, but if there is no AI expertise in the team the viability rating might be low.
A proposal that promises a new Carbon Emission Compensation scheme might be technically feasible, but the viability could be estimated low due to challenges around market penetration and widespread adoption.

Creating a translation service from, say Spanish to English might be possible, but it\'s questionable if such a service would be able to get a significant share of the market

A metaverse project that spends 95% of its budget on the development of the game and only 5 % on the development of an AI service for the platform might expect a low ‘usefulness’ rating here.

A marketing proposal that creates t-shirts for a local high school, would get a lower ‘usefulness’ rating than a marketing proposal that has a viable plan for targeting highly esteemed universities in a scaleable way.
An AI service that is fully dedicated to a single product, does not take advantage of the purpose of the platform. When the same service would be offered and useful for other parties, this should increase the ‘usefulness’ rating.