Kossiso Udodi
Project OwnerServer management and AIML training operations
We are developing an AI-driven platform that automates data centre tasks like server provisioning, patching, general maintenance and monitoring. It uses reinforcement learning to handle standard operations and offer insights. We are currently building datasets for the model that will power the platform. We are using Qwen 2.5 as a foundational model. | Our dataset currently has 1.8M rows and hopes to hit 5M rows before April. Because data centres have low fault tolerance, we must precisely structure and validate our datasets. We aim to train a beta version of the model within the next two months and start working on the platform’s software section in May.
We will use it to extract data container and entity data for command execution. For example if the model needs to configure a Kubernetes container we will use the service to extract the name of the container from logs and adjust the command to be executed.
Completing the creation of datasets is a key milestone providing the essential data needed to train our AI model. Our dataset currently has 1.8 million rows and is on track to reach 5 million by April covering a wide range of server logs and maintenance records. The data is structured to support both supervised and reinforcement learning helping the AI understand cause-and-effect relationships between commands and system states. Achieving this sets the stage for training the AI model and developing a functional beta version of our platform.
This deliverable would contain millions of rows of data i.e A robust dataset
$30,000 USD
When we have completed 6 million rows of data containing data ranging from Core Infrastructure & Server Management to development and automation IDEs.
Completing the model training is a crucial step in developing an AI-driven data center management platform. This process involves teaching the AI to perform tasks like server provisioning patching and maintenance by using a large dataset. The dataset currently at 1.8 million rows and expected to grow to 5 million by April is structured as Invocation | Command | Context | Ansible Playbook Command capturing various aspects of command execution. The training process will use reinforcement learning where the AI interacts with a simulated environment receiving feedback to optimize its actions. Once complete the AI will efficiently manage data center tasks leading to cost savings and improved reliability.
This deliverable sets the stage for the beta release of the datacenter AI model.
$20,000 USD
First, the model must achieve a high accuracy rate, such as 95%, in executing and recommending optimal data center tasks. This ensures that the platform can reliably perform its functions without frequent human intervention. Second, the model should demonstrate data efficiency by effectively learning from the structured dataset, which is expected to grow to 5 million rows. This ability to generalize from the data is crucial for handling diverse and complex tasks. Third the model should exhibit scalability, maintaining its performance and accuracy as the complexity and scale of the tasks increase.
Please create account or login to post comments.
Reviews & Ratings
Please create account or login to write a review and rate.
Check back later by refreshing the page.
© 2024 Deep Funding
Simon250
Mar 9, 2025 | 1:17 PMEdit Comment
Processing...
Please wait a moment!
I suggest we could reallocate some of the budget from Milestone 1 to create two additional milestones. For instance, Milestone 3 could focus on building a user-friendly UI, ensuring that our platform is accessible and intuitive for end users. Then, Milestone 4 could be dedicated to developing comprehensive documentation and planning for further expansion. This revised structure would not only streamline the development process but also enhance the overall scalability and usability of the platform. What are your thoughts on this approach?
Kossiso Udodi
Project Owner Mar 9, 2025 | 1:36 PMEdit Comment
Processing...
Please wait a moment!
I agree with you. Thank you for the clarity on this. We'll make corrections!
Sky Yap
Mar 9, 2025 | 12:42 PMEdit Comment
Processing...
Please wait a moment!
Really like this idea! I think changing the milestone title from "Creation of datasets" to "Collection of datasets" is a good idea. "Collection" better emphasizes that you're gathering data from existing sources, like server logs and maintenance records, rather than generating it from scratch. This slight change clarifies the process and aligns well with the continuous growth of your dataset. What do you think?
Kossiso Udodi
Project Owner Mar 9, 2025 | 2:30 PMEdit Comment
Processing...
Please wait a moment!
We "collect" for sections that are less error sensitive and format specially for critical sections. In early testing, one of the problems we encountered was that sometimes models sent the wrong initiation commands. There are certain functions where the system can fail and try again but in some areas like security and firewalls, we cannot afford even slight margins of error. We are trying to get around this by formatting some sections of data this way: { "Invocation": "Apply Pod Security Standards at Cluster level", "Command": "kubectl apply -f podsecurity.yaml", "NLP Context": "This is for when we need to apply Pod Security Standards at Cluster level", "Ansible Task": { "name": "Execute: Apply Pod Security Standards at Cluster level", "module": "k8s", "args": { "definition": "kubectl apply -f podsecurity.yaml" }We are also working with a few server centre workers to address specific scenarios with very low error margins. We want to get it right so that managers can trust it well enough to adopt it quickly. Collection is also part of what we are doing, especially for sections with low sensitivity.