A Compassionate Minetest RL Trained LLM Bot

chevron-icon
Back
Top
chevron-icon
project-presentation-img
Universal Research Center
Project Owner

A Compassionate Minetest RL Trained LLM Bot

Expert Rating

n/a
  • Proposal for BGI Nexus 1
  • Funding Request $50,000 USD
  • Funding Pools Beneficial AI Solutions
  • Total 3 Milestones

Overview

Our project pioneers a compassionate AI agent within the Minetest virtual world, integrating reinforcement learning (RL) and a transformer-based language model to enable autonomous thinking, ethical reasoning, and human-like interaction. The AI will observe, reflect, and engage safely with users, ensuring pro-social behavior and minimizing harmful actions. By expanding to multi-agent collaboration and human-AI chat, this project advances AI safety, ethics, and intelligent virtual companions for education, research, and interactive environments. It lays the groundwork for trustworthy AI systems that adapt dynamically to human needs, fostering safer and more meaningful AI-human interactions.

Proposal Description

How Our Project Will Contribute To The Growth Of The Decentralized AI Platform

Our proposal directly advances the BGI mission by integrating ethically-aligned AGI principles into reinforcement learning, ensuring AI systems act with compassion and safety in multi-agent and human-AI interactions. By prioritizing privacy, local AI deployment, and non-harm objectives, we contribute to global AI governance, fostering trustworthy, decentralized AI that aligns with human values, reinforcing inclusive, ethical AI for a more equitable future.

Our Team

Our team possesses exceptional expertise in AI, reinforcement learning, and cognitive architectures. Dr. Cédric Mesnage, PhD in Computer Science and research fellow at IDSAI, has 20+ years of pioneering AI research at universities. Shagofta Shabashkhan, MSc in Data Science, has a strong grasp of the project and is familiar with the codebase, ensuring deep technical understanding. Their proven synergy ensures seamless execution of this project.

AI services (New or Existing)

tbc

Type

New AI service

Purpose

tbc

AI inputs

tbc

AI outputs

tbc

Company Name (if applicable)

Universal Research Center

The core problem we are aiming to solve

Modern AI chat systems often rely on cloud-based, black-box models, offering little transparency or control for parents, educators, and researchers. When children interact with these systems, privacy risks, inappropriate responses, and lack of localized oversight become serious concerns. Meanwhile, game environments and virtual sandboxes rarely feature truly pro-social AI agents that can cooperate, teach, or respond compassionately. The urgent challenge is to develop an offline, ethically aligned AI companion that safeguards user data, ensures child-safe dialogue, and fosters meaningful collaboration rather than mere automation.

Our specific solution to this problem

1. Local Transformer Model

We replace remote APIs calls with a 2B-parameter model fully deployed on a standard laptop or desktop. This eliminates external data transfers, keeping private conversations off the cloud.

2. Multi-Objective Reinforcement Learning

Our RL pipeline rewards not just task completion but also non-harm, empathy, and pro-social cooperation. By treating “compassion” as a core objective, the bot learns to avoid toxic or harmful content.

3. Child-Centric Design

From the agent’s word choices to its conflict-resolution behavior, everything is tuned for age-appropriate engagement. Scripted moral dilemmas train the AI to handle tricky situations in a supportive, gentle manner.

4. “Thinking as an Action”

The AI’s internal dialogue—reflecting on its next step or clarifying a user’s request—has surfaced as part of its decision-making, promoting transparency and opportunities for creative, emergent behavior.

5. Multi-Agent & Human Integration

We enable real-time chat with multiple AI agents or human players, so children, parents, and educators can build, explore, or solve puzzles alongside an AI that actively fosters cooperation instead of competition. This framework tackles the most pressing issues of privacy, child safety, and ethical AI alignment, resulting in a robust platform that blends curiosity-driven exploration with a compassionate, collaborative ethos—ideal for families, classrooms, or research labs seeking an offline AI solution.

Project details

Our project aims to fundamentally transform how AI systems interact with children, families, and educators by building a locally deployable and compassion-centered AI companion in the open-world Minetest environment. Below is a more in-depth look at our motivation, technical underpinnings, and anticipated impact.

Motivation and Rationale

1. Child Safety & Ethical AI

Existing AI chatbots have often been criticized for unpredictable or inappropriate outputs, especially when interacting with minors. Families and schools have expressed growing concern about privacy and lack of transparency with cloud-based AI systems.

We believe children deserve an AI assistant that is not only creative and engaging but also trustworthy, respectful, and fully under local control.

2. Collaborative Virtual Environments

Minetest (an open-source Minecraft-like sandbox) allows unstructured creativity and experimentation. By embedding an empathy-driven AI here, we provide a living laboratory for open-ended play, learning, and social interaction.

Unlike single-user chatbot apps, a multi-agent setup in Minetest can simulate group tasks, resource management, and creative building—contexts where pro-social behaviors become both evident and necessary.

3. Urgent Need for Offline Alternatives

As data breaches and concerns over cloud-based data mining rise, there is strong demand for offline or edge-friendly AI solutions. By hosting the entire LLM (2B parameters) locally, users avoid sending conversation logs or personal details to external servers.

Existing resources

We already have substantial resources in place to support this project:

RL Codebase & IP: We have intellectual property rights and a fully functioning multi-objective RL framework originally implemented by Dr. Cédric Mesnage. This code underpins our curiosity-based exploration and reward-shaping methods in the Minetest environment.

High-Performance Hardware: The team has ongoing access to a Titan V GPU and two A100 GPUs, ensuring ample computational power for training and fine-tuning our 2B–7B parameter transformer models locally. These resources

allow us to conduct large-scale experiments without relying on external cloud infrastructure.

Links and references

Based on the work presented at AGI 2024 (https://doi.org/10.1007/978-3-031-65572-2_14)

Was there any event, initiative or publication that motivated you to register/submit this proposal?

select_option

Proposal Video

Placeholder for Spotlight Day Pitch-presentations. Video's will be added by the DF team when available.

  • Total Milestones

    3

  • Total Budget

    $50,000 USD

  • Last Updated

    24 Feb 2025

Milestone 1 - Replace ChatGPT layer with transformer-based LLM

Description

In this milestone we will replace our existing ChatGPT-based layer with a locally hosted transformer-based LLM featuring approximately 2B parameters ensuring it can run smoothly on a standard laptop. By doing this we eliminate reliance on external APIs and gain full control over how the model is adapted for child-friendly interactions data privacy and real-time performance. Our approach will integrate seamlessly with the existing “thinking as an action” paradigm in which the AI’s internal dialogue is treated as part of its decision-making loop. We will refactor the codebase to route environment observations memory states and action prompts directly through the new local LLM. This involves restructuring the prompt templates to handle environment updates questions and introspective “thoughts” while ensuring the agent’s reward mechanisms remain tightly coupled to the local model’s outputs. By the end of this milestone our Minetest-based system will function entirely offline preserving user data on-device and allowing families or educational settings to deploy an advanced AI assistant without cloud dependencies. This milestone sets the stage for our long-term vision: an ethical privacy-respecting AI that demonstrates creative problem-solving consistent child-safe behavior and rapid adaptability to new scenarios all while ensuring a robust foundation for the subsequent compassion-oriented and multi-agent milestones.

Deliverables

Deliverable Description 1. Local LLM Integration Package A comprehensive module replacing ChatGPT calls with our 2B-parameter transformer complete with optimized configuration files inference scripts and a scalable approach to run on mid-range hardware. 2. Revised Prompt & Memory Infrastructure Updated code that streams game observations and internal “thinking” text into the local LLM ensuring tight feedback loops between the agent’s environment memory buffer and reward signals. 3. Demonstration Environment A fully functional Minetest build showcasing the local LLM’s capabilities: logs of the agent’s introspective dialogue response times and in-world decision-making. 4. Technical Documentation & User Guide Step-by-step instructions on setup and customization with best practices for child-safe prompts recommended GPU/CPU specs and guidance for minor parameter tuning if users need specialized behaviors. 5. Performance Benchmarks Measurable indicators (e.g. latency memory footprint task success rate) demonstrating that the local LLM operates at near-real-time speeds preserving or surpassing the AI’s former performance with ChatGPT.

Budget

$20,000 USD

Success Criterion

1. Operational Independence The system runs entirely on local hardware—no external API calls—while maintaining stable inference speeds and a streamlined user experience for family or classroom scenarios. 2. Robust & Privacy-Focused All dialogue, rewards, and memory data remain on-device, substantially reducing privacy risks and offering users full ownership of their AI’s data pipeline. 3. Consistent Behavior & Creativity Qualitative tests show the new model can generate thoughtful, varied “inner dialogue” on par with ChatGPT-based performance, with no drop in adaptability or creativity. 4. Child-Friendly Interaction Quality Preliminary assessments confirm polite, safe, and encouraging dialogue. The AI’s suggestions remain appropriate and supportive, aligning with our goal of a pro-social, helpful companion. 5. Scalable for Future Milestones The newly integrated local LLM seamlessly supports subsequent milestones focused on compassion-driven reinforcement learning and multi-agent/human–AI collaboration, proving its viability as the project’s long-term backbone.

Milestone 2 - 2B parameter LLM Bot RL training for compassion

Description

In this milestone we will scale up to a 7B-parameter transformer model and rigorously train our AI agent on compassion-oriented reinforcement learning objectives. Building on the local setup from Milestone 1 we will integrate a multi-objective RL framework where “compassion” or “non-harm” is a central reward factor—alongside curiosity problem-solving and user engagement. This approach ensures that the agent actively avoids harmful or inappropriate behaviors while proactively seeking to assist and nurture positive experiences especially for children. Our methodology involves explicit pro-social reward shaping: we will craft reward signals that reinforce empathetic communication supportive in-game actions (e.g. helping a human player build or learn) and conflict resolution in multi-agent scenarios. The training environment will include scenario scripts where the agent faces moral choices so it learns to respond with gentleness and cooperation rather than aggression or neglect. We will also fine-tune the agent’s language style to remain polite and age-appropriate. By the end of this milestone we expect the 7B-parameter model to demonstrate higher expressive capacity improved safety compliance and robust alignment with child-focused compassionate goals.

Deliverables

1. Expanded Model Architecture & Training Scripts A fully documented pipeline that accommodates the new 7B-parameter model complete with hyperparameter tuning advanced prompt templates and a multi-objective RL reward system emphasizing “compassion.” 2. Scripted Ethical Scenarios & Data A curated set of environment states dialogue prompts and interactive tasks specifically designed to assess and reinforce the agent’s pro-social choices and empathy. 3. Checkpoints & Fine-Tuning Logs Versioned checkpoints of the model’s progress alongside detailed logs of reward evolution performance metrics (e.g. “Cooperation Score”) and agent behaviors through multiple training runs. 4. Updated Minetest Integration An enhanced in-game interface showcasing how the RL agent reacts to user inquiries navigates moral quandaries and encourages constructive play sessions. 5. Technical Documentation & Ethics Brief A step-by-step guide on deploying the 2B model plus an “Ethics Brief” that explains how reward-shaping safe prompts and scenario design minimize the risk of harmful or offensive behavior.

Budget

$20,000 USD

Success Criterion

1. Measurable Compassion Alignment Quantitative improvements in cooperation, helpfulness, and polite communication across test scenarios, with a significant drop in any rule-breaking or socially negative actions compared to baseline. 2. Stable Multi-Objective Convergence The RL system consistently converges toward higher scores on both “task efficiency” and “pro-social metrics,” demonstrating that compassion does not compromise overall performance. 3. In-Game Behavior Validation During interactive playtests, the agent handles scenarios such as sharing resources, comforting frustrated users, or peacefully resolving conflicts without external interventions or overrides. 4. Child-Friendliness & Trustworthiness Educators or parents evaluate the final agent’s interactions as safe, respectful, and beneficial for a child audience, confirming that it aligns with age-appropriate language and supportive play. 5. Transparent Reporting & Community Engagement A final report detailing training runs, ethical compliance, reward outcomes, and user impressions is shared openly, fostering trust and providing a roadmap for community feedback on further refinements.

Milestone 3 - Multi-agent/human AI chat

Description

In this final milestone we will expand our Minetest-based environment to support multi-agent interactions and human–AI chat transforming the single-agent paradigm into a collaborative ecosystem. Building on the compassionate RL framework (Milestone 2) we will enable multiple AI agents—and optionally human players—to engage in real-time dialogue coordinate on complex tasks and negotiate resources. Our system will incorporate partial observability (where each participant may see different portions of the environment) and a conflict-resolution module that guides agents toward peaceful outcomes instead of adversarial behavior. We will also implement a user-friendly chat interface that allows children parents or educators to directly converse with AI agents ask questions or give commands all while seeing how the agents reason and respond. This two-way interaction is designed to be child-safe and empathetic leveraging the pro-social reward signals introduced earlier. By blending multi-agent intelligence with human input we expect richer emergent behaviors deeper collaboration and a more engaging overall experience—paving the way for child-friendly AI companions capable of creative teamwork.

Deliverables

1. Multi-Agent Coordination Toolkit A newly integrated module where two or more AI agents can share or trade information allocate tasks and communicate via textual “thoughts” or explicit chat messages. 2. Human–AI Chat Interface A real-time chat overlay for human participants to interact with the AI agents directly in the Minetest world—issuing instructions posing questions or simply conversing. 3. Conflict Resolution & Negotiation Logic Additional reinforcement learning rules and scenario scripts that reward cooperative strategies and penalize aggressive or destructive actions ensuring stable group dynamics in shared tasks. 4. Demonstration & Documentation A working environment showcasing multiple agents (and at least one human user) collaborating on in-game goals (e.g. building a structure gathering resources) with step-by-step documentation on how to replicate these multi-agent/human interactions. 5. Performance & Behavior Metrics Quantitative and qualitative reports measuring conversation quality cooperation levels conflict frequency and user satisfaction particularly focusing on child-friendly and non-harmful behaviors.

Budget

$10,000 USD

Success Criterion

1. Smooth Multi-Agent Collaboration Two or more AI agents reliably coordinate tasks, share knowledge, and resolve conflicts without external intervention, demonstrating stable group behaviors under partial observability. 2. High-Quality Human–AI Dialogue Real-time chat sessions exhibit coherent, context-aware, and polite exchanges, with no harmful or inappropriate outputs. Human testers report a positive and engaging experience. 3. Ethical & Child-Safe Conduct The negotiation module effectively prevents or mitigates destructive actions, and any user-facing communication aligns with age-appropriate guidelines, reinforcing trust in the system’s safety. 4. Extensibility & Reusability The delivered multi-agent/chat framework can be easily adapted or extended, supporting additional AI agents or more complex human roles (e.g., moderators or teachers), indicating a robust architecture for future development. 5. Demonstrable Impact Live demonstrations or pilot studies confirm that the multi-agent/human chat feature not only enriches gameplay but also highlights the system’s compassionate and collaborative potential in broader educational or social contexts.

Join the Discussion (2)

Sort by

2 Comments
  • 0
    commentator-avatar
    Simon250
    Mar 9, 2025 | 1:51 PM

    Do elaborate the new AI service under AI services (New or Existing)?

    • 0
      commentator-avatar
      shagofta1605
      Mar 14, 2025 | 3:07 PM

      Thank you for your comment. We plan to provide a new AI service. Could you please clarify what additional details you would like in the service description? For instance, are you looking for more technical specifications, implementation details, or potential use cases? Your guidance would be much appreciated.

Expert Ratings

Reviews & Ratings

    No Reviews Avaliable

    Check back later by refreshing the page.

feedback_icon