Seb Wiechers
project owner
Large MeTTa corpus for LLM fine-tuning
We propose to curate two (natural language <-> MeTTa) expression datasets, respectively the *silver* dataset, consisting of 20.000 AI-generated, probabilistically verified (NL <-> MeTTa) pairs, and the *gold* dataset, consisting of 10.000 human-labeled, high-quality pairs. The proposed timeline is 4 months. Funding will be used to cover the expense of a) compute costs b) scoring output-pairs, c) developing algorithms for estimating the probability of correct predictions. Our knowledge of the MeTTa language and NLP, linguistics, background in logic, AI and real-world organizational experience places us in a perfect position to have a compounding effect on the SNET ecosystem.