We believe in open science. Access clean, structured audio, speech, and text corpora for training production-grade generative models.
29.33 hours Negative Dataset.
RACON is a comprehensive feature set derived from approximately 11 hours of diverse, real-world audio. It is designed to serve as a high-quality negative dataset for training and evaluating robust wake-word models, particularly within systems like Nanowakeword.
Audio Embeddings ~29 hours dataset contains precomputed audio embeddings designed for Nanowakeword framework. The embeddings are intended to be used as general-purpose negative training data, meaning the audio does not contain the target wake word or phrase.
High-quality Function Calling demonstrations & Clear, well-structured Chain of Thought reasoning