Arcosoph Datasets | Open-Source Audio & Text Corpora

Noice & Spech Audio 3.38 GB

SonicWeave-v2

29.33 hours Negative Dataset.

🎧 5 sec par sample

⚖️ Apache 2.0

🕒 29.33 hrs

Download Viewer

Negative Feature 2.96 GB

RACON_11h_v1

RACON is a comprehensive feature set derived from approximately 11 hours of diverse, real-world audio. It is designed to serve as a high-quality negative dataset for training and evaluating robust wake-word models, particularly within systems like Nanowakeword.

🗣️ 16kHz Mono WAV

⚖️ Apache 2.0

🕒 ~11 hours

Download Viewer

Negative Feature 130 MB

AE29H_float32

Audio Embeddings ~29 hours dataset contains precomputed audio embeddings designed for Nanowakeword framework. The embeddings are intended to be used as general-purpose negative training data, meaning the audio does not contain the target wake word or phrase.

📄 NumPy Array

⚖️ Apache 2.0

🕒 ~48 hrs

Download Viewer

Speech & TTS 64.6 MB

FC-CoT-Top10k

High-quality Function Calling demonstrations & Clear, well-structured Chain of Thought reasoning

📊 10,000 Samples

⚖️ Apache 2.0

🎯 Function Calling

HF Repository Viewer