Arcosoph Research | Pioneering Open Audio & Logic AI

OPEN SOFTWARE

Developer Frameworks

High-performance, lightweight libraries optimized for local inference and edge hardware.

AUDIO ENGINE

Nanowakeword

Install package locally:


                            $ pip install nanowakeword

Train custom model:


                            $ nanowakeword -c ./config.yaml

An automated training toolkit designed for deploying production-grade, low-latency custom wake words directly onto edge devices and microcontrollers. Nanowakeword automates audio data synthesis, synthetic background noise mixing, and neural model quantization.

Production-ready keyword spotting (KWS) models in minutes
Advanced synthetic voice generation for negative data curation
Ultra-lightweight weight exports optimized for ARM Cortex processors

GitHub Repository →

NLP & LOGIC

Phonemize

Install package locally:


                            $ pip install phonemize

A lightweight, zero-dependency, pure-Python library designed to convert raw text inputs into phonetic IPA representations. Essential for building high-fidelity Text-to-Speech (TTS) frontends and NLP text processing chains.

Accurate International Phonetic Alphabet (IPA) conversions
No external C binaries or compiled system dependencies required
Pre-trained multilanguage phonetic mapping tables

Get Started → Download Model

DATASET VIEWER

Research Datasets Hub

A live feed of our active machine learning datasets on HuggingFace. The registry cycles automatically below.

arcosoph/SonicWeave-v2 Size: 12.4 GB Format: 48kHz Stereo WAV License: CC-BY-4.0

Browse All Datasets

PUBLICATIONS

Academic Preprints

Read the methodology, mathematical formulations, and engineering optimizations behind our open-source tools.

PREPRINT arXiv:2603.01024

Optimizing Neural Keyword Spotting Models for Low-Power ARM Microcontrollers

Muhammad Abid, Sarah Chen, Arcosoph AI Team

We describe our custom training pipeline that compresses and compiles speech wake-word models. By incorporating synthetic soundscape augmentation and post-training integer-8 quantization, we achieve robust wake word recognition under 15KB of local SRAM.

Attention(Q, K, V) = softmax(½ Q K^T / √d_k) V

Read Full Preprint →

DOCUMENT arXiv:2602.04910

High-Speed IPA Phonetic Conversions: Pure-Python Frontends for Speech Pipelines

Arcosoph AI Team

This paper presents a zero-dependency phonetic translation toolkit that converts multi-language orthographic text to the International Phonetic Alphabet (IPA). We demonstrate a 40% inference latency reduction using native dictionary mappings.

P(Φ | W) = ∏_i P(φ_i | w_i, φ_i-1)

Read Documentation →

ARTICLE ARCOSOPH-2026

Active Loss-Weighted Sampling for Reinforcement Learning Alignment from Rationales

Sarah Chen, Muhammad Abid

We present a dynamic sampling strategy for alignment datasets. By clustering prompt logic trees C_k and sampling inputs x_i proportional to historical losses, we achieve higher training convergence on multi-step reasoning tasks.

P(x_i | x_i ∈ C_k, t) =

( L_i^(t-1) )^α + ε

∑ x_j ∈ C_k

[ ( L_j^(t-1) )^α + ε ]

Read Research Article →