NanowakeWord Configuration Guide
Complete documentation of all configurable parameters in the Nanowakeword package, including descriptions, default values, meanings, and usage examples.
Table of Contents
- Project & Data Paths
- Model Architecture
- Training & Optimization
- Feature Manifest
- Batch Composition
- Data Generation
- Augmentation Settings
- Feature Generation Manifest
- Advanced Settings
- Pipeline Control
- Intelligent Auto-Configuration
- Inference Parameters
Project & Data Paths
Configuration parameters for project organization and data source locations.
model_name
- Type:
string - Default: Auto-generated based on model type (e.g.,
XXX_dnn_v1) - Description: Name of the trained model. Used for creating directories and organizing outputs.
- Example:
model_name: "my_wakeword_A_v1"
output_dir
- Type:
string - Default:
"./trained_models" - Description: Base directory where all trained models and artifacts will be stored.
- Example:
output_dir: "./trained_models" # Creates: ./trained_models/my_wakeword_v1/model/, ./trained_models/my_wakeword_v1/features/
positive_data_path
- Type:
string(file path) - Mandatory: Yes
- Default: None
- Description: Directory containing positive audio samples (actual wake word utterances).
- Requirements:
- Must contain
.wavfiles at 16 kHz sample rate - Mono or stereo audio (will be converted to mono)
- Can be empty if using only generated synthetic samples
- Must contain
negative_data_path
- Type:
string(file path) - Mandatory: Yes
- Default: None
- Description: Directory containing negative audio samples (non-wake-word utterances).
- Example:
negative_data_path: "./data/common_words"
background_paths
- Type:
listof strings - Default: Optional
- Description: Directories containing background noise audio files for augmentation. Multiple paths supported.
- Example:
background_paths: # You can add multiple path or only one - "./data/office_noise" - "./data/street_noise" - "./data/home_noise"
rir_paths
- Type:
listof strings - Default: Optional
- Description: Directories containing Room Impulse Response (RIR) files for acoustic augmentation.
- Note: At least one RIR path is required for intelligent configuration.
Model Architecture
Parameters controlling the neural network structure and behavior.
model_type
Type:
stringDefault:
"dnn"Valid Options:
"dnn","lstm","gru","rnn","cnn","transformer","crnn","tcn","quartznet","conformer","e_branchformer","custom"Description: The neural network architecture to use for wake word detection.
Complexity Levels (from simplest to most complex):
dnn- Dense feedforward network (lightweight, fast)cnn- Convolutional Neural Network (good for spectrograms)lstm,gru,rnn- Recurrent networks (excellent for sequences)crnn- Hybrid CNN-RNN (combines both strengths)transformer,conformer,e_branchformer- Advanced attention-based (most powerful, most complex)
Examples by use case:
# Embedded/Edge device (minimal resources) model_type: "dnn" # Edge device with more resources model_type: "lstm" # Desktop/cloud with ample resources model_type: "conformer"
layer_size (DNN/RNN-based architectures)
- Type:
integer - Default:
128 - Valid Range:
64to512 - Description: Number of neurons in each hidden layer for feedforward and recurrent layers.
- Relationship to model capacity: Larger values = more parameters = longer training, better performance (up to a point)
- Example:
layer_size: 256 # Larger model, slower but potentially better
n_blocks
Type:
integerDefault:
3Valid Range:
1to10Description: Number of stacked blocks/layers in the model.
- For
dnn: Number of fully connected layers - For
lstm/gru: Number of recurrent layers - For
transformer: Number of encoder layers - For
crnn: Number of RNN layers (CNN part is fixed)
- For
Example:
n_blocks: 5 # Deeper network
dropout_prob
Type:
floatDefault:
0.5(intelligently adjusted)Valid Range:
0.0to0.8Description: Dropout probability per layer to prevent overfitting.
- Higher values = more regularization = potential underfitting
- Lower values = less regularization = potential overfitting
- Typically 0.2-0.5 for most models
Example:
dropout_prob: 0.3
activation_function (Advanced)
Type:
stringDefault:
"relu"Valid Options:
"relu","gelu","silu"Description: Activation function used in hidden layers.
relu- Traditional, fast, widely supportedgelu- Smooth, often better convergencesilu- Modern alternative (Swish activation)
Example:
activation_function: "gelu"
embedding_dim (Advanced)
- Type:
integer - Default:
64 - Valid Range:
32to256 - Description: Dimensionality of the final embedding before classification.
Architecture-Specific Parameters
Transformer Architecture
model_type: "transformer"
transformer_d_model: 128 # Model dimension, default: 128
transformer_n_head: 4 # Number of attention heads, default: 4
CRNN Architecture
model_type: "crnn"
crnn_cnn_channels: [16, 32, 32] # CNN channel progression, default: [16, 32, 32]
crnn_rnn_type: "lstm" # "lstm" or "gru", default: "lstm"
TCN Architecture
model_type: "tcn"
tcn_channels: [64, 64, 128] # Channel progression, default: [64, 64, 128]
tcn_kernel_size: 3 # Convolution kernel size, default: 3
Conformer Architecture
model_type: "conformer"
conformer_d_model: 144 # Model dimension, default: 144
conformer_n_head: 4 # Attention heads, default: 4
E-Branchformer Architecture
model_type: "e_branchformer"
branchformer_d_model: 144 # Model dimension, default: 144
branchformer_n_head: 4 # Attention heads, default: 4
QuartzNet Architecture
model_type: "quartznet"
quartznet_config: # Channel, kernel, repeat config
- [256, 33, 1]
- [256, 33, 1]
- [512, 39, 1]
Custom Architecture
Type:
stringValue:
"custom"Description: Load a user-defined
torch.nn.Moduleclass from a Python file or installed module.Required Settings:
custom_model_config.module_pathcustom_model_config.class_name
Optional Settings:
custom_model_config.params
Custom model requirements:
- The class must inherit from
torch.nn.Module - It should return an embedding tensor shaped
[batch_size, embedding_dim] - It may accept the following standard constructor arguments:
input_shapeembedding_dimdropout_probactivation_fnconfig
- Additional custom parameters may be provided via
params
- The class must inherit from
Example:
You can create a Python file.
import torch
from torch import nn
class MyCustomModel(nn.Module):
def __init__(self,input_shape, embedding_dim=64, dropout_prob=0.5, activation_fn=None, config=None, hidden_channels=32,):
super().__init__()
self.input_shape = input_shape
self.embedding_dim = embedding_dim
self.activation_fn = activation_fn if activation_fn is not None else nn.ReLU()
# Build CNN feature extractor (no flatten/linear until we know conv output size)
self.feature_extractor = nn.Sequential(
nn.Conv2d(1, hidden_channels, kernel_size=3, padding=1),
self.activation_fn,
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(hidden_channels, hidden_channels * 2, kernel_size=3, padding=1),
self.activation_fn,
nn.AdaptiveAvgPool2d((1, 1)),
)
# Determine flattened feature size by running a dummy tensor through the convs
with torch.no_grad():
dummy = torch.zeros(1, 1, *input_shape)
conv_out = self.feature_extractor(dummy)
flattened_size = int(conv_out.numel() // conv_out.shape[0])
self.embedding_head = nn.Sequential(
nn.Flatten(),
nn.Linear(flattened_size, 128),
self.activation_fn,
nn.Dropout(dropout_prob),
nn.Linear(128, embedding_dim),
)
def forward(self, x):
# Expect input shaped [batch, time, features] or [batch, 1, time, features]
if x.dim() == 3:
x = x.unsqueeze(1)
x = self.feature_extractor(x)
x = self.embedding_head(x)
return x
In your config:
model_type: "custom"
custom_model_config:
module_path: "path/to/your/custom_model_architectures.py"
class_name: "MyCustomModel"
params:
hidden_channels: 32
- Important:
module_pathmay be either a relative path to a Python file or an importable Python module name.
Training & Optimization
Parameters governing the training loop, optimization, and learning rate scheduling.
steps
Type:
integerDefault:
20000(intelligently adjusted based on data volume)Valid Range:
1000to100000Description: Total number of training iterations/steps.
Calculation Logic:
base_steps = effective_data_volume * 1000steps per hour- Adjusted based on data quality and model complexity
- Typically 10,000-40,000 for most scenarios
Example:
steps: 50000 # For very large/complex datasets
batch_size
Type:
integerDefault:
128Valid Range:
- Minimum: 1 (at least 1 sample per batch required)
- Maximum: Limited by GPU/CPU memory
- CPU training → 16–128+ typical
- single GPU → 32–256+ typical
- multi-GPU → 512+ possible
Description: Number of training samples per batch.
- Larger batches = faster training, more stable gradients, more memory
- Smaller batches = slower training, noisier gradients, less memory
Example:
batch_size: 128
optimizer_type
Type:
stringDefault:
"adamw"Valid Options:
"adamw","adam","sgd"Description: Optimization algorithm.
adamw- Adaptive Moment Estimation with Weight decay (recommended)adam- Original adaptive optimizersgd- Stochastic Gradient Descent (simple, slower convergence)
Example:
optimizer_type: "adamw"
learning_rate_max
Type:
floatDefault: Auto-calculated
Description: Maximum learning rate during training (used with cycle schedulers).
Intelligently Adjusted Based On:
- Dataset size (larger datasets → higher LR)
- Data noise levels (cleaner data → higher LR)
- Model complexity
Example:
learning_rate_max: 0.001
learning_rate_base
- Type:
float - Default:
learning_rate_max / 10 - Description: Minimum/base learning rate during cyclical scheduling.
- Note: Automatically calculated if not specified.
lr_scheduler_type
Type:
stringDefault:
"onecycle"Valid Options:
"onecycle","cyclic","cosine"Description: Learning rate schedule strategy.
onecycle- One cycle from base to max LR and back (good for fast convergence)cyclic- Multiple triangular cycles (good for exploration)cosine- Cosine annealing (smooth, gradual decrease)
Example:
lr_scheduler_type: "onecycle"
clr_step_size_up (Cyclic LR)
- Type:
integer - Default: Auto-calculated based on total steps
- Description: Number of steps to increase LR in each cycle.
clr_step_size_down (Cyclic LR)
- Type:
integer - Default: Auto-calculated based on total steps
- Description: Number of steps to decrease LR in each cycle.
weight_decay
- Type:
float - Default:
0.01 - Description: L2 regularization coefficient to prevent overfitting.
momentum (SGD optimizer)
- Type:
float - Default:
0.9 - Valid Range:
0.0to1.0 - Description: Momentum factor for SGD optimizer.
num_workers
- Type:
integer - Default:
2 - Valid Range:
0toCPU_count - Description: Number of worker threads for data loading.
- 0 = single thread (slower, no multiprocessing)
- 2-4 = typical for most systems
- Increase for large datasets and fast GPUs
Feature Manifest
Defines paths to pre-computed audio feature files (.npy format) used for training.
Structure
feature_manifest: # You can add Multiple Sources
targets: # Positive samples (wake word)
key1: "path/to/features.npy"
# others..
negatives: # Negative samples (non-wake-words)
key1: "path/to/negatives.npy"
key2: "path/to/noise.npy" # Background noise samples
# others..
# Optional: Validation data (if _val key suffix used)
targets_val:
key1: "path/to/val_positive.npy"
negatives_val:
key1: "path/to/val_negatives.npy"
key2: "path/to/val_noise.npy"
Key Naming Convention (It will use batch_composition)
- Keys within each category can be arbitrary unique identifiers
- Short keys preferred for readability (e.g.,
t,n,b) - Multiple feature sources can be specified with different keys (e.g.,
real_pos,bg2,hard_neg)
Example with Multiple Sources
feature_manifest:
targets:
t: "./trained_models/model_v1/features/positive.npy"
my_voice: "./voice/muhammad_abid/muhammad_abid_data.npy"
negatives:
common_words: "./features/common_words.npy"
hard_negatives: "./features/similar_words.npy"
external_dataset: "./external/negatives_1m.npy"
office: "./features/office_noise.npy"
home: "./features/home_noise.npy"
Batch Composition
batch_composition defines how many feature samples are taken per training batch from the datasets specified in feature_manifest.
Each entry in batch_composition corresponds to a dataset or dataset group defined in feature_manifest.
batch_composition:
target: 10
n: 68
hn: 10
b: 40
# others..
This means that each training batch will contain:
- 10 samples from the
targetsdatasets (all datasets inside thetargets) - 68 samples from the
negatives.ndataset - 10 samples from the
negatives.hndataset - 40 samples from the
negatives.bdataset
Relationship with feature_manifest
batch_composition always uses the datasets defined in feature_manifest.
For example:
feature_manifest:
targets:
t: positive_features.npy
negatives:
n: negative_features.npy
hn: hard_negative_features.npy
b: noise_features.npy
The keys used in batch_composition must match the dataset keys or dataset groups defined in feature_manifest.
How Samples Are Selected
When a group name is used:
batch_composition:
target: 10
the samples are randomly selected from all datasets inside the targets group.
For example:
targets:
t1: dataset1.npy
t2: dataset2.npy
t3: dataset3.npy
Then:
target: 10
means:
- A total of 10 samples will be taken from the targets group
- Samples are selected randomly across all target datasets
- Not exactly 10 from each dataset
Example distribution:
- t1 → 3 samples
- t2 → 4 samples
- t3 → 3 samples
Selecting From a Specific Dataset
To select samples from a specific dataset, use its dataset key:
batch_composition:
t: 10
This means:
- 10 samples will be taken only from
targets.t
because:
targets:
t: positive_features.npy
Summary
feature_manifestdefines where the datasets are locatedbatch_compositiondefines how many samples are taken from those datasets per batch- Keys in
batch_compositionmust match keys or groups infeature_manifest
Data Generation
Parameters for synthetic audio generation using Text-to-Speech (TTS).
This function serves as the central orchestrator for creating synthetic audio
clips. It operates based on a list of "generation tasks" defined in the
main configuration file under the data_generation_tasks key. This
task-based approach grants the user fine-grained control over the entire
data generation process, allowing for the creation of multiple, diverse
datasets (e.g., positive, negative, validation) in a single run.
Each task is an independent job that specifies what text to synthesize, how many samples to create, where to save them, and what Text-To-Speech (TTS) settings to use. This modularity empowers users to build complex and robust datasets tailored to their specific needs.
The primary workflow is as follows:
- Loads the list of tasks from the configuration.
- Pre-loads any globally required models (like the phonemizer) for efficiency.
- Iterates through each enabled task.
- For each task, it determines the text source and generates the list of phrases to be synthesized.
- It then calls the
generate_samplesutility to create the audio files. - Clears the GPU cache after heavy tasks to maintain performance.
Configuration Schema (data_generation_tasks):
The data_generation_tasks key in your config file should be a list of
dictionaries, where each dictionary represents a single task.
Task Keys:
name (str): A descriptive name for the task (e.g., "Positive Wake Words").
enabled (bool): If `False`, this task will be skipped. Defaults to `True`.
output_dir (str): The path to the directory where audio clips will be saved.
num_samples (int): The total number of audio clips to generate for this task.
file_prefix (str): A prefix for the generated audio filenames (e.g., "pos_").
tts_settings (dict, optional): Task-specific TTS settings that override
the global `tts_settings`.
text_source (dict): A dictionary defining the source of the text to be
synthesized. This is the core of the task's logic.
The text_source Dictionary:
This dictionary must contain a type key, which determines how the text
is generated. Supported types are:
type: "fixed_phrase"Generates audio for a single, repeated phrase. Ideal for positive wake word samples.phrase(str, optional): The exact phrase to use. If not provided, it falls back to the globaltarget_phrase.
type: "from_list"Generates audio from a user-provided list of phrases. Perfect for curated lists of negative samples.phrases(list[str]): A list of custom text phrases.repeat_each(int, optional): How many times to repeat each phrase in the list. Defaults to 1.
type: "auto_adversarial"Generates phonetically similar but common English words/phrases. Excellent for creating a robust set of negative samples that challenge the model with real-world, confusable words.base_phrase(str, optional): The phrase to generate variations from. Falls back to the globaltarget_phrase.- Supports other keys like
include_partial_phrase,max_multi_word_len, etc.
type: "phoneme_adversarial"Generates nonsensical but phonetically very similar text by manipulating the phonemes of a base phrase. This creates extremely challenging negative samples to drastically reduce false activations.base_phrase(str, optional): The phrase to generate variations from. Falls back to the globaltarget_phrase.min_distance(float, optional): Controls how different the generated phoneme strings are from the original. Defaults to 0.35.
Example Usage (in a .yaml config file):
target_phrase: "hey nano"
data_generation_tasks:
- name: "Positive Wake Words"
enabled: true
output_dir: "dataset/positive"
num_samples: 1000
text_source:
type: "fixed_phrase"
# Uses the global "hey nano" target_phrase
- name: "Phoneme-Based Hard Negatives"
enabled: true
output_dir: "dataset/negative"
num_samples: 1500
file_prefix: "neg_phoneme"
text_source:
type: "phoneme_adversarial"
min_distance: 0.4
Augmentation Settings
Audio augmentation parameters for training robustness.
Structure
augmentation_settings:
gain_prob: 1.0 # Probability of gain adjustment
min_gain_in_db: -2.0 # Minimum gain in dB
max_gain_in_db: 2.0 # Maximum gain in dB
pitch_prob: 0.3 # Probability of pitch shift
max_pitch_semitones: 1.0 # Maximum pitch shift
min_pitch_semitones: -1.0 # Minimum pitch shift
max_snr_in_db: 35.0 # Maximum signal-to-noise ratio
min_snr_in_db: 15.0 # Minimum signal-to-noise ratio
rir_prob: 0.0 # Probability of applying RIR
Parameter Descriptions
min_snr_in_db / max_snr_in_db
- Type:
float - Range: Typically -10 to +40 dB
- Description: Signal-to-Noise ratio range when mixing audio with background noise.
- Lower SNR = harder augmentation (more noise, harder training)
- Higher SNR = easier augmentation (less noise, cleaner audio)
rir_prob
- Type:
float(0.0-1.0) - Default:
0.2 - Description: Probability of applying room impulse response convolution.
- Effect: Simulates acoustic room effects for robustness.
pitch_prob / min_pitch_semitones / max_pitch_semitones
- Type:
float - Pitch Range: Typically ±2 to ±5 semitones
- Description: Pitch shifting for voice variation without changing content.
gain_prob / min_gain_in_db / max_gain_in_db
- Type:
float - Gain Range: Typically -6 to +6 dB
- Description: Volume adjustment for robustness to different microphone levels.
ColoredNoise
- Type:
float(0.0-1.0) - Default:
0.30 - Description: Probability of adding colored noise (pink/brown noise).
Example: Aggressive Augmentation
augmentation_settings:
min_snr_in_db: -5.0 # Very noisy (challenging)
max_snr_in_db: 20.0
rir_prob: 0.5 # Frequent RIR
pitch_prob: 0.6 # Frequent pitch shift
min_pitch_semitones: -4.0 # Wider pitch range
max_pitch_semitones: 4.0
gain_prob: 1.0
min_gain_in_db: -12.0 # Wider gain range
max_gain_in_db: 12.0
Feature Generation Manifest
Defines how to generate and process feature files from raw audio.
Structure
feature_generation_manifest:
feature_key_name1:
input_audio_dirs: ["path/to/audio"] # Source audio directories
output_filename: "output_features.npy" # Output file name
use_background_noise: true # Mix with background noise
use_rir: true # Apply RIR augmentation
augmentation_rounds: 10 # Number of augmentation iterations
augmentation_settings: # Optional: override global settings
min_snr_in_db: 5.0
pitch_prob: 0.5
Parameters
input_audio_dirs
- Type:
listof strings - Description: Directories containing raw audio files to process.
output_filename
- Type:
string - Description: Name of the output .npy feature file (without
.npyextension).
use_background_noise
- Type:
boolean - Default:
true - Description: Mix samples with background noise from
background_paths.
use_rir
- Type:
boolean - Default:
true - Description: Apply room impulse response convolution.
augmentation_rounds
- Type:
integer - Default:
10 - Valid Range:
1to50 - Description: How many times to augment each audio sample.
- Higher rounds = more training data, slower generation
- Examples: 1-3 rounds for large datasets, 10-20 for small datasets
augmentation_settings
- Type:
dict(optional) - Description: Feature-specific augmentation overrides (if not using global settings).
Example: Multiple Feature Generations
feature_generation_manifest:
positive_features:
input_audio_dirs: ["./data/positive"]
output_filename: "positive_features.npy"
use_background_noise: true
use_rir: true
augmentation_rounds: 15
hard_negative_features:
input_audio_dirs: ["./data/negative"]
output_filename: "hard_negative_features.npy"
use_background_noise: true
use_rir: true
augmentation_rounds: 20
pure_noise_features:
input_audio_dirs: ["./data/background_noise"]
output_filename: "noise_features.npy"
use_background_noise: false
use_rir: false
augmentation_rounds: 5
augmentation_settings: false # There will be no argumentation.
others_features:
# your paramiters...
Advanced Settings
Fine-tuning parameters for specialized scenarios.
augmentation_batch_size
- Type:
integer - Default: Auto-calculated (16-128 based on system resources)
- Description: Batch size for audio augmentation (separate from training batch size).
- Note: Intelligently calculated based on available RAM and CPU cores.
feature_gen_cpu_ratio
- Type:
float - Default:
1.0 - Valid Range:
0.0to1.0 - Description: CPU utilization ratio for feature generation (0.0=GPU only, 1.0=CPU ratio).
Checkpointing & Early Stopping
checkpoint_averaging_top_k
- Type:
integer - Default:
5 - Description: Number of best checkpoints to average for final model.
checkpointing.enabled
- Type:
boolean - Default:
true - Description: Enable periodic model checkpointing during training.
checkpointing.interval_steps
- Type:
integer - Default:
1000 - Description: Save checkpoint every N training steps.
checkpointing.limit
- Type:
integer - Default:
2 - Description: Maximum checkpoint files to keep (oldest are deleted).
early_stopping_patience
- Type:
integer - Default:
0 - Valid Range:
0to100 - Description: Stop training if no improvement for N validation checks.
- 0 = disabled
main_delta
- Type:
float - Default:
0.0001 - Description: Minimum improvement threshold for early stopping.
Loss & Training Dynamics
stabilization_steps
- Type:
integer - Default:
1500 - Description: Number of gradual warmup steps at training start.
- Effect: Prevents instability in initial iterations.
ema_alpha
- Type:
float - Default:
0.01 - Valid Range:
0.0to1.0 - Description: Exponential moving average smoothing factor for loss tracking.
- Higher values: Faster response to recent changes
- Lower values: Smoother, more stable trend
Validation Settings
validation_batch_size
- Type:
integer - Default:
256 - Description: Batch size for validation pass.
Export Settings
onnx_opset_version
- Type:
integer - Default:
17 - Valid Range:
11to20 - Description: ONNX opset version for model export compatibility.
- Note: Lower versions = broader compatibility, higher versions = latest features.
Custom Export Model
Nanowakeword supports user-provided export hooks so you can run any custom export code (for example, CoreML, TFLite, or a private converter) automatically after training and after distillation.
How it works:
- Place a Python script anywhere on disk that exposes a callable (default name
export_model) which accepts the following arguments (either by keyword or positional):model- the in-memory PyTorch model (or a student model during distillation)input_shape- the detected input shape tupleconfig- the final merged training configuration (aConfigProxy-backed dict)model_name- the name chosen for the model (string)output_dir- directory where built-in exporters have written artifacts
Alternatively, specify a shell command which will be executed; the command supports Python-style str.format() placeholders: {model_path}, {model_name}, {output_dir}.
Configuration (example YAML):
export_model:
# Option A: Python script
script: /absolute/path/to/my_coreml_export.py
function: export_model # optional, defaults to export_model
# Option B: shell command (alternative)
# command: "python /scripts/convert_to_coreml.py --onnx {model_path} --out {output_dir}"
Example Python export script (my_coreml_export.py):
def export_model(model, input_shape, config, model_name, output_dir):
"""Example: export a model to CoreML.
Notes:
- This example assumes you have `coremltools` installed and available.
- Many users prefer to export the ONNX produced by the built-in exporter
and run a converter on that file instead of converting a live PyTorch model.
"""
import os
# Option 1: convert the in-memory PyTorch model directly
try:
import coremltools as ct
# Example: convert a traced TorchScript model
# WARNING: conversion requirements depend on your model; this is illustrative.
model.eval()
example_input = None
# Create a dummy input matching the expected shape; adapt dtype/device as needed
import torch
example_input = torch.randn(1, *input_shape)
traced = torch.jit.trace(model, example_input)
mlmodel = ct.convert(traced)
out_path = os.path.join(output_dir, model_name + ".mlmodel")
mlmodel.save(out_path)
print(f"Saved CoreML model to {out_path}")
return
except Exception as e:
# Fallback: convert the already-produced ONNX file with an external tool
print(f"In-memory CoreML conversion failed: {e}. Trying ONNX fallback.")
# Option 2: operate on ONNX produced by built-in exporter
onnx_path = os.path.join(output_dir, model_name + ".onnx")
if os.path.exists(onnx_path):
# call your converter here, e.g. coremltools.converters.onnx.convert(...) or a CLI
print(f"Found ONNX at {onnx_path}. Run your converter here.")
else:
raise FileNotFoundError(f"Could not find ONNX at {onnx_path}")
Command example using command (shell):
custom_export:
command: "python /scripts/onnx_to_coreml.py --onnx {model_path} --out {output_dir}"
This feature is intentionally flexible: your script can use the in-memory torch model, the ONNX file written by the trainer, or call any external tooling your workflow requires.
Pipeline Control
Master switches to enable/disable major processing stages.
generate_clips
- Type:
boolean - Default:
false - Description: Enable/disable the clip generation stage (TTS synthesis).
- Example:
generate_clips: true
transform_clips
- Type:
boolean - Default:
false - Description: Enable/disable feature extraction and augmentation stage.
- ⚠️ Important: Set to
falsewhen not actively generating features to avoid infinite loops.
train_model
- Type:
boolean - Default:
false - Description: Enable/disable the training stage.
overwrite
- Type:
boolean - Default:
false - Description: Force regeneration of feature files, overwriting existing files.
- ⚠️ Warning: Use with caution as it will delete existing computed features.
force_verify
- Type:
boolean - Default:
false - Description: Force re-verification of all data directories, ignoring cache.
show_training_summary
- Type:
boolean - Default:
true - Description: Display effective training configuration in tabular format.
debug_mode
- Type:
boolean - Default:
false - Description: Enable debug logging and visualization outputs.
enable_journaling
- Type:
boolean - Default:
true - Description: Log training metrics and model information to journal.
Command-Line Arguments
Running training with configuration overrides:
# Basic training
nanowakeword -c your_config_path.yaml
# Generate + Transform + Train
nanowakeword -c config.yaml -G -t -T
# Force regeneration of features
nanowakeword -c config.yaml --overwrite
# Resume from previous training
nanowakeword -c config.yaml --resume ./trained_models/my_model_v1
# Only transform (no generation, no training)
nanowakeword -c config.yaml -t
# Distill
nanowakeword -c copy_X_config.yaml --distill
Arguments Explanation
-c, --config_path- Path to YAML config file (required)-G, --generate_clips- Enable synthetic data generation stage-t, --transform_clips- Enable feature generation and augmentation-T, --train_model- Enable model training-f, --force-verify- Ignore cache and re-verify all data--overwrite- Regenerate all feature files (destructive)--resume- Resume training from specific model directory