Read the full configuration guide now, everything is here.
Complete documentation of all configurable parameters in the NanoWakeWord package, including descriptions, default values, meanings, and usage examples.
Table of Contents
Project & Data Paths
Configuration parameters for project organization and data source locations.
model_name
Type:
stringDefault: Auto-generated based on model type (e.g.,
XXX_dnn_v1)Description: Name of the trained model. Used for creating directories and organizing outputs.
Example:
model_name: "my_wakeword_A_v1"
output_dir
Type:
stringDefault:
"./trained_models"Description: Base directory where all trained models and artifacts will be stored.
Example:
output_dir: "./trained_models" # Creates: ./trained_models/my_wakeword_v1/model/, ./trained_models/my_wakeword_v1/features/
positive_data_path
Type:
string(file path)Mandatory: Yes
Default: None
Description: Directory containing positive audio samples (actual wake word utterances).
Requirements:
- Must contain
.wavfiles at 16 kHz sample rate - Mono or stereo audio (will be converted to mono)
- Can be empty if using only generated synthetic samples
- Must contain
negative_data_path
Type:
string(file path)Mandatory: Yes
Default: None
Description: Directory containing negative audio samples (non-wake-word utterances).
Example:
negative_data_path: "./data/common_words"
background_paths
Type:
listof stringsDefault: Optional
Description: Directories containing background noise audio files for augmentation. Multiple paths supported.
Example:
background_paths: # You can add multiple path or only one - "./data/office_noise" - "./data/street_noise" - "./data/home_noise"
rir_paths
Type:
listof stringsDefault: Optional
Description: Directories containing Room Impulse Response (RIR) files for acoustic augmentation.
Note: At least one RIR path is required for intelligent configuration.
Model Architecture
Parameters controlling the neural network structure and behavior.
model_type
Type:
stringDefault:
"dnn"Valid Options:
"dnn","lstm","gru","rnn","cnn","transformer","crnn","tcn","quartznet","conformer","e_branchformer","custom"Description: The neural network architecture to use for wake word detection.
Complexity Levels (from simplest to most complex):
dnn- Dense feedforward network (lightweight, fast)cnn- Convolutional Neural Network (good for spectrograms)lstm,gru,rnn- Recurrent networks (excellent for sequences)crnn- Hybrid CNN-RNN (combines both strengths)transformer,conformer,e_branchformer- Advanced attention-based (most powerful, most complex)- custom architectures
Examples by use case:
# Embedded/Edge device (minimal resources) model_type: "dnn" # Edge device with more resources model_type: "lstm" # Desktop/cloud with ample resources model_type: "conformer"
layer_size (DNN/RNN-based architectures)
Type:
integerDefault:
128Valid Range:
64to512Description: Number of neurons in each hidden layer for feedforward and recurrent layers.
Relationship to model capacity: Larger values = more parameters = longer training, better performance (up to a point)
Example:
layer_size: 256 # Larger model, slower but potentially better
n_blocks
Type:
integerDefault:
3Valid Range:
1to10Description: Number of stacked blocks/layers in the model.
- For
dnn: Number of fully connected layers - For
lstm/gru: Number of recurrent layers - For
transformer: Number of encoder layers - For
crnn: Number of RNN layers (CNN part is fixed)
- For
Example:
n_blocks: 5 # Deeper network
dropout_prob
Type:
floatDefault:
0.5(intelligently adjusted)Valid Range:
0.0to0.8Description: Dropout probability per layer to prevent overfitting.
- Higher values = more regularization = potential underfitting
- Lower values = less regularization = potential overfitting
- Typically 0.2-0.5 for most models
Example:
dropout_prob: 0.3
activation_function (Advanced)
Type:
stringDefault:
"relu"Valid Options:
"relu","gelu","silu"Description: Activation function used in hidden layers.
relu- Traditional, fast, widely supportedgelu- Smooth, often better convergencesilu- Modern alternative (Swish activation)
Example:
activation_function: "gelu"
embedding_dim (Advanced)
Type:
integerDefault:
64Valid Range:
32to256Description: Dimensionality of the final embedding before classification.
Architecture-Specific Parameters
Transformer Architecture
model_type: "transformer"
transformer_d_model: 128 # Model dimension, default: 128
transformer_n_head: 4 # Number of attention heads, default: 4
CRNN Architecture
model_type: "crnn"
crnn_cnn_channels: [16, 32, 32] # CNN channel progression, default: [16, 32, 32]
crnn_rnn_type: "lstm" # "lstm" or "gru", default: "lstm"
TCN Architecture
model_type: "tcn"
tcn_channels: [64, 64, 128] # Channel progression, default: [64, 64, 128]
tcn_kernel_size: 3 # Convolution kernel size, default: 3
Conformer Architecture
model_type: "conformer"
conformer_d_model: 144 # Model dimension, default: 144
conformer_n_head: 4 # Attention heads, default: 4
E-Branchformer Architecture
model_type: "e_branchformer"
branchformer_d_model: 144 # Model dimension, default: 144
branchformer_n_head: 4 # Attention heads, default: 4
QuartzNet Architecture
model_type: "quartznet"
quartznet_config: # Channel, kernel, repeat config
- [256, 33, 1]
- [256, 33, 1]
- [512, 39, 1]
Custom Architecture
Type:
stringValue:
"custom"Description: Load a user-defined
torch.nn.Moduleclass from a Python file or installed module.Required Settings:
custom_model_config.module_pathcustom_model_config.class_name
Optional Settings:
custom_model_config.params
Custom model requirements:
- The class must inherit from
torch.nn.Module - It should return an embedding tensor shaped
[batch_size, embedding_dim] - It may accept the following standard constructor arguments:
input_shapeembedding_dimdropout_probactivation_fnconfig
- Additional custom parameters may be provided via
params
- The class must inherit from
Example:
You can create a Python file.
import torch
from torch import nn
class MyCustomModel(nn.Module):
def __init__(self,input_shape, embedding_dim=64, dropout_prob=0.5, activation_fn=None, config=None, hidden_channels=32,):
super().__init__()
self.input_shape = input_shape
self.embedding_dim = embedding_dim
self.activation_fn = activation_fn if activation_fn is not None else nn.ReLU()
# Build CNN feature extractor (no flatten/linear until we know conv output size)
self.feature_extractor = nn.Sequential(
nn.Conv2d(1, hidden_channels, kernel_size=3, padding=1),
self.activation_fn,
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(hidden_channels, hidden_channels * 2, kernel_size=3, padding=1),
self.activation_fn,
nn.AdaptiveAvgPool2d((1, 1)),
)
# Determine flattened feature size by running a dummy tensor through the convs
with torch.no_grad():
dummy = torch.zeros(1, 1, *input_shape)
conv_out = self.feature_extractor(dummy)
flattened_size = int(conv_out.numel() // conv_out.shape[0])
self.embedding_head = nn.Sequential(
nn.Flatten(),
nn.Linear(flattened_size, 128),
self.activation_fn,
nn.Dropout(dropout_prob),
nn.Linear(128, embedding_dim),
)
def forward(self, x):
# Expect input shaped [batch, time, features] or [batch, 1, time, features]
if x.dim() == 3:
x = x.unsqueeze(1)
x = self.feature_extractor(x)
x = self.embedding_head(x)
return x
In your config:
model_type: "custom"
custom_model_config:
module_path: "path/to/your/custom_model_architectures.py"
class_name: "MyCustomModel"
params:
hidden_channels: 32
Important:
module_pathmay be either a relative path to a Python file or an importable Python module name.
Training & Optimization
Parameters governing the training loop, optimization, and learning rate scheduling.
steps
Type:
integerDefault:
20000(intelligently adjusted based on data volume)Valid Range:
1000to100000Description: Total number of training iterations/steps.
Calculation Logic:
base_steps = effective_data_volume * 1000steps per hour- Adjusted based on data quality and model complexity
- Typically 10,000-40,000 for most scenarios
Example:
steps: 50000 # For very large/complex datasets
batch_size
Type:
integerDefault:
128Valid Range:
- Minimum: 1 (at least 1 sample per batch required)
- Maximum: Limited by GPU/CPU memory
- CPU training → 16–128+ typical
- single GPU → 32–256+ typical
- multi-GPU → 512+ possible
Description: Number of training samples per batch.
- Larger batches = faster training, more stable gradients, more memory
- Smaller batches = slower training, noisier gradients, less memory
Example:
batch_size: 128
optimizer_type
Type:
stringDefault:
"adamw"Valid Options:
"adamw","adam","sgd"Description: Optimization algorithm.
adamw- Adaptive Moment Estimation with Weight decay (recommended)adam- Original adaptive optimizersgd- Stochastic Gradient Descent (simple, slower convergence)
Example:
optimizer_type: "adamw"
learning_rate_max
Type:
floatDefault: Auto-calculated
Description: Maximum learning rate during training (used with cycle schedulers).
Intelligently Adjusted Based On:
- Dataset size (larger datasets → higher LR)
- Data noise levels (cleaner data → higher LR)
- Model complexity
Example:
learning_rate_max: 0.001
learning_rate_base
Type:
floatDefault:
learning_rate_max / 10Description: Minimum/base learning rate during cyclical scheduling.
Note: Automatically calculated if not specified.
lr_scheduler_type
Type:
stringDefault:
"onecycle"Valid Options:
"onecycle","cyclic","cosine"Description: Learning rate schedule strategy.
onecycle- One cycle from base to max LR and back (good for fast convergence)cyclic- Multiple triangular cycles (good for exploration)cosine- Cosine annealing (smooth, gradual decrease)
Example:
lr_scheduler_type: "onecycle"
clr_step_size_up (Cyclic LR)
Type:
integerDefault: Auto-calculated based on total steps
Description: Number of steps to increase LR in each cycle.
clr_step_size_down (Cyclic LR)
Type:
integerDefault: Auto-calculated based on total steps
Description: Number of steps to decrease LR in each cycle.
weight_decay
Type:
floatDefault:
0.01Description: L2 regularization coefficient to prevent overfitting.
momentum (SGD optimizer)
Type:
floatDefault:
0.9Valid Range:
0.0to1.0Description: Momentum factor for SGD optimizer.
num_workers
Type:
integerDefault:
2Valid Range:
0toCPU_countDescription: Number of worker threads for data loading.
- 0 = single thread (slower, no multiprocessing)
- 2-4 = typical for most systems
- Increase for large datasets and fast GPUs
Feature Manifest
Defines paths to pre-computed audio feature files (.npy format) used for training.
Structure
feature_manifest: # You can add Multiple Sources
targets: # Positive samples (wake word)
key1: "path/to/features.npy"
# others..
negatives: # Negative samples (non-wake-words)
key1: "path/to/negatives.npy"
key2: "path/to/noise.npy" # Background noise samples
# others..
# Optional: Validation data (if _val key suffix used)
targets_val:
key1: "path/to/val_positive.npy"
negatives_val:
key1: "path/to/val_negatives.npy"
key2: "path/to/val_noise.npy"
Key Naming Convention (It will use batch_composition)
Keys within each category can be arbitrary unique identifiers
Short keys preferred for readability (e.g.,
t,n,b)Multiple feature sources can be specified with different keys (e.g.,
real_pos,bg2,hard_neg)
Example with Multiple Sources
feature_manifest:
targets:
t: "./trained_models/model_v1/features/positive.npy"
my_voice: "./voice/muhammad_abid/muhammad_abid_data.npy"
negatives:
common_words: "./features/common_words.npy"
hard_negatives: "./features/similar_words.npy"
external_dataset: "./external/negatives_1m.npy"
office: "./features/office_noise.npy"
home: "./features/home_noise.npy"
Batch Composition
batch_composition defines how many feature samples are taken per training batch from the datasets specified in feature_manifest.
Each entry in batch_composition corresponds to a dataset or dataset group defined in feature_manifest.
batch_composition:
target: 10
n: 68
hn: 10
b: 40
# others..
This means that each training batch will contain:
10 samples from the
targetsdatasets (all datasets inside thetargets)68 samples from the
negatives.ndataset10 samples from the
negatives.hndataset40 samples from the
negatives.bdataset
Relationship with feature_manifest
batch_composition always uses the datasets defined in feature_manifest.
For example:
feature_manifest:
targets:
t: positive_features.npy
negatives:
n: negative_features.npy
hn: hard_negative_features.npy
b: noise_features.npy
The keys used in batch_composition must match the dataset keys or dataset groups defined in feature_manifest.
How Samples Are Selected
When a group name is used:
batch_composition:
target: 10
the samples are randomly selected from all datasets inside the targets group.
For example:
targets:
t1: dataset1.npy
t2: dataset2.npy
t3: dataset3.npy
Then:
target: 10
means:
A total of 10 samples will be taken from the targets group
Samples are selected randomly across all target datasets
Not exactly 10 from each dataset
Example distribution:
t1 → 3 samples
t2 → 4 samples
t3 → 3 samples
Selecting From a Specific Dataset
To select samples from a specific dataset, use its dataset key:
batch_composition:
t: 10
This means:
10 samples will be taken only from
targets.tbecause:
targets:
t: positive_features.npy
Summary
feature_manifestdefines where the datasets are locatedbatch_compositiondefines how many samples are taken from those datasets per batchKeys in
batch_compositionmust match keys or groups infeature_manifest
Data Generation
Parameters for synthetic audio generation using Text-to-Speech (TTS).
This function serves as the central orchestrator for creating synthetic audio
clips. It operates based on a list of "generation tasks" defined in the
main configuration file under the data_generation_tasks key. This
task-based approach grants the user fine-grained control over the entire
data generation process, allowing for the creation of multiple, diverse
datasets (e.g., positive, negative, validation) in a single run.
Each task is an independent job that specifies what text to synthesize, how many samples to create, where to save them, and what Text-To-Speech (TTS) settings to use. This modularity empowers users to build complex and robust datasets tailored to their specific needs.
The primary workflow is as follows:
Loads the list of tasks from the configuration.
Pre-loads any globally required models (like the phonemizer) for efficiency.
Iterates through each enabled task.
For each task, it determines the text source and generates the list of phrases to be synthesized.
It then calls the
generate_samplesutility to create the audio files.Clears the GPU cache after heavy tasks to maintain performance.
Configuration Schema (
data_generation_tasks): Thedata_generation_taskskey in your config file should be a list of dictionaries, where each dictionary represents a single task.
Task Keys:
name(str): A descriptive name for the task (e.g., "Positive Wake Words").enabled(bool): IfFalse, this task will be skipped. Defaults toTrue.output_dir(str): The path to the directory where audio clips will be saved.num_samples(int): The total number of audio clips to generate for this task.file_prefix(str): A prefix for the generated audio filenames (e.g., "pos_").tts_settings(dict, optional): Task-specific TTS settings that override the globaltts_settings.text_source(dict): A dictionary defining the source of the text to be synthesized. This is the core of the task's logic. Thetext_sourceDictionary: This dictionary must contain atypekey, which determines how the text is generated. Supported types are:
type: "fixed_phrase"Generates audio for a single, repeated phrase. Ideal for positive wake word samples.phrase(str, optional): The exact phrase to use. If not provided, it falls back to the globaltarget_phrase.
type: "from_list"Generates audio from a user-provided list of phrases. Perfect for curated lists of negative samples.phrases(list[str]): A list of custom text phrases.repeat_each(int, optional): How many times to repeat each phrase in the list. Defaults to 1.
type: "auto_adversarial"Generates phonetically similar but common English words/phrases. Excellent for creating a robust set of negative samples that challenge the model with real-world, confusable words.base_phrase(str, optional): The phrase to generate variations from. Falls back to the globaltarget_phrase.- Supports other keys like
include_partial_phrase,max_multi_word_len, etc.
type: "phoneme_adversarial"Generates nonsensical but phonetically very similar text by manipulating the phonemes of a base phrase. This creates extremely challenging negative samples to drastically reduce false activations.base_phrase(str, optional): The phrase to generate variations from. Falls back to the globaltarget_phrase.min_distance(float, optional): Controls how different the generated phoneme strings are from the original. Defaults to 0.35.Example Usage (in a .yaml config file):
target_phrase: "hey nano"
data_generation_tasks:
- name: "Positive Wake Words"
enabled: true
output_dir: "dataset/positive"
num_samples: 1000
text_source:
type: "fixed_phrase"
# Uses the global "hey nano" target_phrase
- name: "Phoneme-Based Hard Negatives"
enabled: true
output_dir: "dataset/negative"
num_samples: 1500
file_prefix: "neg_phoneme"
text_source:
type: "phoneme_adversarial"
min_distance: 0.4
Augmentation Settings
Audio augmentation parameters for training robustness.
Structure
augmentation_settings:
gain_prob: 1.0 # Probability of gain adjustment
min_gain_in_db: -2.0 # Minimum gain in dB
max_gain_in_db: 2.0 # Maximum gain in dB
pitch_prob: 0.3 # Probability of pitch shift
max_pitch_semitones: 1.0 # Maximum pitch shift
min_pitch_semitones: -1.0 # Minimum pitch shift
max_snr_in_db: 35.0 # Maximum signal-to-noise ratio
min_snr_in_db: 15.0 # Minimum signal-to-noise ratio
rir_prob: 0.0 # Probability of applying RIR
Parameter Descriptions
min_snr_in_db / max_snr_in_db
Type:
floatRange: Typically -10 to +40 dB
Description: Signal-to-Noise ratio range when mixing audio with background noise.
- Lower SNR = harder augmentation (more noise, harder training)
- Higher SNR = easier augmentation (less noise, cleaner audio)
rir_prob
Type:
float(0.0-1.0)Default:
0.2Description: Probability of applying room impulse response convolution.
Effect: Simulates acoustic room effects for robustness.
pitch_prob / min_pitch_semitones / max_pitch_semitones
Type:
floatPitch Range: Typically ±2 to ±5 semitones
Description: Pitch shifting for voice variation without changing content.
gain_prob / min_gain_in_db / max_gain_in_db
Type:
floatGain Range: Typically -6 to +6 dB
Description: Volume adjustment for robustness to different microphone levels.
ColoredNoise
Type:
float(0.0-1.0)Default:
0.30Description: Probability of adding colored noise (pink/brown noise).
Example: Aggressive Augmentation
augmentation_settings:
min_snr_in_db: -5.0 # Very noisy (challenging)
max_snr_in_db: 20.0
rir_prob: 0.5 # Frequent RIR
pitch_prob: 0.6 # Frequent pitch shift
min_pitch_semitones: -4.0 # Wider pitch range
max_pitch_semitones: 4.0
gain_prob: 1.0
min_gain_in_db: -12.0 # Wider gain range
max_gain_in_db: 12.0
Feature Generation Manifest
Defines how to generate and process feature files from raw audio.
Structure
feature_generation_manifest:
feature_key_name1:
input_audio_dirs: ["path/to/audio"] # Source audio directories
output_filename: "output_features.npy" # Output file name
use_background_noise: true # Mix with background noise
use_rir: true # Apply RIR augmentation
augmentation_rounds: 10 # Number of augmentation iterations
augmentation_settings: # Optional: override global settings
min_snr_in_db: 5.0
pitch_prob: 0.5
Parameters
input_audio_dirs
Type:
listof stringsDescription: Directories containing raw audio files to process.
output_filename
Type:
stringDescription: Name of the output .npy feature file (without
.npyextension).
use_background_noise
Type:
booleanDefault:
trueDescription: Mix samples with background noise from
background_paths.
use_rir
Type:
booleanDefault:
trueDescription: Apply room impulse response convolution.
augmentation_rounds
Type:
integerDefault:
10Valid Range:
1to50Description: How many times to augment each audio sample.
- Higher rounds = more training data, slower generation
- Examples: 1-3 rounds for large datasets, 10-20 for small datasets
augmentation_settings
Type:
dict(optional)Description: Feature-specific augmentation overrides (if not using global settings).
Example: Multiple Feature Generations
feature_generation_manifest:
positive_features:
input_audio_dirs: ["./data/positive"]
output_filename: "positive_features.npy"
use_background_noise: true
use_rir: true
augmentation_rounds: 15
hard_negative_features:
input_audio_dirs: ["./data/negative"]
output_filename: "hard_negative_features.npy"
use_background_noise: true
use_rir: true
augmentation_rounds: 20
pure_noise_features:
input_audio_dirs: ["./data/background_noise"]
output_filename: "noise_features.npy"
use_background_noise: false
use_rir: false
augmentation_rounds: 5
augmentation_settings: false # There will be no argumentation.
others_features:
# your paramiters...
Advanced Settings
Fine-tuning parameters for specialized scenarios.
augmentation_batch_size
Type:
integerDefault: Auto-calculated (16-128 based on system resources)
Description: Batch size for audio augmentation (separate from training batch size).
Note: Intelligently calculated based on available RAM and CPU cores.
feature_gen_cpu_ratio
Type:
floatDefault:
1.0Valid Range:
0.0to1.0Description: CPU utilization ratio for feature generation (0.0=GPU only, 1.0=CPU ratio).
Checkpointing & Early Stopping
checkpoint_averaging_top_k
Type:
integerDefault:
5Description: Number of best checkpoints to average for final model.
checkpointing.enabled
Type:
booleanDefault:
trueDescription: Enable periodic model checkpointing during training.
checkpointing.interval_steps
Type:
integerDefault:
1000Description: Save checkpoint every N training steps.
checkpointing.limit
Type:
integerDefault:
2Description: Maximum checkpoint files to keep (oldest are deleted).
early_stopping_patience
Type:
integerDefault:
0Valid Range:
0to100Description: Stop training if no improvement for N validation checks.
0 = disabled
main_delta
Type:
floatDefault:
0.0001Description: Minimum improvement threshold for early stopping.
Loss & Training Dynamics
stabilization_steps
Type:
integerDefault:
1500Description: Number of gradual warmup steps at training start.
Effect: Prevents instability in initial iterations.
ema_alpha
Type:
floatDefault:
0.01Valid Range:
0.0to1.0Description: Exponential moving average smoothing factor for loss tracking.
Higher values: Faster response to recent changes
Lower values: Smoother, more stable trend
Validation Settings
validation_batch_size
Type:
integerDefault:
256Description: Batch size for validation pass.
Export Settings
onnx_opset_version
Type:
integerDefault:
17Valid Range:
11to20Description: ONNX opset version for model export compatibility.
Note: Lower versions = broader compatibility, higher versions = latest features.
Custom Export Model
Nanowakeword supports user-provided export hooks so you can run any custom export code (for example, CoreML, TFLite, or a private converter) automatically after training and after distillation.
How it works:
Place a Python script anywhere on disk that exposes a callable (default name
export_model) which accepts the following arguments (either by keyword or positional):model- the in-memory PyTorch model (or a student model during distillation)input_shape- the detected input shape tupleconfig- the final merged training configuration (aConfigProxy-backed dict)model_name- the name chosen for the model (string)output_dir- directory where built-in exporters have written artifactsAlternatively, specify a shell
commandwhich will be executed; the command supports Python-stylestr.format()placeholders:{model_path},{model_name},{output_dir}.Configuration (example YAML):
export_model:
# Option A: Python script
script: /absolute/path/to/my_coreml_export.py
function: export_model # optional, defaults to export_model
# Option B: shell command (alternative)
# command: "python /scripts/convert_to_coreml.py --onnx {model_path} --out {output_dir}"
Example Python export script (my_coreml_export.py):
def export_model(model, input_shape, config, model_name, output_dir):
"""Example: export a model to CoreML.
Notes:
- This example assumes you have `coremltools` installed and available.
- Many users prefer to export the ONNX produced by the built-in exporter
and run a converter on that file instead of converting a live PyTorch model.
"""
import os
# Option 1: convert the in-memory PyTorch model directly
try:
import coremltools as ct
# Example: convert a traced TorchScript model
# WARNING: conversion requirements depend on your model; this is illustrative.
model.eval()
example_input = None
# Create a dummy input matching the expected shape; adapt dtype/device as needed
import torch
example_input = torch.randn(1, *input_shape)
traced = torch.jit.trace(model, example_input)
mlmodel = ct.convert(traced)
out_path = os.path.join(output_dir, model_name + ".mlmodel")
mlmodel.save(out_path)
print(f"Saved CoreML model to {out_path}")
return
except Exception as e:
# Fallback: convert the already-produced ONNX file with an external tool
print(f"In-memory CoreML conversion failed: {e}. Trying ONNX fallback.")
# Option 2: operate on ONNX produced by built-in exporter
onnx_path = os.path.join(output_dir, model_name + ".onnx")
if os.path.exists(onnx_path):
# call your converter here, e.g. coremltools.converters.onnx.convert(...) or a CLI
print(f"Found ONNX at {onnx_path}. Run your converter here.")
else:
raise FileNotFoundError(f"Could not find ONNX at {onnx_path}")
Command example using command (shell):
custom_export:
command: "python /scripts/onnx_to_coreml.py --onnx {model_path} --out {output_dir}"
This feature is intentionally flexible: your script can use the in-memory torch model, the ONNX file written by the trainer, or call any external tooling your workflow requires.
Pipeline Control
Master switches to enable/disable major processing stages.
generate_clips
Type:
booleanDefault:
falseDescription: Enable/disable the clip generation stage (TTS synthesis).
Example:
generate_clips: true
transform_clips
Type:
booleanDefault:
falseDescription: Enable/disable feature extraction and augmentation stage.
⚠️ Important: Set to
falsewhen not actively generating features to avoid infinite loops.
train_model
Type:
booleanDefault:
falseDescription: Enable/disable the training stage.
overwrite
Type:
booleanDefault:
falseDescription: Force regeneration of feature files, overwriting existing files.
⚠️ Warning: Use with caution as it will delete existing computed features.
force_verify
Type:
booleanDefault:
falseDescription: Force re-verification of all data directories, ignoring cache.
show_training_summary
Type:
booleanDefault:
trueDescription: Display effective training configuration in tabular format.
debug_mode
Type:
booleanDefault:
falseDescription: Enable debug logging and visualization outputs.
enable_journaling
Type:
booleanDefault:
trueDescription: Log training metrics and model information to journal.
Command-Line Arguments
Running training with configuration overrides:
# Basic training
nanowakeword -c your_config_path.yaml
# Generate + Transform + Train
nanowakeword -c config.yaml -G -t -T
# Force regeneration of features
nanowakeword -c config.yaml --overwrite
# Resume from previous training
nanowakeword -c config.yaml --resume ./trained_models/my_model_v1
# Only transform (no generation, no training)
nanowakeword -c config.yaml -t
# Distill
nanowakeword -c copy_X_config.yaml --distill
Arguments Explanation
-c, --config_path- Path to YAML config file (required)-G, --generate_clips- Enable synthetic data generation stage-t, --transform_clips- Enable feature generation and augmentation-T, --train_model- Enable model training-f, --force-verify- Ignore cache and re-verify all data--overwrite- Regenerate all feature files (destructive)--resume- Resume training from specific model directory