NanowakeWord Configuration Guide

Complete documentation of all configurable parameters in the Nanowakeword package, including descriptions, default values, meanings, and usage examples.

Table of Contents

  1. Project & Data Paths
  2. Model Architecture
  3. Training & Optimization
  4. Feature Manifest
  5. Batch Composition
  6. Data Generation
  7. Augmentation Settings
  8. Feature Generation Manifest
  9. Advanced Settings
  10. Pipeline Control
  11. Intelligent Auto-Configuration
  12. Inference Parameters

Project & Data Paths

Configuration parameters for project organization and data source locations.

model_name

  • Type: string
  • Default: Auto-generated based on model type (e.g., XXX_dnn_v1)
  • Description: Name of the trained model. Used for creating directories and organizing outputs.
  • Example:
    model_name: "my_wakeword_A_v1"
    

output_dir

  • Type: string
  • Default: "./trained_models"
  • Description: Base directory where all trained models and artifacts will be stored.
  • Example:
    output_dir: "./trained_models"
    # Creates: ./trained_models/my_wakeword_v1/model/, ./trained_models/my_wakeword_v1/features/
    

positive_data_path

  • Type: string (file path)
  • Mandatory: Yes
  • Default: None
  • Description: Directory containing positive audio samples (actual wake word utterances).
  • Requirements:
    • Must contain .wav files at 16 kHz sample rate
    • Mono or stereo audio (will be converted to mono)
    • Can be empty if using only generated synthetic samples

negative_data_path

  • Type: string (file path)
  • Mandatory: Yes
  • Default: None
  • Description: Directory containing negative audio samples (non-wake-word utterances).
  • Example:
    negative_data_path: "./data/common_words"
    

background_paths

  • Type: list of strings
  • Default: Optional
  • Description: Directories containing background noise audio files for augmentation. Multiple paths supported.
  • Example:
    background_paths: # You can add multiple path or only one
      - "./data/office_noise"
      - "./data/street_noise"
      - "./data/home_noise"
    

rir_paths

  • Type: list of strings
  • Default: Optional
  • Description: Directories containing Room Impulse Response (RIR) files for acoustic augmentation.
  • Note: At least one RIR path is required for intelligent configuration.

Model Architecture

Parameters controlling the neural network structure and behavior.

model_type

  • Type: string

  • Default: "dnn"

  • Valid Options: "dnn", "lstm", "gru", "rnn", "cnn", "transformer", "crnn", "tcn", "quartznet", "conformer", "e_branchformer", "custom"

  • Description: The neural network architecture to use for wake word detection.

  • Complexity Levels (from simplest to most complex):

    • dnn - Dense feedforward network (lightweight, fast)
    • cnn - Convolutional Neural Network (good for spectrograms)
    • lstm, gru, rnn - Recurrent networks (excellent for sequences)
    • crnn - Hybrid CNN-RNN (combines both strengths)
    • transformer, conformer, e_branchformer - Advanced attention-based (most powerful, most complex)
  • Examples by use case:

    # Embedded/Edge device (minimal resources)
    model_type: "dnn"
    
    # Edge device with more resources
    model_type: "lstm"
    
    # Desktop/cloud with ample resources
    model_type: "conformer"
    

layer_size (DNN/RNN-based architectures)

  • Type: integer
  • Default: 128
  • Valid Range: 64 to 512
  • Description: Number of neurons in each hidden layer for feedforward and recurrent layers.
  • Relationship to model capacity: Larger values = more parameters = longer training, better performance (up to a point)
  • Example:
    layer_size: 256  # Larger model, slower but potentially better
    

n_blocks

  • Type: integer

  • Default: 3

  • Valid Range: 1 to 10

  • Description: Number of stacked blocks/layers in the model.

    • For dnn: Number of fully connected layers
    • For lstm/gru: Number of recurrent layers
    • For transformer: Number of encoder layers
    • For crnn: Number of RNN layers (CNN part is fixed)
  • Example:

    n_blocks: 5  # Deeper network
    

dropout_prob

  • Type: float

  • Default: 0.5 (intelligently adjusted)

  • Valid Range: 0.0 to 0.8

  • Description: Dropout probability per layer to prevent overfitting.

    • Higher values = more regularization = potential underfitting
    • Lower values = less regularization = potential overfitting
    • Typically 0.2-0.5 for most models
  • Example:

    dropout_prob: 0.3
    

activation_function (Advanced)

  • Type: string

  • Default: "relu"

  • Valid Options: "relu", "gelu", "silu"

  • Description: Activation function used in hidden layers.

    • relu - Traditional, fast, widely supported
    • gelu - Smooth, often better convergence
    • silu - Modern alternative (Swish activation)
  • Example:

    activation_function: "gelu"
    

embedding_dim (Advanced)

  • Type: integer
  • Default: 64
  • Valid Range: 32 to 256
  • Description: Dimensionality of the final embedding before classification.

Architecture-Specific Parameters

Transformer Architecture

model_type: "transformer"
transformer_d_model: 128        # Model dimension, default: 128
transformer_n_head: 4           # Number of attention heads, default: 4

CRNN Architecture

model_type: "crnn"
crnn_cnn_channels: [16, 32, 32]  # CNN channel progression, default: [16, 32, 32]
crnn_rnn_type: "lstm"             # "lstm" or "gru", default: "lstm"

TCN Architecture

model_type: "tcn"
tcn_channels: [64, 64, 128]      # Channel progression, default: [64, 64, 128]
tcn_kernel_size: 3                # Convolution kernel size, default: 3

Conformer Architecture

model_type: "conformer"
conformer_d_model: 144            # Model dimension, default: 144
conformer_n_head: 4               # Attention heads, default: 4

E-Branchformer Architecture

model_type: "e_branchformer"
branchformer_d_model: 144         # Model dimension, default: 144
branchformer_n_head: 4            # Attention heads, default: 4

QuartzNet Architecture

model_type: "quartznet"
quartznet_config:                 # Channel, kernel, repeat config
  - [256, 33, 1]
  - [256, 33, 1]
  - [512, 39, 1]

Custom Architecture

  • Type: string

  • Value: "custom"

  • Description: Load a user-defined torch.nn.Module class from a Python file or installed module.

  • Required Settings:

    • custom_model_config.module_path
    • custom_model_config.class_name
  • Optional Settings:

    • custom_model_config.params
  • Custom model requirements:

    • The class must inherit from torch.nn.Module
    • It should return an embedding tensor shaped [batch_size, embedding_dim]
    • It may accept the following standard constructor arguments:
      • input_shape
      • embedding_dim
      • dropout_prob
      • activation_fn
      • config
    • Additional custom parameters may be provided via params
  • Example:

You can create a Python file.

import torch
from torch import nn


class MyCustomModel(nn.Module):

    def __init__(self,input_shape, embedding_dim=64, dropout_prob=0.5, activation_fn=None, config=None, hidden_channels=32,):
        super().__init__()
        self.input_shape = input_shape
        self.embedding_dim = embedding_dim
        self.activation_fn = activation_fn if activation_fn is not None else nn.ReLU()
        # Build CNN feature extractor (no flatten/linear until we know conv output size)
        self.feature_extractor = nn.Sequential(
            nn.Conv2d(1, hidden_channels, kernel_size=3, padding=1),
            self.activation_fn,
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(hidden_channels, hidden_channels * 2, kernel_size=3, padding=1),
            self.activation_fn,
            nn.AdaptiveAvgPool2d((1, 1)),
        )
        # Determine flattened feature size by running a dummy tensor through the convs
        with torch.no_grad():
            dummy = torch.zeros(1, 1, *input_shape)
            conv_out = self.feature_extractor(dummy)
            flattened_size = int(conv_out.numel() // conv_out.shape[0])
        self.embedding_head = nn.Sequential(
            nn.Flatten(),
            nn.Linear(flattened_size, 128),
            self.activation_fn,
            nn.Dropout(dropout_prob),
            nn.Linear(128, embedding_dim),
        )
    def forward(self, x):
        # Expect input shaped [batch, time, features] or [batch, 1, time, features]
        if x.dim() == 3:
            x = x.unsqueeze(1)
        x = self.feature_extractor(x)
        x = self.embedding_head(x)
        return x

In your config:

model_type: "custom"
custom_model_config:
  module_path: "path/to/your/custom_model_architectures.py"
  class_name: "MyCustomModel"
  params:
    hidden_channels: 32
  • Important: module_path may be either a relative path to a Python file or an importable Python module name.

Training & Optimization

Parameters governing the training loop, optimization, and learning rate scheduling.

steps

  • Type: integer

  • Default: 20000 (intelligently adjusted based on data volume)

  • Valid Range: 1000 to 100000

  • Description: Total number of training iterations/steps.

  • Calculation Logic:

    • base_steps = effective_data_volume * 1000 steps per hour
    • Adjusted based on data quality and model complexity
    • Typically 10,000-40,000 for most scenarios
  • Example:

    steps: 50000  # For very large/complex datasets
    

batch_size

  • Type: integer

  • Default: 128

  • Valid Range:

    • Minimum: 1 (at least 1 sample per batch required)
    • Maximum: Limited by GPU/CPU memory
    • CPU training → 16–128+ typical
    • single GPU → 32–256+ typical
    • multi-GPU → 512+ possible
  • Description: Number of training samples per batch.

    • Larger batches = faster training, more stable gradients, more memory
    • Smaller batches = slower training, noisier gradients, less memory
  • Example:

    batch_size: 128
    

optimizer_type

  • Type: string

  • Default: "adamw"

  • Valid Options: "adamw", "adam", "sgd"

  • Description: Optimization algorithm.

    • adamw - Adaptive Moment Estimation with Weight decay (recommended)
    • adam - Original adaptive optimizer
    • sgd - Stochastic Gradient Descent (simple, slower convergence)
  • Example:

    optimizer_type: "adamw"
    

learning_rate_max

  • Type: float

  • Default: Auto-calculated

  • Description: Maximum learning rate during training (used with cycle schedulers).

  • Intelligently Adjusted Based On:

    • Dataset size (larger datasets → higher LR)
    • Data noise levels (cleaner data → higher LR)
    • Model complexity
  • Example:

    learning_rate_max: 0.001
    

learning_rate_base

  • Type: float
  • Default: learning_rate_max / 10
  • Description: Minimum/base learning rate during cyclical scheduling.
  • Note: Automatically calculated if not specified.

lr_scheduler_type

  • Type: string

  • Default: "onecycle"

  • Valid Options: "onecycle", "cyclic", "cosine"

  • Description: Learning rate schedule strategy.

    • onecycle - One cycle from base to max LR and back (good for fast convergence)
    • cyclic - Multiple triangular cycles (good for exploration)
    • cosine - Cosine annealing (smooth, gradual decrease)
  • Example:

    lr_scheduler_type: "onecycle"
    

clr_step_size_up (Cyclic LR)

  • Type: integer
  • Default: Auto-calculated based on total steps
  • Description: Number of steps to increase LR in each cycle.

clr_step_size_down (Cyclic LR)

  • Type: integer
  • Default: Auto-calculated based on total steps
  • Description: Number of steps to decrease LR in each cycle.

weight_decay

  • Type: float
  • Default: 0.01
  • Description: L2 regularization coefficient to prevent overfitting.

momentum (SGD optimizer)

  • Type: float
  • Default: 0.9
  • Valid Range: 0.0 to 1.0
  • Description: Momentum factor for SGD optimizer.

num_workers

  • Type: integer
  • Default: 2
  • Valid Range: 0 to CPU_count
  • Description: Number of worker threads for data loading.
    • 0 = single thread (slower, no multiprocessing)
    • 2-4 = typical for most systems
    • Increase for large datasets and fast GPUs

Feature Manifest

Defines paths to pre-computed audio feature files (.npy format) used for training.

Structure

feature_manifest: # You can add Multiple Sources
  targets:           # Positive samples (wake word)
    key1: "path/to/features.npy"
    # others.. 
  negatives:         # Negative samples (non-wake-words)
    key1: "path/to/negatives.npy"
    key2: "path/to/noise.npy" # Background noise samples
    # others..
  # Optional: Validation data (if _val key suffix used)
  targets_val:
    key1: "path/to/val_positive.npy"
  negatives_val:
    key1: "path/to/val_negatives.npy"
    key2: "path/to/val_noise.npy"

Key Naming Convention (It will use batch_composition)

  • Keys within each category can be arbitrary unique identifiers
  • Short keys preferred for readability (e.g., t, n, b)
  • Multiple feature sources can be specified with different keys (e.g., real_pos, bg2, hard_neg)

Example with Multiple Sources

feature_manifest:
  targets:
    t: "./trained_models/model_v1/features/positive.npy"
    my_voice: "./voice/muhammad_abid/muhammad_abid_data.npy"
    
  negatives:
    common_words: "./features/common_words.npy"
    hard_negatives: "./features/similar_words.npy"
    external_dataset: "./external/negatives_1m.npy"
    office: "./features/office_noise.npy"
    home: "./features/home_noise.npy"

Batch Composition

batch_composition defines how many feature samples are taken per training batch from the datasets specified in feature_manifest.

Each entry in batch_composition corresponds to a dataset or dataset group defined in feature_manifest.

batch_composition:
  target: 10
  n: 68
  hn: 10
  b: 40
  # others..

This means that each training batch will contain:

  • 10 samples from the targets datasets (all datasets inside the targets)
  • 68 samples from the negatives.n dataset
  • 10 samples from the negatives.hn dataset
  • 40 samples from the negatives.b dataset

Relationship with feature_manifest

batch_composition always uses the datasets defined in feature_manifest.

For example:

feature_manifest:
  targets:
    t: positive_features.npy

  negatives:
    n: negative_features.npy
    hn: hard_negative_features.npy
    b: noise_features.npy

The keys used in batch_composition must match the dataset keys or dataset groups defined in feature_manifest.


How Samples Are Selected

When a group name is used:

batch_composition:
  target: 10

the samples are randomly selected from all datasets inside the targets group.

For example:

targets:
  t1: dataset1.npy
  t2: dataset2.npy
  t3: dataset3.npy

Then:

target: 10

means:

  • A total of 10 samples will be taken from the targets group
  • Samples are selected randomly across all target datasets
  • Not exactly 10 from each dataset

Example distribution:

  • t1 → 3 samples
  • t2 → 4 samples
  • t3 → 3 samples

Selecting From a Specific Dataset

To select samples from a specific dataset, use its dataset key:

batch_composition:
  t: 10

This means:

  • 10 samples will be taken only from targets.t

because:

targets:
  t: positive_features.npy

Summary

  • feature_manifest defines where the datasets are located
  • batch_composition defines how many samples are taken from those datasets per batch
  • Keys in batch_composition must match keys or groups in feature_manifest

Data Generation

Parameters for synthetic audio generation using Text-to-Speech (TTS).

This function serves as the central orchestrator for creating synthetic audio clips. It operates based on a list of "generation tasks" defined in the main configuration file under the data_generation_tasks key. This task-based approach grants the user fine-grained control over the entire data generation process, allowing for the creation of multiple, diverse datasets (e.g., positive, negative, validation) in a single run.

Each task is an independent job that specifies what text to synthesize, how many samples to create, where to save them, and what Text-To-Speech (TTS) settings to use. This modularity empowers users to build complex and robust datasets tailored to their specific needs.

The primary workflow is as follows:

  1. Loads the list of tasks from the configuration.
  2. Pre-loads any globally required models (like the phonemizer) for efficiency.
  3. Iterates through each enabled task.
  4. For each task, it determines the text source and generates the list of phrases to be synthesized.
  5. It then calls the generate_samples utility to create the audio files.
  6. Clears the GPU cache after heavy tasks to maintain performance.

Configuration Schema (data_generation_tasks): The data_generation_tasks key in your config file should be a list of dictionaries, where each dictionary represents a single task.

Task Keys:
    name (str): A descriptive name for the task (e.g., "Positive Wake Words").
    enabled (bool): If `False`, this task will be skipped. Defaults to `True`.
    output_dir (str): The path to the directory where audio clips will be saved.
    num_samples (int): The total number of audio clips to generate for this task.
    file_prefix (str): A prefix for the generated audio filenames (e.g., "pos_").
    tts_settings (dict, optional): Task-specific TTS settings that override
                                    the global `tts_settings`.
    text_source (dict): A dictionary defining the source of the text to be
                        synthesized. This is the core of the task's logic.

The text_source Dictionary: This dictionary must contain a type key, which determines how the text is generated. Supported types are:

  1. type: "fixed_phrase" Generates audio for a single, repeated phrase. Ideal for positive wake word samples.

    • phrase (str, optional): The exact phrase to use. If not provided, it falls back to the global target_phrase.
  2. type: "from_list" Generates audio from a user-provided list of phrases. Perfect for curated lists of negative samples.

    • phrases (list[str]): A list of custom text phrases.
    • repeat_each (int, optional): How many times to repeat each phrase in the list. Defaults to 1.
  3. type: "auto_adversarial" Generates phonetically similar but common English words/phrases. Excellent for creating a robust set of negative samples that challenge the model with real-world, confusable words.

    • base_phrase (str, optional): The phrase to generate variations from. Falls back to the global target_phrase.
    • Supports other keys like include_partial_phrase, max_multi_word_len, etc.
  4. type: "phoneme_adversarial" Generates nonsensical but phonetically very similar text by manipulating the phonemes of a base phrase. This creates extremely challenging negative samples to drastically reduce false activations.

    • base_phrase (str, optional): The phrase to generate variations from. Falls back to the global target_phrase.
    • min_distance (float, optional): Controls how different the generated phoneme strings are from the original. Defaults to 0.35.

Example Usage (in a .yaml config file):

target_phrase: "hey nano"

data_generation_tasks:
  - name: "Positive Wake Words"
    enabled: true
    output_dir: "dataset/positive"
    num_samples: 1000
    text_source:
      type: "fixed_phrase"
      # Uses the global "hey nano" target_phrase

  - name: "Phoneme-Based Hard Negatives"
    enabled: true
    output_dir: "dataset/negative"
    num_samples: 1500
    file_prefix: "neg_phoneme"
    text_source:
      type: "phoneme_adversarial"
      min_distance: 0.4

Augmentation Settings

Audio augmentation parameters for training robustness.

Structure

augmentation_settings:
  gain_prob: 1.0               # Probability of gain adjustment
  min_gain_in_db: -2.0         # Minimum gain in dB
  max_gain_in_db: 2.0          # Maximum gain in dB
  pitch_prob: 0.3              # Probability of pitch shift
  max_pitch_semitones: 1.0     # Maximum pitch shift
  min_pitch_semitones: -1.0    # Minimum pitch shift
  max_snr_in_db: 35.0          # Maximum signal-to-noise ratio
  min_snr_in_db: 15.0          # Minimum signal-to-noise ratio
  rir_prob: 0.0                # Probability of applying RIR

Parameter Descriptions

min_snr_in_db / max_snr_in_db

  • Type: float
  • Range: Typically -10 to +40 dB
  • Description: Signal-to-Noise ratio range when mixing audio with background noise.
    • Lower SNR = harder augmentation (more noise, harder training)
    • Higher SNR = easier augmentation (less noise, cleaner audio)

rir_prob

  • Type: float (0.0-1.0)
  • Default: 0.2
  • Description: Probability of applying room impulse response convolution.
  • Effect: Simulates acoustic room effects for robustness.

pitch_prob / min_pitch_semitones / max_pitch_semitones

  • Type: float
  • Pitch Range: Typically ±2 to ±5 semitones
  • Description: Pitch shifting for voice variation without changing content.

gain_prob / min_gain_in_db / max_gain_in_db

  • Type: float
  • Gain Range: Typically -6 to +6 dB
  • Description: Volume adjustment for robustness to different microphone levels.

ColoredNoise

  • Type: float (0.0-1.0)
  • Default: 0.30
  • Description: Probability of adding colored noise (pink/brown noise).

Example: Aggressive Augmentation

augmentation_settings:
  min_snr_in_db: -5.0          # Very noisy (challenging)
  max_snr_in_db: 20.0
  rir_prob: 0.5                # Frequent RIR
  pitch_prob: 0.6              # Frequent pitch shift
  min_pitch_semitones: -4.0    # Wider pitch range
  max_pitch_semitones: 4.0
  gain_prob: 1.0
  min_gain_in_db: -12.0        # Wider gain range
  max_gain_in_db: 12.0

Feature Generation Manifest

Defines how to generate and process feature files from raw audio.

Structure

feature_generation_manifest:
  feature_key_name1:
    input_audio_dirs: ["path/to/audio"]  # Source audio directories
    output_filename: "output_features.npy" # Output file name
    use_background_noise: true            # Mix with background noise
    use_rir: true                         # Apply RIR augmentation
    augmentation_rounds: 10               # Number of augmentation iterations
    augmentation_settings:                # Optional: override global settings
      min_snr_in_db: 5.0
      pitch_prob: 0.5

Parameters

input_audio_dirs

  • Type: list of strings
  • Description: Directories containing raw audio files to process.

output_filename

  • Type: string
  • Description: Name of the output .npy feature file (without .npy extension).

use_background_noise

  • Type: boolean
  • Default: true
  • Description: Mix samples with background noise from background_paths.

use_rir

  • Type: boolean
  • Default: true
  • Description: Apply room impulse response convolution.

augmentation_rounds

  • Type: integer
  • Default: 10
  • Valid Range: 1 to 50
  • Description: How many times to augment each audio sample.
    • Higher rounds = more training data, slower generation
    • Examples: 1-3 rounds for large datasets, 10-20 for small datasets

augmentation_settings

  • Type: dict (optional)
  • Description: Feature-specific augmentation overrides (if not using global settings).

Example: Multiple Feature Generations

feature_generation_manifest:
  positive_features:
    input_audio_dirs: ["./data/positive"]
    output_filename: "positive_features.npy"
    use_background_noise: true
    use_rir: true
    augmentation_rounds: 15
    
  hard_negative_features:
    input_audio_dirs: ["./data/negative"]
    output_filename: "hard_negative_features.npy"
    use_background_noise: true
    use_rir: true
    augmentation_rounds: 20
    
  pure_noise_features:
    input_audio_dirs: ["./data/background_noise"]
    output_filename: "noise_features.npy"
    use_background_noise: false
    use_rir: false
    augmentation_rounds: 5
    
    augmentation_settings: false  # There will be no argumentation.

  others_features:
    # your paramiters...

Advanced Settings

Fine-tuning parameters for specialized scenarios.

augmentation_batch_size

  • Type: integer
  • Default: Auto-calculated (16-128 based on system resources)
  • Description: Batch size for audio augmentation (separate from training batch size).
  • Note: Intelligently calculated based on available RAM and CPU cores.

feature_gen_cpu_ratio

  • Type: float
  • Default: 1.0
  • Valid Range: 0.0 to 1.0
  • Description: CPU utilization ratio for feature generation (0.0=GPU only, 1.0=CPU ratio).

Checkpointing & Early Stopping

checkpoint_averaging_top_k

  • Type: integer
  • Default: 5
  • Description: Number of best checkpoints to average for final model.

checkpointing.enabled

  • Type: boolean
  • Default: true
  • Description: Enable periodic model checkpointing during training.

checkpointing.interval_steps

  • Type: integer
  • Default: 1000
  • Description: Save checkpoint every N training steps.

checkpointing.limit

  • Type: integer
  • Default: 2
  • Description: Maximum checkpoint files to keep (oldest are deleted).

early_stopping_patience

  • Type: integer
  • Default: 0
  • Valid Range: 0 to 100
  • Description: Stop training if no improvement for N validation checks.
  • 0 = disabled

main_delta

  • Type: float
  • Default: 0.0001
  • Description: Minimum improvement threshold for early stopping.

Loss & Training Dynamics

stabilization_steps

  • Type: integer
  • Default: 1500
  • Description: Number of gradual warmup steps at training start.
  • Effect: Prevents instability in initial iterations.

ema_alpha

  • Type: float
  • Default: 0.01
  • Valid Range: 0.0 to 1.0
  • Description: Exponential moving average smoothing factor for loss tracking.
  • Higher values: Faster response to recent changes
  • Lower values: Smoother, more stable trend

Validation Settings

validation_batch_size

  • Type: integer
  • Default: 256
  • Description: Batch size for validation pass.

Export Settings

onnx_opset_version

  • Type: integer
  • Default: 17
  • Valid Range: 11 to 20
  • Description: ONNX opset version for model export compatibility.
  • Note: Lower versions = broader compatibility, higher versions = latest features.

Custom Export Model

Nanowakeword supports user-provided export hooks so you can run any custom export code (for example, CoreML, TFLite, or a private converter) automatically after training and after distillation.

How it works:

  • Place a Python script anywhere on disk that exposes a callable (default name export_model) which accepts the following arguments (either by keyword or positional):
    • model - the in-memory PyTorch model (or a student model during distillation)
    • input_shape - the detected input shape tuple
    • config - the final merged training configuration (a ConfigProxy-backed dict)
    • model_name - the name chosen for the model (string)
    • output_dir - directory where built-in exporters have written artifacts

Alternatively, specify a shell command which will be executed; the command supports Python-style str.format() placeholders: {model_path}, {model_name}, {output_dir}.

Configuration (example YAML):

export_model:
  # Option A: Python script
  script: /absolute/path/to/my_coreml_export.py
  function: export_model   # optional, defaults to export_model

  # Option B: shell command (alternative)
  # command: "python /scripts/convert_to_coreml.py --onnx {model_path} --out {output_dir}"

Example Python export script (my_coreml_export.py):

def export_model(model, input_shape, config, model_name, output_dir):
    """Example: export a model to CoreML.

    Notes:
    - This example assumes you have `coremltools` installed and available.
    - Many users prefer to export the ONNX produced by the built-in exporter
      and run a converter on that file instead of converting a live PyTorch model.
    """
    import os
    # Option 1: convert the in-memory PyTorch model directly
    try:
        import coremltools as ct
        # Example: convert a traced TorchScript model
        # WARNING: conversion requirements depend on your model; this is illustrative.
        model.eval()
        example_input = None
        # Create a dummy input matching the expected shape; adapt dtype/device as needed
        import torch
        example_input = torch.randn(1, *input_shape)
        traced = torch.jit.trace(model, example_input)
        mlmodel = ct.convert(traced)
        out_path = os.path.join(output_dir, model_name + ".mlmodel")
        mlmodel.save(out_path)
        print(f"Saved CoreML model to {out_path}")
        return
    except Exception as e:
        # Fallback: convert the already-produced ONNX file with an external tool
        print(f"In-memory CoreML conversion failed: {e}. Trying ONNX fallback.")

    # Option 2: operate on ONNX produced by built-in exporter
    onnx_path = os.path.join(output_dir, model_name + ".onnx")
    if os.path.exists(onnx_path):
        # call your converter here, e.g. coremltools.converters.onnx.convert(...) or a CLI
        print(f"Found ONNX at {onnx_path}. Run your converter here.")
    else:
        raise FileNotFoundError(f"Could not find ONNX at {onnx_path}")

Command example using command (shell):

custom_export:
  command: "python /scripts/onnx_to_coreml.py --onnx {model_path} --out {output_dir}"

This feature is intentionally flexible: your script can use the in-memory torch model, the ONNX file written by the trainer, or call any external tooling your workflow requires.


Pipeline Control

Master switches to enable/disable major processing stages.

generate_clips

  • Type: boolean
  • Default: false
  • Description: Enable/disable the clip generation stage (TTS synthesis).
  • Example:
    generate_clips: true
    

transform_clips

  • Type: boolean
  • Default: false
  • Description: Enable/disable feature extraction and augmentation stage.
  • ⚠️ Important: Set to false when not actively generating features to avoid infinite loops.

train_model

  • Type: boolean
  • Default: false
  • Description: Enable/disable the training stage.

overwrite

  • Type: boolean
  • Default: false
  • Description: Force regeneration of feature files, overwriting existing files.
  • ⚠️ Warning: Use with caution as it will delete existing computed features.

force_verify

  • Type: boolean
  • Default: false
  • Description: Force re-verification of all data directories, ignoring cache.

show_training_summary

  • Type: boolean
  • Default: true
  • Description: Display effective training configuration in tabular format.

debug_mode

  • Type: boolean
  • Default: false
  • Description: Enable debug logging and visualization outputs.

enable_journaling

  • Type: boolean
  • Default: true
  • Description: Log training metrics and model information to journal.

Command-Line Arguments

Running training with configuration overrides:

# Basic training
nanowakeword -c your_config_path.yaml 

# Generate + Transform + Train
nanowakeword -c config.yaml -G -t -T

# Force regeneration of features
nanowakeword -c config.yaml --overwrite

# Resume from previous training
nanowakeword -c config.yaml --resume ./trained_models/my_model_v1

# Only transform (no generation, no training)
nanowakeword -c config.yaml -t

# Distill
nanowakeword -c copy_X_config.yaml --distill

Arguments Explanation

  • -c, --config_path - Path to YAML config file (required)
  • -G, --generate_clips - Enable synthetic data generation stage
  • -t, --transform_clips - Enable feature generation and augmentation
  • -T, --train_model - Enable model training
  • -f, --force-verify - Ignore cache and re-verify all data
  • --overwrite - Regenerate all feature files (destructive)
  • --resume - Resume training from specific model directory