Arcosoph - Building the Future of Intelligent Systems

🧮 The ISBL Master Equation

The core logic of the sampler is governed by the following dynamic probability distribution:

$\mathcal{P}(x_i \mid x_i \in C_k, t) = \frac{\left(\mathcal{L}_i^{(t-1)}\right)^\alpha + \epsilon}{\sum_{x_j \in C_k} \left[ \left(\mathcal{L}_j^{(t-1)}\right)^\alpha + \epsilon \right]}$

🔍 Nomenclature & Parameter Breakdown

$\mathcal{P}(x_i \mid x_i \in C_k, t)$ : The conditional probability of selecting a specific data sample $x_i$ from class pool $C_k$ at training step $t$ .
$x_i, x_j$ : Individual data samples (e.g., specific audio clips) within the dataset. Here, $x_i$ represents the target sample being evaluated, while $x_j$ represents all competing samples within the same pool during summation.
$C_k$ (Class/Category Pool): A distinct subset or category of data (e.g., targets or negatives), isolated via the dataset's index pools.
$t$ (Training Step / Time): The current iteration or time step of the training loop, defining the temporal state of the sampling probabilities.
$\mathcal{L}_i^{(t-1)}$ (Loss/Hardness Score): The individual loss value computed for sample $x_i$ during its most recent forward pass at step $t-1$ . Higher loss signifies higher "hardness". (Note: At $t=0$ , before any training occurs, all scores are uniformly initialized to $\mathcal{L}_i^{(0)} = 1.0$ ).
$\alpha$ (Smoothing Factor): A hyperparameter set to 0.75. It acts as a contrast control that dampens extreme loss values. This prevents unlearnable, corrupted, or heavily noisy audio clips from dominating the batch gradients and causing model collapse.
$\epsilon$ (Epsilon / Stability Constant): A tiny positive constant set to 1e-6 serving a dual purpose:
1. Mathematical Safety: Prevents division-by-zero errors or absolute zero probabilities when a sample is perfectly learned.
2. Catastrophic Forgetting Prevention: As the model converges and all individual losses drop near zero ( $\mathcal{L} \approx 0$ ), the equation naturally transitions into a uniform random sampler ( $\mathcal{P} \approx \frac{1}{N}$ ), ensuring balanced baseline revision in later training stages.
$\sum_{x_j \in C_k}$ (Summation Over Class): The summation operator (Sigma) that aggregates the computed scores of all individual samples $x_j$ belonging to class $C_k$ . Dividing the single sample's score by this total sum normalizes the output into a strict probability distribution bounded between $0$ and $1$ .

ISBL - Importance Sampling based on Loss

🧮 The ISBL Master Equation

🔍 Nomenclature & Parameter Breakdown