🧮 The ISBL Master Equation
The core logic of the sampler is governed by the following dynamic probability distribution:
\[\mathcal{P}(x_i \mid x_i \in C_k, t) =
\frac{
\left(\mathcal{L}_i^{(t-1)}\right)^\alpha + \epsilon
}{
\sum_{x_j \in C_k}
\left[
\left(\mathcal{L}_j^{(t-1)}\right)^\alpha + \epsilon
\right]
}\]
🔍 Nomenclature & Parameter Breakdown
- \(\mathcal{P}(x_i \mid x_i \in C_k, t)\): The conditional probability of selecting a specific data sample \(x_i\) from class pool \(C_k\) at training step \(t\).
- \(x_i, x_j\): Individual data samples (e.g., specific audio clips) within the dataset. Here, \(x_i\) represents the target sample being evaluated, while \(x_j\) represents all competing samples within the same pool during summation.
- \(C_k\) (Class/Category Pool): A distinct subset or category of data (e.g.,
targetsornegatives), isolated via the dataset's index pools. - \(t\) (Training Step / Time): The current iteration or time step of the training loop, defining the temporal state of the sampling probabilities.
- \(\mathcal{L}_i^{(t-1)}\) (Loss/Hardness Score): The individual loss value computed for sample \(x_i\) during its most recent forward pass at step \(t-1\). Higher loss signifies higher "hardness". (Note: At \(t=0\), before any training occurs, all scores are uniformly initialized to \(\mathcal{L}_i^{(0)} = 1.0\)).
- \(\alpha\) (Smoothing Factor): A hyperparameter set to
0.75. It acts as a contrast control that dampens extreme loss values. This prevents unlearnable, corrupted, or heavily noisy audio clips from dominating the batch gradients and causing model collapse. - \(\epsilon\) (Epsilon / Stability Constant): A tiny positive constant set to
1e-6serving a dual purpose:- Mathematical Safety: Prevents division-by-zero errors or absolute zero probabilities when a sample is perfectly learned.
- Catastrophic Forgetting Prevention: As the model converges and all individual losses drop near zero (\(\mathcal{L} \approx 0\)), the equation naturally transitions into a uniform random sampler (\(\mathcal{P} \approx \frac{1}{N}\)), ensuring balanced baseline revision in later training stages.
- \(\sum_{x_j \in C_k}\) (Summation Over Class): The summation operator (Sigma) that aggregates the computed scores of all individual samples \(x_j\) belonging to class \(C_k\). Dividing the single sample's score by this total sum normalizes the output into a strict probability distribution bounded between \(0\) and \(1\).
End of article.