🧮 The ISBL Master Equation

The core logic of the sampler is governed by the following dynamic probability distribution:

P(xixiCk,t)=(Li(t1))α+ϵxjCk[(Lj(t1))α+ϵ]\mathcal{P}(x_i \mid x_i \in C_k, t) = \frac{\left(\mathcal{L}_i^{(t-1)}\right)^\alpha + \epsilon}{\sum_{x_j \in C_k} \left[ \left(\mathcal{L}_j^{(t-1)}\right)^\alpha + \epsilon \right]}

🔍 Nomenclature & Parameter Breakdown

  • P(xixiCk,t)\mathcal{P}(x_i \mid x_i \in C_k, t): The conditional probability of selecting a specific data sample xix_i from class pool CkC_k at training step tt.
  • xi,xjx_i, x_j: Individual data samples (e.g., specific audio clips) within the dataset. Here, xix_i represents the target sample being evaluated, while xjx_j represents all competing samples within the same pool during summation.
  • CkC_k (Class/Category Pool): A distinct subset or category of data (e.g., targets or negatives), isolated via the dataset's index pools.
  • tt (Training Step / Time): The current iteration or time step of the training loop, defining the temporal state of the sampling probabilities.
  • Li(t1)\mathcal{L}_i^{(t-1)} (Loss/Hardness Score): The individual loss value computed for sample xix_i during its most recent forward pass at step t1t-1. Higher loss signifies higher "hardness". (Note: At t=0t=0, before any training occurs, all scores are uniformly initialized to Li(0)=1.0\mathcal{L}_i^{(0)} = 1.0).
  • α\alpha (Smoothing Factor): A hyperparameter set to 0.75. It acts as a contrast control that dampens extreme loss values. This prevents unlearnable, corrupted, or heavily noisy audio clips from dominating the batch gradients and causing model collapse.
  • ϵ\epsilon (Epsilon / Stability Constant): A tiny positive constant set to 1e-6 serving a dual purpose:
    1. Mathematical Safety: Prevents division-by-zero errors or absolute zero probabilities when a sample is perfectly learned.
    2. Catastrophic Forgetting Prevention: As the model converges and all individual losses drop near zero (L0\mathcal{L} \approx 0), the equation naturally transitions into a uniform random sampler (P1N\mathcal{P} \approx \frac{1}{N}), ensuring balanced baseline revision in later training stages.
  • xjCk\sum_{x_j \in C_k} (Summation Over Class): The summation operator (Sigma) that aggregates the computed scores of all individual samples xjx_j belonging to class CkC_k. Dividing the single sample's score by this total sum normalizes the output into a strict probability distribution bounded between 00 and 11.