Mitigating the saturation gap: Inverse prime-density scaling for high-stakes probabilistic modeling

Authors

DOI:

https://doi.org/10.61467/2007.1558.2026.v17i3.1331

Keywords:

Activation Functions, Prime Number Theorem, Neural Networks, Financial Forecasting, Logarithmic Moderation, Funciones de activación, Teorema de los números primos, Redes neuronales, Moderación logarítmica

Abstract

Learning under severe class imbalance exposes a critical limitation of standard probabilistic gates: saturation at large pre-activations, which leads to vanishing gradients and poor minority-class recall. Inspired by the Prime Number Theorem, we introduce PrimeSigmoid, a logarithmically moderated nonlinearity that preserves gradient sensitivity in high-activation regimes while maintaining probabilistic interpretability. The contribution of this work is centered on three key pillars: (i) Mitigating the Saturation Gap, where logarithmic moderation prevents the model from ignoring rare events; (ii) Structural Robustness, demonstrated through an exhaustive grid study of 63 architectural combinations where PrimeSigmoid consistently dominates across heterogeneous hidden representations; and (iii) Probabilistic Integrity, showing that significant gains in Recall and Matthews Correlation Coefficient (MCC) are achieved without degrading AUC or LogLoss calibration. These results position logarithmically moderated nonlinearities as a principled mechanism for robust classification in high-stakes domains such as financial risk modeling.

Spanish-language metadata / Metadatos en español

Título en español:

Reducción de la brecha de saturación: escalado inverso de la densidad de números primos para la modelización probabilística de alto riesgo


Resumen:

El aprendizaje en condiciones de desequilibrio grave entre clases pone de manifiesto una limitación crítica de las puertas probabilísticas estándar: la saturación ante preactivaciones elevadas, lo que conduce a la desaparición de los gradientes y a una baja recuperación de las clases minoritarias. Inspirándonos en el teorema de los números primos, presentamos PrimeSigmoid, una no linealidad moderada logarítmicamente que conserva la sensibilidad de los gradientes en regímenes de alta activación, al tiempo que mantiene la interpretabilidad probabilística. La contribución de este trabajo se centra en tres pilares fundamentales: (i) la mitigación de la brecha de saturación, en la que la moderación logarítmica evita que el modelo ignore los eventos poco frecuentes; (ii) la robustez estructural, demostrada mediante un exhaustivo estudio de cuadrícula de 63 combinaciones arquitectónicas en las que PrimeSigmoid domina de manera consistente en representaciones ocultas heterogéneas; y (iii) Integridad probabilística, lo que demuestra que se logran mejoras significativas en la tasa de recuperación y el coeficiente de correlación de Matthews (MCC) sin que se vea afectada la calibración del AUC o del LogLoss. Estos resultados sitúan a las no linealidades moderadas logarítmicamente como un mecanismo fundamentado para la clasificación robusta en ámbitos de alto riesgo, como la modelización del riesgo financiero.

Palabras Claves:

Funciones de activación, Teorema de los números primos, Redes neuronales, Previsiones financieras, Moderación logarítmica.

 

Smart citations:

https://scite.ai/reports/10.61467/2007.1558.2026.v17i3.1331
Dimensions.
Open Alex.

References

Aguilar, J., & Gutiérrez, M. (2025). PrimeSigmoid and PrimeTanhGate implementation [Repositorio de software]. GitHub. https://github.com/derivado29/Primesigmoid_PrimeTanhGate_Implementation

Bliss, C. I. (1934). The method of probits. Science, 79(2037), 38–39. https://doi.org/10.1126/science.79.2037.38

Chicco, D., & Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 21(1), Article 6. https://doi.org/10.1186/s12864-019-6413-7

Clevert, D.-A., Unterthiner, T., & Hochreiter, S. (2016). Fast and accurate deep network learning by exponential linear units (ELUs). In International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1511.07289

Courbariaux, M., Bengio, Y., & David, J.-P. (2015). BinaryConnect: Training deep neural networks with binary weights during propagations. Advances in Neural Information Processing Systems, 28. https://arxiv.org/abs/1511.00363

Das, K. (2021). Loan default prediction dataset [Data set]. Kaggle. https://www.kaggle.com/datasets/kmldas/loan-default-prediction

Elkan, C. (2001). The foundations of cost-sensitive learning. In Proceedings of the 17th International Joint Conference on Artificial Intelligence (IJCAI). https://www.ijcai.org/Proceedings/01/Papers/161.pdf

Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In Y. W. Teh & M. Titterington (Eds.), Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research, Vol. 9, pp. 249–256). https://proceedings.mlr.press/v9/glorot10a.html

Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017). On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning. https://arxiv.org/abs/1706.04599

Hardy, G. H., & Wright, E. M. (1979). An introduction to the theory of numbers (5th ed.). Oxford University Press.

Hendrycks, D., & Gimpel, K. (2016). Gaussian error linear units (GELUs). arXiv. https://arxiv.org/abs/1606.08415

LeCun, Y. A., Bottou, L., Orr, G. B., & Müller, K.-R. (2012). Efficient backprop. In G. Montavon, G. B. Orr, & K.-R. Müller (Eds.), Neural networks: Tricks of the trade (2nd ed., pp. 9–48). Springer. https://doi.org/10.1007/978-3-642-35289-8_3

Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In 2017 IEEE International Conference on Computer Vision (ICCV) (pp. 2999–3007). https://doi.org/10.1109/ICCV.2017.324

Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the 30th International Conference on Machine Learning. https://ai.stanford.edu/~amaas/papers/relu_hybrid_icml2013_final.pdf

Misra, D. (2019). Mish: A self-regularized non-monotonic neural activation function. arXiv. https://arxiv.org/abs/1908.08681

Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning.

Ramachandran, P., Zoph, B., & Le, Q. V. (2017). Searching for activation functions. arXiv. https://arxiv.org/abs/1710.05941

Riemann, B. (1859). Ueber die Anzahl der Primzahlen unter einer gegebenen Grösse. Monatsberichte der Berliner Akademie der Wissenschaften zu Berlin.

Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533–536. https://doi.org/10.1038/323533a0

Verhulst, P.-F. (1838). Notice sur la loi que la population suit dans son accroissement. Correspondance mathématique et physique, 10, 113–121.

Downloads

Published

2026-06-12

How to Cite

Aguilar-Ortiz, J., Gutiérrez-Salinas, M. F., & Domínguez-Mayorga, C. R. (2026). Mitigating the saturation gap: Inverse prime-density scaling for high-stakes probabilistic modeling. International Journal of Combinatorial Optimization Problems and Informatics, 17(3), 37–52. https://doi.org/10.61467/2007.1558.2026.v17i3.1331

Issue

Section

Articles

Most read articles by the same author(s)