Working Papers
Dimitrova, D. S., Kaishev, V. K., and Sáenz Guillén, E. L. (2025a). GeDS: An R package for Regression, Generalized Additive Models and Functional Gradient Boosting, based on Geometrically Designed (GeD) Splines. Manuscript submitted for publication. Under review in the Journal of Statistical Software. Pre-print.
Abstract
In recent years, geometrically designed variable knot splines, named GeDS, have emerged as a promising technique in the domain of spline regression, with Kaishev, Dimitrova, Haberman, and Verrall (2016) and Dimitrova, Kaishev, Lattuada, and Verrall (2023) showcasing their potential. In this paper, we introduce the R package GeDS that includes the implementation of two significant enhancements of the original GeDS methodology. The first broadens the applicability of GeDS to encompass generalized additive models (GAM), by implementing the local scoring algorithm using GeD splines as function smoothers. This approach stands as a competitive alternative, complementing existing practices suggested by Hastie and Tibshirani (1990) and Wood (2017), and implemented in the R packages gam and mgcv, respectively. Secondly, we incorporate functional gradient boosting (FGB) to estimate the number and location of the spline knots, as well as the associated regression coefficients. This novel approach allows the final boosted fit to be expressed as a single spline model, contrasting with typical gradient boosting models, which generally lack a straightforward, interpretable representation. We demonstrate that this technique yields competitive spline fits comparing favorably in both accuracy and efficiency to the outputs of existing boosting-with-splines procedures proposed by Bühlmann and Yu (2003) and Schmid and Hothorn (2008a), and implemented in the R package mboost.
The above extensions position GeDS as a versatile tool for additive modeling within the exponential family, suitable for both regression and classification tasks. The GeDS methodology, including GAM-GeDS and FGB-GeDS, is implemented in the R package GeDS available from https://cran.r-project.org/package=GeDS. We illustrate the capabilities of this package foregrounding the competitiveness of GeDS, and its potential for applications in the wider contexts of data science and machine learning.
Dimitrova, D. S., Kaishev, V. K., and Sáenz Guillén, E. L. (2026a). Density and distribution function estimation using variable-knot splines. Manuscript submitted for publication. R package: DDFS.
Abstract
We propose a novel nonparametric framework for density estimation based on B-splines with data-driven knot selection. The method, termed DDFS (Density and Distribution Function Splines), simultaneously estimates the probability density function and the cumulative distribution function using a common spline representation, ensuring internal consistency between the two.
The approach combines constrained maximum likelihood estimation with a sequential, bias-driven knot insertion procedure inspired by Geometrically Designed Splines (GeDS) introduced by Kaishev et al. (2016) and extended by Dimitrova et al. (2023). This yields an adaptive, non-uniform knot sequence with data-driven refinement, where a small number of tuning parameters admit robust default choices but can be adjusted when modelling complex density features.
We develop a comprehensive asymptotic theory for the proposed estimator in both conditional and unconditional settings. In particular, we show that the data-driven knot sequence satisfies suitable growth and quasi-uniformity properties with high probability, enabling a rigorous sieve maximum likelihood analysis. Under standard smoothness assumptions, we establish uniform (sup-norm) convergence rates for the spline coefficients, density, distribution, and quantile estimators. These rates achieve the classical minimax optimal order (up to logarithmic factors) over Hölder classes. Moreover, the estimator is shown to attain these rates adaptively, without prior knowledge of the underlying smoothness.
The spline representation further allows for closed-form expressions of key risk measures, including Value-at-Risk and Tail Value-at-Risk. The corresponding plug-in estimators inherit the optimal convergence rates, supporting accurate and theoretically grounded risk assessment.
Numerical experiments demonstrate the effectiveness of the proposed method across a range of benchmark densities, highlighting its flexibility and strong finite-sample performance.
Dimitrova, D. S., Kaishev, V. K., and Sáenz Guillén, E. L. (2026b). How Income Perceptions and Living Standards Shape Inequality and New Product Diffusion. Manuscript submitted for publication. A corresponding survey study was funded through an internal grant and implemented via Prolific and ShinyApps.
Abstract
We introduce a novel stochastic framework in which income perceptions are shaped by individuals’ “visible” standard of living, proxied by the value of owned goods (e.g., housing, cars) and consumption expenditures (e.g., groceries, leisure, holidays). Central to the approach is modeling the joint distribution of perceived income and standard of living. We employ a Gaussian copula and marginal distributions specified via free-knot splines. Both marginals and the copula are estimated using UK survey data.
The estimated joint distribution enables the assessment of inequality both marginally (perceived income) and jointly (perceived income and standard of living). We construct univariate/multivariate Lorenz curves and compute the associated Gini coefficients. Results show that perceived income inequality exceeds actual income inequality and is even higher when measured jointly with standard of living. By incorporating this joint distribution, we extend the threshold model of new product diffusion, explore its theoretical properties, and illustrate its application.
Dimitrova, D. S., Kaishev, V. K., and Sáenz Guillén, E. L. (2026c). Pricing and Diffusion under Perceived Inequality. Manuscript in preparation.
Abstract
This paper develops a framework linking perceived income, inequality, pricing, and new product diffusion in heterogeneous markets. Because firms do not observe consumers’ purchasing power directly, they infer it from observable living-standard signals through an inference technology that generates a distribution of perceived income. Pricing and adoption decisions are therefore governed by perceived rather than latent purchasing power.
We show that diffusion dynamics and optimal pricing are jointly determined by the local geometry of the perceived-income distribution. In particular, the curvature of the Lorenz curve acts as a sufficient statistic for market thickness, governing adoption responsiveness, diffusion speed, takeoff dynamics, and pricing incentives. Markets with thinner local segments around the affordability threshold exhibit weaker responses to price reductions, delayed diffusion, and higher optimal markups.
The paper also develops an empirical implementation of the inference technology using UK survey data on observable living standards and perceived purchasing power. The results show that alternative informational architectures generate systematically different perceived-income distributions and therefore different pricing and diffusion predictions. Inference technologies based on relatively homogeneous respondent groups may substantially overstate or understate market inequality, whereas more diverse informational architectures generate perceived-income distributions that more closely approximate the underlying distribution of purchasing power.
Overall, the paper provides a unified framework connecting inequality, informational frictions, pricing, and diffusion, and highlights the central role of market thickness in shaping market outcomes.
Work in Progress
Bariatric Data Analytics Based on the UK National Bariatric Surgery Registry (NBSR), joint project with Prof. Vladimir Kaishev, Dr. Dimitrina Dimitrova, and external collaborators, Miss Emma Rose McGlone from Imperial College London and Mr Omar Khan from the British Obesity & Metabolic Specialist Society (BOMSS).
Acknowledged Collaborations
Montes-Rojas and Elosegui (2020). ‘Network ANOVA random effects models for node attributes’. In: Journal of Dynamics and Games 7.3, pp. 239–252. issn: 2164-6066. doi: 10.3934/jdg.2020017
Software
Dimitrova, D. S., Kaishev, V. K., Lattuada, A., Sáenz Guillén, E. L., & Verrall, R. J. (2025). GeDS: Geometrically designed spline regression [R package version 0.3.4]. https://CRAN.R-project.org/package=GeDS
Dimitrova, D. S., Kaishev, V. K., & Sáenz Guillén, E. L. (2025b). DDFS: Density & distribution function estimation using variable-knot splines [R package version 0.1.0]. https://github.com/emilioluissaenzguillen/ddfs
Master’s Thesis
Sáenz Guillén, E. L. (2021). The Chilean Pension System: Actuarial Analysis of a Paradigmatic Social Security Program. https://hdl.handle.net/10171/124147