Credit risk scorecards are logistic regression models, fitted to large and complex data sets, employed by the financial industry to model the probability of default of a potential customer. In order to ensure that a scorecard remains a representative model of the population one tests the hypothesis of population stability; specifying that the distribution of clients' attributes remains constant over time. Simulating realistic data sets for this purpose is nontrivial as these data sets are multivariate and contain intricate dependencies. The simulation of these data sets are of practical interest for both practitioners and for researchers; practitioners may wish to consider the effect that a specified change in the properties of the data has on the scorecard and its usefulness from a business perspective, while researchers may wish to test a newly developed technique in credit scoring. We propose a simulation technique based on the specification of bad ratios, this is explained below. Practitioners can generally not be expected to provide realistic parameter values for a scorecard; these models are simply too complex and contain too many parameters to make such a specification viable. However, practitioners can often confidently specify the bad ratio associated with two different levels of a specific attribute. That is, practitioners are often comfortable with making statements such as "on average a new customer is 1.5 times as likely to default as an existing customer with similar attributes". We propose a method which can be used to obtain parameter values for a scorecard based on specified bad ratios. The proposed technique is demonstrated using a realistic example and we show that the simulated data sets adhere closely to the specified bad ratios. The paper provides a link to a github project in which the R code used in order to generate the results shown can be found.
翻译:信用风险记分卡是后勤回归模型,安装在大型和复杂的数据集上,金融业采用这些数据集来模拟潜在客户违约概率。为了确保记分卡继续成为具有代表性的人口模型,为确保记分卡继续成为具有代表性的人口模型,先测试人口稳定性假设;具体说明客户属性的分布在一段时间内保持不变。为此目的模拟现实的数据集是非边际的,因为这些数据集是多变的,包含错综复杂的相互依存关系。这些数据集的模拟对于从业人员和研究人员来说都具有实际意义;从业务角度出发,从业人员不妨考虑数据属性的特定变化对记分卡及其实用性的影响,而研究人员则可能希望测试新开发的信用评分技术。我们提议根据坏比率的规格进行模拟技术,下文对此作出解释。一般无法期望操作者为记分卡提供现实的参数值;这些模型过于复杂,包含太多参数,无法使这种规格变得可行。然而,从业人员往往可以有信心地具体说明与两个不同属性的坏比率相关比率。也就是说,从实际操作者通常都喜欢在信用评分评分方法中进行陈述。我们用了一个类似的标准来显示,我们用了一个错误的比标表示:我们用了一个错误的比标表示的比,我们用了一个新的客户的比重表示了一个比的比,我们用了一个比。我们用了一个不同的标准表示了一个比。我们用了一个比。我们用了一个比,用来显示一个比的比的比的比。我们用了一个比,用来了一个比表示一个比表示一个比。我们用了一个比表示一个比表示一个比,用来了一个比表示一个比,用来一个比。我们用来了一个比的比的比的比的比的比的比一个比一个比一个比一个比。一个比。一个比的比。我们用一个比一个比一个比一个比一个比一个比表示一个比一个比一个比一个比一个比一个比一个比一个比的比一个比一个比一个比一个比,用来了一个比。我们用的比一个比。我们用的比。我们用的比的比,用来表示一个比一个比一个比一个比一个比一个比。一个比一个比一个比一个比一个比一个比一个比一个比一个比一个比一个比一个比一个比一个比一个比一个比一个比一个比一个比一个比一个比一个比一个比一个比一个比一个比一个比