Validating Spreadsheet Models for the EBA
Mark Hayes, 23rd March 2020.
The European Banking Authority issues taxonomies against which banks submit financial reports. Prior to submitting the reports, the client goes through a review process. In order to facilitate review, the data is presented in a way that is logical for review, but is not the format required by the EBA.
In order then to present the data for review, business analysts had created user friendly "Views", of the data, contained in a couple of dozen Excel workbooks. Each workbook contained at least one view, some more than ten. These views were populated by retrieving data directly from SAP, based on criteria contained in hidden columns, and then performing aggregates etc. for review.
The taxonomies presented the reportable items as view specific grids, i.e. they could be referenced by view, row and column references.
In brief, the task was to automate the validation of over ten thousand equations held in views across twenty five or so workbooks. To ensure complete coverage and, amongst other things, that the equations accurately reflected the requirements.
Manual testing would have taken months and the live systems weren't, at the time, producing data in a compatible format that could be used for testing. Also, due to the complexity of the data, producing dummy test data had proven counter-productive. The "test" data required more debugging than the views themselves.
This turned out to be quite an interesting little project. Both the taxonomies and the views were changing frequently. Everything was new and not fully defined either on the client side or the regulators side.
The solution had to be able to be repeatedly refreshed without requiring redevelopment.
The output would be a simple spreadsheet listing the key reference values, i.e. View Ref / Row Ref/ Column Ref, the title of the reporting item and a full description of what exactly was wrong with it.
Given the speed with which this was required, and the fact that the solution had to be portable, it was decided to code everything up inside an Excel workbook with VBA. Basically, the workbook would:
- Import the taxonomies and format them for quick programmatic referencing.
- Allow the user to select a folder containing view files to be imported and validated.
- Perform all the necessary validations, i.e. formula and coverage checks etc.
- Automatically maintain taxonomy and solution versions, dates and update history etc.
- Allow the user to import populated views and from those create suitable outputs.
There was a lot of supporting functionality that wasn't actually doing any validating but was essential none the less. The interesting part would be validating the equations.
As described above, a reporting item consisted numerous hidden reference data criteria columns, used for SAP data retrieval, equations to get the data from SAP and equations to present the data for review. Validating them was a simple matter of:
- Ensuring the data retrieval criteria were covered by the data retrieval equations. I.e. there were no populated columns that weren't covered and no unpopulated columns referenced in any equations.
- Comparing the data retrieval equations to the taxonomy requirements by cross referencing on view, row and column.
- Making sure the data retrieval equations mapped directly to a taxonomy requirement. I.e. that the combination of view, row and column was valid.
- Making sure all the taxonomy requirements were met. I.e. there were no gaps in the views.
- Recording any errors or exceptions.
In reality there was a lot more complexity involved, but essentially the solution had to completely automate the processes of:
- Updating and versioning the solution to suit the latest taxonomies.
- Validating the various view workbooks.
- Producing outputs such as the validation report, formatted and cleansed views, view aggregates etc.
This solution made it possible to validate more than two dozen workbooks containing thousands of formulae in a minute or so.
On its first run, the solution identified more than three thousand validation errors. These would have otherwise taken many weeks of manual testing and tracking to identify and resolve.
What's really important to note here is the fact that, it never took more than an hour or so for the errors to be corrected by the analyst building the views. Errors were generally simple and could be resolved by a quick copy and paste.
The Biggest Benefit
By automating the validation this way, the testing team didn't have to create and track a case for each error. An end-to-end administration process that would have taken at least a couple of hours or more per error.
In fact, by the time the testers got involved there was nothing left for them to find. Many thousands of hours saved!