Fast, large scale document production can easily be achieved by simply spreading out processing across as many machines as are available. The set-up is simple, doesn't require much more than a database, some code and networked computers.


Distributed Processing

Important: For file formats that don't have built in code functionality, file generation etc. must be handled by the application. I.e. self population and processing is only possible for formats such MS Office documents that support VBA.

Also note, the requirement in this case was not just to produce the calculated results. For various reasons, fully functional Excel models had to be created for each claim and sent, with each batch, to the client. If only results were required, calculator document production would not have been necessary.

Process Summary

  1. Host creates a batch header.
  2. Participants connect and are logged against the batch.
  3. Host starts batch once happy all machines have connected.
  4. Server procedure splits and allocates work to participants.
  5. Participant machines:

    • Copy down the self populating Excel template.
    • Create local copies of the template and populate each with a case reference.
    • Opens up the first ten spreadsheets, each automatically connect to the backend database grab the data as per the case reference, run the population process, save and return the results to the server.
    • Application monitors the number of files being processed, as soon as the number drops below ten, spawns another until the process is complete.
    • Once the host workload is complete, all files are copied back to the network and machine exits the batch.
  6. Once all machines have finished, the host machine closes the batch and generates batch report listing any failures etc., and produces a random sample set for quality control.

The Benefits

In this particular example, a process generating over five thousand complex MS Excel spreadsheets per batch, calculating PPI refunds with model results being written back to the database, processing was reduced from around ten hours to minutes as per the number of available machines. Due to regulatory deadlines, each batch had to be completed by a specific date to avoid large fines.

Timing wasn't the only benefit brought by distributing the processing and having the spreadsheets self-process. The end-to-end process and calculator template changed frequently. Being able to run multiple test batches in a day made testing and debugging a breeze.

In the original process one machine generated one spreadsheet at a time, populating it and writing back the results to the server. In the new process many machines each generated ten spreadsheets at a time in parallel. VBA in the spreadsheets handled grabbing the data from the server and writing back the results.

There was however a small problem. The spreadsheets could not be sent back to the client containing the VBA that was used to build them, IPR needed protecting. To resolve this I structured the VBA as follows:


The process here describes a particular scenario with very specific requirements. It can however be modified to suit whatever the requirements.

In this particular case there was no dedicated hardware, or a software product, the process was live almost from the moment development started and all we had to work with was desktop PCs and MS Office.

The point is, if you need to produce thousands of unique, template based, documents quickly, have multiple computers that can be utilised for the time it takes, you can do so without having to invest in additional hardware. For example, a process that might take 3 hours will be reduced to 30 minutes with 6 computers involved.