Wednesday, July 26, 2017

Compression / Optimization in OpenIZ

While deploying OpenIZ in a rural region of Tanzania, some of our users were noticing that initial synchronizations were taking quite a long amount of time. The users were attempting to download over 30,000 records over a 2g connection.

To begin with, the Edmonton CTP version of the OpenIZ IMS (0.9.6) did have some greedy database queries. After optimizing the server down to near 0 response times (using REDIS for example, first request for 5,000 facilities is about 20 seconds, second request for the same dataset is 1.5s), I decided it would be worthwhile to update the compression on the OpenIZ IMS.

After reading on the internet, I found some articles about SharpCompress (MIT licensed) which I could use to extend Edmonton CTP's compression providers. The deflate and gzip (compressors in SharpCompress were apparently faster than the default .NET ones). After some work, there are now four compression schemes supported for retrieving and sending data to the OpenIZ IMS:
  • LZMA - The compression scheme used by 7z, which can heavily compress data however requires quite a bit more resources (decompression does not)
  • BZIP2 - A burrows-wheeler compression algorithm which yields slightly higher compression rates than GZIP but is faster than LZMA
  • GZIP - The GNU Zip format, this is standard HTTP GZIP compression
  • Deflate - Again, standard deflate compression algorithm, OpenIZ uses the optimization for speed rather than compression.
Each one of these compression schemes is now supported in the Disconnected Client 0.9.7.3 (Edmonton CTP3, they will require a server running Edmonton CTP3 as well). The settings are found in the "Network Options" section of the configuration UI.

For those that are interested in the trade-offs in terms of speed and bandwidth usage, I setup a small facility with 500 patients and downloaded the data. Please note that these are very specific to this particular use case, however are a good guide. Below are the results:

Algorithm Download Time Size
Broadband DSL 56k (like 2g)
LZMA 0:18.50 1:14.50 5:30.00 1.8 MB
BZ2 0:18:50 1:18.50 6:01.00 1.94 MB
GZIP 0:19.00 1:20.50 6:30.00 2.1 MB
Deflate 0:10.00 1:40.50 6:45.00 2.2 MB
RAW 0:08.50 5:00.00 24:50.50 8.3 MB

The differences in these results are more marked when dealing with larger clinics with 10,000's of patients and 100,000's of results. This illustrates the need for connection compression. The OpenIZ IMS always supported the submission and querying of data in compressed format, with these new schemes, it makes downloading a lot faster on 2g connections!

No comments:

Post a Comment

OpenIZ 2.0 / SanteDB Announcement

Well, it has been quite some time since the last OpenIZ blog post, and that is because we've been quite busy rolling our lessons learned...