[DRAFT] Conditional expansion and SQL column compression

2022-03-05 05:26:31

Migrating tables is never fun

As part of an ongoing migration from an on-premise solution to a cloud-hosted service layer, I have found myself in the middle of an Agile development cycle where migrating the old to the new means dismantling the temple and finally looking at the idols.  Like many engineering marvels of old, the road to Hell was and remains paved with good intentions and it is time to for a healthy demolition session.  The table in question utilizes five columns for value holders and two columns type discriminators to represent a handful of types and value arrays of no more than three entries.  While this is fine and even reasonable for the backbone of a report or a dashboard view to easily isolate values by type, the problem arises in enforcing business rules within the table's own technology and allowing new business cases for both types and value combinations.  The client presented this case of design paralysis some time ago and through compromises on all sides a proposed new schema was determined to be roughly—modified with some light pseudo-SQL for the sake of hiding the client's details:

RULE_TYPE varchar(64) NOT NULL CHECK (RULE_TYPE IN ('TYPE_1', 'TYPE_2', 'TYPE_3', ... 'TYPE_N', 'INVALID')),
RULE_VALUE_1 varchar(255) NOT NULL DEFAULT '',
RULE_VALUE_2 varchar(255) NOT NULL DEFAULT '',
RULE_VALUE_3 varchar(255) NOT NULL DEFAULT '',

The above CREATE statement hints at some of the business behavior that is being enforced in the new schema.  The rule types RULE_TYPE are enforced within the table's constraints and are far more manageable as a single list.  While some arguments could be made for the value of reducing the types into the two separate factors required to derive a rule record's type, several problems stemmed from invalid combinations of the history two column solution.  Not only were only a small number of combinations of the old column set valid, for example "A" and "B" may have been a valid type column pair but "A" and "D" was invalid, but the users—business analysts and contract managers—did not see these internal values.  Human readable names were derived in the service layer based on the data access layer's response and a vast majority of these relied on assumptions about the data and unfortunately the occasional guess as to what to type to identify a rule.  In the process of trying to fix the data, more problems had arisen historical as the database administration team was not knowledgeable of the business rules obscured in the table schema and rule type pairs, their associated value holders, and the risk of invalid combinations that, as often found in legacy software, were not captured in any form of testing or quality assurance.  The move to capturing types in a single column and ensuring that there can be no more than three values set for a rule which opens up simplified check constraints for these values and rule types seems logical to all parties involved currently and the challenge now stands bare.  While reducing two columns into a single column is a simple multiplexer presented via a CASE expression, reducing five columns into three columns while preventing overlapping values is a challenge to compact into a single query to avoid complex procedures and possible data inconsistencies during tight maintenance windows for a company that is always available across numerous time zones.

Boolean Logic and Case Multiplexing

TODO!!! :D

-- Devin

Tagged with:

draftlong postsqlwork

Edit 🔒