Home / Thoughts, sketches, dev cases

SQL statement, operator WITH

Multi-paradigm SQL query

Sunday, February 5, 2023



I'm not a big fan of "powerful" business logic in stored SQL procedures, first of all because of difficulty with unit test coverage. However leads on my current project stick to different opinion, claiming that the less the API will invoke the SQL server, the more overall productivity the system will have. And benefits of this could be big enough to sacrifice unit testing of such logic.

One day I've got a quite interesting task solving which I've developed the longest SQL-query for all my career. The task was next:

  • web application processes entities called "stakeholder";
  • those stakeholders could be of two types — person or organization;
  • stakeholders could have different relationships, specifically there are two of them:
    • person-to-organization as "employee-employer";
    • organization to person as "employer-employee";
  • every stakeholder has, amongst other, property "sentiment";
  • those sentiments reflect a color - GREEN, AMBER, RED or NOT ENGAGED; the color is determined by some business parameters;
  • sentiment values are stored as char, that is 'G', 'A', 'R', 'N' respectively;
  • person stakehlders get their sentiment value directly from the application user;
  • but the organization stakeholders obtain their sentiment from calculation, relatively to proportional numbers of sentiments from related person stakeholders which does not have NOT ENGAGED value:
    • when 10% or more person stakeholders have RED sentiment, the organization will also have RED;
    • otherwise when 80% or more of person stakehlders have GREEN, the organization will have GREEN;
    • otherwise when there is any person with AMBER, related organization will also have AMBER sentiment;
    • otherwise NOT ENGAGED.

At my first glance, big number of conditions and the need to aggregate on different criteria at once to return single result value, drawing thoughts to implement the algorithm in procedural way on the API side, using bunch of if-then-else's. However, as I mentioned at the beginning, development approach on this project required somewhat else, so to solve this task I mentally left territory of imperative programming and switched to the side of declarative SQL.

Although requirement was to implement the task as a stored procedure, and because of this I was able to implement algorithm in semi-imperative way, like "fetch some data, analyze, make some decision, then again data, analysis, decision and so on", I decided to develop it as SQL-ish as possible, obtaining somewhat new experience of how far declarative approach can go.

Finally yes, I've got what I wanted, with the sngle SQL request, and have it as a personal level up. I like problems solving which the mind pave the way through winding path amongst numerous thoughts and someties unexpected opportunities.

So the SQL request looks like this (MS SQL dialect):

CREATE PROCEDURE [dbo].[UpdateStakeholderOrganisationSentiment] @StakeholderOrganisationId int
    DECLARE @ThresholdRED INT = 10;
    DECLARE @ThresholdGREEN INT = 80;

    DECLARE @SentimentChar as CHAR;

    WITH data(orgId, personId, sentiment)
             AS (SELECT so.StakeholderId as StakeholderOrganisationId,
                        sr.StakeholderId as StakeholderPersonId,
                 FROM StakeholderRelationship sr
                          INNER JOIN StakeholderOrganisation so
                                     ON sr.RelatedStakeholderId = so.StakeholderId
                                         AND sr.StakeholderRelationshipTypeId = 5 -- 'employee' relationship type
                                         AND sr.RelatedStakeholderId = @StakeholderOrganisationId
                          INNER JOIN Stakeholder s
                                     ON S.StakeholderId = sr.StakeholderId
                                         AND s.Sentiment != 'N'),
         percents(S, P) AS (SELECT sentiment, COUNT(sentiment) * 100 / (SELECT COUNT(*) FROM data) as Percentage
                            FROM data
                            GROUP BY sentiment),
         -- according to the current calculation logic the final Organization Sentiment value will have:
         -- R (RED) if number of RED Sentiments in the related Person stakeholders
         --  will be equal or more than @ThresholdRED percent.
         -- otherwise it will be G (GREEN) if number of GREEN Sentiments in the related Person stakeholders
         --  will be equal or more than @ThresholdGREEN percent.
         -- otherwise it will be A (AMBER) if there any AMBER Sentiments in the related Person stakeholders
         -- otherwise it will be N (NOT ENGAGED)
         result (c) AS (SELECT 'R' c
                        FROM percents
                        WHERE S = 'R'
                          AND P >= @ThresholdRED
                        SELECT 'G' c
                        FROM percents
                        WHERE S = 'G'
                          AND P >= @ThresholdGREEN
                        SELECT 'A' c
                        FROM percents
                        WHERE S = 'A'
                        SELECT '0')
    SELECT TOP 1 @SentimentChar = c
    FROM result

    UPDATE dbo.Stakeholder
    SET Sentiment = IIF(@SentimentChar = '0', 'N', @SentimentChar)
    WHERE StakeholderId = @StakeholderOrganisationId

As it can be seen, the procedure has only two statements: calculation for the sentiment value and storing it to the table. And that big query where sentiment value is being calculated became the longest SQL query for me so far :)

From that big WITH operator, apart from been developeing declaratively, I've got feeling of working in functional paradigm, when each part of it, from begin to end, prepares some result which is passed to the next part for further deforming on its way to the final value. This is why the article is named "Multi-paradigm SQL query".

So, first part of the WITH operator selects set of person stakehlders related to the given organization stakeholder with their sentiments. This set is named just 'data'. Second part returns set named 'percents', it contains percentages of different sentiments for their total number from the previous part. And finally last part 'result' returns the sentient value for the organization. To obtain this value I used some trick — I noticed that the reverse order of letters that mark different sentiments matches with priority of their selection. So that if 'R' is there, no need to pull 'G'; if 'G' is there, no need for 'A'. So if sort the resulting sentiment set in reverse order, it will be enough to take the first record to get the proper result. Only 'N' for 'not engaged' outstands this approach, but I worked this around with temporary char '0' which has the least code numberin the character table for letters that are used in this algorithm.

Thanks for reading!


© All Rights Reserved. Designed by HTML Codex