Blog

Home / Thoughts, sketches, dev cases

Infinite iterator

Power of iterators

Thursday, November 9, 2023

iterator

Comments

When big project is being developed, likewise when big house is being built, it is inevitable to address some needs that do not relate to the project itself — assembling/disassembling scaffoldings aroung the building, making railways for the cranes, making some storages for materials, some place for personnel etc. Something like this had happened to me in one of my past projects, which was quite big web API — at some moment we came to need to provide the QA team ability to test the API on big amount of fake data. To address this requirement we needed to create custom tool to fill up the database with thousands of records.

The main problem that pushed us to the idea of custom tool was the need to provide proper relationship between different kinds of the fake data. It is quite easy to fill up single table with some data using tools like Bogus or AutoFixture. However for the data to be consistent in the scope of particular business domain, generated parent-children data entities stored in different data tables must be linked by proper ids. Additional requirement from the QA team was about ability to add data incrementally, for example, when there are parent entities with ids 1, 2, 3, and depending of the test scenario they would like to test big number of children for parents 1 and 2 but leave 3 intact. But later they would like to add children only for parent 3 and do not touch existing data for 1 and 2.

The domain that the API was built for was land management, so it included entities like land titles, land owners, stakeholders, countries, regions, business projects on the land, all sorts of communication events between them all, tracking project statuses depending on agreements and permissions obtained from owners, businesses and management team. Many of the low-level data entities were linked to 3-4 parent entities, and such consistency had to be maintained in thousands fake data records that in different combinations could be either generated on the fly or used existing ids of entities stored in database.

The request could be like "I need to generate 1000 communication events of type 1, 2, 3 for any stakeholder that belong to project 42 and is linked to existing land titles 50..80. Also I want any of them be from country 10 or 11". Here we presume that database already contains necessary event types, projects, land titles and countries. The task is to generate partially fake communication event records that will contain real ids to establish valid relations with existing data.

After some initial thinking I came to idea that this generator tool needs to be buiilt as two-layer application. The core of it will be small framework that will analyze input parameters and provide some infinite abstract collections of proper ids to the upper layer, that will consume those collections to generate fake fields values for the required entity without any bothering of where those ids actually came from. This way, having such separation of concerns, other team members later on will be able to create new generators for the data entities that for that time were not even implemented, not digging into machinery of the core implementation (that I was doing by myself).

My way of thinking was that the consumer of the generator's core API should not be concerned about obtaining collection, dealing with randimization to pick some id from it, even knowing how many elements in there. The consumer has just to make call like .GetNextId() on the provided collection, just having promise from the API that returned element will be in range or list of required numbers and actually exist in the database. It does not matter how many, 10 or 10000 elements need to be generated, the consumer will just do .GetNextId() as many times as required, and provided numbers will be randomly distributed across given range.


for (var i = 0; i < count; i++)
{
    var projectId    = (await _projectIdProvider   .FromIds()).GetNextId();
    var statusId     = (await _statusIdProvider    .FromIds()).GetNextId();
    var salutationId = (await _salutationIdProvider.FromIds()).GetNextId();
    var titleId      = (await _titleIdProvider     .FromIds()).GetNextId();
    var countryId    = (await _countryIdProvider   .FromIds()).GetNextId();

    // ...........
	
	// Bogus stuff for individual data fields of the generated record
}

So the essence of that infinite abstract collection was the thing called "iterator", or Enumerator in c#. The idea of the custom Enumerator was that its .MoveNext() method, consumed by the upper-level generator, never reaches the end. So it does not matter either we need to generate 1000 fake entities linked by 10 existing keys, or generate 10 fake entities linked to range of 1000 existing keys — writer of the generator needs just to place _provider.FromIds().GetNextId() in the loop of how many iterations are needed.

FromIds() method renders initial collection of ids depending on if their entity type was given or omitted in the parameters during app call.


    public async Task> FromIds() =>
        (_input.ToList() switch
        {
            [] => (await _repository.GetStakeholdersAsync())							// if not specified in params, get all the entity ids from db
                .Data.Value.Select(x => x.StakeholderId).Randomize(),

            [..] ids => (await _repository.GetExistingStakeholderIdsAsync(ids))			// if specified, check provided ids for existance and use them
                .Data.Select(x => x).Randomize()
        });

Actual infinite iterator has been implemented in Randomize() method.


public static class CollectionRandomizer
{
    public static IEnumerator Randomize(this IEnumerable input)
    {
        var random = new Random();
        var list = input.ToList();
        if (!list.Any()) yield break;													// Edge case — empty input collection, generator will fail fast
        while (true) yield return list[random.Next(0, list.Count)];
    }

    public static int GetNextId(this IEnumerator input)
    {
        input.MoveNext();

        return input.Current;
    }

    // When a generator needs to get several ids at once
    public static IEnumerable GetNextIds(this IEnumerator input, int count)
    {
        while (input.MoveNext() && count-- > 0) yield return input.Current;
    }
}

This way infinity of the abstract collection of ids is provided.

© theyur.dev. All Rights Reserved. Designed by HTML Codex