Nexus multi-scrum teams mono-repo trunk-based release management and misconceptions.

by Fady S. Ghatas | Jun 5, 2022 | Management | 0 comments

Note: This post is highly opinionated based on our experience in multiple Nexus projects

A lot of teams struggle with releases! The argument is “the more complex the project gets, the more complex the releases are”. Once you have this symptom, there is most certainly something wrong and the project will suffer with scaling. Going from a single Scrum team to a Nexus with dozens of developers, is an even harder case.

A lot of processes exist for this problem, but as this problem covers a very large landscape from feature ideation and planning to shipping and deploying, we want to give our take on this and how most of release problems can be streamlined. This allows making the release complexity always linear and not get overly complicated when adding new teams/products to the project.

Our choice has been always Nexus for managing the teams, Monorepos for code structure and Trunk based development as the source control process.

Definitions

First, we will go through the meaning of Nexus, monorepos, and trunk based flow. We encourage you to get more information from their guides, but we will brief the definition of each.

Nexus

When you have less than 9 people working on a single product, scrum is the way to go even if they are working in 2 scrum teams. Once you have more than 9 people working on a single product or multiple products, Nexus is the way to go (you can use any other Scrum scaling process). Nexus is a framework that manages multiple projects/products, including the scrum events plus cross-team refinement to make sure you can detect, eliminate and resolve any cross-team dependency.

Monorepos

A monorepo is simply a single code repository that contains all your code. It can be as simple as a single product frontend, backend and all other single product stuff, or as complex as your whole company code in a single monorepo.

Trunk-based development

Trunk-based development is the most recent and highly adapted source control process, managing feature branches, trunks and releases. As well as all the processes around it.

Misconceptions

The first misconception we usually see with teams is the differences between increments, release candidates and releases. Also, we mention the difference between shippable user stories and releasable features.

Increments

The increment is the current features in your trunk right now. Whenever you merge more user stories, you end up with a new increment. An increment should always be shippable, being releasable is a different story that requires business involvement. So an Increment is the summation of all previous user stories merged in the trunk and ready to create a release candidate.

Release candidates

A release candidate (RC) is the plan, code and user stories of everything that should get into your next release. From the moment you get a green light of an increment readiness from business, a new RC is born. This usually get hosted on a staging pre-production environment, where all kind of testing should be done to ensure you can release it (This can be fully automated testing, manual testing, user acceptance testing on a subset of users, etc). From trunk-based and source control standpoint, your increment is always a release candidate specially if you use tag-based releasing in a high throughput team, but you can still have a virtual separation of increments and RCs if you followed a branch-based releasing flow.

Releases

Releases is what your main user base have access to, you might have versioned releases or bleeding edge releases. Those are still releases and at any point of time everyone in the company should understand what’s the currently accessible releases and each release supported features, if that’s not transparent, the whole thing might fall apart, as the code will represent something totally different than what customer support or account managers are responding for in tickets. Everyone in the company should prepare for a release (Even a single small feature release), because a tech company is represented as a blackbox of releases. Everything else is internal and isn’t exposed to the customers.

Shippable user-story

Any scrum user story must be shippable on its own, this doesn’t mean releasable as the functionality can be lacking, but shippable in a sense that it has (if needed) the feature flagging, backward compatibility and migrations that would allow it to be part of an IMMEDIATE release if required. You can follow a lot of different processes to make sure your user stories are always shippable, more on that later. “Shippable user stories” is a technical/product concept, should be totally transparent between tech team, product owner and business, in a sense that allows business later to decide on a releasable feature. A single or multiple user stories make a releasable feature.

Releasable feature

A releasable feature is something that an end user can use, it usually consists of a single user story or multiple user stories under a single epic/product or multiple user stories under multiple epics. This is a business decision that shouldn’t concern technical teams. As at any point of time, a feature can be released without an actual code delivery by using feature flags. Technical teams and product owners need to be only concerned about making sure they work on shippable user stories.

Common problems

All the common problems assume that you have a multiple scrum teams working on multiple products on the same monorepo and following trunk-based flow.

Why do I need processes at all? Can’t I use common sense?

Common sense does not scale. Every person common sense is different, and with a complex structure, you need a process followed by everyone, from C-level to developers so everyone can do their job.

If everything goes to the trunk, how can I select what can go live in the next release?

Feature flags.

I get a ton of conflicts when merging to the trunk, how can I fix that?

Make sure your code is modular and structured correctly, with common-libraries, product-libraries, app-libraries and apps scoped well defined.
Make sure your developers are following best practices, this allows minimal changes on common code between multiple teams, ensures less conflicts.
Finally make sure every feature branch is alive for no more than 2 or 3 days, this way you won’t end up with as many conflicts.

The feature branch doesn’t have all the required features, how can I merge it?

Who said you should? You must wait until it covers all acceptance criterias before you merge it, however, you need to do some analysis what happened and why it’s alive for too long without merging:

The branch is alive for more than a week and not done yet: This means splitting the feature into chunks and smaller units wasn’t done properly, in that case keep the branch open until the feature is ready, after that have an immediate mitigation plan how to make sure a feature is splitted into smaller units (User stories) and each user story is shippable and small enough to get into a couple of days with very specific tasks.
The features changed mid-sprint which requires more time: This happens a lot. A sprint is protected from major changes, so a scrum master or team lead should always protect the developers from these changes mid-sprint, if the user stories are shippable and feature flagged, just merge it and feature flag it off until a new user story with the changes get introduced to complete a releasable feature approved by business.

If a user story is not done yet, but it ended up in the trunk, what should I do?

This is usually a development team problem, and the ownership of the problem is on the team as a whole, splitting it to QA/Development within the team is against Scrum nature as the team should be totally self-organized and cross-functional to make sure the team can provide done stories with complete ownership.

With that being said, this is usually caused by the following:

Developers don’t have a feeling of real ownership, and basically they don’t give much attention to acceptance criteria and definition of done: Make sure you understand why they don’t have real sense of ownership, it’s your job as a leader to give them the sense of accomplishment and ownership so they can start caring more for bugs. It’s important to find it before it hits production. With that being said, always blame the process not the team.
Business and developers aren’t on the same page if it’s done or not: Revisit your DoD immediately and make sure everyone is on the same page, make sure the acceptance criterias are well explained.
Developers rely on business or external QA workforce to test because they don’t love testing: In that case, maybe a test-driven development (TDD) might prove to be better, developers love coding, so testing through coding in unit tests is much more entertaining if done properly. However, a team is not done with a task if it’s not tested correctly following the DoD.

Anyways, make sure on any instance of those, to revisit your process and do any changes accordingly.

If a user story is done but not shippable yet, and it ended up in the trunk, what should I do?

Note: The previous questions talk about a not done story that ended up in the trunk, this question is about a done story (From DoD and acceptance criteria standpoint) but for some reason didn’t pass UAT, or the business didn’t like it, or it’s not representing a shippable user story.

Done means the development team didn’t do any problem in shipping the user story, as it complied with the definition of done, however, this means a severe problem in how user stories are written, any single user story should be shippable on its own, meaning that if it requires feature flags, this must be done as part of the user story so it can be released even if the whole feature isn’t done yet. Make sure the product owners are aware of the problem and they must provide a user story with all needed functionality straight ahead.

If a change is huge and so hard to feature flag, what should I do?

There are a lot of ways that you can manage this problem, depending on what you want to achieve. Keep in mind we need the branch to be merged fast without breaking anything, so we will mention some cases and what you manage so as not to break other dependants or to rush into a big change that would break a module that was already working:

Database schemas

If you are introducing new column or table, this is straightforward and doesn’t need a feature flag as it won’t be accessible by old code and only new features can access it. If you are introducing a data change on the same column, you need to keep both old and new logic that handles dealing with data, including any difference in what the column represent, or you can introduce a new column for future versions and mark the old column for deprecation. The best thing we see on that side is being open to introduce new views as a way to add versioning to specific tables in DB, where a view does any required transformation on the fly. But this is not for every single case.

Entity Meaning

If there is a change in Entity meaning, this might mean the monorepo structure is not done correctly, the entity meaning should be always the same from the first day of the project, a car is always a car, can be extended and inherited from, but it’s meaning is the same.

If a developer at some point decided to use the same entity code to represent multiple meanings, this is a recipe for disaster, and most probably understanding of Entity planning is needed, as well as advanced concepts like OOP/DB Polymorphic Associations. A shared document of all entity and concepts meaning between everyone in the company is always needed.

Anyways, entity meaning can never change but a single field can change.

An entity field change.

This depends on how you handle entities, but mostly you can always introduce new virtual entity fields with setters and getters in case you changed what the field mean or what it represent so you can always keep backward compatibility.

API/Graphql parameter in request/response, query/mutation

This depends on a number of things, whether you use REST or Graphql as well as your versioning schema, whether it’s API-level versioning, endpoint-level versioning or field-level versioning.

We usually recommend adding new fields and mark old ones for deprecation, so old clients can still operate and new clients can use the new fields, unless you must replace the same field, then you have to do some higher level versioning.

In all cases, you don’t usually need a feature flag in that case, but you can still feature flag a whole endpoint/resolver or a field if needed.

Environment variable changes

If it’s a change in a value, do it in place. If it’s a change in the meaning, introduce a new environment variable and leave the old one and support moving between the two via a feature flag that defaults the old value so old systems can still operate. Always abstract environment variables layer by a layer of configuration so you can always have more control, where you can re-use some values in a number of ways without introducing new environment variables unnecessarily.

GUI changes

That’s the easiest thing to feature flag, just introduce your new GUI and feature flag it and that’s it.

Vendor/Provider/Library replacement

For example you want to move from Amazon SNS to Twilio, or from specific 3rd party API to another that does the same functionality.

The best way to replace a library or a vendor in trunk-based flow is branch by abstraction. It won’t take you ages to eventually merge it and it would be behind a feature flagging wall until it’s done so you can operate with old and new vendor whenever needed.

There is an unplanned release required, what should I do now?

Why is it unplanned? If it’s security patch or a broken flow, you can hotfix it, if it’s a new feature that was missing in the last release, just do a usual release either by tagging or creating a release branch, but the idea here is you need to make sure the team ships always done stories, so this unplanned release doesn’t mess up everything and you end up with incomplete features in the release. Again, make sure you use feature flags so you never end up with an increment that can’t be release (Blocked) until some other stories are done. This is a huge hit to the team throughput.

There is a hotfix required, what should I do now?

You have three options, the first two are recommended by the trunk-based development guide:

Fix on the trunk, tag a new version (If you release by tagging)
Fix on the trunk, cherry pick to the release branch (If you do release branches).
Fix directly in the release branch if you need it in that release only and not in all upcoming releases

There is a problem in the regression testing of a release candidate, what should I do now?

That’s normal. If the problem is a bug here or there, you can just fix it technically by one of two ways:

Fix in the trunk and create a new release branch with a minor version change
Fix in the trunk and cherry pick to the same release branch.

However if the problem is that the business doesn’t like the new features, then there is a much bigger and deeper problem. How did the team get a user story that business don’t want? This means you need a big change in ideation, discovery and grooming process.

Also, the problem might be the lack of automated testing, meaning that the new features work as expected but once integrated it break other parts, try to introduce more unit testing and integration testing to make sure the trunk is secured from getting breaking PRs.

There is a problem in an old version, how should I fix and release?

This depends on how you are setting up the CI/CD for your versioned clients (Like apps), but in both tag-based releases or branch-based releases, just go back to the tag/branch created for the old release, do the changes, release it back. Should be straight forward.

Some teams use feature flags to mark the versions, as in a specific feature is only available for specific versions, is not how it works and feature flags should only be used to

I’m following trunk-based, how can I have beta users?

You must do feature-flag segmentation, where you can do A/B testing based on specific criterias for a subset of users or based on the deployed to environment. This won’t require changes to any flows as everyone is accessing the same product but some people are seeing some stuff the others won’t.

A team worked on a feature that broke another team’s product, now what?

This depends on why that happened:

If this happened because the team didn’t know the feature needs some changes on the other team side from a product standpoint, this is a product owner problem as they should have done dependency analysis properly
If this happened due to a technically coupled code, this means code is not modular, not properly structured or architected in a bad way. In that case, tech leads need to figure out a better architecture following SOLID design principles.

My frontend scrum team and mobile team are waiting for backend scrum team to finish their work.

This is not scrum, all of those teams should be a single cross-functional team with an ability to create a shippable user story across all technologies, whatever the activities they do under a single team.

Merge all those teams and split based on products or end-to-end sections of the product not based on their knowledge or activity (FE, BE, DevOps, Mobile, QA, etc). A single developer being part of multiple scrum teams is fine, but splitting by activity and experience is not scrum at all, as a single team can’t product a done story end to end.

I’m trying my best to manage cross team dependencies efficiently, but it just doesn’t work!

Maybe that’s the problem! It’s always recommended to ELIMINATE dependencies instead of managing them. Make elimination your 1st option. This would allow a single team to work on items end to end without waiting for any other team.

By elimination we mean either converting to two separate shipibile features with help of feature flags where you only flag once those two parts are done but then the development team isn’t blocked, or by giving the ownership of the whole feature to a single team.

However, if you still can’t eliminate dependency, make it start to start (Dependant team starts the task after the other task starts) or finish to finish (The dependant team finishes the task after the other task is finished) dependency instead of Start to Finish dependency. Start to Finish dependency is the number one screwer of a critical path. Because the dependant team needs to wait until the other task finishes their work to start.

So, in short:

Eliminate by giving a single team the whole feature
Eliminate by converting the feature into two smaller independent features
Manage the dependency by converting to start-to-start or end-to-end dependency.

With all of that being said, it might be a process problem related to how you do cross-team refinement, not a dependency management problem, in that case make sure you read this paper thoroughly.

Conclusion

Don’t make your scaling a reason for a misery live for everyone, scaling should be a great thing for every startup, without converting the complexity of your releases to an exponential pain, it should always be linear complexity.

Nexus teams on a monorepo in a trunk-based development team managing feature-flag powered releases ensures a simplified releasing process if done right.