Hi, I was being assigned to a new team as a lead 2 weeks ago. I am not familiar with the team's existing system. There are many bits and pieces that I find non-ideal.
For example, existing monitoring and alerting are lacking and noisy at the same time, integration test is non-existent, unit test coverage is low and many important code paths are not being tested.
How should I tackle these tech debts and still deliver business features? Happy to know your thoughts
The best advice I can give is to eliminate toil [1]. Non-ideal does not necessarily mean it's something you need to change.
Beyond that, the standard evaluation of where you are, where you're going, and evolving the system to get there is always necessary.
For monitoring and alerting - aim for actionable alerts first (raises on-call when there's either a known or unknown event), then warnings (slack channel message, for SLOs, not SLAs) second. It's better to be more aware than unaware at first, as it will help your team quantify the situation, so play with the thresholds.
Testing is always a balance - I'd suggest ensuring that the critical paths of your system are covered, then only strive to do better from there.
As far as scheduling to tackle the tech debt, work toward including it in the requirements for future work. You must address X to build Y, etc. Addressing technical debt as a project that leads to no new end result (eliminating toil is one), has little (but not zero) value.
> As far as scheduling to tackle the tech debt, work toward including it in the requirements for future work. You must address X to build Y, etc. Addressing technical debt as a project that leads to no new end result (eliminating toil is one), has little (but not zero) value.
This advice is very actionable, I'll try it out, thanks for the tip!
Pick a level of effort that can be applied to tackle the debt, and start with some of the smaller things, working to the bigger things. You often get more bang for the buck on the smaller things, and it can help energize the team.
Try to figure out why the debt is accumulating and fix the habits that are increasing debt. For alerts, as new features are completed, have a checklist for them so that they get reviewed and the feature is not complete until the alerts are necessary and proper. What is the criteria for start and end of alert? Does the alert automatically deactivate?
For tests, how is a feature being merged without adequate test?
You may need to set different expectations for your leadership as to how long things take as you're no longer cutting corners and also making up for previous cutting of corners.
If you're not familiar with the system, why are you insistent on changing it? And since you're new to the team, what's the rush to tell everyone their work is poor? What would be your reaction if you were in their place? How do you think your approach will play out in the long term?
Sorry that I present myself to be critical to the team, I am not and I am impressed with how much they achieved given the time frame. In fact, all of the team are in agreement on most of the issues, but I am not entirely sure what to prioritize and how to balance it with the regular feature work
For example, existing monitoring and alerting are lacking and noisy at the same time, integration test is non-existent, unit test coverage is low and many important code paths are not being tested.
How should I tackle these tech debts and still deliver business features? Happy to know your thoughts