2019/09 - How to (begin) debug(ging ) a back-end problem

Hey folks!


Today we'll be going through how to debug a backend bug. Depending on the health of your codebase and company's processes, a lot of this might not be neccessary. Indeed you might have actually caused the bug with a new feature and know how to fix it straight away (Or know the culprit and be preparing your condemnation already). This guide isn't for those kinds of bugs, but moreso for fixing bugs where you're in a tougher situation. Maybe you don't have experienced team members that know the codebase well, or otherwise don't have access to them, or are limited in your access. If that's the case and you're dealing with lots of legacy code, then this guide should prove more useful.


Pre-step: Be SURE to pay attention to the data that the reporter was using, as they could be using nonsensical data that would never be used in the real world. I know it's not ideal, but sometimes projects become so huge that reported "bugs" are just features that should have been implemented, but never were. Weird data (Like testing a feature with dates for records that could not possibly exist) can bring these bugs up, though it's not always obvious why someone would do so if the (bad) health of a codebase is common knowledge. Just remember that everyone makes mistakes, and as an old manager of mine used to say "Never attribute to malice what can be attributed to stupidity", so don't feel bad when it's your time to do something silly too. The reporter might not have done their due diligence with regards to exactly why they're using that particular data to replicate that bug, which is why the second pre-step you take should be to set up a call with the reporter and ask them to replicate it right in front of you. I actually had someone who knew the codebase much better than me (Several years experience to my meagre few months at the time) slow his talk-through of a bug to a crawl as he realised that he hadn't actually found a bug and was just misusing valid functionality (It can happen to anyone!).


1. Check the logs ASAP! If you're getting an exception, that's great! You can search your project in your IDE with the exception/error message (Many frameworks these days have dedicated files for error codes), or use agent ransack to search for the message across multiple projects if you think the bug might not originate from your current project. Even if the logs imply that they're hitting unrelated classes and aren't useful, keep them in mind. They could be useful in the next step. These will hopefully give you the exact classes where the error is thrown and is a much faster method than setting up your local machine for testing, debugging remotely, etc. Time is a factor when fixing bugs, as they can take weeks to fix, or substantially less time depending on how quickly you can get to the crucial information that you need in order to solve the bug (The exact cause and scenario that allows the defect to occur). Logs are often wiped from servers quite regularly too, hence why I stress the speed with which you check them. The most I've seen them saved for on a server that's actually hosting an application is probably a bit longer than a month or two. The least amount of time is one week, but involved a kind of archiving process for said logs. If you can't replicate the bug, you'll have to rely on the logs from whenever the original reporter experienced the bug. And if they're already gone, and you can't replicate it from the reporter's Jira steps... That's more of your time eaten up!


2. Find the environment that the bug was reported on and replicate it. If your company doesn't have test servers, this could be more difficult as whoever reported the bug may have a different configuration to you. Keep an eye on the classes seen in the previous step too, as they can lead you to the 'problem area' of the codebase if you can't reach the people who can help you immediately (due to timezone difference, illness, etc). You might have access to replication steps depending on your company also, but if its not possible to replicate, contact whoever raised the bug and have them replicate it in front of you (Screenshare on skype, teams, even whatsapp video call if you have to. I've used Facebook to call people across the Atlantic when firewall rules got in the way, so don't feel weird about using personal communication software to get to the info you need). If they can't replicate it, it could be a non-issue extreme edge case, or you may need to figure out what changed between them finding/reporting the bug and you working on the replication. Sometimes there are much better ways to spend your time than fixing a minor visual IE11 bug that only appears at exactly 00:54:03 every seventh Saturday of the quarter. Your manager will likely have much more important work for you, so let them know! Though it's worth mentioning that if a client reports such a bug, and definitely wants it for the next work package, it will be given more priority than you think it deserves (And you'll have to treat it accordingly).


3. Debug. Hopefully you can remotely debug on whatever environment the bug was reported on. If not, do everything you can to replicate the conditions, configuration, data setup and anything else you can think of for your local machine, then try to replicate it. If you can't replicate it despite having the exact same configuration as the reporter, then you're either going to have to leave the bug in the backlog (Terrible idea, but you could if you were desperate and the bug wasn't a show stopper) or use the reporters laptop to debug the issue. I would honestly recommend exchanging laptops temporarily if you choose to do the latter, sometimes a bug really just HAS to be fixed! If you still can't replicate it after that, then it could have just been a one-time glitch (A "ghost in the machine").


4. (Sometimes optional) Write your test(s) to make sure this doesn't happen again. You can't really test visual bugs most of the time (Unless your employer loves Selenium tests performed by random software engineers and not QA... or your company is small enough and the bug is problematic enough to allow for you to do so). This is also the fastest way I've found to repeatedly test your bug fix., and sometimes the fastest way to replicate a bug if you know the codebase well enough. Always replicate it the same way that the reporter did at least a few times though!



I've followed these steps pretty rigorously in the past and they've served me well. Time is truely of the essence when fixing bugs, because it's very easy to get stuck in the mud when you're dealing with a lot of complexity and a problem potentially spans several domains. If you just remember these steps and do them FAST, you'll find your debugging life much simpler and easygoing.


Good luck!