0

What are the best methods to debug and fix a bug in production?

Profile picture
Software Engineer at Government7 months ago

These are tremendously tricky to deal with, so what are your best strategies to navigate these pesky bugs?

Under the assumption: it only occurs in the highest level prod environment.

62
6

Discussion

(6 comments)
  • 3
    Profile picture
    Tech Lead/Manager at Meta, Pinterest, Kosei
    7 months ago

    How long does it take to deploy code, and what's the cost of the bug in production?

    If the deployment speed is very fast and the bug is not very severe, one option is to just make (educated) guesses with blind fixes, e.g. guard different method calls or add checks.

    The more methodical fix is to add logging and instrumentation so you can see where the bug is coming from. Use something like posthog or a log collection tool to figure out what the errors are.

    See this case study from Meta: https://www.jointaro.com/lesson/xAVw3j6fAB1GR9LUnq8n/meta-case-study-debugging-a-massive-production-issue/

  • 2
    Profile picture
    Tech Lead @ Robinhood, Meta, Course Hero
    7 months ago

    Here's my overall process to fix any bug:

    1. Understand the end-to-end flow (break it down into steps)
    2. Go through each step and see if it works/breaks
    3. Once you find the breaking step, analyze the code and find the fix

    If you can never find the exact breaking step, you probably have to decompose it down more into sub-steps.

    In other words, follow the advice from here: [Masterclass] How To Become A Debugging Master And Fix Issues Faster

    The very tricky part is when you have limited observability, so it's hard to figure out if a particular step is breaking or even what the exact steps are. In that case, I recommend sharing more exact context when you ask for support in Taro, and we can creatively jam on some ideas 😊

    • 3
      Profile picture
      Friendly Tarodactyl
      Taro Community
      7 months ago

      I can vouch for the limited observability part. Sometimes mobile app problems are so much harder to fix than backend. We can't ssh into users phone, while we can easily have root control of backend server. Mobile app uses different libraries in different phones, while we have 100% control in backend

    • 1
      Profile picture
      Tech Lead @ Robinhood, Meta, Course Hero
      7 months ago

      Mobile issues can indeed be very gnarly, especially on Android. If you have a more global app (like Instagram where I spent the biggest chunk of my career), there are going to be a lot of users on janky old phones that are on an ancient version of the Android OS and have a weird screen size. Pain.

    • 0
      Profile picture
      Software Engineer [OP]
      Government
      7 months ago

      This particular issue;

      • Intermittent
      • On Mobile
      • It can only be tested if built to the phone (no f5 debugging)
      • Testing on the Google Play Store vs building directly to the phone results in various failure %s

      So a very tricky issue.

      I might have fixed it now but I ended up implementing:

      • Logging service that sends errors and stacks to a local API and displays physical toasts
      • Code push to more quickly deploy the app to the Play Store
      • Posthog
      • Firebase crashlytics (this was a gamechanger as it was erroring in an abstraction layer via a firebase auth package whose error couldn't easily be observed)
  • 1
    Profile picture
    Software Engineer [OP]
    Government
    6 months ago

    Follow up: FIXED

    The steps I followed:

    Google sign-in wasn't working and (the package) would return a vague error.

    • After implementing custom error logging and forking the package I got an error that was associated with incorrect keystores.
    • I implemented code push, tested in different envs etc.
    • I then changed everything from Firebase to Supabase.
    • Tried recreating it as a single project
    • Would always work when you built it but never if downloaded through the store
    • Used every single AI tool
    • Tried implementing fb login - Facebook sign-in prod gives me an error about key hashes not matching
    • 1 random Stackoverflow post mentioning Gplay signing
    • I was using Google Play automatic signing and so the App on Gplay was using a different keystore
    • I added that SHA1 and it worked