1

How to debug an API faster?

Profile picture
Machine Learning Engineer at Taro Community10 months ago

I am currently doing some ETL (extract transform load)

But the Transform part of the operation takes a minute or two. I am having issues with the "load" part of my script where I need to update a DB with the transformed data

Pain point: It takes me a minute or two to get the transform outputs and it really slows down my coding velocity + feedback loop bc most of my time is spent waiting for the code to run

117
2

Discussion

(2 comments)
  • 1
    Profile picture
    Tech Lead @ Robinhood, Meta, Course Hero
    9 months ago

    What kind of mocking infrastructure do you have? Because if you know that the "Load" part is broken, just isolate that. Instead of waiting for "Extract" and "Transform" every time, try to plug-in some hard-coded data into the "Load" part and see how it breaks.

    If you can't mock it, then try to shrink the input. If the "Load" part is really buggy, it will probably break both when you have 1 million rows of data and when you have just 1,000 rows of data. Slash your data set down by 90% or 99% and see what happens.

    At a high-level, the process for debugging is like this:

    1. Figure out the end-to-end flow (in your case, this is easy as you're doing ETL)
    2. Test every step of the end-to-end flow until you figure out which step is broken (for you, it's L)
    3. Isolate that step so it's ideally the only thing that runs
    4. Add a ton of print statements and debugger hooks to that isolated step until you figure out the exact piece that is broken

    Our debugging masterclass covers this far more in-depth. Check it out: [Masterclass] How To Become A Debugging Master And Fix Issues Faster

  • 1
    Profile picture
    Eng @ Taro
    9 months ago

    I agree with breaking down the process into granular pieces and focusing on fixing the relevant, breaking portion.

    Here are some things you can try:

    • Save the transformed data into an output file and use that outputted file as the input into the load step
    • Break down the input from the output file into small page sizes and load those smaller page sizes into the load step (similar to what Alex is saying)