How to Improve Broken/Flaky End to End Tests for my team

Entry-Level Software Engineer [E3] at Meta7 months ago

Code Quality

Junior Engineer

Discussion

(3 comments)

3
Jonathan C
•Engineer @ Robinhood
7 months ago
Can you try fixing 1 test? Set a hardstop 1-2 weeks in the future, annoy whatever people can help fix it, and then document what you tried? This will compliment what Rahul is suggesting: what you're looking to define is yield (value/effort). If the yield is low (value is very low or effort is too high), that will explain why no one fixes the tests. In that case, I'd just delete them and measure if that decreases build times.
1
Rahul Pandey
•Tech Lead/Manager at Meta, Pinterest, Kosei
7 months ago
IIRC, Meta had a system to automatically disable flaky tests, no? Why is this not kicking in here? I'm sure this depends on the codebase you're dealing with.

The first question to ask is: "why does this matter?" When a test breaks, how much time/energy goes into fixing it? Do other people on the team view this as a problem?

This sounds like a reasonable thing to spend time on, but it could also be a rabbit hole that doesn't actually yield anything fruitful for you. The worst outcome is if you spend a bunch of time on this and no one cares.

My recommendation is to write a very thorough Workplace post documenting:

The problem: what's happening and how long it's been going on for

Research you've done about why this is happening

The negative impact stemming from the problem

Ask for feedback or suggestions on next steps (and propose a few ideas)

At a minimum, you will learn a lot from making this post. Tag relevant people, and you may get valuable feedback to decide if you want to invest further in fixing it.

I talk more about my strategy around comms here: [Case Study] Effective Communication: Leading A Multi-Org Re-architecture At Meta
0
Alex Chiou
•Tech Lead @ Robinhood, Meta, Course Hero
7 months ago
Every Big Tech company is filled to the brim with flaky tests. With flaky tests, you have 3 options:

Fix them

Ignore them and keep suffering

Delete them

It's easy at a company like Meta where you're always heads down with roadmap work to do #2. I saw this all the time with E3s and E4s. However, this goes against the spirit of Meta and top engineers overall as you aren't taking any action. It feels painful, but it's way better to do #1 and #3 as a "1 step backward, 2 steps forward" type thing.

In general, I'm a fan of trying to save things, especially with code quality. As Jonathan mentioned, set aside some time to just fix ONE test. This is literally the perfect time of the year to do this Better Engineering work as you are in code freeze right now. From there, you can make an informed call on whether to do #1 (create a playbook and repeat) or #3.

If you can pull this off (either getting buy-in to do #1 or #3), this will be a shining gem on your E3 packet. This is more like advanced E4 behavior.

Side note: End-to-end tests suck and are tremendously overrated. We had a similar problem at Instagram. We made the call to effectively delete end-to-end tests and break them down into unit tests and snapshot tests.

Here's some additional nice reading material for you: "What do mobile testing strategies look like at top tech companies?"