I'm trying to break into AI as a Machine Learning Engineer. I want to demonstrate expertise in AI/ML topics and capture a Wow factor by contributing to a well-known open source AI project. I'm looking for suggestions spanning CV, NLP, tabular data and collaborative filtering -- not excluding but not limited to the latest GenAI stuff. Some context: I have an MS in CS, did CV research, and completed Jeremy Howard's Practical Deep Learning with fastai and PyTorch. The fastai library itself is a popular target for a first contribution. See: How to Make Open Source Contributions to fastai (Hamel Husain) Pull requests made easy Contributing to fastai fast.ai Discord server (if the link doesn't work, it's also discoverable: search fast.ai). #fastai-dev is the contributors' channel. One concern is identifying something that's non-trivial but tractable for a relative newbie. I realize there's tension between this and achieving that Wow factor at least in the beginning. I'm not sure a contribution to, say, PyTorch or scikit-learn is achievable from where I stand, but I could be wrong. Perspective on how to spot/scope opportunities would be appreciated too. Thanks very much! Potentially relevant questions on Taro: Side projects vs open source How to start contributing to open source Determining whether something is a major PR for Big Tech open source Getting users for your open source projects Promoting your open source projects

I was in your same position exactly a year ago, so this hits really close to home. It was so hard to get interviews that even with spam applying, I only got 2 callbacks in a month. Then I started contributing to open-source, and things started looking up. Here's some pointers: For those who are later in their career, I think these principles still apply. The impact of doing this earlier in your career has more of a 0-1 effect than the 1-5 effect someone later in their career would see IMO Pick A Niche So there's a deluge of MSCS graduates who want to work in AI. You need to focus on a single direction. There's several routes you can take: Model Serving Classic ML model development (collaborative filtering, etc) Large ML models (LLMs, stable diffusion, etc) Fine-tuning Generative AI tools ML infra tools like Flyte/Airflow Classic ML model dev and Large Model work typically require top conference papers to be in the top 1% Before you dive into any route, make sure you do some due diligence so that you know roughly what it takes to be the top 1% in anything. Then decide if you're willing to put in the work. Pick The Project Size Then there's size of the open source project. Small libraries/frameworks are too unknown to get you any real attention with recruiters. Large libraries/frameworks will take 1-2 years to get on the PMC. Small projects are things like util libraries. Large projects are established frameworks like PyTorch or Kubernetes. The goldilocks zone IMO is medium-size projects that are rapidly gaining traction. They have enough things on their roadmap that'll add immediate value and they should be the most welcoming of new contributors. Making Core Contributions Start with something small. No sane person would ever let a stranger add changes to their core codebase. Start with something small like a cherry-pick commit or an integration. Then it's time to align your contributions to the actual product roadmap. Attend contributor meetings. Talk to other contributors on Slack/Discord. Get involved. As you contribute, you'll see yourself go up the top contributors list. You'll see yourself writing better PRs, writing cleaner code, and working on more impactful features. Spinning Open-source Into Opportunities Getting to the top 10 or top 5 contributors for a medium-size library doesn't take that long. Even in the span of 1-2 months, you can get to Top 5 if you put in full-time effort. Now you have some awesome résumé bullet points: Built core features: X, Y, Z, etc Top 5 contributor for Github project with 20k stars and 500K downloads/month Now go! Find a project and contribute. You'll learn a lot + be helping out the community. I think that you can only re-write your resume so many times before you need to make a fundamental change. Making open-source contributions can help you make that fundamental change to your profile.

Suggestions for good open-source AI projects I can contribute to?

Entry-Level Software Engineer at Other10 months ago

Junior Engineer

Open Source

I'm trying to break into AI as a Machine Learning Engineer. I want to demonstrate expertise in AI/ML topics and capture a "Wow" factor by contributing to a well-known open source AI project.

I'm looking for suggestions spanning CV, NLP, tabular data and collaborative filtering -- not excluding but not limited to the latest GenAI stuff.

Some context: I have an MS in CS, did CV research, and completed Jeremy Howard's "Practical Deep Learning with fastai and PyTorch".

The fastai library itself is a popular target for a first contribution. See:

How to Make Open Source Contributions to fastai (Hamel Husain)
- Pull requests made easy
- Contributing to fastai
fast.ai Discord server (if the link doesn't work, it's also discoverable: search "fast.ai"). #fastai-dev is the contributors' channel.

One concern is identifying something that's non-trivial but tractable for a relative newbie. I realize there's tension between this and achieving that "Wow" factor at least in the beginning. I'm not sure a contribution to, say, PyTorch or scikit-learn is achievable from where I stand, but I could be wrong. Perspective on how to spot/scope opportunities would be appreciated too.

Thanks very much!

Potentially relevant questions on Taro:

3.6K3.6K Views

99 Comments

Discussion

(9 comments)

25
Elliot Kang
•Entry-Level Software Engineer at Seed Startup
10 months ago
I was in your same position exactly a year ago, so this hits really close to home. It was so hard to get interviews that even with spam applying, I only got 2 callbacks in a month. Then I started contributing to open-source, and things started looking up. Here's some pointers:

**For those who are later in their career, I think these principles still apply. The impact of doing this earlier in your career has more of a 0-1 effect than the 1-5 effect someone later in their career would see IMO

Pick A Niche

So there's a deluge of MSCS graduates who want to work in AI. You need to focus on a single direction. There's several routes you can take:

Model Serving

Classic ML model development (collaborative filtering, etc)

Large ML models (LLMs, stable diffusion, etc)

Fine-tuning

Generative AI tools

ML infra tools like Flyte/Airflow

Classic ML model dev and Large Model work typically require top conference papers to be in the top 1%

Before you dive into any route, make sure you do some due diligence so that you know roughly what it takes to be the top 1% in anything. Then decide if you're willing to put in the work.

Pick The Project Size

Then there's size of the open source project. Small libraries/frameworks are too unknown to get you any real attention with recruiters. Large libraries/frameworks will take 1-2 years to get on the PMC.

Small projects are things like util libraries. Large projects are established frameworks like PyTorch or Kubernetes.

The "goldilocks zone" IMO is medium-size projects that are rapidly gaining traction. They have enough things on their roadmap that'll add immediate value and they should be the most welcoming of new contributors.

Making Core Contributions

Start with something small. No sane person would ever let a stranger add changes to their core codebase. Start with something small like a cherry-pick commit or an integration.

Then it's time to align your contributions to the actual product roadmap. Attend contributor meetings. Talk to other contributors on Slack/Discord. Get involved.

As you contribute, you'll see yourself go up the "top contributors" list. You'll see yourself writing better PRs, writing cleaner code, and working on more impactful features.

Spinning Open-source Into Opportunities

Getting to the top 10 or top 5 contributors for a medium-size library doesn't take that long. Even in the span of 1-2 months, you can get to Top 5 if you put in full-time effort.

Now you have some awesome résumé bullet points:

Built core features: X, Y, Z, etc

Top 5 contributor for Github project with 20k stars and 500K downloads/month

Now go! Find a project and contribute. You'll learn a lot + be helping out the community. I think that you can only re-write your resume so many times before you need to make a fundamental change. Making open-source contributions can help you make that fundamental change to your profile.
- 1
  Alex Chiou
  •Tech Lead @ Robinhood, Meta, Course Hero
  10 months ago
  Wow, this is literally one of the best things I have ever read - Thank you so much Elliot for sharing your wisdom!
- 2
  Entry-Level Software Engineer [OP]
  •Other
  10 months ago
  Thanks for sharing, Elliot. It's encouraging to hear from someone who's been in your shoes and made it to the other side. I'll put your suggestions to work. I appreciate you.
- 0
  Thoughtful Tarodactyl
  •Taro Community
  10 months ago
  This is so helpful! Elliot, would you be open to sharing more details on how this helped get more interviews? like how many more interviews did you get, did recruiters notice it/were impressed by it? or did you get reachouts?
  
  and generally any other thoughts related to how open source helped with getting/passing interviews
- 1
  Thoughtful Tarodactyl
  •Taro Community
  10 months ago
  Related reading for anyone interested: https://huyenchip.com/2024/03/14/ai-oss.html
  
  Chip talks about the current state of open source AI repos, which ones are good, and how to think about it
- 1
  Thoughtful Tarodactyl
  •Taro Community
  10 months ago
  Here's a list of repos she analyzed and categorized: https://huyenchip.com/llama-police
- 0
  Thoughtful Tarodactyl
  •Taro Community
  10 months ago
  ok update, after digging through chip's list I found an awsome repo: https://github.com/unslothai/unsloth
  
  It's a YC backed startup: https://www.ycombinator.com/companies/unsloth-ai that's building an open source platform to make fine-tuning LLMs faster.
  
  It's really cool because you can actually run the training yourself on colab because these models are small. You get the experience of training/fine-tuning LLMs which is not something you find in a lot of open source repos
  
  It's a super active repo with a very active maintainer. Lots of opportunity to learn how these models work under the hood
  
  Not too crowded, still time to make good contributions.
  
  Pretty medium sized
- 0
  Elliot Kang
  •Entry-Level Software Engineer at Seed Startup
  10 months ago
  Hm, not a bad choice
12
Alex Chiou
•Tech Lead @ Robinhood, Meta, Course Hero
10 months ago
TensorFlow is open-source and widely used: https://github.com/tensorflow/tensorflow

The problem is that the repo is huge (2.8k pull requests with 185k GitHub stars), so getting something merged in is probably super hard 😥

The "How to start contributing to open source?" discussion is particular helpful for high-level advice. In particular, I recommend working on a repo you use yourself.

We also recently shipped an open-source contribution course with the former Director of Open Source Engineering at Facebook. You can watch it here: [Course] Become An Open Source Master