This post is part AMA and part request for your insights.
I’m currently developing a Taro course for Q1 of 2025, focused on navigating the transition to a career in machine learning. If this topic resonates with you, I’d love to hear your thoughts:
1. What are the most pressing questions or concerns you have about making this transition?
2. What is your current role or background (e.g., Software Engineer, Data Engineer, SRE/PE, Data Scientist, etc.)?
About Me:
I’ve transitioned from Software Engineer to Data Scientist to Machine Learning Engineer over the course of my 14 year career. I’ve been an MLE for 10 years, with experience at Adobe, Twitter, Meta, and as Head of MLOps at a Series B startup.
Thank you!
Update: Thank you for all the questions, keep them coming. I put together a YouTube video with a quick outline of the process as I generally recommend it, this is a high level overview and more practical things are coming in a Taro course early next year.
Just echoing some stuff I've heard across the Taro community and engineers overall:
Excellent questions!
What’s the difference in technical interviews between ML/SWE?
Is the interview process standard and uniform like in big tech? ( DSA/Leetcode style questions/Systems) or is it team by team basis?
Excellent question, will for sure address in the course.
Short and to the point answer:
The actual process varies, but it will be some mixture of the above.
Allow me to be a bit self-promotional, I did publish a YouTube video with a basic walk-through of what I expect to hear from staff-level MLE in ML System Design round.
Is there a way to be a part of the ML world without having to deep dive into the math part of ML?
The math in ML intimidates me.
I would like to be an ML Infra engineer. I work as a distributed systems engineer and extending my domain to ML would really help me in my career.
What is the path that I'd have to follow to become an ML infra engineer?
I'm not talking about MLOps here but actually building good scalable systems for ML based infrastructure. I don't want to target ML based DevOps roles.
Check out this job at PayPal which covers what I'm talking about : https://www.linkedin.com/jobs/view/4083052214
If it's not possible to avoid the math part, how can I become really good at it? I don't want to watch a MOOC from Udemy/Deep learning that covers bits and bites.
I would prefer a comprehensive learning resource that can take me from 0 to 100.
I'm a software engineer at a web3 based distributed systems company.
This space sometimes scares me because of how shady some projects might be. They don't really focus on the tech and are just hell bent on the tokenomics. I think ML infra might be a good switch after I get some good experience.
Yes, I would say that if infra is your primary concern (and there is a lot of need for good ML infra people) then fundamental computer science is much more important than math. LLVMs are a must. If you understand derivatives and matrix multiplication you have all the math you need for that role.
I am working on a comprehensive resource, but it will likely not be as simple as a course or several. In coaching people one-on-one I am seeing a lot of variability so I think the only way for me to do this well is to guide this pretty carefully with regular check-ins and a lot of interaction. Everyone comes from a different place and is going toward a slightly different destination.
Awesome, that sounds nice. Do you offer 1-1 mentorship services for people looking to transition?
@Friendly Tarodactyl
I do, but currently working on systematizing it. Should be out shortly, will put a link here when it exists.
Thanks for doing this :) I have many questions. Some have been asked already.
I am looking forward to watching the course when it's out!
- What's a good MLE project to include on resume? And what's a bad one?
- What indicators are hiring managers searching for when looking at a side project?
I'm not an MLE, but I imagine that the project having actual users is the most important thing, just like it is for every other domain of engineering. It should be as I describe here: https://www.jointaro.com/course/build-side-projects-with-500k-users-coming-up-with-an-idea/what-makes-a-project-valuable/
@Alex the issue with projects having actual users and MLE is that its pretty dang hard to do that without getting mucked into tons of other irrelevant skills for MLE such as frontend/marketing.
What makes MLE valuable in industry is the data. without data you cant build anything of value. which is why 90% of MLE projects are just stuff like take some public dataset like MNIST and throw some models and make an article which is quite frankly not impressive.
The thing about ML is that it's almost always a feature, not a product. This is why many companies that tries to make AI the product fails (e.g. rabbit R1, the humane pin)
Let me take these in order:
Role
There are too many "core skills" to cover, honestly, so I will take a cop-out answer here and say hunger for learning and never being stuck. I'll cover the technical skills in the course, but those two are foundational. MLEs constantly operate in situations where we don't have enough (data, specifications, time...) and need to have creative solutions. On top of that the field is evolving all the time but you need to know the difference between "hype" and "innovation" and they often look similar. You are continuously learning the fundamentals.
MLE career progression is basically the same as SWE. Technical or management route are open to you.
Interviews
I covered interview types in another answer.
Preparation depends on where you are coming from and where you are applying. I will say that DSA is pretty fundamental and the bar is the same as SWE. Honestly, preparing for DSA is a really high leverage activity. I am working on a bunch of resources for ML system design (system, example) I will have more resources for other rounds soon.
FAANG vs startups vs other large companies is a great question and the roles change a lot depending on where you are. Primary differences are breadth over depth (in FAANG you tend to specialize a lot more) and learning opportunities (honestly, you have to be a self-starter to stay up-to-date at FAANG whereas in small companies it is almost annoying how many new things pop up each day).
Side Projects
I will echo Alex on the projects. I understand that it seems like marketing and UX are irrelevant but I assure you, they are not. When you work at any company your VP is essentially your VC and you must convince them to "invest" in your project over a hundred others your org can go after. I am not saying that most of your effort should go those, but "selling" useful things for free should not be that hard.
Plus you can always get creative: email can be your UX where you summarize (for example) US congressional votes and send that to your subscribers every week.
The only thing that matters about the project is external validation. I have talked about this above, but most HMs ignore "toy" projects. As to where to find data... that problem doesn't go away even when you work for Google :) Every project starts with "if only we had...", showing ingenuity here is an important signal. Not every ML model is a transformer model. I have trained Bayesian models in production with 30 highly curated examples and it performed well. Many projects can rely on generated data.
This is what I mean by "not getting stuck". Many of these issues are not magically resolved just because you have a job, but the best MLEs always have 3-4 ways we can "go from here" and still accomplish what we are after.
@Ilya Reznik I appreciate all these insights, this is very helpful! Thanks!
Makes sense, thank you so much. this is very helpful!
Hey Ilya,
For those passionate about working in pioneering technology and considering a transition to MLE, but with 10 years of deep experience in another software engineering discipline (e.g., mobile engineering at FAANG companies), how do the economics/compensation of an MLE career compare to SWE?
Would making the switch be financially attractive, or are the comps similar? I’d love to hear your perspective—both short-term and long-term.
Thanks!
MLEs make a bit more typically I think. The figure I heard is about 15% more than full stack eng at the same level not sure about mobile.
But... (there's always a but) you are not at the same level as MLE as you are in the place you have been for ten years. By the time you hit these levels the most certain financial return is to go all in. There are many other reasons to go to ML but financial is probably not it.
Longer term the ceiling in ML is currently higher. Will it be when you get there? I think so, but nothing is guaranteed.
How high/low signal are winning Kaggle competitions? I heard online that they're not high signal. From your experience, is this true? Why or why not?
Low. Kaggle ranking is just a number. Open-source and building in public generates far more inbound than Kaggle. You can generate dozens of inbound interviews this way. Just do cool stuff, then write a little about it. Rinse and repeat.
The reason for this is people want to see cool shit. A ranking number is boring.
If you are a data scientist, kaggle helps. For MLE it is better than nothing at entry level, but not by much. Kaggle is pretty far from what makes a great MLE, it is like competitive programming not transferring well to distributed systems specialist... Yes, there is overlap, but there are higher ROI ways to spend your time (like building a project people will use).
I’ve worked as an ML Engineer for 2 years at Microsoft, where I trained tree-based models, managed inference pipelines, and conducted experimentation at Microsoft scale. Later, I transitioned to a project focused on building auto-featurization pipelines, utilizing AutoML and explainability techniques, where I worked for another 2 years.
After a re-org, I was moved to a non-ML-related backend engineering role, where I’ve been for the past 2 years. Prior to joining Microsoft, I also had 2 years of full-stack development experience. I hold a Master’s degree where I studied machine learning, and I’m passionate about returning to the ML field.
I’m familiar with fundamental ML concepts such as classification (tree-based models, logistic regression, SVMs), clustering (K-means, hierarchical clustering), and Data Mining (bag of words, TF-IDF), but I’ve never learnt neural networks or deep learning, either professionally or through MOOCs.
Recently, a recruiter from Meta reached out about Software Engineer, Machine Learning roles.
If someone has 2 years of career break recently and has 8 years of experience in mobile development before break. And, now want to restart his career as an ML engineer. Will companies prefer him and what they want to look for?
Long career break and no ML expertise is going to be tough. Not sure if you have a masters degree, if not, this may be a great time to get one that is ML focused. Start networking right away and try to get an internship in the summer between two years.
Like I said earlier, if you can show a project that is validated by others (users, open source committers, conference reviewers, etc) that is much stronger.
Are publications the best way to differentiate yourself from the crowd?
For MLE's publishing ML papers, how much does publication venue matter? For example if you publish to a tier-1 conference (ICLR, NuerIPS, ICML) vs something less prestigious (AAAI, TMLR, etc.), how much would that weigh against you?
For generalist MLE's, are papers submitted to specialized conferences (SIGGRAPH, COLT, AISTATS, etc.) still valued? What about if you publish ML-related papers to non-ML journals? For example, applied ML papers to MechE or Bio or ChemE conferences? Will managers (or recruiters) be able to tell the difference between workshop papers and main conference track papers?
To answer all your immediate questions: Papers are a good way, but they take lots of time and your work is unlikely to be rewarded. There is no certainty that you'll publish a decent paper, if at all. Generally ML people don't care about applied ML papers as much. Worthwhile top tier papers take 12-18 months from idea inception to publication acceptance/release at a minimum. Second tier conferences like AAAI and KD are still good, but less prestigious. Workshop papers are in much lower regard to real papers.
Some general points:
Elliot Kang gave a good answer! Here is some more:
So I am actually at NeurIPS right now and polled a few recruiters for you (sample size 6 across a range of companies):
Can you comment on the difference in skills needed when it comes to different verticals
Is it possible to get Data Science jobs (not data analytics) without a PhD?
What even is an ML Engineer? As an ML Engineer will I be training models?
Titles can vary a lot, but generally data analyst uses the stuff data scientist developed to produce one report whereas many users use stuff MLE developed to accomplish their end goals.
MLEs do train models, but we also own ML systems and some of us specialize in Ops or Infra... We train models, write code, maintain models in production, set strategy...
You can become a data scientist without an advanced degree.