A lot of ML was cleaning up and preparing datasets, which wasn’t that much fun. I had an exec pushing for using Amazon Personalize, which I gave a good try but ultimately it didn’t deliver. Was that because of data problems or underlying models? That’s the crux of the problem when using black-box ML services: you can’t analyze what’s going on. And Amazon Personalize makes changing the data layer a pain, so you never know if you’re getting closer to a better solution. Who knows, Personalize is probably great in the hands of experienced ML folks.
So, if you can swing it, definitely recommend doing an ML trial project to see if you like it before committing your career to it.
I transitioned to this in 2018. It was called MLOps. I was a mobile developer before that. Transitioning was pretty easy at that time (it might be more competitive now). What I did: worked on an intensive ML project for myself and realized I enjoyed working across the ML stack. I wrote a blog post on the project here: https://www.nicksypteras.com/blog/aisu.html. Then I applied to an MLOps team and leveraged the project to demonstrate skills/experience. As for avoiding the math, you could probably get away with it but learning the basics will make everything way easier. I feel like without some basic ML math under my belt I would have been flying very blindly.
Edit: just realized you might also be thinking of "MLOps". See end of comment.
It's what I have been doing for the last two years but I refer to it as "software engineer with a recent focus on generative AI".
I also think AI Integration Engineer is good but I have only really seen AI Engineer.
The thing is, up until a few years ago, doing useful things with AI generally did require something more like machine learning knowledge.
But now that we have general purpose models like gpt-4o, Claude 3.5, LlaVA, etc., you can just do an API call and in a day or two have a more functional system than what a machine learning engineer may have spent months training a custom model on previously.
So that somewhat explains the confusion. I think it's best to just honest. Most applications actually don't need "real" ML knowledge or custom neural network architectures or training a model from scratch.
I do think that ML is a good field to be in if you have the patience to learn the math and about neural networks etc. But I think that is not what you are talking about. And the architectures and models are very general purpose so as I said you can work on many applications of ML now without having that background.
Go to the Anthropic or OpenAI documentation and copy paste their examples and try inserting some customization into the system message using an f-string.
I think MLOps is also a thing. Go to HuggingFace and RunPod and practice deploying models with Python. Also find some tutorials on LLM pre-training, fine-tuning, and evaluations. Check out Predibase.
A big thing right now I believe is Diffusion Transformers. If you can find some article explaining how to run a training job for that, you may be able to help people.
If you want to "cheat", check out replicate.com. cog could be useful for self hosting ML models outside of replicate.com also.
I didn't pivot into it, that was part of the project (object recognition in a video stream).
It helps if you work with small organizations that don't box you into a role but just give you stuff to do.
2) Not me
3) With difficulty, but through courses like fast.ai
If you’re not into maths and problem solving this is probably the wrong path. The main value add you bring is being able to transform a business problem into solvable maths.
Read Introduction to statistical learning and look at the fast.ai courses. I also recommend the paper attention is all you need. If you find all of those things interesting and not too mathsy then I’d say go for the switch. Else, maybe look at options in data engineering or as a software engineer in a ML team.
And I would look at applying existing models to business problems. That I think has a lot of demand now. Solve a few problems for yourself, make a few blog posts and apply.
Edit:
>i.e. can I avoid learning all the maths underneath?
Why would you want to avoid that? Math makes everything easier. Doesn't even matter which field you work in
I write about various engineers who now work at Meta, Google, Amazon, and OpenAI who made the switch. You can see what strategies and tactics they used to do it.
1) It's "wise" if you find during your personal hours you are enjoying hacking on it. Before I made the switch, I spent a year studying the material on nights and weekends so that was m my first data point that perhaps this is something I wanted to do full time.
2) Yes, I have! And I've been an ML engineer for 7 years now after I made the switch. For context, I'm an ML tech lead at FAANG. Prior to that, I worked in infrastructure and product.
3) One piece of advice I got on this years ago is to join a team adjacent to ML work so that you can get familiar with what production ML looks like. You can also start practicing ML thinking on Kaggle.com.
P.S. You can check out other posts in my blog for resources to learn AI/ML and the math needed for this career, such as my Linear Algebra 101 for AI/ML series: https://www.trybackprop.com/blog/linalg101/part_1_vectors_ma... (includes interactive quizzes, fundamentals of vectors/matrices, and a quick intro to PyTorch, an open source ML framework widely used in industry)
Many who have done the fast.ai course have pivoted their careers into not only ML engineers but also research scientists.
It's not an easy course so to speak so you have to work through it in your spare time.
Since you are interested in deploying/scaling feel free to jump straight ahead to lesson 2 of part 1. Jeremy is an awesome teacher. I don't like or come from academia so I find his style of teaching very wholesome.
1) Depend on your own goal. Its wise if you believe in the future, market appreciate ML eng higher than software eng (within same years exp)
2) Yes. I did (> 3 years now)
3) Start find the ML job
Is there an ML engineering practice that isn't focused on building models but more on managing/deploying/scaling models?
Yes. Companies who adopt ML at large scale usually need this.
i.e. can I avoid learning all the maths underneath?
Yes its possible as long as you are focusing on infrastructure. Eg: create/monitoring pipelines, models, etc.
Noted: While you finding job, try to take professional ML engineering certification/program (Google, AWS, Azure, etc) - they will provide you hands-on lab experience with some case studies. The cost is super cheap
For me mathematics and statistics represent (a) unemployment insurance since it give me a lot of transitivity around roles in the space,(b) give me a good toolbox to talk as an equal with DS, and (c) for all implementation made by the Research Engineers/Data Scientists I can chime in and give insights and avoid waste of time and resources.
One example: When BERT was released (ca. 2018) I was working in a place where several Research Engineers and DSs wanted to use it in production for text classification.
The issue was that architecturally BERT was suboptimal due to a process called masking [1] that increases significantly the training time and the inference time was not so great. The alternative that I gave at that time was to use a mechanism called "Bag fo Tricks" [2] which its a very efficient modification of Bag of Words, but knowing math (and being on top of the literature) saved me from implementing something that would be inherently inefficient. Without having it it's hard to push back on DS/ResEng.
[1] - https://datascience.stackexchange.com/questions/97310/what-i... [2] - https://arxiv.org/abs/1607.01759
The maths in ANN ML aren't that hard, and you don't need to understand them very deeply even to come up with new models. A lot of the new model development is just stacking pre-built layers with torch.nn.Module.
The difficulty comes from getting the beasts to actually work. But it's largely trial and error for everybody.
AI/ML is already becoming rapidly commoditised and the level you’re talking about is very infrastructure/platform orientated whereas all the real action (and potentially higher value stuff) is going to be happening an abstraction level or two above that.
It’s kind of like when electricity was invented. Do you want to become an engineer working on building the electricity grid (rapidly commoditised) or do you want to be the inventor/builder working on new things powered by this fancy new electricity?
The analogy isn’t perfect but I always think it’s worth carefully thinking about what level of abstraction you want to work at. Working in the infrastructure layer is definitely fun and rewarding. I would happily do that kind of work too. I love Devops. But like a lot of infrastructure level stuff it may not be where the “action” really is eventually.
But it depends what your motivations are too. “Do what you enjoy” is always a good way to set a general direction I think.
Just my two cents. I’m just an internet moron. I could be wrong.
There are different types of roles in an ML project.
1. Data scientists - These are the people who analyze the data and prepare the models. They basically deliver a jupyter notebook to us
2. ML Engineer - We take the notebooks from the Data scientists and productionize it.
3. MLOPs - These are people take care of the required infra, basically the equivalent of devops.
Personally, for me I worked as a hybrid of Data scientist/Ml enggr. I liked being an ML enggr better.
There are other aspects to the job (like dispatching experiments) but the point is that I was able to bring value in all of this as a programmer without requiring any new skills apart from the natural learning experience that one has to go through in any discipline. I think you can surely transition as long as your job requires it.
You're describing MLOps.
Why would you want to make this transition if you're not keen on the underlying math? Why not include DevOps or SRE if you're interested in code-based pipelines, deployments, and scaling? How are you going to get into things like drift detection without understanding the math?
(...hoping for an answer that isn't shouted by Rod Tidwell at Jerry Maguire -- but genuinely curious why I'm seeing this so much and trying not to make a cynical assumption :) )
I think I’ve got a grasp of the fundamentals and the ability to learn fast on the job, but every MLE job listing I see wants “4+ years of experience training and deploying models in a production environment” or something (even non-Senior roles!). I’m not sure how to break into it, to acquire a MLE job to get the experience to acquire an MLE job. Does anyone have any advice?
The AI/ML hype will die out, or you’ll be replaced by some third party service, and you’ll be left with no job, having to relearn how to be a regular software engineer.
It is much more focused on the engineering than the maths of deep learning itself.
I feel like the market is saturated with machine learning. Everyone seems to be doing machine learning these days.
EDIT: For downvoters, that's how I did it. I was a very successful SWEng (some of my work was among top posts on HN under different nicks) but saw the ball rolling towards ML in 2012 so I reskilled.
It helped that I got a PhD in Physics so I knew a lot of math already, also I had been interested in ML for text classification circa 2005 or so. From 2001 to 2010 or so I was busy with a wide range of web applications such as: a social network for a secret society, a blog for a local political party with an integrated telephone response system, an application tracking system for a nanotechnology internship program, etc. There was plenty of brownfield work in there.
I had worked on a series of side projects that got attention and landed me a job as a "relevance architect" (recommender systems and such for a new social media site that softlaunched but didn't get big) and then an ML software engineer where I completed a search engine for patents based on a neural network.
After that I went through a phase of doing random consulting projects but also trying to start my own startup around data engineering problems that didn't get the support I needed. I learned Python because there was huge amounts of Python work in this space. I did a project involving LSTM networks for text around this time.
When I threw in the towel I joined up with a company where I was between the engineering and data science teams, over the course of a year I had figured out most of why our Python systems were not entirely reliable but the company was pivoting a lot and towards the end I was writing more typescript + Scala. Even though I had the physics education I felt my long experience as an applications developer was more important to this role: we had data scientists who were better at creating models than me, but most of them worked in Jupyter Notebooks and didn't have a clear idea of how to take the code they wrote and make it reusable: how to make the "monthly sales report" as opposed to the "April sales report". It was my role to get that discipline in terms of collaboration w/ the data sci's, the process we used, and as embodied in the software I developed.
It was a great experience but also a bit disorganized as we were trying to develop a product but also having to change tack every week to accommodate project work for A-list companies we had as customers. We had a system called "Themis" that was used to build training sets that worked but I disagreed with architecturally (#1 requirement is not sweating it when your A-list company needs something really different). I wrote up a description of a system called "Nemesis" which the company didn't go with but I developed a number of (image|document|comment|job advertisement)-sorters on my own account afterwards that were all called "Nemesis" until I had a vision which lead to YOShInOn and the newer FraXinus which is meant to be an everything sorter. We were working on CNN-text models at this time, BERT came out and we thought it was a big advance but we had no idea how big it would be.
I was burned out from working remote so I went looking for an local job which means doing more ordinary stuff like React programming but I sometimes I get to do some more systems oriented stuff such as writing parsers and code generators and such. At the end of my ML phase I felt that (1) it was more important to have the appropriate labeled data rather than the best models, and (2) UI was the bottleneck for (1) so it was worth "getting gud" at UI so I have. I still have my side projects.
This book is a very good introduction to designing Machine Learning Systems for production: https://www.amazon.com/Designing-Machine-Learning-Systems-Pr...
This blog by the same author is highly recommended as an intro into building production grade AI and ML systems: https://huyenchip.com/2023/04/11/llm-engineering.html
To summarize answers to your questions:
(1) Yes it is wise to do this transition especially at an inflection point in the zeitgeist of the times as now
(2) Yes
(3) See above for resources on how to get started and reach mastery in the craft of ML Engineering.