It depends on what type of role you want. If you'd be happy building the application layer and doing prompt engineering, just build applications that call LLM APIs.
If you want a research position at the top labs, the interviews really are actually passable by people without PhDs. They are really focused on having strong fundamentals. I've seen people make this leap but it can be years of preparation. Like actually reading textbooks, implementing low-level details like backprop, re-implementing papers, and doing non-trivial personal projects. Essentially, you're self-studying a Masters degree. Blog about it. Post about it here. I've found people to make this transition just generally love learning.
Feel free to reach out if you’re in the EU (email in profile), we’re hiring. Also happy to give some pointers on how to approach these conversations.
If you want to apply AI, there are lots of really useful projects that are just calling the Anthropic or OpenAI API for the AI part. Or replicate.com image models etc. That wasn't the case a few years ago before we had the general purpose models. I have been doing a lot of those types of projects and I don't have a machine learning background.
There are ML Ops jobs that don't require a lot of machine learning knowledge.
There are ML researcher jobs that are just training LLMs which are more practical rather than theory.
To do novel machine learning research or at least significant variations of popular neural network architectures, I think that is the only thing that really requires years of study. But I think there is a very large gap between that type of work and web development. Which is why I was very happy to see the progress in general purpose models.
Because I am still in the academic process, I had the opportunity to take a couple of classes on the subject. Three books that I would recommend going over to make sure your foundation in ML and mathematics are solid are
-Pattern recognition and machine learning by Christopher Bishop
-Mathematics for Machine Learning by Peter Deisenroth
-Deep Learning by Courville, Bengio and Goodfellow
All three are legally available online in some form. I can't say I have any experience in finding a job related to ML though.
Beyond that, there are hundreds of open source projects you can fork to start building intuitions around what the inner loop of AI dev project cycles look like. You'll be surprised at how much of your web dev skills remain relevant in these projects, particularly in UI-related tasks. Data scientists and folks of similar ilk default to notebooks, gradio, streamlit et al. to ship interfaces for their experiments. You though have the ability to do that on your own if you choose to (sometimes a notebook is enough), which can be a valuable differentiator for you as a candidate if you also have all the other skills needed to be productive in this space.
My own background is in distributed systems with some full stack and embedded work mixed in over the years. I started tinkering with ML projects back in 2012 when I first discovered AlexNet and resources were far more limited. I was still able to get productive relatively quickly even though most of what I built wasn't really applicable to my work in a practical sense on day one. Where my background became relevant was when I needed something approximating an MLOps pipeline for data processing, training, and eval. Most of the code you're writing for that isn't really specific to ML, its nearly identical to CI/CD systems but with the obvious infrastructure caveats native to ML workloads.
Nowadays though, especially if you're intrepid/resourceful, there is so much more learning material by comparison and much of what you work on can likely augment your day-to-day web dev tasks as well if you're creative enough.
Ie, Verification that the application/model do the job correctly and Validation that the app/mdl is appropriate for the business case.
This is not exciting but it pays well and people with the skill set are extremely needed.
From where I stand Computer Vision seems like a really good area to start in machine learning. Good luck!
Then learn by symbiosis, while having AI on your resume :)
Feel free to get in touch if you want to chat more about this.
I think I would probably fit more into the AI Engineer category since I would have to do a lot more study for AI research, but I do enjoy trying to use existing models and libraries to accomplish tasks. I can also create toy models myself (I actually built a ConvNet in PyTorch to detect popup dialogs on my screen and alert me), but I'm no where near good enough to create entirely new novel approaches or architectures.
Working on AI can mean many different things.
If you're looking to pivot into a more research-y position in DS and/or AI research, I'd suggest getting an advanced degree in these fields.
If you're talking more about ML engineering, there're full stack software engineer positions at some startup AI companies that require an assortment of skill sets such as web development, MLOps, and sometimes a bit of data engineering. You could look into these roles. Alternatively, since ML engineer is still an emerging position at a lot of organizations, some do not require prior experience but instead focus more on the candidate's portfolio. Create some projects and build a strong portfolio, you might have a good chance.
Specifically with AI/ML the urge might be to start from scratch but I think there might be enough tooling to start where you are with web, build solutions using existing AI tech, start customizing it and going deeper and deeper.
Makes for a natural story and journey too. It’s completely choose your own adventure so you and direct the vector of your path where you want and learn & build in that direction.
contrast this to an open market.. for example I make a poster for a grocery store in my town, that poster is well-liked and I have the rights to reproduce it, or make a similar new one. In every town there could be such activity, and for larger towns, many people could do that activity. That naturally scales for participation, with the transactions of pay and consumption in an open market environment.
The AI space seems much more like building large projects in tight teams with serious resource requirements. The end products are more varied than most people realize, but there is a common thread of replacing skilled humans in jobs with some kind of automation, or extracting value from humans with monitoring and some kind of enforcement. In other words, really contrasting to the open markets ideas.
Honestly I cannot be enthusiastic to put the word "job" and "AI dev" in the same sentance. The real-world dynamics appear to be coalescing into high powered, competing silos, with a side-effect of replacing jobs in some cases.
I got my PhD in 1998, did a postdoc in Germany for a year, came back to the states, started doing remote work and consulting projects for web sites, worked on the arXiv preprint server for a few years, then worked on a pretty wide range of projects for pay and for side projects until I got interested in using automation to make large image collections on my own account circa 2008 or so.
I had a conversation with my supervisor that called into question whether I could ever be treated fairly where I was working and then two days later I got a call from a recruiter who was looking for a "relevance architect" which had me work for about a year and a half for a very disorganized startup. Then I got called by another recruiter who needed somebody to finish a neural network search engine for patents based on C++, Java and SIMD assembly.
After that I tried to put a business to develop a next-generation data integration tool and did consulting projects, learned Python because customers were asking for it. When I gave up on my own business I went to work full-time for a startup that was building something similar to the product I had in mind as a "machine learning engineer". That company was using CNNs for text, I had previously worked for one using RNNs, that summer BERT came out and we realized it was important but not quite so important.
After that I wound up getting a more ordinary webdev job where I can actually go to an office, I still do ML and NLP-based side projects though.
Funny enough I am working on text analysis projects now that I first conceived of 20 years ago, I think technologically some of them could have worked but they work so much better now with newer models.
---
My take is that the average 'data scientist' is oriented towards making the July sales report, not making a script that will make the monthly sales report. If you want to get repeatable results with ML it really helps to apply the same kind of organizational thinking and discipline that we're used to in application development. Also I believe getting training data is the bottleneck for most projects: I mean, if you have 5000 labeled examples and a 20 year old classification model you might get a useful classifier, you can get a much better classifier with a two year old model with little more work, or you can try a model out of last week's arXiv paper and spend 10-100x the effort, risk complete failure, and probably add 0.03 points to your ROC.
If you don't have those 5000 examples on the other hand all you can do is download some model from huggingface and hope it is close enough to your problem to be useful.
My spurt of doing front-end heavy work built up my UI skills so I have done a lot of side project work towards building systems that let people label data.
- "Real AI Scientist/Applied Scientist/Researcher" aka you do actual training/fine tuning of bleeding edge models. Very hot right now but competition is incredibly intense. Probably you need a PhD or some serious experience to compete. Get ready to do a bunch of independent learning if you're serious about this.
- "Fake AI Scientist/Applied Scientist/Researcher" - You work in a big corporation or maybe a confused startup who wants to staff out some internal AI teams but doesn't really have true expertise in the area. Maybe if you really knock it out of the park something you build will provide real customer value...one day.
- "Real AI-ML Engineer" Scientist work under a different name, or deployment/infra for custom models. More approachable than Real Scientist work but probably more focused on engineering chops, C++, CUDA, etc. Similar to Real Scientist in that you need to have some actual legit skills.
- "Fake AI-ML Engineer" Calling the OpenAI API and massaging the output into something that is possibly valuable but more likely is just a "AI Feature" on top of an existing application that provides real customer value.
- "Non-AI ML Engineer" You work with traditional ML like xgboost, probably in the financial world, and don't really interact with any of this stuff, unless your boss asks you to create a new AI Feature. You can now put this on your resume and hope to get a Fake AI-ML Engineer job, if you want to.
- "Real Data Science" this role is trending toward inhabiting more of a BI/Analytics space. IMO in Big Tech they are getting more serious about the stats/probability background here. In some ways I think this would be more difficult to upskill on than Deep Learning math, if you're starting without a math background.
- "Fake Data Science" Kind of a dying role, this is like "I just learned how to do pandas and scikit learn and I'm creating linear regressions for boomers". Honestly still some alpha here if you are a product-focused person in the right org. But maybe the title here should be more like Data Analyst++
Hope this helps. Me myself I'm a Non-AI ML Engineer who is pretty screwed, because if you search ML Engineer now everyone wants you to know PyTorch.
If the former, I suggest digging into things like the excellent Fast AI course: https://course.fast.ai/
If the latter, the (relatively new) keyword you are looking for is likely "AI Engineer" - https://www.latent.space/p/ai-engineer
There's an argument that deep knowledge of how to train models isn't actually that useful when working with generative AI (LLMs etc) - knowing how to train or fine-tune a new model is less useful that developing knowledge of the other weird things you have to figure out about prompting, evals and using these models to build production-quality apps.