Companies that create data for FM (foundational model) companies have been hiring people with degrees for years
> Invisible Tech employs 5,000 specialized trainers globally
Some of those companies have almost a million freelancers on their platforms, so 5k is honestly kinda medium sized.
> It takes smart humans to avoid hallucinations in AI
Many smart humans fail at critical thinking. I've seen people with masters fail at spotting hallucinations in elementary level word problems.
After using LLMs daily for some time, I have developed a feeling on how to phrase my requests as to get better quality answers.
For example, ensure that it can process the information linearly, like asking to classify items in a list and adding the label directly after the item so that the order remains, instead of allowing it to create multiple lists as the output (which it tends to do by default).
So, at least for me, the prompts are getting smarter.
someone should write a story about that!
First and foremost a chatbot generates plenty of new data (plus feedback!), but you can also commission new high-quality content.
Karpathy recently commented that GPT-3 needs so many parameters because most of the training set is garbage, and that he expects eventually a GPT-2 sized model could reach GPT-3 level, if trained exclusively on high-quality textbooks.
This is one of the ways you get textbooks to push the frontier capabilities.
The role played by humans on the training side is of little interest when considering the technology from a user's perspective.
I feel as though it 'extracts' some sort of "smartness" out of me (if any) and then whatever intelligence from me becomes part of google gemini
this is why I would never want to pay for using these tools, anything good that comes from me in the chat becomes google's by AI training, which is ok so long as it's free to use
i.e. I won't pay to make their stuff better through my own work
It was a real challenge. I managed to come up with a handful of questions that tripped up the models, but it was clear they stumbled for pretty mundane reasons—outdated info or faulty string parsing due to tokenization. A common gripe among the trainers was the project's insistence on questions with clear-cut right/wrong answers. Many of us worked in fields where good research tends to be more nuanced and open to interpretation. I saw plenty of questions from other trainers that only had definitive answers if you bought into specific (and often contentious) theoretical frameworks in psychology, sociology, linguistics, history, and so on.
The AI company people running the projects seemed a bit out of their depth, too. Their detailed guidelines for us actually contained some fundamental contradictions that they had missed. (Ironically, when I ran those guidelines by Claude, ChatGPT, and Gemini, they all spotted the issues straight away.)
After finishing the project, I came away even more impressed by how smart the current models can be.