Show HN: Semantic clusters and embeddings for 500k Hacker News comments

costco

Some users comment frequently and uniquely enough that they get their own cluster. I also like how there's "Deno vs Node" lol. It'd be cool if you could click on a comment and get the most similar ones by whatever distance metric you used.

calebheinzman

Interesting breakdown of the dataset. Are you guys manually assigning labels to the clusters after the fact or is this using some kind of LLM to create a cluster name?

politelemon

I'm not sure how to view the embedding, I clicked on the graph, narrowed down to a comment, but it only shows the row and not the raw array? (Or I've misunderstood)

LordGrey

I'm not sure about others here, but I occasionally spend time typing out a long reply to someone and then simply deleting the reply without posting. Most of the time I conclude -- a little too late -- that the effort was not worth it.

How nice it would be to have an LLM trained on all of my previous writings and simply be able to click a button to indicate "reply to this person, please." I know I don't have enough training data from HN, and maybe not even from all of the sites I contribute, combined. It is still a nice thought, though.

But: Let's say I do acquire enough training data to have a local LLM do exactly what I describe. My volume of "replies" would certainly increase. Is that a good thing, on average? If the tool became ubiquitous, would it be a good thing for the average social media user? Or more pointedly, would it be a good thing for consumers of that social media? The cynic in me thinks "no" -- the effort required today surely weeds out _some_ idiots....

(Full disclosure, I nearly closed this window without clicking the "add comment" button.)

giancarlostoro

Rough view on mobile the navigation isnt auto hiding and in the way.

GeoAtreides

I did not consent on my HN data being used by entities other than HN[1]. Please remove all my comments and data from this dataset.

[1] As per HN license: https://www.ycombinator.com/legal/