Dr Drang's script counts the number of _characters_ not the number of _glyphs_. This matters because there's more than one way to represent é: Either just as unicode character \x{e9} ("NFC") or as a combination of "e" and the combining character that adds the accent ("NFD")
For example for "léon" this prints out "l3n" for me.
What you need to do is normalize to NFC.
> /usr/bin/perl -C -MUnicode::Normalize -pe '$_=NFC($_);s/(.)(.+)(.)/$1 . length($2) . $3/e'
> perl -C -pe 's/(\w)(\w+)(\w)/$1 . length($2) . $3/ge'
Or for the less o4e among us, this v5n will only n10e words with l4h six and up:
> perl -C -pe 's/(\w)(\w\w\w\w+)(\w)/$1 . length($2) . $3/ge'
F3l v5n:
perl -C -pe 's/(\p{L})(\p{L}*)(\p{L})/$1@{[length($2)]}$3/g'
N12g w5t i18n w3d n1t b0e c6e, t2s t2s a u1f-8 c8e v5n. I c2l i0t I16r-v1.0
새0로 오0신 모0든 분1께 인3고 싶2다.
Tangent:
I worked at a large financial news site for a number of years.
One of our best engineers spun up an "a11y" sub team. As it was quite involved and they went team to team doing things, I assume it was some sort of dev tool initiative.
It was only after I left and I was describing it as the "ally" team that I was told what it meant.
Its like "banal" its only when you say it out load amongst (hopefully) friends do you realise that you've not got it quite right....
edit: I do realize that I might be missing the joke entirely
https://mastodon.hccp.org/@igb/112734767519719978
> e14n -> "Andreesen Horowitz" is not a typo, it is a bit of an easter egg/joke (Sorry, I can't help myself.):
> "e14n" has recently shown up in social meda as shorthand for @pluralistic's "enshittification" coinage. Andreesen Horowitz often refers to themselves using a numeronym: "a16z".
Ex: "accessibility localization internationalization multilingualization globalization" becomes "a11y l10n i18n m17n g11n" becomes "applicability locomutation intercrystallization metaphenylenediamin gastrocnemian"