The rest of my blog: https://www.catmonad.xyz/blog/
Published: September 24th, 2024. Last updated: October 15th, 2024.
Hey everyone.
I have spent a lot of time explaining Unicode to programmers who just wanted to move on with their projects. While I am considering writing a blog post covering common misconceptions, and simple approaches to getting text processing right, I am short on time.
Instead, take a look at these articles, which are all wonderful!
This is something of a historical piece, from just about 21 years ago now. This piece correctly conveys the notion that you must know the particular encoding of a piece of text data in order to do anything at all with it. It also serves as a decent introduction to how wild text encodings have gotten, especially when it was further from a settled question which one we should use.
Nowadays, UTF-8 should be used for new data, and old systems should be migrated to use it, at the hazard of otherwise facing legal problems.
I also feel compelled to note after this one
that you can use
<meta charset="utf-8" />
in
the head of your HTML documents; it is
equivalent to
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
since HTML5. (Note that HTML5
specifies UTF-8 as the only valid
encoding.)
Moving on…
lord.io
(October 28th, 2019)There are other great posts on the subject I have read, and I wish I’d started keeping this list sooner.
Whenever I encounter, or rediscover, a post that I think should be on here, I’ll add it.
Added Henri Sivonen’s post, https://hsivonen.fi/string-length/, as I located it again through reading ThePhD’s post 5 Years Later: The First Win and happening to click on Henri Sivonen’s blog which was linked there.