[Alpine-info] UTF-16 and header view of UTF-16 emails

Robert Wolf-conf via Alpine-info alpine-info at u.washington.edu
Sun Jun 1 14:24:42 PDT 2025


Dear Eduardo,

I send this email to alpine mailing list and to you in Bcc. Maybe
someone else is interested too. And if the mailing does not accept my
email with attachments, then you get this email directly.

==================================================

In our company, we use Exchange Online and we make backup with veeam.
We get backup reports by email.

The email contains report as HTML table. The email content is either
HTML message (if the table is short) or overall status as HTML message
and full report as HTML attachment.

There are multiple problems with this email to view in alpine.

==================================================

First problem is, that the HTML message is UTF-16 text without BOM. As
RFC 2781 in section 4.3 says, the UTF-16 without BOM should be
interpreted as big-endian (UTF-16BE). But the email does not come from
normal world, instead it comes from Microsoft world. It means,
Microsoft uses little-endian and therefore the text is in fact
UTF-16LE. And if the software works according to RTF 2781 and tries to
display the UTF-16LE message as UTF-16BE, then nothing correct comes.
But this is no alpine problem. Neither Thunderbird can display these
emails correct. Only Microsoft Outlook or Exchange Webclient, of
course.

--------------------------------------------------

But I have tried to find some information, if already someone tried to
solve this problem. In fact, probably to solutions is simple just to
change charset="utf-16" to charset="utf-16le" and the message is
corrected. But not in alpine. And this is the problem, what I want to
discuss with you.

It was a luck, that I have found old bug report
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=726739 with patch
proposal from Thorsten Glaser. You have discussed with him and ask him
for some examples, but he hasn't sent any.

I have played now with his patch for c-client library and I have
created example emails (attached in MBOX file).

In the alpine 2.26 (without Thorsten's utf-16 c-client library patch):

- message "[2] test mail message only without attachment; message
utf-16 with BOM utf-16le" is displayed as some invalid utf-16
characters

- message "[4] test mail message only without attachment; message
utf-16 with BOM utf-16be" is displayed correctly

- message "[6] test mail message only without attachment; message
utf-16le without BOM" is displayed incorrectly as UTF-8 with NULL
characters behind correct character (little-endian)

- message "[8] test mail message only without attachment; message
utf-16be without BOM" is displayed incorrectly as UTF-8 with NULL
characters before correct character (big-endian)

- message "[10] test mail message only without attachment; message
utf-16le with BOM" is display incorrectly as UTF-8 with NULL
characters behind correct character (little-endian) and question marks
at the beginning (BOM LE)

- message "[12] test mail message only without attachment; message
utf-16be with BOM" is displayed incorrectly as UTF-8 with NULL
characters before correct character (big-endian) and question marks at
the beginning (BOM BE)

If I apply the Thorsten's patch for c-client library, all the messages
are displayed correctly. Could you verify this with the MBOX and with
the patch? Thank you.

==================================================

Then I have second problem. This is the second patch which Thorsten
has sent, which updates mailview.c for the "header" view.

If you looks the example messages in header view in original alpine
(without Thorsten's patches), then

- messages "[2] test mail message only without attachment; message
utf-16 with BOM utf-16le" and "[4] test mail message only without
attachment; message utf-16 with BOM utf-16be", i.e. messages with
charset="utf-16" with BOM (either if BE or LE) are both in header mode
displayed incorrectly, as incorrect utf-16 characters, although the
body is base64 encoded. It looks like something tries to decode the
base64 encoded content based on charset into invalid utf-16 characters
instead of displaying base64 as plain ascii.

- other messages, i.e. charset utf-16be or utf-16le without or with
BOM, are correctly displayed in header view as base64 text (these
messages are not correctly converted from utf-16 to utf-8 in normal
message view)

--------------------------------------------------

But if I apply Thorsten's patch for UTF-16 in c-client lib, then in
the header view is for no message the content displayed correctly in
header view. It looks to me, that if the mailview knows the charset of
the message, then it tries to decode even the "source code" of the
message from the known charset to (probably) utf-8, although it should
not convert between charsets and should display plain ascii base64
text. This is, what Thorsten tried to explain you.

If I apply Thorsten's header-view patch additionally to UTF-16 patch,
then alpine displays correctly base64 text in all messages in header
view.

--------------------------------------------------

Now comes the main Thorsten's problem, as he has written "... this is
a separate item which popped up during my tests with UTF-16 eMails
...". If I apply Thorsten's UTF-16 patch, then every test message is
in header view displayed incorrectly. I presume, the mailview gets
correct charset from c-client lib, i.e. charset is known, and
therefore tries mailview to convert the "source code" (base64 text)
from known charset (any utf-16) to utf-8, which generates incorrect
output with invalid utf-16 characters. If I apply both Thorsten's
patches, utf-16 and header-view, then every message text in normal
view and the base64 text in the header view are displayed correct.

==================================================

Could you please take a look at this problem and both patches if you
could includ them in alpine? I know, the utf-16 is probably not so
much used, at least I have seen these emails with utf-16 from
veeam+office365 for the first time, but it would be great if they
could be included and improve/extend alpine features.

The problem is not critical, I can apply the patches for me now
myself, so you don't need to hurry. Just sometimes when you find some
time to look at it.

Thank you very much.

Regards,

Robert.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: alpine.patches.zip
Type: application/zip
Size: 3372 bytes
Desc: not available
URL: <http://mailman23.u.washington.edu/pipermail/alpine-info/attachments/20250601/214d036f/attachment.zip>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: utf-16-mbox-test.zip
Type: application/zip
Size: 1153 bytes
Desc: not available
URL: <http://mailman23.u.washington.edu/pipermail/alpine-info/attachments/20250601/214d036f/attachment-0001.zip>


More information about the Alpine-info mailing list