iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
📆

The Mystery of Latin-1 and 0x85

に公開

Overview

Proton Calendar is a privacy-first calendar service provided by Proton. By using Proton Calendar, you can protect your schedule with end-to-end encryption while also being able to invite others to your calendar or share it via a URL.

When sharing a calendar from Proton Calendar via a URL, the schedule can be downloaded in iCalendar format. iCalendar is a format that describes schedules in plain text separated by line breaks, and it must use CRLF as the line break character. Therefore, any other line break characters must be converted to CRLF.

In this article, I will explain the characteristics of iCalendar and character codes from the perspective of line break characters, and then introduce a bug where iCalendar format files downloadable from Proton Calendar's URL sharing feature would become corrupted. Note that this bug has already been fixed and is no longer reproducible. By comparing the iCalendar files from that time attached at the end of the article, you can experience how the bug was occurring.

What is Proton Calendar?

Proton is a privacy-first integrated service that provides email, calendar, cloud storage, and VPN connections.

Originally, it was a service providing only email under the name ProtonMail, and until quite recently, it was operated under the protonmail.com domain. Subsequently, as new services like calendar and cloud storage were added, it shifted its direction towards becoming a privacy protection ecosystem. In May 2022, the entire service was unified under the Proton brand, with a major overhaul of the site design and logo[1]. At the same time, the main domain was replaced with proton.me, which does not include "mail."

Proton Calendar is the calendar service provided by Proton. Unlike general calendar services such as Google Calendar, its major feature is that registered events are protected by end-to-end encryption.

If you simply want to protect your schedule from a malicious operator, using a calendar app that stores data only locally might be sufficient. However, it is inconvenient if you want to check a schedule registered on your home desktop PC while you are out, and an unexpected SSD crash could result in the loss of important plans. With Proton Calendar, you can sync and store your schedule without exposing your privacy to the operator.

Furthermore, Proton Calendar has a feature to share calendars while maintaining security[2]. The sharing level can be selected from "Full (all events)" or "Limited (availability only)," and you can share the iCalendar format file via a dedicated URL. A dedicated URL looks like this:

https://calendar.proton.me/api/calendar/v1/url/E1KR01K_mSkB4xNfWKcWxPdBI-4XgM4_9L1lor6u_M4b3W3SnYTbDOER4DxkIoqdNC-XyXS90bN2i5LQ_8si-Q==/calendar.ics?CacheKey=azCo4bm52XVbcRaKJshNNQ%3D%3D&PassphraseKey=lVBEVmWy0xKQ8c6rXPkjMNnQkeCuVJ2bgTl-M9ny9DI%3D
  • Something like a Calendar ID: E1KR01K_mSkB4xNfWKcWxPdBI-4XgM4_9L1lor6u_M4b3W3SnYTbDOER4DxkIoqdNC-XyXS90bN2i5LQ_8si-Q==
  • CacheKey: azCo4bm52XVbcRaKJshNNQ==
  • PassphraseKey: lVBEVmWy0xKQ8c6rXPkjMNnQkeCuVJ2bgTl-M9ny9DI=

From this URL, you can view one of the calendars I have registered in Proton Calendar. As I mentioned previously in Passing fragments and keys in URLs, when passing keys through a URL, fragments should be used; however, in this case, it cannot be used because the file needs to be downloaded directly. If a key is passed via a fragment, decryption processing via JavaScript would be required.

Of course, if architectures like Zero-access encryption are operating normally, it shouldn't be a major issue.

What is iCalendar?

Now, by downloading the iCalendar format file from the URL shown earlier, you will obtain text like the following:

BEGIN:VCALENDAR
PRODID:-//Proton AG//ProtonCalendar 1.0.0//EN
VERSION:2.0
BEGIN:VTIMEZONE
TZID:UTC
...
END:VTIMEZONE
BEGIN:VEVENT
UID:fPDE6TvVtlU-ZKW0agLtxHMTudmJ@proton.me
DTSTAMP:20210819T234541Z
SUMMARY:ホシノ
DTSTART;VALUE=DATE:20210102
DTEND;VALUE=DATE:20210103
SEQUENCE:0
RRULE:FREQ=YEARLY
STATUS:CONFIRMED
END:VEVENT
BEGIN:VEVENT
...
BEGIN:VEVENT
UID:lVrwGDF9KYqoY2kUbVFtfYjQMKJ7@proton.me
DTSTAMP:20211201T152307Z
SUMMARY:ナツ
DTSTART;VALUE=DATE:20201204
DTEND;VALUE=DATE:20201205
SEQUENCE:1
RRULE:FREQ=YEARLY
STATUS:CONFIRMED
END:VEVENT
END:VCALENDAR

iCalendar is the standard format for schedules defined in RFC 5545. It is plain text where records in XXX:YYY format are recorded separated by line breaks (CRLF: \x0d\x0a), and UTF-8 is specified as the default encoding. Since there is a limit of 75 octets per line, it is possible to continue from the previous line by inserting a space or tab immediately after a line break.

Now, the fact that CRLF is specifically named as the line break character is quite troublesome. If you are using a system where it is common to use LF or CR alone for line breaks, non-CRLF line break characters can easily get mixed in if you are not careful. Moreover, it is difficult to notice the mistake unless you use an editor that can visualize each type of line break character or compare them in binary format.

Furthermore, it is not only LF (\x0a) and CR (\x0d) that are treated as line break characters. In Unicode, Line Separator (LS: U+2028) and Paragraph Separator (PS: U+2029) can be used[3]. Although these characters are used less frequently than LF or CR, they must be replaced with CRLF (or \\n if the line break is within a value) when describing them in iCalendar format.

Additionally, in the code page ISO-8859-1, which is broadly called Latin-1, \x85 is mapped to a line break character called Next Line (NEL). This is also carried over into Unicode, where U+0085 similarly represents NEL.

What is ISO-8859-1?

ISO/IEC 8859-1 is an 8-bit character set developed by ISO in response to the proliferation of proprietary specifications (so-called extended ASCII) that extended 7-bit ASCII codes to 8 bits or more. In ISO/IEC 8859-1, characters are not defined in the C0 region (0x000x1f and 0x7f) and the C1 region (0x800x9f), and a total of 191 types of alphanumeric characters and symbols can be used.

ISO-8859-1 is the version that assigned control characters to all 65 unused areas of ISO/IEC 8859-1. For Windows, a code page called Windows-1252, which has different mapping in the C1 region, was defined. It seems that in the past, symbols in the C1 region were often treated as control characters due to misinterpreting these code pages.

As mentioned earlier, \x85 in ISO-8859-1 is a control character called NEL, which is treated as a line break character. In other words, when describing it in iCalendar format, it must be converted to CRLF (\x0d\x0a also in ISO-8859-1) or \\n. Since ISO-8859-1 is an 8-bit character set, simply replacing \x85 with \x0d\x0a is sufficient.

On the other hand, NEL (U+0085) in Unicode is encoded as \xc2\x85 in UTF-8, so replacing all \x85 with CRLF will break it. This is because U+.... is a code point in Unicode, which is different from the actual byte sequence obtained by encoding into UTF-8 or UTF-16.

What was happening with Proton Calendar

Proton Calendar had a bug where the iCalendar format file would become corrupted if there were characters containing \x85 in UTF-8. It is in the past tense because, after I contacted Proton support on June 1st, the bug had already been fixed by June 10th.

The details of the bug that I was able to identify are as follows:

  • When accessing the shared calendar URL for the first time (after a sufficiently long interval), a complete iCalendar format file with the schedule recorded as-is is returned.
  • When accessing the shared calendar URL multiple times (without a sufficiently long interval), a corrupted calendar is returned where \x85 has been converted to CRLF, ignoring the UTF-8 sequence.

Here, "converting \x85 to CRLF while ignoring the UTF-8 sequence" refers to cases like the following:

  • Encode 光速感情 in UTF-8.
    • \xe5\x85\x89 \xe9\x80\x9f \xe6\x84\x9f \xe6\x83\x85
  • Since and contain \x85, they are converted to CRLF.
    • \xe5\x0d\x0a\x89 \xe9\x80\x9f \xe6\x84\x9f \xe6\x83\x0d\x0a
  • When this byte sequence is decoded as UTF-8, extra line break characters are inserted, and the sequence is corrupted.
    • \r\n速感\r\n

Presumably, some inappropriate replacement process was running at the timing when calendar information was moved in and out of the cache, and from the second time onwards, the calendar corrupted by the replacement was returned. When I asked for details after the bug fix, I received the following response:

The issue was connected with a regex we execute on our backend side to force the line ending to be \r\n when returning the ICS.
From Proton Support

Based on this, I had imagined a simple process of decrypting the encrypted calendar received from the calendar owner's client using the PassphraseKey, but it seems they are actually putting it in a cache or performing some minor processing on the decrypted data. When I asked further questions, they said that they convert the data on Proton Calendar to iCalendar format and then hold it in the cache in an encrypted state. Presumably, the line breaks are replaced either before writing or after reading.

The iCal data is generated from the ProtonCalendar events by decrypting all events using the query parameters from the link. Those query parameters are not stored backend side and only used during the API request processing. Once the data is generated, we do cache some encrypted components of the iCal data (using the CacheKey).
From Proton Support

An overview of the Proton Calendar security model can be read in The Proton Calendar security model. To meet the characteristics and requirements of a shared calendar, they seem to have adopted an architecture that maintains privacy while improving convenience.

Additionally, for the test calendar I used for the inquiry, I registered an event containing the string ⅐⅑⅒⅓⅔⅕⅖⅗⅘⅙⅚⅛⅜⅝⅞, thinking it would be difficult to convey the issue with Japanese. These are characters that contain plenty of \x85 in UTF-8 and are relatively easy to identify even in English-speaking regions.

Since the bug has already been fixed, if you want to reproduce the situation at that time, please use the following files:

Summary

  • By using Proton Calendar, a privacy-first calendar service, you can share schedules in iCalendar format through a URL appended with key information.
  • iCalendar is a format for describing schedules with records separated by line breaks, and line break characters must be unified to CRLF.
  • There was a bug in the process of unifying line break characters when Proton Calendar shared calendars via URL, where inappropriate line breaks were inserted ignoring UTF-8 sequences.

The original article "Latin-1と0x85のなぞ" is licensed under CC-BY 4.0 ( https://creativecommons.org/licenses/by/4.0/ ), so the same license applies to this article as well.

脚注
  1. Updated Proton, unified protection ↩︎

  2. Requires a Mail Plus plan or higher. ↩︎

  3. In UTF-8, LS is encoded as \xe2\x80\xa8 and PS as \xe2\x80\xa9. ↩︎

Discussion