jiff/shared/mod.rs
1/*!
2Defines data types shared between `jiff` and `jiff-static`.
3
4While this module exposes types that can be imported outside of `jiff` itself,
5there are *no* semver guarantees provided. That is, this module is _not_ part
6of Jiff's public API. The only guarantee of compatibility that is provided
7is that `jiff-static x.y.z` works with one and only one version of Jiff,
8corresponding to `jiff x.y.z` (i.e., the same version number).
9
10# Design
11
12This module is really accomplishing two different things at the same time.
13
14Firstly, it is a way to provide types that can be used to construct a static
15`TimeZone`. The proc macros in `jiff-static` generate code using these
16types (and a few routines).
17
18Secondly, it provides a way to parse TZif data without `jiff-static`
19depending on `jiff` via a Cargo dependency. This actually requires copying
20the code in this module (which is why it is kinda sectioned off from the rest
21of jiff) into the `jiff-static` crate. This can be done automatically with
22`jiff-cli`:
23
24```text
25jiff-cli generate shared
26```
27
28The copying of code is pretty unfortunate, because it means both crates have to
29compile it. However, the alternatives aren't great either.
30
31One alternative is to have `jiff-static` explicitly depend on `jiff` in its
32`Cargo.toml`. Then Jiff could expose the parsing routines, as it does here,
33and `jiff-static` could use them directly. Unfortunately, this means that
34`jiff` cannot depend on `jiff-static`. And that in turn means that `jiff`
35cannot re-export the macros. Users will need to explicitly depend on and use
36`jiff-static`. Moreover, this could result in some potential surprises
37since `jiff-static` will need to have an `=x.y.z` dependency on Jiff for
38compatibility reasons. That in turn means that the version of Jiff actually
39used is not determine by the user's `jiff = "x.y.z"` line, but rather by the
40user's `jiff-static = "x'.y'.z'"` line. This is overall annoying and not a
41good user experience. Plus, it inverts the typical relationship between crates
42and their proc macros (e.g., `serde` and `serde_derive`) and thus could result
43in other unanticipated surprises.
44
45Another obvious alternative is to split this code out into a separate crate
46that both `jiff` and `jiff-static` depend on. However, the API exposed in
47this module does not provide a coherent user experience. It would either need a
48ton of work to turn it into a coherent user experience or it would need to be
49published as a `jiff-internal-use-only` crate that I find to be very annoying
50and confusing. Moreover, a separate crate introduces a new semver boundary
51beneath Jiff. I've found these sorts of things to overall increase maintenance
52burden (see ripgrep and regex for cases where I did this).
53
54I overall decided that the least bad choice was to copy a little code (under
552,000 source lines of code at present I believe). Since the copy is managed
56automatically via `jiff-cli generate shared`, we remove the downside of the
57code getting out of sync. The only downside is extra compile time. Since I
58generally only expect `jiff-static` to be used in niche circumstances, I
59prefer this trade-off over the other choices.
60
61More context on how I arrived at this design can be found here:
62<https://github.com/BurntSushi/jiff/issues/256>
63
64# Particulars
65
66When this code is copied to `jiff-static`, the following transformations are
67done:
68
69* A header is added to indicate that the copied file is auto-generated.
70* All `#[cfg(feature = "alloc")]` annotations are removed. The `jiff-static`
71 proc macro always runs in a context where the standard library is available.
72* Any code between `// only-jiff-start` and `// only-jiff-end` comments is
73 removed. Nesting isn't supported.
74
75Otherwise, this module is specifically organized in a way that doesn't rely on
76any other part of Jiff. The one exception are routines to convert from these
77exposed types to other internal types inside of Jiff. This is necessary for
78building a static `TimeZone`. But these conversion routines are removed when
79this module is copied to `jiff-static`.
80*/
81
82/// An alias for TZif data whose backing storage has a `'static` lifetime.
83// only-jiff-start
84pub type TzifStatic = Tzif<
85 &'static str,
86 &'static str,
87 &'static [TzifLocalTimeType],
88 &'static [i64],
89 &'static [TzifDateTime],
90 &'static [TzifDateTime],
91 &'static [TzifTransitionInfo],
92>;
93// only-jiff-end
94
95/// An alias for TZif data whose backing storage is on the heap.
96#[cfg(feature = "alloc")]
97pub type TzifOwned = Tzif<
98 alloc::string::String,
99 self::util::array_str::Abbreviation,
100 alloc::vec::Vec<TzifLocalTimeType>,
101 alloc::vec::Vec<i64>,
102 alloc::vec::Vec<TzifDateTime>,
103 alloc::vec::Vec<TzifDateTime>,
104 alloc::vec::Vec<TzifTransitionInfo>,
105>;
106
107/// An alias for TZif transition data whose backing storage is on the heap.
108#[cfg(feature = "alloc")]
109pub type TzifTransitionsOwned = TzifTransitions<
110 alloc::vec::Vec<i64>,
111 alloc::vec::Vec<TzifDateTime>,
112 alloc::vec::Vec<TzifDateTime>,
113 alloc::vec::Vec<TzifTransitionInfo>,
114>;
115
116#[derive(Clone, Debug)]
117pub struct Tzif<STR, ABBREV, TYPES, TIMESTAMPS, STARTS, ENDS, INFOS> {
118 pub fixed: TzifFixed<STR, ABBREV>,
119 pub types: TYPES,
120 pub transitions: TzifTransitions<TIMESTAMPS, STARTS, ENDS, INFOS>,
121}
122
123#[derive(Clone, Debug)]
124pub struct TzifFixed<STR, ABBREV> {
125 pub name: Option<STR>,
126 /// An ASCII byte corresponding to the version number. So, 0x50 is '2'.
127 ///
128 /// This is unused. It's only used in `test` compilation for emitting
129 /// diagnostic data about TZif files. If we really need to use this, we
130 /// should probably just convert it to an actual integer.
131 pub version: u8,
132 pub checksum: u32,
133 pub designations: STR,
134 pub posix_tz: Option<PosixTimeZone<ABBREV>>,
135}
136
137#[derive(Clone, Copy, Debug)]
138pub struct TzifLocalTimeType {
139 pub offset: i32,
140 pub is_dst: bool,
141 pub designation: (u8, u8), // inclusive..exclusive
142 pub indicator: TzifIndicator,
143}
144
145/// This enum corresponds to the possible indicator values for standard/wall
146/// and UT/local.
147///
148/// Note that UT+Wall is not allowed.
149///
150/// I honestly have no earthly clue what they mean. I've read the section about
151/// them in RFC 8536 several times and I can't make sense of it. I've even
152/// looked at data files that have these set and still can't make sense of
153/// them. I've even looked at what other datetime libraries do with these, and
154/// they all seem to just ignore them. Like, WTF. I've spent the last couple
155/// months of my life steeped in time, and I just cannot figure this out. Am I
156/// just dumb?
157///
158/// Anyway, we parse them, but otherwise ignore them because that's what all
159/// the cool kids do.
160///
161/// The default is `LocalWall`, which also occurs when no indicators are
162/// present.
163///
164/// I tried again and still don't get it. Here's a dump for `Pacific/Honolulu`:
165///
166/// ```text
167/// $ ./scripts/jiff-debug tzif /usr/share/zoneinfo/Pacific/Honolulu
168/// TIME ZONE NAME
169/// /usr/share/zoneinfo/Pacific/Honolulu
170/// LOCAL TIME TYPES
171/// 000: offset=-10:31:26, is_dst=false, designation=LMT, indicator=local/wall
172/// 001: offset=-10:30, is_dst=false, designation=HST, indicator=local/wall
173/// 002: offset=-09:30, is_dst=true, designation=HDT, indicator=local/wall
174/// 003: offset=-09:30, is_dst=true, designation=HWT, indicator=local/wall
175/// 004: offset=-09:30, is_dst=true, designation=HPT, indicator=ut/std
176/// 005: offset=-10, is_dst=false, designation=HST, indicator=local/wall
177/// TRANSITIONS
178/// 0000: -9999-01-02T01:59:59 :: -377705023201 :: type=0, -10:31:26, is_dst=false, LMT, local/wall
179/// 0001: 1896-01-13T22:31:26 :: -2334101314 :: type=1, -10:30, is_dst=false, HST, local/wall
180/// 0002: 1933-04-30T12:30:00 :: -1157283000 :: type=2, -09:30, is_dst=true, HDT, local/wall
181/// 0003: 1933-05-21T21:30:00 :: -1155436200 :: type=1, -10:30, is_dst=false, HST, local/wall
182/// 0004: 1942-02-09T12:30:00 :: -880198200 :: type=3, -09:30, is_dst=true, HWT, local/wall
183/// 0005: 1945-08-14T23:00:00 :: -769395600 :: type=4, -09:30, is_dst=true, HPT, ut/std
184/// 0006: 1945-09-30T11:30:00 :: -765376200 :: type=1, -10:30, is_dst=false, HST, local/wall
185/// 0007: 1947-06-08T12:30:00 :: -712150200 :: type=5, -10, is_dst=false, HST, local/wall
186/// POSIX TIME ZONE STRING
187/// HST10
188/// ```
189///
190/// See how type 004 has a ut/std indicator? What the fuck does that mean?
191/// All transitions are defined in terms of UTC. I confirmed this with `zdump`:
192///
193/// ```text
194/// $ zdump -v Pacific/Honolulu | rg 1945
195/// Pacific/Honolulu Tue Aug 14 22:59:59 1945 UT = Tue Aug 14 13:29:59 1945 HWT isdst=1 gmtoff=-34200
196/// Pacific/Honolulu Tue Aug 14 23:00:00 1945 UT = Tue Aug 14 13:30:00 1945 HPT isdst=1 gmtoff=-34200
197/// Pacific/Honolulu Sun Sep 30 11:29:59 1945 UT = Sun Sep 30 01:59:59 1945 HPT isdst=1 gmtoff=-34200
198/// Pacific/Honolulu Sun Sep 30 11:30:00 1945 UT = Sun Sep 30 01:00:00 1945 HST isdst=0 gmtoff=-37800
199/// ```
200///
201/// The times match up. All of them. The indicators don't seem to make a
202/// difference. I'm clearly missing something.
203#[derive(Clone, Copy, Debug)]
204pub enum TzifIndicator {
205 LocalWall,
206 LocalStandard,
207 UTStandard,
208}
209
210/// The set of transitions in TZif data, laid out in column orientation.
211///
212/// The column orientation is used to make TZ lookups faster. Specifically,
213/// for finding an offset for a timestamp, we do a binary search on
214/// `timestamps`. For finding an offset for a local datetime, we do a binary
215/// search on `civil_starts`. By making these two distinct sequences with
216/// nothing else in them, we make them as small as possible and thus improve
217/// cache locality.
218///
219/// All sequences in this type are in correspondence with one another. They
220/// are all guaranteed to have the same length.
221#[derive(Clone, Debug)]
222pub struct TzifTransitions<TIMESTAMPS, STARTS, ENDS, INFOS> {
223 /// The timestamp at which this transition begins.
224 pub timestamps: TIMESTAMPS,
225 /// The wall clock time for when a transition begins.
226 pub civil_starts: STARTS,
227 /// The wall clock time for when a transition ends.
228 ///
229 /// This is only non-zero when the transition kind is a gap or a fold.
230 pub civil_ends: ENDS,
231 /// Any other relevant data about a transition, such as its local type
232 /// index and the transition kind.
233 pub infos: INFOS,
234}
235
236/// TZif transition info beyond the timestamp and civil datetime.
237///
238/// For example, this contains a transition's "local type index," which in
239/// turn gives access to the offset (among other metadata) for that transition.
240#[derive(Clone, Copy, Debug)]
241pub struct TzifTransitionInfo {
242 /// The index into the sequence of local time type records. This is what
243 /// provides the correct offset (from UTC) that is active beginning at
244 /// this transition.
245 pub type_index: u8,
246 /// The boundary condition for quickly determining if a given wall clock
247 /// time is ambiguous (i.e., falls in a gap or a fold).
248 pub kind: TzifTransitionKind,
249}
250
251/// The kind of a transition.
252///
253/// This is used when trying to determine the offset for a local datetime. It
254/// indicates how the corresponding civil datetimes in `civil_starts` and
255/// `civil_ends` should be interpreted. That is, there are three possible
256/// cases:
257///
258/// 1. The offset of this transition is equivalent to the offset of the
259/// previous transition. That means there are no ambiguous civil datetimes
260/// between the transitions. This can occur, e.g., when the time zone
261/// abbreviation changes.
262/// 2. The offset of the transition is greater than the offset of the previous
263/// transition. That means there is a "gap" in local time between the
264/// transitions. This typically corresponds to entering daylight saving time.
265/// It is usually, but not always, 1 hour.
266/// 3. The offset of the transition is less than the offset of the previous
267/// transition. That means there is a "fold" in local time where time is
268/// repeated. This typically corresponds to leaving daylight saving time. It
269/// is usually, but not always, 1 hour.
270///
271/// # More explanation
272///
273/// This, when combined with `civil_starts` and `civil_ends` in
274/// `TzifTransitions`, explicitly represents ambiguous wall clock times that
275/// occur at the boundaries of transitions.
276///
277/// The start of the wall clock time is always the earlier possible wall clock
278/// time that could occur with this transition's corresponding offset. For a
279/// gap, it's the previous transition's offset. For a fold, it's the current
280/// transition's offset.
281///
282/// For example, DST for `America/New_York` began on `2024-03-10T07:00:00+00`.
283/// The offset prior to this instant in time is `-05`, corresponding
284/// to standard time (EST). Thus, in wall clock time, DST began at
285/// `2024-03-10T02:00:00`. And since this is a DST transition that jumps ahead
286/// an hour, the start of DST also corresponds to the start of a gap. That is,
287/// the times `02:00:00` through `02:59:59` never appear on a clock for this
288/// hour. The question is thus: which offset should we apply to `02:00:00`?
289/// We could apply the offset from the earlier transition `-05` and get
290/// `2024-03-10T01:00:00-05` (that's `2024-03-10T06:00:00+00`), or we could
291/// apply the offset from the later transition `-04` and get
292/// `2024-03-10T03:00:00-04` (that's `2024-03-10T07:00:00+00`).
293///
294/// So in the above, we would have a `Gap` variant where `start` (inclusive) is
295/// `2024-03-10T02:00:00` and `end` (exclusive) is `2024-03-10T03:00:00`.
296///
297/// The fold case is the same idea, but where the same time is repeated.
298/// For example, in `America/New_York`, standard time began on
299/// `2024-11-03T06:00:00+00`. The offset prior to this instant in time
300/// is `-04`, corresponding to DST (EDT). Thus, in wall clock time, DST
301/// ended at `2024-11-03T02:00:00`. However, since this is a fold, the
302/// actual set of ambiguous times begins at `2024-11-03T01:00:00` and
303/// ends at `2024-11-03T01:59:59.999999999`. That is, the wall clock time
304/// `2024-11-03T02:00:00` is unambiguous.
305///
306/// So in the fold case above, we would have a `Fold` variant where
307/// `start` (inclusive) is `2024-11-03T01:00:00` and `end` (exclusive) is
308/// `2024-11-03T02:00:00`.
309///
310/// Since this gets bundled in with the sorted sequence of transitions, we'll
311/// use the "start" time in all three cases as our target of binary search.
312/// Once we land on a transition, we'll know our given wall clock time is
313/// greater than or equal to its start wall clock time. At that point, to
314/// determine if there is ambiguity, we merely need to determine if the given
315/// wall clock time is less than the corresponding `end` time. If it is, then
316/// it falls in a gap or fold. Otherwise, it's unambiguous.
317///
318/// Note that we could compute these datetime values while searching for the
319/// correct transition, but there's a fair bit of math involved in going
320/// between timestamps (which is what TZif gives us) and calendar datetimes
321/// (which is what we're given as input). It is also necessary that we offset
322/// the timestamp given in TZif at some point, since it is in UTC and the
323/// datetime given is in wall clock time. So I decided it would be worth
324/// pre-computing what we need in terms of what the input is. This way, we
325/// don't need to do any conversions, or indeed, any arithmetic at all, for
326/// time zone lookups. We *could* store these as transitions, but then the
327/// input datetime would need to be converted to a timestamp before searching
328/// the transitions.
329#[derive(Clone, Copy, Debug)]
330pub enum TzifTransitionKind {
331 /// This transition cannot possibly lead to an unambiguous offset because
332 /// its offset is equivalent to the offset of the previous transition.
333 ///
334 /// Has an entry in `civil_starts`, but corresponding entry in `civil_ends`
335 /// is always zeroes (i.e., meaningless).
336 Unambiguous,
337 /// This occurs when this transition's offset is strictly greater than the
338 /// previous transition's offset. This effectively results in a "gap" of
339 /// time equal to the difference in the offsets between the two
340 /// transitions.
341 ///
342 /// Has an entry in `civil_starts` for when the gap starts (inclusive) in
343 /// local time. Also has an entry in `civil_ends` for when the fold ends
344 /// (exclusive) in local time.
345 Gap,
346 /// This occurs when this transition's offset is strictly less than the
347 /// previous transition's offset. This results in a "fold" of time where
348 /// the two transitions have an overlap where it is ambiguous which one
349 /// applies given a wall clock time. In effect, a span of time equal to the
350 /// difference in the offsets is repeated.
351 ///
352 /// Has an entry in `civil_starts` for when the fold starts (inclusive) in
353 /// local time. Also has an entry in `civil_ends` for when the fold ends
354 /// (exclusive) in local time.
355 Fold,
356}
357
358/// The representation we use to represent a civil datetime.
359///
360/// We don't use `shared::util::itime::IDateTime` here because we specifically
361/// do not need to represent fractional seconds. This lets us easily represent
362/// what we need in 8 bytes instead of the 12 bytes used by `IDateTime`.
363///
364/// Moreover, we pack the fields into a single `i64` to make comparisons
365/// extremely cheap. This is especially useful since we do a binary search on
366/// `&[TzifDateTime]` when doing a TZ lookup for a civil datetime.
367#[derive(Clone, Copy, Debug, Eq, Hash, PartialEq, PartialOrd, Ord)]
368pub struct TzifDateTime {
369 bits: i64,
370}
371
372impl TzifDateTime {
373 pub const ZERO: TzifDateTime = TzifDateTime::new(0, 0, 0, 0, 0, 0);
374
375 pub const fn new(
376 year: i16,
377 month: i8,
378 day: i8,
379 hour: i8,
380 minute: i8,
381 second: i8,
382 ) -> TzifDateTime {
383 // TzifDateTime { year, month, day, hour, minute, second }
384 let mut bits = (year as u64) << 48;
385 bits |= (month as u64) << 40;
386 bits |= (day as u64) << 32;
387 bits |= (hour as u64) << 24;
388 bits |= (minute as u64) << 16;
389 bits |= (second as u64) << 8;
390 // The least significant 8 bits remain 0.
391 TzifDateTime { bits: bits as i64 }
392 }
393
394 pub const fn year(self) -> i16 {
395 (self.bits as u64 >> 48) as u16 as i16
396 }
397
398 pub const fn month(self) -> i8 {
399 (self.bits as u64 >> 40) as u8 as i8
400 }
401
402 pub const fn day(self) -> i8 {
403 (self.bits as u64 >> 32) as u8 as i8
404 }
405
406 pub const fn hour(self) -> i8 {
407 (self.bits as u64 >> 24) as u8 as i8
408 }
409
410 pub const fn minute(self) -> i8 {
411 (self.bits as u64 >> 16) as u8 as i8
412 }
413
414 pub const fn second(self) -> i8 {
415 (self.bits as u64 >> 8) as u8 as i8
416 }
417}
418
419#[derive(Clone, Copy, Debug, Eq, PartialEq)]
420pub struct PosixTimeZone<ABBREV> {
421 pub std_abbrev: ABBREV,
422 pub std_offset: PosixOffset,
423 pub dst: Option<PosixDst<ABBREV>>,
424}
425
426#[derive(Clone, Copy, Debug, Eq, PartialEq)]
427pub struct PosixDst<ABBREV> {
428 pub abbrev: ABBREV,
429 pub offset: PosixOffset,
430 pub rule: PosixRule,
431}
432
433#[derive(Clone, Copy, Debug, Eq, PartialEq)]
434pub struct PosixRule {
435 pub start: PosixDayTime,
436 pub end: PosixDayTime,
437}
438
439#[derive(Clone, Copy, Debug, Eq, PartialEq)]
440pub struct PosixDayTime {
441 pub date: PosixDay,
442 pub time: PosixTime,
443}
444
445#[derive(Clone, Copy, Debug, Eq, PartialEq)]
446pub enum PosixDay {
447 /// Julian day in a year, no counting for leap days.
448 ///
449 /// Valid range is `1..=365`.
450 JulianOne(i16),
451 /// Julian day in a year, counting for leap days.
452 ///
453 /// Valid range is `0..=365`.
454 JulianZero(i16),
455 /// The nth weekday of a month.
456 WeekdayOfMonth {
457 /// The month.
458 ///
459 /// Valid range is: `1..=12`.
460 month: i8,
461 /// The week.
462 ///
463 /// Valid range is `1..=5`.
464 ///
465 /// One interesting thing to note here (or my interpretation anyway),
466 /// is that a week of `4` means the "4th weekday in a month" where as
467 /// a week of `5` means the "last weekday in a month, even if it's the
468 /// 4th weekday."
469 week: i8,
470 /// The weekday.
471 ///
472 /// Valid range is `0..=6`, with `0` corresponding to Sunday.
473 weekday: i8,
474 },
475}
476
477#[derive(Clone, Copy, Debug, Eq, PartialEq)]
478pub struct PosixTime {
479 pub second: i32,
480}
481
482#[derive(Clone, Copy, Debug, Eq, PartialEq)]
483pub struct PosixOffset {
484 pub second: i32,
485}
486
487// only-jiff-start
488impl TzifStatic {
489 pub const fn into_jiff(self) -> crate::tz::tzif::TzifStatic {
490 crate::tz::tzif::TzifStatic::from_shared_const(self)
491 }
492}
493// only-jiff-end
494
495// only-jiff-start
496impl PosixTimeZone<&'static str> {
497 pub const fn into_jiff(self) -> crate::tz::posix::PosixTimeZoneStatic {
498 crate::tz::posix::PosixTimeZone::from_shared_const(self)
499 }
500}
501// only-jiff-end
502
503// Does not require `alloc`, but is only used when `alloc` is enabled.
504#[cfg(feature = "alloc")]
505pub(crate) mod crc32;
506pub(crate) mod posix;
507#[cfg(feature = "alloc")]
508pub(crate) mod tzif;
509pub(crate) mod util;