iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🔍

GHC Debugging Log

に公開

Overview

While testing my library with GHC 9.0.1-alpha1, I found a bug in a library bundled with GHC. It was difficult to debug because it wasn't reproducible, but after several days, I finally tracked down the cause.

The Beginning

Since GHC 9.0.1-alpha1 was released, I tested my own library. However, some tests related to fusedMultiplyAdd failed. Even when fixing the random seed used by QuickCheck, it failed with different values every time.

Haskell is a pure language, and the same calculation should return the same result. Therefore, a non-reproducible bug that fails with different values every time implies that something quite nasty (like memory safety corruption) is happening.

The problematic code can be roughly extracted like this:

-- Reference implementation of FMA
fusedMultiplyAdd_viaRational :: (RealFloat a) => a -> a -> a -> a
fusedMultiplyAdd_viaRational x y z
  | isFinite x && isFinite y && isFinite z =
      case toRational x * toRational y + toRational z of
        0 -> x * y + z
        r -> fromRational r
  | isFinite x && isFinite y = z + z -- x * y is finite, but z is Infinity or NaN
  | otherwise = x * y + z -- either x or y is Infinity or NaN

checkFMA :: (RealFloat a, Show a, Arbitrary a, Random a) => String -> (a -> a -> a -> a) -> [(a, a, a, a)] -> Spec
checkFMA name f cases = do
  prop name $ forAllFloats3 $ \a b c -> do
    -- sameFloatP is a comparison function that distinguishes signs of 0 and treats NaNs as equal
    f a b c `sameFloatP` fusedMultiplyAdd_viaRational a b c
  testSpecialValues name f cases -- Test explicitly given examples

spec :: Spec
spec = modifyMaxSuccess (* 1000) $ do
  describe "Double" $ do
    checkFMA "fusedMultiplyAdd (default)"      fusedMultiplyAdd             casesForDouble
    checkFMA "fusedMultiplyAdd (via Rational)" fusedMultiplyAdd_viaRational casesForDouble
  describe "Float" $ do
    checkFMA "fusedMultiplyAdd (default)"      fusedMultiplyAdd                casesForFloat
    checkFMA "fusedMultiplyAdd (via Rational)" fusedMultiplyAdd_viaRational    casesForFloat

The failure details were something like this:

  test/FMASpec.hs:87:3: 
  1) FMA.Float fusedMultiplyAdd (default)
       Falsified (after 31878 tests):
         0x1.189192p6
         0x1.6e6c4p-5
         0x1.6d6de2p5
         0x1.86874ep5 =/= 0x1.8p5

  To rerun use: --match "/FMA/Float/fusedMultiplyAdd (default)/"

Randomized with seed 324858585
  test/FMASpec.hs:87:3: 
  1) FMA.Float fusedMultiplyAdd (via Rational)
       Falsified (after 61282 tests):
         0x1.17fb68p126
         -0x1.797e3p-126
         -0x1.00a96cp5
         -0x1.08p5 =/= -0x1.0d9046p5

  To rerun use: --match "/FMA/Float/fusedMultiplyAdd (via Rational)/"

Randomized with seed 783038111

If extracted as a standalone code snippet, it would look like this:

#!/usr/bin/env cabal
{- cabal:
build-depends: base, QuickCheck
-}
{-# LANGUAGE TypeApplications #-}
import           Numeric
import           Test.QuickCheck

isFinite :: RealFloat a => a -> Bool
isFinite x = not (isNaN x) && not (isInfinite x)

-- newtype wrapper for printing in hexadecimal
newtype HexFloat a = HexFloat { getHexFloat :: a }
  deriving (Eq, Ord)

instance RealFloat a => Show (HexFloat a) where
  showsPrec _ (HexFloat x) = showHFloat x

-- Simple FMA that ignores the sign of 0, infinity, and NaN handling
fusedMultiplyAdd_simple :: (RealFloat a) => a -> a -> a -> a
fusedMultiplyAdd_simple x y z
  = fromRational (toRational x * toRational y + toRational z)

-- Test whether the given function matches fusedMultiplyAdd_simple
{-# NOINLINE prop_fma_simple #-}
prop_fma_simple :: (RealFloat a, Show a) => (a -> a -> a -> a) -> HexFloat a -> HexFloat a -> HexFloat a -> Property
prop_fma_simple f (HexFloat x) (HexFloat y) (HexFloat z)
  = HexFloat (f x y z) === HexFloat (fusedMultiplyAdd_simple x y z)

main :: IO ()
main = do
  quickCheck $ withMaxSuccess 1000000 $
    forAll (HexFloat <$> (arbitrary `suchThat` isFinite)) $ \x ->
    forAll (HexFloat <$> (arbitrary `suchThat` isFinite)) $ \y ->
    forAll (HexFloat <$> (arbitrary `suchThat` isFinite)) $ \z ->
    prop_fma_simple @Float fusedMultiplyAdd_simple x y z

Even when applying the same function (fusedMultiplyAdd_simple) to the same arguments, this issue occurred.

You can run this code like this:

$ cabal v2-run --with-compiler=/opt/ghc-9.0.1-alpha1/bin/ghc Test.hs

And the output would look like this:

*** Failed! Falsified (after 215548 tests):  
0x1.3fa1c2p4
0x1.32a8cep3
0x1.36cf48p4
0x1.a5bc06p7 /= 0x1.a4p7
*** Failed! Falsified (after 54278 tests):  
0x1.dd4246p5
0x1.49e178p5
-0x1.785d22p5
0x1.2d9dccp11 /= 0x1.2d8p11

Observations

First, I tried to observe the situation under different conditions. I wanted to identify the reproduction conditions as much as possible. If successful, I might be able to guess the cause at this stage.

This kind of observation and inference is fun, like being a detective.

The results were as follows:

  • It occurs with FMA. I prepared several implementations of FMA, but it didn't seem to matter. Even an implementation that "performs intermediate calculations with Rational" without relying on floating-point arithmetic sometimes returned incorrect answers.
  • It does not reproduce with binary addition or multiplication. Perhaps it requires the complexity of something like FMA.
  • It is non-reproducible. Even with the same random seed, different errors occur.
  • It occurs with Float, but not with Double.
  • The behavior suggests that the lower bits of the Float are being truncated.
  • It does not occur when optimization is turned off.
  • It seems easier to reproduce without specialization or inlining.
  • It seems not to occur with the LLVM backend.
  • It does not reproduce with older GHC (8.10.2).
  • I originally tested on macOS, but it also reproduced on Ubuntu inside a virtual machine running on the same physical machine. I gave up testing on Windows because the installation of GHC 9.0.1-alpha1 via stack failed.
  • It seemed not to occur on Raspberry Pi (AArch64). → This turned out to be a mistake.

This included some guesses that turned out to be wrong. Since it was a probabilistically occurring event, that's somewhat inevitable.

Inference

There were several broad areas where the cause could lie:

  • My own library
  • GHC's bundled libraries (base or ghc-bignum)
  • GHC's code generator, specifically the Native Code Generator (NCG)

I had already discovered one bug in my own library during testing with GHC 9.0.1-alpha1. However, in this case, the issue reproduced even when the code was extracted into a standalone program (as shown earlier), so the possibility of a bug in my own code was low.

This meant the likelihood of a bug on the GHC side was high. While the introduction of linear types in GHC 9.0 was a hot topic, it's unlikely that a bug causing this kind of memory corruption would arise from frontend changes. It was more likely a backend-related problem.

In the backend of GHC 9.0, the biggest change was ghc-bignum, the new implementation of Integer. Bugs are common in new libraries.

Regarding floating-point numbers in the code generator (NCG), there was a recent change to "always use SSE2 on x86 targets (sweeping out the x87 FPU-specific code from inside the compiler)", but that was already implemented in GHC 8.10, so it was unlikely to be causing new problems in GHC 9.0.

While it might seem natural to think "ghc-bignum seems buggy," at this point, I couldn't rule out a bug in the code generator and was terrified that "if the NCG is buggy, it'll be a nightmare (I don't want to debug it)."

Debugging

First, I wanted to verify the possibility of a code generator bug. However, looking at the intermediate code and assembly generated by -ddump-*** options is exhausting. As a small comfort, I tried adding GHC's own debugging options like -dcore-lint -dstg-lint -dcmm-lint -fllvm-fill-undef-with-garbage, but they had no particular effect. I didn't want to run git bisect on GHC itself because building GHC takes too long.

In the end, I gave up pursuing only the code generator bug and decided to proceed with debugging while considering both the possibility of a code generator bug and a bug in the bundled libraries (base or ghc-bignum).

At this point, I went back to the basics (observation). I had mentioned that "the lower bits of the Float are being truncated," but when I counted the number of truncated bits, it seemed somewhat arbitrary (like 13 bits). If it were memory corruption, it would naturally happen in byte units (multiples of 8 bits), and if it were a register misuse, even the exponent part would likely be completely wrong. This suggested that a code generation bug related to floating-point numbers was unlikely.

Next, I considered the possibility of a bug in functions related to the Float type. The Float functions used in the test code were basically just toRational and fromRational, so the suspicious ones were fromRational and encodeFloat, which is used internally.

So, I created a custom type that wraps the Float type to hook into fromRational and encodeFloat and inserted a sanity check. Specifically, it looked like this:

newtype FloatX = FloatX Float
  deriving (Eq, Ord, Show, Arbitrary, Random, Num, Floating, Real, RealFrac)

instance Fractional FloatX where
  {-# NOINLINE fromRational #-}
  fromRational x = let FloatX a = fromRat x
                       b = fromRational x
                   in if a == b then
                        FloatX a
                      else
                        error $ "fromRational/FloatX: mismatch " ++ showHFloat a (" " ++ showHFloat b "")
  (/) = coerce ((/) @Float)
  recip = coerce (recip @Float)

-- Implemented in C
foreign import ccall unsafe my_encodeFloat :: Int32 -> Int32 -> Float
foreign import ccall unsafe my_encodeFloatU :: Int32 -> Int32 -> Word32

instance RealFloat FloatX where
  floatRadix = coerce (floatRadix @Float)
  floatDigits = coerce (floatDigits @Float)
  floatRange = coerce (floatRange @Float)
  decodeFloat = coerce (decodeFloat @Float)
  {-# NOINLINE encodeFloat #-}
  encodeFloat m e = if abs m > 2^24 then
                      error "encodeFloat/FloatX: Mantissa too large"
                    else if integerCheck m then
                           let r = my_encodeFloat (fromInteger m) (fromIntegral e)
                               w = my_encodeFloatU (fromInteger m) (fromIntegral e)
                               r' = castWord32ToFloat w
                           in if r == r' then
                                if toRational m * 2^^e == toRational r then
                                  FloatX r
                                else
                                  error $ "encodeFloat/FloatX: Wrong value: " ++ showHFloat r " " ++ show (m,e)
                              else
                                  error $ "encodeFloat/FloatX: FFI mismatch: " ++ showHFloat r " vs " ++ showHFloat r' ""
                         else
                           error "encodeFloat/FloatX: Invalid integer"
  isNaN = coerce (isNaN @Float)
  isInfinite = coerce (isInfinite @Float)
  isDenormalized = coerce (isDenormalized @Float)
  isNegativeZero = coerce (isNegativeZero @Float)
  isIEEE = coerce (isIEEE @Float)
  atan2 = coerce (atan2 @Float)

However, the embedded checks were never triggered.

Still, this reduced the likelihood of it being a problem with floating-point handling itself. If there was a problem, it was highly probable that the values were already corrupted by arithmetic operations on rational numbers or arbitrary-precision integers before they reached encodeFloat or fromRational. (After all, it seemed related to ghc-bignum!)

When I modified the code to display the values of the rational numbers passed to fromRational, I hit the jackpot. The numerators and denominators were clearly smaller than they should be:

-- Display arguments to fromRational upon test failure
prop_fma_simple f (HexFloat x) (HexFloat y) (HexFloat z)
  = let r = toRational x * toRational y + toRational z
    in counterexample (show r) $
       HexFloat (f x y z) === HexFloat (fromRational r)

{- Execution example:
*** Failed! Falsified (after 238018 tests):  
0x1.9c55dp3
0x1.3fdddap3
0x1.d63dccp3
143 % 1  ← The result of this FMA calculation cannot be exactly 143!
0x1.1efdfep7 /= 0x1.1ep7
-}

Further investigation revealed that while toRational x * toRational y and toRational z had plausible values, the calculation of x * y + z was clearly wrong.

prop_fma_simple f (HexFloat x) (HexFloat y) (HexFloat z)
  = let xy' = toRational x * toRational y
        z' = toRational z
        r = xy' + z'
    in counterexample ("xy' = " ++ show xy') $
       counterexample ("z' = " ++ show z') $
       counterexample ("r = " ++ show r) $
       HexFloat (f x y z) === HexFloat (fromRational r)

{- Execution example:
*** Failed! Falsified (after 68474 tests):  
0x1.680692p5
-0x1.12d7aap2
-0x1.0f6604p5
xy' = (-106247109426877) % 549755813888   ← Plausible value
z' = (-4446593) % 131072   ← Plausible value
r = (-227) % 1   ← The result cannot be such a simple value!
-0x1.c65fd6p7 /= -0x1.c6p7
-}

In other words, the (+) operator for Rational was bugged.

In GHC, the Rational type is defined in the GHC.Real module of the base package. After copying the implementation of rational arithmetic from GHC.Real for further verification, I found that addition and multiplication of arbitrary-precision integers seemed fine, but the reduce function, which divides the numerator and denominator by their GCD, was behaving incorrectly.

prop_fma_simple f (HexFloat x) (HexFloat y) (HexFloat z)
  = let xy'@(n :% d) = toRational x * toRational y
        z'@(n' :% d') = toRational z
        n'' = n * d' + n' * d -- Numerator before reducing xy' + z'
        d'' = d * d' -- Denominator before reducing xy' + z'
        r = reduce n'' d'' -- xy' + z'
    in counterexample ("xy' = " ++ show xy') $
       counterexample ("z' = " ++ show z') $
       counterexample ("n'' = " ++ show n'') $
       counterexample ("d'' = " ++ show d'') $
       counterexample ("r = " ++ show r) $
       HexFloat (f x y z) === HexFloat (fromRational r)

{- Execution example:
*** Failed! Falsified (after 33464 tests):  
-0x1.691a2cp2
-0x1.e6ff24p4
0x1.1ed1b8p3
xy' = 47205871654947 % 274877906944
z' = 2349623 % 262144
n'' = 13020595471461908480   ← Plausible value
d'' = 72057594037927936   ← Plausible value (this is 2^56)
r = 180 % 1   ← Incorrect (this is the value of n'' divided by d'' with the decimal part truncated)
0x1.6964e6p7 /= 0x1.68p7

Incidentally, the value of gcd 13020595471461908480 72057594037927936 is 262144, and
the correct value for r is 49669629941795 % 274877906944
-}

If the GCD was being calculated as an incorrectly large value, it would explain the phenomenon where "it looks like the lower bits of the floating-point number are being truncated." (In this case, the denominator is a power of 2.)

Supplement: Internal Representation of Integer and Rational

Let's check the internal representation of Integer and Rational. We'll assume one word is 64 bits.

The Integer type is actually defined in ghc-bignum, a library bundled with GHC. Until GHC 8.10, it was defined in integer-gmp or integer-simple, but in GHC 9.0, they are integrated into ghc-bignum.

The definition of arbitrary-precision integer types like Integer is as follows. While the names of the data constructors have changed slightly with the introduction of ghc-bignum, they are basically equivalent to those in integer-gmp.

-- GHC.Num.Integer
data Integer = IS !Int#    -- When the value fits in an Int, i.e., between -2^63 and 2^63 - 1
             | IP !BigNat# -- When the value is 2^63 or greater
             | IN !BigNat# -- When the value is less than -2^63

-- GHC.Num.Natural
data Natural = NS !Word#   -- When the value fits in a Word, i.e., less than 2^64
             | NS !BigNat# -- When the value is 2^64 or greater

-- GHC.Num.BigNat
type BigNat# = WordArray#

-- GHC.Num.WordArray
type WordArray# = ByteArray# -- ByteArray# is a primitive type in GHC representing a byte sequence. See GHC.Exts

BigNat# is exactly what you'd expect for arbitrary-precision integers, holding the integer represented in binary as an array of 64-bit units on the heap. It is compatible with GMP's internal representation, allowing GMP functions to be applied directly.

Integer and Natural are like a hybrid of arbitrary-precision and fixed-precision; if the value fits in one word, it is embedded directly in the data constructor, and if not, it is represented using BigNat#.

The Rational type for representing rational numbers is an alias for Ratio Integer, as stated in the Language Report. While the Language Report defines Ratio in the Data.Ratio module, the actual definition in GHC resides in GHC.Real.

-- GHC.Real
data Ratio a = !a :% !a -- Invariant: stored as a reduced fraction. Denominator is positive
type Rational = Ratio Integer

Data.Ratio does not expose the Ratio data constructor, requiring the use of functions like (%), numerator, or denominator to access the numerator and denominator. However, GHC.Real exposes the data constructor (:%) normally.

GHC.Real contains an internal function reduce :: Integral a => a -> a -> Ratio a. Given a numerator and a (positive) denominator, it returns a reduced fraction (divided by the GCD). It is almost the same as (%), but differs in that (%) allows the denominator to be negative.

reduce :: Integral a => a -> a -> Ratio a
reduce _ 0 = error "divide by zero"
reduce x y = (x `quot` d) :% (y `quot` d)
             where d = gcd x y

Rational addition (+) is defined using reduce like this:

instance Integral a => Num (Ratio a) where
  (x :% y) + (x' :% y') = reduce (x * y' + x' * y) (y * y')

Debugging integerGcd

In reduce, the standard gcd function is used to calculate the GCD, but when a = Integer, the gcd :: Integral a => a -> a -> a function is designed to be replaced by a specialized function called integerGcd :: Integer -> Integer -> Integer via rewrite rules (see GHC.Real).

If integerGcd were bugged, it would explain why "it doesn't reproduce when optimization is turned off." This is because rewrite rules do not trigger when optimization is disabled.

So, I pursued integerGcd.

#!/usr/bin/env cabal
{- cabal:
build-depends: base, ghc-bignum, QuickCheck
-}
import           GHC.Num.Integer (integerGcd)
import           Test.QuickCheck

naiveGcd :: Integer -> Integer -> Integer
naiveGcd x y = gcd' (abs x) (abs y)
  where
    gcd' x 0 = x
    gcd' x y = gcd' y (x `rem` y)

main = do
  quickCheck $ withMaxSuccess 1000000 $
    forAll (choose (2^62, 2^65)) $ \x ->
    let y = 16 -- The value of y doesn't seem important, so I fixed it arbitrarily
    in integerGcd x y === naiveGcd x y

{- Execution example:
*** Failed! Falsified (after 647 tests):  
15296509433926188867   ← This is a 63-bit integer (between 2^63 and 2^64-1)
16 /= 1
-}

It seems that when one value is a 63-bit integer and the other is a small number less than 63 bits, it sometimes returns the smaller value directly without calculating the GCD.

Since the condition depends on the integer range, it explains the phenomenon where "it occurs with Float but not with Double" and "it occurs with FMA but not with addition or multiplication." This is because the denominator in Double exceeds 100 bits, whereas in Float addition and multiplication, both the numerator and denominator are around 48 bits.

Let's check the source code of the integerGcd function.

After excluding several trivial cases, integerGcd delegates the processing to:

  • When both can be represented as Int (between -2^{63} and 2^{63}-1): gcdWord#
  • When neither can be represented as Int: bigNatGcd
  • When one can be represented as Int but the other cannot: bigNatGcdWord#

In this case, bigNatGcdWord# seems suspicious. So, I looked at the source code for bigNatGcdWord#.

In bigNatGcdWord#, after removing trivial cases, it calls the backend function bignat_gcd_word. In the case of the GMP backend, bignat_gcd_word calls integer_gmp_mpn_gcd_1, which is implemented in C. The actual implementation of integer_gmp_mpn_gcd_1 is in gmp_wrappers.c, which calls GMP's mpn_gcd_1 function.

I suspected a bug in GMP's mpn_gcd_1, but it seemed fine. The integer_gmp_mpn_gcd_1 function on the caller side in gmp_wrappers.c hasn't changed for several years, so it's likely innocent.

I followed the function calls in the source code down to the C code, but it seems I went too deep. The problem likely lies on the Haskell side, which calls the C code.

To isolate the problem, I tested integerGcd, naturalGcd, gcdWord, and bigNatGcdWord# individually. integerGcd and bigNatGcdWord# failed.

#!/usr/bin/env cabal
{- cabal:
build-depends: base, ghc-bignum, QuickCheck
-}
{-# LANGUAGE MagicHash #-}
import           GHC.Exts
import           GHC.Num.BigNat  (bigNatFromWord, bigNatGcdWord#, gcdWord)
import           GHC.Num.Integer (integerGcd)
import           GHC.Num.Natural (Natural, naturalGcd)
import           Test.QuickCheck

naiveGcd :: Integer -> Integer -> Integer
naiveGcd x y = gcd' (abs x) (abs y)
  where
    gcd' x 0 = x
    gcd' x y = gcd' y (x `rem` y)

main = do
  -- Testing integerGcd
  quickCheck $ withMaxSuccess 1000000 $
    forAll (choose (2^63, 2^64-1)) $ \x ->
    \y' ->
    let y = fromIntegral (y' :: Word)
    in integerGcd x y === naiveGcd x y

  -- Testing naturalGcd
  quickCheck $ withMaxSuccess 1000000 $
    forAll (choose (2^63, 2^64-1)) $ \x ->
    \y' ->
    let x' = fromInteger x :: Natural
        y = fromIntegral (y' :: Word)
    in toInteger (naturalGcd x' (fromInteger y)) === naiveGcd x y

  -- Testing gcdWord
  quickCheck $ withMaxSuccess 1000000 $
    forAll (choose (2^63, 2^64-1)) $ \x ->
    \y' -> y' >= 2 ==> 
    let x' = fromInteger x :: Word
        y = fromIntegral y'
    in toInteger (gcdWord x' y') === naiveGcd x y

  -- Testing bigNatGcdWord#
  quickCheck $ withMaxSuccess 1000000 $
    forAll (choose (2^63, 2^64-1)) $ \x ->
    \y'@(W# y#) -> y' >= 2 ==> 
    let x' = bigNatFromWord (fromInteger x)
        y = fromIntegral y'
    in toInteger (W# (bigNatGcdWord# x' y#)) === naiveGcd x y

{- Execution example:
*** Failed! Falsified (after 1049 tests):                  
14286959948854507901
40
40 /= 1
+++ OK, passed 1000000 tests.
+++ OK, passed 1000000 tests; 442425 discarded.
*** Failed! Falsified (after 318297 tests):                  
11289483210199994043
11
11 /= 1
-}

It is certain that bigNatGcdWord# is bugged. Looking at the source code again, I noticed it calls a function named bigNatCompareWord#. If this function returns EQ, it returns the argument b :: Word# directly without performing the GCD calculation.

-- Excerpt from GHC 9.0.1-alpha1's src/GHC/Num/BigNat.hs:
bigNatGcdWord# :: BigNat# -> Word# -> Word#
bigNatGcdWord# a b
   | ...omitted...
   | True           = case bigNatCompareWord# a b of
      EQ -> b
      _  -> bignat_gcd_word a b

If, for some reason, bigNatCompareWord# was incorrectly returning EQ, it would explain the bug in bigNatGcdWord#. So, let's look at bigNatCompareWord#.

-- Excerpt from GHC 9.0.1-alpha1's src/GHC/Num/BigNat.hs:
bigNatCompareWord# :: BigNat# -> Word# -> Ordering
bigNatCompareWord# a b
   | bigNatIsZero a                   = cmpW# 0## b
   | isTrue# (wordArraySize# a ># 1#) = GT
   | True
   = cmpW# (indexWordArray# a 1#) b  -- Look at the index!!!

Yes.

When wordArraySize# a is 1, it is accessing the 1st element (0-indexed) of the array.

This is clearly a bug. It's an out-of-bounds memory access.

It took several days to track down the cause, but in the end, the fix was just "one character." This kind of thing often happens in programming. Source code is read much more often than it is written.

Reporting

I concocted a simple reproduction code and reported it to GHC's Issues. Technically, only bigNatCompareWord# would have sufficed, but considering the background, I also included a test for integerGcd. Since this bug is characterized by its non-reproducibility, I couldn't remove the dependency on QuickCheck.

{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE MagicHash    #-}
import           GHC.Exts
import           GHC.Num.BigNat  (bigNatCompareWord#, bigNatFromWord#)
import           GHC.Num.Integer (integerGcd)
import           Test.QuickCheck

main :: IO ()
main = do
  quickCheck $ withMaxSuccess 1000000 $
    forAll (choose (2^63, 2^64-1)) $ \x -0
    \y -0
      integerGcd x (toInteger y) === toInteger (gcd (fromInteger x) y :: Word)

  quickCheck $ withMaxSuccess 1000000 $
    \x@(W# x#) -0 let !x' = bigNatFromWord# x# in
      \y@(W# y#) -0
        bigNatCompareWord# x' y# === compare x y

Verification

Once you find what you believe to be the cause of a bug, you must also verify that fixing it actually works.

After checking out GHC:

$ ./boot
$ ./configure LLC=llc-mp-9.0 OPT=opt-mp-9.0 --with-intree-gmp
$ hadrian/build --flavour=quick -j3

I performed the build and tested using _build/stage1/bin/ghc. When I tried to do this without creating a project via cabal v2-run, it was annoying because it said:

    Could not find module ‘Prelude’
    Perhaps you haven't installed the profiling libraries for package ‘base-4.15.0.0’?
    Use -v (or `:set -v` in ghci) to see a list of the files searched for.

But that's a different story.

In the fixed GHC, I successfully confirmed that the issue no longer reproduced. (I haven't verified it with the original library yet, but I expect to be able to test it with alpha2, beta, or rc of GHC 9.0.1 which will be released soon.)

Closing Thoughts

As long as you use a stable version of GHC for normal Haskell programming, you wouldn't typically struggle with debugging this much. Array accesses are normally bounds-checked, and for your own code, it's easy to insert assertions. This time, I was unlucky because it was a bug in a library bundled with GHC.

Speaking of bugs occurring with a specific range of Integer, I once discovered a bug in the reflection package where it manifested when "the value was 2^{63} or greater and less than 2^{64}." That time, it wasn't an out-of-bounds access, but rather an instance where type invariants were broken by intentionally ignoring types with unsafeCoerce.

This time, I was lucky enough to identify the cause of the bug by myself, leading to a bug report complete with the cause (effectively, the fix). When reporting bugs to OSS, I make it a point to identify the cause as specifically as possible before reporting.

However, those who are not confident in their ESP abilities should prioritize reporting to the developers with a reproduction code once they encounter such a nasty bug, rather than trying too hard to identify the cause on their own. The developers usually know the source code best.

Lessons Learned (Addendum)

If I were to force some lessons from this incident:

  • Testing is important. This issue was discovered thanks to the fact that I usually write tests.
  • New things always come with bugs, so test alpha and beta versions actively.

That's about it.

Discussion