iTranslated by AI
What's New in GHC 9.12
GHC 9.12.1 was released on December 16, 2024.
In this article, I will review the new features of GHC 9.12 based on my personal selection. Past similar articles are:
- New Features in GHC 9.2 and Trends in GHC 2021
- New Features in GHC 8.10 and GHC 9.0
- New Features in GHC 9.4
- New Features in GHC 9.6
- New Features in GHC 9.8
- New Features in GHC 9.10
This article is not an exhaustive introduction. In particular, I have not covered areas like the RTS or Template Haskell, which I am not very familiar with. Please also refer to the official release notes:
- 2.1. Version 9.12.1 — Glasgow Haskell Compiler 9.12.1 User's Guide
- Changelog for base-4.21.0.0 | Hackage
- 9.12 · Wiki · Glasgow Haskell Compiler / GHC · GitLab
Features in GHC 9.12
MultilineStrings Extension
Multiline string literals are implemented as the MultilineStrings extension.
In traditional Haskell, there were methods such as using the unlines function or string gaps (a feature where whitespace surrounded by backslashes within a string literal is ignored) to write string literals across multiple lines.
str1 = unlines
[ "aaa"
, "bbb"
, "ccc"
]
-- -> "aaa\nbbb\nccc\n"
str2 = "aaa\n\
\bbb\n\
\ccc\n"
-- -> "aaa\nbbb\nccc\n"
On the other hand, with the MultilineStrings extension, you can write multiline string literals using triple double quotes.
{-# LANGUAGE MultilineStrings #-}
str3 = """
aaa
bbb
ccc
"""
-- -> "aaa\nbbb\nccc"
Many languages implement multiline string literals, but the syntax varies slightly by language. Here are some characteristics of the implementation in GHC:
- Common indentation is stripped.
- If the literal starts with a newline, one
\nis removed. - If the literal ends with a newline, one
\nis removed. - Even if the input file uses CRLF line endings, the line endings embedded in the string are treated as LF.
Here is an example illustrating this:
{-# LANGUAGE MultilineStrings #-}
str4 = """
aaa
bbb
ccc
"""
-- -> "aaa\n bbb\nccc\n"
str5 = """
aaa
bbb
ccc\n
"""
-- -> "aaa\nbbb\n ccc\n"
OrPatterns Extension
In pattern matching, there are cases where you want to perform the same processing in multiple branches. For example, consider the following code:
data T = Foo | Bar | Baz
f :: T -> IO ()
f Foo = putStrLn "A"
f Bar = putStrLn "B"
f Baz = putStrLn "B" -- Same as f Bar!
Suppose we want to perform the same processing for f Bar and f Baz. Here, we've written the same code twice.
In this example, an alternative to "writing it twice" is "using the wildcard pattern _."
data T = Foo | Bar | Baz
f :: T -> IO ()
f Foo = putStrLn "A"
f _ = putStrLn "B"
However, using a wildcard pattern increases the likelihood of forgetting to update the code when new data constructors are added to the pattern match target. In other words, if the definition of T changes to Foo | Bar | Baz | Bazz, the approach of "fixing where warnings or errors occur" will no longer be reliable.
This is where the OrPatterns extension comes in. It allows you to write multiple patterns separated by semicolons:
{-# LANGUAGE OrPatterns #-}
data T = Foo | Bar | Baz
f :: T -> IO ()
f Foo = putStrLn "A"
f (Bar; Baz) = putStrLn "B"
When there is no ambiguity, it is also possible to write them without parentheses:
{-# LANGUAGE OrPatterns #-}
g :: T -> IO ()
g x = case x of
Foo -> putStrLn "A"
Bar; Baz -> putStrLn "B"
Semicolon insertion based on layout is also valid:
{-# LANGUAGE OrPatterns #-}
h :: T -> IO ()
h x = case x of
Foo -> putStrLn "A"
Bar
Baz -> putStrLn "B"
On the other hand, parentheses cannot be omitted in the pattern matching of function definitions:
{-# LANGUAGE OrPatterns #-}
f :: T -> IO ()
f Foo = putStrLn "A"
f Bar; Baz = putStrLn "B" -- Not allowed
-- In terms of layout rules, this is equivalent to writing:
-- f Foo = putStrLn "A"
-- f Bar
-- Baz = putStrLn "B"
-- (which results in an error)
NamedDefaults Extension: Generalizing default Declarations
- ghc-proposals/proposals/0409-exportable-named-default.rst at master · ghc-proposals/ghc-proposals
- 6.11.3. Named default declarations — Glasgow Haskell Compiler 9.12.1 User's Guide
In Haskell, type ambiguity can sometimes occur. For example:
main = print ((777 :: Integer) ^ 3)
What should the type of the exponent 3 be? As another example:
main = print (read "123")
In this code, what type should be used for read?
Haskell 2010 allows resolving this through default declarations when an ambiguous type variable has Num-related constraints. Specifically, when:
- Constraints on the type variable
vare limited to the formC v. - At least one of the constrained classes is numeric (
Numor its subclasses). - All constrained classes are from the Prelude or standard libraries.
When these conditions are met, the types described in a default declaration of the following form are tried in order (defaulting):
default (t1, ..., tn)
If nothing is specified, the following default declaration is in effect:
default (Integer, Double)
Therefore, the 3 in the exponent in the previous example is resolved to Integer. On the other hand, print (read ...) results in an error because no numeric classes are involved.
GHC's rules regarding this defaulting have been expanded as the compiler has evolved. For example, in GHCi, the ExtendedDefaultRules extension is enabled, allowing the print (read ...) example to pass. When using the OverloadedStrings extension, defaulting also works for the IsString class, and the String type is included as a candidate for defaulting. On the other hand, defaulting does not work for the OverloadedLists extension.
With the NamedDefaults extension, you can specify a class in the default declaration like this:
default C (t1, ..., tn)
The condition for triggering defaulting is then relaxed to:
- There is at least one constraint of the form
C vamong the constraints for the type variablev.
Candidates are then searched for within default C. If there are multiple applicable classes, they must resolve to the same candidate.
Additionally, it will now be possible to export default declarations from a module.
For more details, please refer to the GHC Proposal and documentation.
As a note, contrary to the examples in the GHC Proposal, it seems that IsList is practically unusable for this purpose. A list type without an element specified, [], is not an instance of IsList (instances are types with specified elements, like [Int]), so a declaration like:
default IsList ([])
is not possible. Even if you try specifying the element type:
{-# LANGUAGE OverloadedLists #-}
{-# LANGUAGE NamedDefaults #-}
import GHC.IsList
default IsList ([Char])
main = print ['a']
This code does not work, possibly due to type inference behavior or some other reason.
Wildcards _ in Type Declarations
In Haskell, when a term-level function doesn't use an argument, you can use a wildcard _ instead of a variable name.
const :: a -> b -> a
const x _ = x
On the other hand, in previous versions of GHC, it was necessary to give names to all arguments of type-level functions.
type Const x y = x -- OK
-- type Const x _ = x -- Not allowed
In GHC 9.12, as part of the TypeAbstractions extension, you can use the wildcard _ for arguments of type-level functions.
{-# LANGUAGE TypeAbstractions #-}
type Const x _ = x -- OK
The HasField Class and Representation Polymorphism
GHC has a mechanism to access record fields using the HasField class. For example, the OverloadedRecordDot extension added in GHC 9.2 desugars dot notation using the HasField class.
The HasField class was previously defined as follows:
module GHC.Records where
class HasField (x :: k) r a | x r -> a where
getField :: r -> a
x is the field name, typically a type of the Symbol kind. r is the record type, and a is the field type.
An example of using HasField and OverloadedRecordDot is as follows:
{-# LANGUAGE OverloadedRecordDot #-}
{-# LANGUAGE DataKinds #-}
import GHC.Records
instance HasField "successor" Int Int where
getField x = x + 1
main :: IO ()
main = do
print $ (37 :: Int).successor -- The next integer after 37 (38)
Previously, the kind of the HasField class was k -> Type -> Type -> Type -> Constraint:
GHCi, version 9.10.1: https://www.haskell.org/ghc/ :? for help
ghci> :m + GHC.Records
ghci> :set -fprint-explicit-runtime-reps -fprint-explicit-kinds -XNoStarIsType
ghci> :k HasField
HasField :: k -> Type -> Type -> Type -> Constraint
This meant that unboxed or unlifted types could not be used as record types or field types. In fact, the following code could not be compiled in GHC 9.10:
{-# LANGUAGE MagicHash #-}
{-# LANGUAGE OverloadedRecordDot #-}
{-# LANGUAGE DataKinds #-}
import GHC.Exts
import GHC.Records
instance HasField "successor" Int# Int# where
getField x = x +# 1#
main :: IO ()
main = do
print $ I# (37# :: Int#).successor
This restriction is relaxed in GHC 9.12. In GHC 9.12, the kind of HasField is as follows:
ghci> :m + GHC.Records
ghci> :set -fprint-explicit-runtime-reps -fprint-explicit-kinds -XNoStarIsType
ghci> :k HasField
HasField :: k -> TYPE r_rep -> TYPE a_rep -> Constraint
And the Int# example will now pass.
Incidentally, the mechanism to treat unboxed types uniformly using TYPE was initially called levity polymorphism, but this is now referred to as representation polymorphism. This follows the introduction of "true (?)" levity polymorphism (BoxedRep) in GHC 9.2, which uniformly treats only lifted boxed and unlifted boxed types.
Strengthening the RequiredTypeArguments Extension (Allowing -> and => in Terms)
The RequiredTypeArguments extension was introduced in GHC 9.10 (see: Playing with visible forall implemented in GHC 9.10). At that time, arrows such as functions could not be used at the term level (requiring an explicit type keyword). This restriction has been relaxed, and -> or => can now be treated as types even when written without the type keyword.
{-# LANGUAGE RequiredTypeArguments #-}
id' :: forall a -> a -> a
id' _ x = x
main = do
let f = id' (Int -> Int) (+ 5)
-- In GHC 9.10, using the ExplicitNamespaces extension, it was necessary to write:
-- let f = id' (type (Int -> Int)) (+ 5)
print $ f 37
HexFloatLiterals for Unboxed Float#/Double#
By using the HexFloatLiterals extension, you can use hexadecimal notation for floating-point numbers (e.g., 0x1.cafep100).
This can now be used with unboxed Float# and Double# types. Example:
{-# LANGUAGE MagicHash #-}
{-# LANGUAGE HexFloatLiterals #-}
import GHC.Exts
main :: IO ()
main = do
print (F# 0x1.cafep0#)
print (D# 0x1.cafep0##)
Relaxing Restrictions on the UnliftedFFITypes Extension
With the UnliftedFFITypes extension, you can pass unlifted types through the FFI. Some types, such as ByteArray# or SIMD types, cannot be passed without using UnliftedFFITypes.
Now, empty tuples can be handled as arguments. Example:
foreign import ccall unsafe foo :: (# #) -2 Int32#
RISC-V (64-bit) Support in the NCG Backend
RISC-V is an emerging instruction set architecture that is gaining momentum in the embedded space. While it's unclear if it will replace the smartphone or PC markets, various SBCs (like the Raspberry Pi) have appeared.
Accordingly, GHC has been advancing its support for RISC-V. In GHC 9.2, the LLVM backend added support for 64-bit RISC-V.
Now, the NCG (Native Code Generator) supports 64-bit RISC-V, allowing for builds without requiring LLVM.
As of now, official pre-built versions of GHC for RISC-V are not distributed, so if you want to try generating code for RISC-V, you will likely need to build it yourself. The procedure for building and installing GHC as a cross-compiler for RISC-V is as follows:
$ # Install dependencies (for Ubuntu)
$ sudo apt install build-essential curl autoconf gcc-riscv64-linux-gnu g++-riscv64-linux-gnu
$ sudo apt install qemu-user
$ # Install GHC (9.6 or later) using ghcup
$ ghcup install ghc 9.6.6 --set
$ cabal install alex happy
$ GHC_VERSION=9.12.1
$ curl -LO https://downloads.haskell.org/~ghc/$GHC_VERSION/ghc-$GHC_VERSION-src.tar.xz && tar -xJf ghc-$GHC_VERSION-src.tar.xz
$ cd ghc-$GHC_VERSION
$ ./configure --target=riscv64-linux-gnu
$ # Build (takes time)
$ hadrian/build --bignum=native -j binary-dist-dir
$ # Install the generated binary
$ cd _build/bindist/ghc-$GHC_VERSION-riscv64-linux-gnu
$ ./configure --target=riscv64-linux-gnu --prefix=$HOME/ghc-rv64 CC=riscv64-linux-gnu-gcc CXX=riscv64-linux-gnu-g++
$ make install
If this procedure doesn't work for you, please adjust it accordingly. A key point is that currently, you need to set various options even during the configure step of the pre-built binary.
An execution example is as follows:
$ echo 'main = putStrLn "Hello world!"' > hello.hs
$ ~/ghc-rv64/bin/riscv64-linux-gnu-ghc hello.hs
$ file hello
hello: ELF 64-bit LSB executable, UCB RISC-V, RVC, double-float ABI, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-riscv64-lp64d.so.1, BuildID[sha1]=250f432c120ef3948b7936b16a26b4add734ae69, for GNU/Linux 4.15.0, not stripped
$ qemu-riscv64 -L /usr/riscv64-linux-gnu/ ./hello
Hello world!
As GHC moves toward full support, it makes one really want actual RISC-V hardware.
SIMD Support in the x86 NCG
- Related article: Thinking about SIMD in Haskell/GHC (August 2023)
SIMD stands for single instruction, multiple data, a CPU feature that can process multiple data points with a single instruction.
Because it uses dedicated instructions, compiler-side support is necessary to utilize it. Specifically, this means either the compiler rewrites ordinary loops to utilize SIMD instructions (auto-vectorization), or it provides dedicated data types and primitives for programmers to utilize SIMD instructions themselves.
The current approach in GHC is the latter, providing data types such as FloatX4# and primitives such as plusFloatX4#. However, until now, these were only supported by the LLVM backend, which made it difficult to utilize them in general libraries.
In this release, the x86 NCG has gained support for some SIMD data types and primitives. Specifically, 128-bit wide floating-point vectors, namely FloatX4# and DoubleX2#. Support for integers or widths of 256 bits and above is not yet implemented. Additionally, it may require SSE 4.1 even for code that could be compiled for SSE2 when using LLVM.
That being said, since the tedious parts of implementation (such as saving registers to the stack) have been handled, the support status should improve as motivated contributors step in. I also intend to contribute if I have the time.
Here is a sample of SIMD code in GHC:
{-# LANGUAGE MagicHash #-}
{-# LANGUAGE UnboxedTuples #-}
import GHC.Exts
main :: IO ()
main = do
let v = packFloatX4# (# 1.1#, 2.2#, 3.3#, 4.4# #)
w = packFloatX4# (# 0.1#, 0.2#, 0.3#, 0.4# #)
x = minusFloatX4# v w
(# a, b, c, d #) = unpackFloatX4# x
print (F# a, F# b, F# c, F# d)
To compile, you can use the newly supported x86 NCG with:
$ ghc simdtest.hs
or use the traditionally supported LLVM backend with:
$ ghc -fllvm simdtest.hs
For serious SIMD usage, a wrapper library would be desirable, but the ones on Hackage (simd, primitive-simd) have old last-updated dates, and it's unclear if they are still usable. Someone might need to create a new one.
Addition of SIMD Primitives
In connection with the implementation of SIMD in the x86 NCG, several primitives have been added. Examples include:
module GHC.Prim where
fmaddFloatX4# :: FloatX4# -> FloatX4# -> FloatX4# -> FloatX4# -- x * y + z
fmsubFloatX4# :: FloatX4# -> FloatX4# -> FloatX4# -> FloatX4# -- x * y - z
fnmaddFloatX4# :: FloatX4# -> FloatX4# -> FloatX4# -> FloatX4# -- - x * y + z
fnmsubFloatX4# :: FloatX4# -> FloatX4# -> FloatX4# -> FloatX4# -- - x * y - z
shuffleFloatX4# :: FloatX4# -> FloatX4# -> (# Int#, Int#, Int#, Int# #) -> FloatX4#
minFloatX4# :: FloatX4# -> FloatX4# -> FloatX4#
maxFloatX4# :: FloatX4# -> FloatX4# -> FloatX4#
Floating-point min/max have also been added:
module GHC.Prim where
minFloat# :: Float# -> Float# -> Float#
maxFloat# :: Float# -> Float# -> Float#
However, since the behavior differs depending on the environment, the specification might be changed in the future (#25350: Floating-point min/max primops should have consistent behavior across platforms · Issues · Glasgow Haskell Compiler / GHC · GitLab).
Note that the newly added primitives are not exported from GHC.Exts. Without wrappers, they may be out of reach for ordinary Haskell users.
Use the LLVM Backend on Windows Without Any Extra Setup
In Haskell Setup 2023, I mentioned that "setting up LLVM tools on Windows is a hassle." At the time, opt.exe and llc.exe were not included in the official binary distributions (they seem to be included now). Furthermore, even if you managed to set them up, you would encounter linker errors when using floating-point numbers.
This release resolves these issues, allowing the LLVM backend to be used on Windows without any manual setup. Specifically, the versions of opt.exe and llc.exe bundled with GHC are now used (actually, GHC for Windows had already switched to Clang recently, so LLVM was already being bundled), and the linker errors related to floating-point numbers have also been fixed.
Template Haskell Support for the WebAssembly Backend
Support has apparently been added. (Apologies for not being able to provide an in-depth explanation.)
Libraries
Data.List{.NonEmpty}.compareLength
Data.List.compareLength :: [a] -> Int -> Ordering
Data.List.NonEmpty.compareLength :: NonEmpty a -> Int -> Ordering
This is a safe and fast alternative to compare (length xs) n. It means you don't have to count all the elements of xs, and it works even with infinite lists.
ghci> compareLength ['A','B','C'] 3
EQ
ghci> compareLength [0..] 3
GT
flip becomes representation polymorphic
It has become representation polymorphic.
ghci> :set -fprint-explicit-runtime-reps
ghci> :type flip
flip
:: forall (repc :: GHC.Types.RuntimeRep) a b (c :: TYPE repc).
(a -> b -> c) -> b -> a -> c
Note that the behavior when applying types to flip has changed as a result of the increased number of type arguments.
read supports binary integer notation
It now supports the binary notation for integers.
ghci> read "0b1011" :: Integer
11
ghci> read "0b1011" :: Int
11
Data.List.{inits1,tails1}
module Data.List where
inits1 :: [a] -> [NonEmpty a]
tails1 :: [a] -> [NonEmpty a]
inits1 returns a list of "subsequences constructed by taking the first n elements." Unlike inits, n is 1 or greater.
tails1 returns a list of "subsequences constructed by removing the first n elements." Unlike tails, n is 1 or greater.
ghci> inits1 ["A","B","C","D"]
["A" :| [],"A" :| ["B"],"A" :| ["B","C"],"A" :| ["B","C","D"]]
ghci> tails1 ["A","B","C","D"]
["A" :| ["B","C","D"],"B" :| ["C","D"],"C" :| ["D"],"D" :| []]
Data.Bitraversable.{firstA,secondA}
module Data.Bitraversable where
firstA :: (Bitraversable t, Applicative f) => (a -> f c) -> t a b -> f (t c b)
secondA :: (Bitraversable t, Applicative f) => (b -> f c) -> t a b -> f (t a b)
Bitraversable is something like Traversable but with two element types (presumably). Within the standard library, Either and the tuple (,) are instances.
Bitraversable has the following method:
bitraverse :: (Bitraversable t, Applicative f) => (a -> f c) -> (b -> f d) -> t a b -> f (t c d)
The newly added firstA and secondA can be considered specializations of this.
Bonus: My Contributions
I will list the contributions I (@mod_poppo) made (such as bug reports and fixes) that are included in GHC 9.12, partly as a personal memo. Some of these activities were inspired by the implementation of SIMD in the x86 NCG.
- Aligning include paths for preprocessed assembly sources
.Swith others (May–June) !12692: Set package include paths when assembling .S files · Merge requests · Glasgow Haskell Compiler / GHC · GitLab- This allows using something like
#include <ghcconfig.h>in.Sfiles, benefiting those who do things like what I described in my previous article "【低レベルHaskell】Haskell (GHC) でもインラインアセンブリに肉薄したい!" (I want to get close to inline assembly even in Haskell (GHC)!).
- This allows using something like
- Bug report regarding LLVM detection not working correctly on macOS (June 17) #24999: LLVM version detection logic in configure doesn't work on macOS · Issues · Glasgow Haskell Compiler / GHC · GitLab
- Comment on the implementation of
negatefor x86 NCG SIMD (June 28)- Had them handle the sign of zero correctly.
- LLVM backend on Windows (August–September) !13183: Fix fltused errors on Windows with LLVM · Merge requests · Glasgow Haskell Compiler / GHC · GitLab
- I conducted the investigation and proposed a solution.
- Documenting primitive string literals (September) !13220: Document primitive string literals and desugaring of string literals · Merge requests · Glasgow Haskell Compiler / GHC · GitLab
- I wrote the findings and explanations from "Haskellの文字列リテラルはGHCでどのようにコンパイルされるか" (How Haskell string literals are compiled in GHC) into the official GHC documentation. I had hoped a native English speaker would write it, but since no one did for four years...
-
-msse4.2not working correctly with the LLVM backend (October)- Investigated the situation.
- Regarding the
MultilineStringsextension and CRLF (October)- Reported this because the proposal was unclear regarding CRLF behavior, and the implementation seemed to behave unintentionally.
- Making
pack/insert/broadcastforFloatX4#usable without specifying SSE4.1 (November) !13542: x86 NCG SIMD: Lower packFloatX4#, insertFloatX4# and broadcastFloatX4# to SSE1 instructions · Merge requests · Glasgow Haskell Compiler / GHC · GitLab
These contributions are done as a hobby and for free. For those who wish to support me, you can send a badge on Zenn, buy fanzines from "Damepo Lab," or support me via GitHub Sponsors.
If you are interested in contributing to GHC yourself, please refer to what I wrote in "My Contributions to GHC 2023." It might be a good idea to start by browsing GitLab to get a feel for the atmosphere. Creating an account is a bit tricky as it requires manual approval due to spam prevention.
Discussion