Preamble
This is a literate haskell document. It was developed using GHC version 8.10.4
with the following extensions:
{-# LANGUAGE GADTs #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE ExplicitForAll #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE UndecidableInstances #-}
{-# LANGUAGE NoStarIsType #-} -- `*` is not type
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE ScopedTypeVariables #-}
And these are the imports:
module Main where
import Data.Kind
import Data.Proxy
The Problem
It all started with a text book problem asking to implement:
splitInHalves :: [a] -> ([a], [a])
One possible implementation could be:
= splitAt (length xs `div` 2) xs splitInHalves xs
Of course, this has one big problem:
badSplit :: ([Int], [Int])
= splitInHalves $ repeat 69 badSplit
I.e., when the list is infinite, this code compiles without raising any warnings or errors.
Can I write a safe version of this function?
As the problem arises when lists are infinite, it seems natural that the solution would try to forbid them.
Let's first try a simple solution that fails controllably when the list is too long at runtime.
safeSplitInHalves :: Int -> [a] -> Maybe ([a], [a])
= splitAt <$> fmap (`div` 2) (safeLength 0 xs) <*> pure xs
safeSplitInHalves limit xs where
= Just acc
safeLength acc [] : ys) =
safeLength acc (_ if acc > limit
then Nothing
else safeLength (acc + 1) ys
Now, nothing prevents someone to still write:
safeBadSplit :: Int -> Maybe ([Int], [Int])
= (`safeSplitInHalves` repeat 69) safeBadSplit
It still type checks fine, but at the end is just a fancy way of writing const Nothing
that uses more memory.
Can I write a function that fails to type check for the badSplit
?
Perhaps I shouldn't, but I can certainly try.
The cause of problems with the previous version is that I don't know if the list is finite at compile time. This is why the next solution uses vectors, which could be thought as list with known length at compile time. The way to implement vectors in Haskell is well established.
I start by defining Peano naturals' type:
-- | Peano naturals
data Nat where
Z :: Nat
S :: Nat -> Nat
-- some constants
one :: Nat
= S Z
one
two :: Nat
= S one
two
three :: Nat
= S two
three
four :: Nat
= S three
four
-- some Kind constants
type Zero = 'Z
type One = 'S Zero
type Two = 'S One
type Three = 'S Two
type Four = 'S Three
type Five = 'S Four
Now I can define vectors of known length:
-- | A Vector has known length at compile time
data Vect :: Nat -> Type -> Type where
VNil :: Vect 'Z a
(:#) :: a -> Vect n a -> Vect ('S n) a
infixr 5 :#
-- | Example vector
example :: Vect Four Char
= '1' :# '2' :# '3' :# '4' :# VNil example
In this case example
is known to be of length 4 at compile time. Trying to add more elements in its definition would result in a type error.
One example of using vector is safely computing the head of a vector known to contain at least one element:
-- | Safe head of a vector
vHead :: Vect ('S n) a -> a
:# _) = x vHead (x
Writing increasingly complex functions
Still with my goal in mind, I decided to try to write some more functions over Vect
.
Take One, Drop One
One function, over lists, that plays nicer with Haskell laziness could be:
-- | Return half of the elements taking one, dropping the next, etc...
skipHalfTheElements :: [a] -> [a]
= []
skipHalfTheElements [] = []
skipHalfTheElements [_] : _ : xs) = x : skipHalfTheElements xs skipHalfTheElements (x
I can write one that works on Vect
like:
-- | Compute the half of a natural number
type family NatHalf (n :: Nat) :: Nat where
NatHalf 'Z = 'Z
NatHalf ('S 'Z) = 'Z
NatHalf ('S ('S n)) = 'S (NatHalf n)
-- Grab every other value
takeOneDropOne :: Vect n a -> Vect (NatHalf n) a
VNil = VNil
takeOneDropOne :# VNil) = VNil
takeOneDropOne (_ :# (_ :# xs)) = x :# takeOneDropOne xs
takeOneDropOne (x
-- I know the result has length Two
skippedExample :: Vect Two Char
= takeOneDropOne example skippedExample
As its signature suggests, takeOneDropOne
takes a Vect
with n
element and returns one with half the input length. For this function to work, I need to provide a way to compute the half, and that is what NatHalf
does.
NatHalf
is a type level function that takes types of kind Nat
and returns their half. I could have made a mistake writing it, but if it's correct then, at compile time, I know that the output of takeOneDropOne
has half the length of the input.
Droping elements from a vector
Next I implemented vDrop
that works like drop :: Int -> [a] -> [a]
but for Vect
. To know the length of the result at compile time, the type checker also needs to know the amount that will be dropped and that the original vector has enough elements.
For the later, I define, the usual type level addition of naturals:
type family (:+:) (n :: Nat) (m :: Nat) :: Nat where
:+:) 'Z m = m
(:+:) ('S n) m = 'S (n :+: m) (
The amount to drop can be passed using a singleton type. A singleton type is a type that only has one constructor. That means that given the type, there is only one way to construct the value.
SNat n
is a type indexed by a natural that is also a singleton. This is also the familiar way of implementing it.
-- | Singleton type indexed by naturals.
data SNat :: Nat -> Type where
SZ :: SNat 'Z
SS :: SNat n -> SNat ('S n)
-- | Some synonyms
type SZero = SNat 'Z
type SOne = SNat ( 'S 'Z )
type STwo = SNat ( 'S ('S 'Z ))
type SThree = SNat ( 'S ( 'S ('S 'Z )))
type SFour = SNat ( 'S ('S ( 'S ('S 'Z ))))
type SFive = SNat ( 'S ('S ('S ( 'S ('S 'Z )))))
Because knowing that a value has some type SNat n
means knowing how it's built. The translation from type to value can be "automated" with a type class.
class Singleton a where
reify :: Proxy a -> a
instance Singleton (SNat 'Z) where
Proxy = SZ
reify
instance Singleton (SNat n) => Singleton (SNat ('S n)) where
Proxy = SS (reify Proxy) reify
The data Proxy a = Proxy
is a technique to bring a type to the context without bringing any particular value. With this instance we can define some constants easily:
szero :: SZero
= reify Proxy
szero
sone :: SOne
= reify Proxy
sone
stwo :: STwo
= reify Proxy
stwo
sthree :: SThree
= reify Proxy
sthree
sfour :: SFour
= reify Proxy
sfour
sfive :: SFive
= reify Proxy sfive
Now I've all the pieces to build the vDrop
function that follows:
-- | Type safe drop function
vDrop :: SNat n -> Vect (n :+: m) a -> Vect m a
SZ xs = xs
vDrop SS n) (_ :# xs) = vDrop n xs
vDrop (
-- | Example using vDrop
droppedExample :: Vect One Char
= vDrop sthree example droppedExample
Taking elements from a vector
Writing vTake
turned out to be harder than vDrop
and required an ad hoc construction to turn it type safe:
-- | LTE n m, proof that n <= m
data LTE :: Nat -> Nat -> Type where
LTEZero :: LTE 'Z m
LTESucc :: LTE n m -> LTE ('S n) ('S m)
The LTE n m
type, indexed by two naturals, can only be constructed if n <= m
. This is a known way to represent it. It could be read as:
- Use
LTEZero
to prove that0 <= m
- To construct a proof that
S n_ <= S m
, useLTESucc
and provide a proof thatn <= m
.
With this representation there is only one way to proof that n <= m
. This means that LTE n m
is a singleton hence:
instance Singleton (LTE 'Z m) where
= LTEZero
reify _
instance Singleton (LTE n m) => Singleton (LTE ('S n) ('S m)) where
= LTESucc (reify Proxy) reify _
-- | Take n elements from a vector of length t with n <= t.
vTake :: LTE n t -> Vect t a -> Vect n a
LTEZero _ = VNil
vTake LTESucc p) (x :# xs) = x :# vTake p xs
vTake (
-- Example using `vTake`
takedExample :: Vect Three Char
= vTake (reify Proxy) example takedExample
I could use reify Proxy
in this case instead of sthree
, which is nice. Also, the amount to take, can be deduced from the proof that we have enough elements to take, so I don't need to pass a SNat n
.
Can I slice vectors type-safely?
Yes, and I figured this one out before vTake
.
Before approaching the halving of a Vect
, I took a look at the, perhaps, more general problem of slicing vectors. Slicing is a term python or R programmers use to describe the action of extracting a sub-collection from a collection (usually lists or arrays).
My solution also uses an ad hoc type to represent, at the type level, the values that I want to slice out.
-- | Indices for slices
data Slice (k :: Nat) :: Nat -> Vect k Bool -> Type where
-- | Extract nothing from an empty Vect
SliceNil :: Slice 'Z 'Z 'VNil
-- | Include the next value
SliceInc :: Slice t i is -> Slice ('S t) ('S i) ('True ':# is)
-- | Remove the next value
SliceRem :: Slice t i is -> Slice ('S t) i ('False ':# is)
A type of Slice Two One '['False, 'True]
means extract a total of One
value from a Vect
of length Two
where the first value is removed and the second included.
I don't think I need Vect k Bool
to index this type, and that [Bool]
is enough if I'm careful writing the constructors. But using a Vect k Bool
makes breaking the invariant a type error.
This type also happens to be a singleton.
instance Singleton (Slice 'Z 'Z 'VNil) where
Proxy = SliceNil
reify
instance Singleton (Slice t i is) => Singleton (Slice ( 'S t ) i ( 'False ':# is) ) where
Proxy = SliceRem (reify Proxy)
reify
instance Singleton (Slice t i is) => Singleton (Slice ( 'S t ) ('S i) ( 'True ':# is) ) where
Proxy = SliceInc (reify Proxy) reify
With these definitions vSlice
is easy to write. It takes a Slice t i is
and a Vect t a
as inputs and returns a Vect i a
where the k-th element of Vect t a
is present in Vect i a
if the k-th element of is
is True
.
-- | Slices the i indices from a vector
vSlice :: Slice t i is -> Vect t a -> Vect i a
SliceNil _ = VNil
vSlice SliceInc is) (x :# xs) = x :# vSlice is xs
vSlice (SliceRem is) (_ :# xs) = vSlice is xs
vSlice (
-- | Example using slice
slicedExample1 :: Vect Two Char
= vSlice (SliceInc (SliceRem (SliceRem (SliceInc SliceNil)))) example slicedExample1
Nice. The Slice
value has to be carefully crafted, but the type checker ensures that exactly two values are selected.
Special Slices
But perhaps there are special slices that don't require us to construct the full value by hand.
The null slice
One of them is the slice that selects nothing. Because it's always possible to select nothing, Vect 'Z a
is a terminal object.
The type of a slice that selects nothing can be computed from the length of the input vector. The following type level function computes a Vect n Bool
where every element is False
.
type family DropAll (n :: Nat) :: Vect n Bool where
DropAll 'Z = 'VNil
DropAll ('S n) = 'False ':# DropAll n
With this function we can compute, at compile time, the type of a slice that selects nothing, like the one used in the following function:
-- | The null slice
vEliminate :: Slice n 'Z (DropAll n) -> Vect n a -> Vect 'Z a
= vSlice
vEliminate
-- | Select nothing example
elimintadExample :: Vect 'Z Char
= vEliminate (reify Proxy) example elimintadExample
The identity slice
Conversely, another slice that we could surely take is the slice that selects everything.
type family TakeAll (n :: Nat) :: Vect n Bool where
TakeAll 'Z = 'VNil
TakeAll ('S n) = 'True ':# TakeAll n
-- | The identity slice
vDuplicate :: Slice n n (TakeAll n) -> Vect n a -> Vect n a
= vSlice
vDuplicate
duplicatedExample :: Vect Four Char
= vDuplicate (reify Proxy) example duplicatedExample
This is very similar to the previous case.
More Special Slices
In the previous two functions, we could say the we already know the result before applying the function. The output of the next slices depends less trivially on on their input.
First Half Slice
One thing, that I can do with any vector is look at its first half. Can I write a function that does this and doesn't require to explicitly construct the Slice
?
Yes, and it turns out to be more involved that the previous functions. Concretely, it involved writing five new type level functions; the last one actually computing the type of the slice.
That is, given for example 2 :: Nat
it returns Slice 2 1 [True, False]
, which is to select the first half.
-- | Less than equal over nats
type family LTEB (n :: Nat) (m :: Nat) :: Bool where
LTEB 'Z m = 'True
LTEB ( 'S n ) 'Z = 'False
LTEB ( 'S n ) ( 'S m ) = LTEB n m
-- | Add an element to the end of a vector
type family Snoc (xs :: Vect k a) (y :: a) :: Vect ('S k) a where
Snoc 'VNil y = y ':# 'VNil
Snoc (x ':# xs) y = x ':# Snoc xs y
-- | Reverse a vector
type family Reverse (xs :: Vect nx a) :: Vect nx a where
Reverse 'VNil = 'VNil
Reverse (x ':# xs) = Snoc (Reverse xs) x
-- | Auxiliar Recursive function that selects the first half of a vector of
-- length m
type family FirstHalfIdxAux (n :: Nat) (m :: Nat) :: Vect m Bool where
FirstHalfIdxAux n 'Z = 'VNil
FirstHalfIdxAux n ('S m) = LTEB ('S m) (NatHalf n) ':# FirstHalfIdxAux n m
type family FirstHalfIdx (n :: Nat) :: Vect n Bool where
FirstHalfIdx n = Reverse (FirstHalfIdxAux n n)
It important to note that the correctness of the following code dependes on the correctness of the previous functions. In other words, if there is an error above then the code below could be wrong despite type checking.
Anyway, with these functions I can finally select the first half:
-- | The first half of a vector
vFirstHalf :: Slice t (NatHalf t) (FirstHalfIdx t) -> Vect t a -> Vect (NatHalf t) a
= vSlice
vFirstHalf
firstHalfExample :: Vect Two Char
= vFirstHalf (reify Proxy) example firstHalfExample
Second Half Slice
After the first half, it comes the second half. And the law of halves is that the two have to recombine into the original object when combined in the proper order.
This translates into five new type level functions, that accomplish the goal of computing the type of the slice that selects the second half.
-- forall (n :: Nat) . NatHalf n :+: NatHalfC n = n
type family NatHalfC (n :: Nat) :: Nat where
NatHalfC 'Z = 'Z
NatHalfC ('S 'Z) = 'S 'Z
NatHalfC ('S ('S n)) = 'S (NatHalfC n)
-- | Type level not
type family NOTB (n :: Bool) :: Bool where
NOTB 'True = 'False
NOTB 'False = 'True
-- | Greater than over Nat
type family GTB (n :: Nat) (m :: Nat) :: Bool where
GTB n m = NOTB (LTEB n m)
type family SecondHalfIdxAux (n :: Nat) (m :: Nat) :: Vect m Bool where
SecondHalfIdxAux n 'Z = 'VNil
SecondHalfIdxAux n ('S m) = GTB ('S m) (NatHalf n) ':# SecondHalfIdxAux n m
type family SecondHalfIdx (n :: Nat) :: Vect n Bool where
SecondHalfIdx n = Reverse (SecondHalfIdxAux n n)
With all this help, I can write the function that extracts the second half:
-- | The second half of a vector
vSecondHalf :: Slice t (NatHalfC t) (SecondHalfIdx t) -> Vect t a -> Vect (NatHalfC t) a
= vSlice
vSecondHalf
secondHalfExample :: Vect Two Char
= vSecondHalf (reify Proxy) example secondHalfExample
What about the original problem?
With all these tools it is now easy to write a safe solution to the original problem:
-- | Split a vector in two halves
vHalve :: Slice t (NatHalf t) (FirstHalfIdx t)
-> Slice t (NatHalfC t) (SecondHalfIdx t)
-> Vect t a -> (Vect (NatHalf t) a, Vect (NatHalfC t) a)
= (vFirstHalf fh xs, vSecondHalf sh xs)
vHalve fh sh xs
halvedExample :: (Vect Two Char, Vect Two Char)
= vHalve (reify Proxy) (reify Proxy) example halvedExample
Finally a type safe solution. I still need those (reify Proxy)
arguments, which is not ideal.
How does this work in the "real" world?
So far all examples have been based on the example
vector which is known at compile time because it is defined in the code. In that regard, the original splitInHalves
function can also split any list you can type. The type safe functions, to this point, have only helped the type checker to find errors on the values I've provided at compile time.
To interact with "the real world", I would need to read vectors from it. The length of these vectors are unknown at compile time. How does it work? How do these functions help?
A case study
To explore the answers to these questions, I'll consider the following program:
- Ask for a
n :: Nat
- Ask for
n
Integers and store them in av :: Vect n Integer
- Ask for a second
m :: Nat
- Ensure that
m <= n
- Take
m
elements fromv
to form new vector.
I started with the simple case of converting an Integer
to a Nat
since Integers can be easly read from stdin
:
intToNat :: Integer -> Maybe Nat
intToNat n| n < 0 = Nothing
| n == 0 = Just Z
| otherwise = S <$> intToNat (n - 1)
The conversion might fail, but the failure can be catched by pattern matching the Nothing
. If it doesn't, then I have obtained a Nat
from the user.
The function vTake
, necessary for the last step, requires a SNat n
, and I have a Nat
. To promote that Nat
into a SNat n
I used the next function that takes as its first argument a function that needs to handle all possible SNat n
.
-- | Construct un SNat n from the nat n
promoteNat :: forall k . (forall (n :: Nat) . SNat n -> k) -> Nat -> k
Z = f SZ
promoteNat f S n) = promoteNat (f . SS) n promoteNat f (
I can use that SNat n
, for any value of n
, to ask the user for the vector:
-- | Read Vect of a from stdin
readVec :: Read a => SNat n -> IO (Vect n a)
SZ = pure VNil
readVec SS n) = do
readVec (<- read <$> getLine
vecElem :#) <$> readVec n (vecElem
So far I have the vector, then I can ask for m :: Nat
in a similar way as before. Once I have both n :: Nat
and m :: Nat
, I need to verify that m <= n
by constructing LTE m n
:
-- | Run the function if we can prove that n <= m
promoteLTE :: (LTE n m -> k) -> SNat n -> SNat m -> Maybe k
SZ _ = Just $ f LTEZero
promoteLTE f SS _) SZ = Nothing
promoteLTE _ (SS n) (SS m) = promoteLTE (f . LTESucc) n m promoteLTE f (
All I need to do now is put these pieces in the right order:
Obtain a SNat n
from the user
step0 :: IO ()
= do
step0 putStr "Vector length: \n ?"
<- intToNat . (read :: String -> Integer) <$> getLine
mVecLen
case promoteNat step1 <$> mVecLen of
Nothing -> putStrLn "Failed to obtain Nat"
Just action -> action
Obtain a Vect n Integer
and SNat m
-- Having obtained a valid natural n, read n integers from `Stdin`
-- Then ask for how many elements to take
step1 :: forall (n :: Nat) . SNat n -> IO ()
= do
step1 n putStrLn "Enter elemnts"
<- readVec n :: IO (Vect n Integer)
vec
putStrLn $ "Input: " ++ show vec
putStr "Number to Take: \n ?"
<- intToNat . (read :: String -> Integer) <$> getLine
mTake
case promoteNat (step2 vec n) <$> mTake of
Nothing -> putStrLn "Failed to obtain Nat"
Just action -> action
Check if m <= n
-- If the elements to take n are a valid natural then
-- Make sure it is less than then length of the vector
step2 :: Vect k Integer -> SNat k -> SNat n -> IO ()
= case promoteLTE (step3 vec) n k of
step2 vec k n Nothing -> putStrLn "Not LTE"
Just action -> action
Take the m
elements
-- If we take less than or equal elements, proceed to take.
-- No checks needed.
step3 :: Vect m Integer -> LTE n m -> IO ()
= putStrLn $ "Taken: " ++ show (vTake lte vec) step3 vec lte
And that's it.
Conclusion
Using the type safe function vTake
required to carefully check that all preconditions were met without using undefined
. So, I could say the code is safer.
On the other hand, the resulting code is more "complex" and I've transferred some of the risk to the type level functions. Those could be wrong in some cases and would require testing.
Finally, this code forbids taking more than what you have; and that could be a very important constraint to enforce in some applications.
References
Some of the techniques used in this document come from "Dependent Type Programming with Singletons" by R. A. Eisenberg and S. Weirich.
Idris' base library source code was also used for reference.
Haskell's singleton libraray's source code was also consulted.
Appendix
This is the rest of the code that drives the previous post, together with some functions that I wrote and ened up not using.
main :: IO ()
= do
main putStrLn $ "Example: " ++ show example
putStrLn $ "Take one, drop one: " ++ show skippedExample
putStrLn $ "Dropped: " ++ show droppedExample
putStrLn $ "Sliced: " ++ show slicedExample1
putStrLn $ "Eliminated: " ++ show elimintadExample
putStrLn $ "Duplicated: " ++ show duplicatedExample
putStrLn $ "First Half: " ++ show firstHalfExample
putStrLn $ "Second Half: " ++ show secondHalfExample
putStrLn $ "Halves: " ++ show halvedExample
putStrLn $ "Taked: " ++ show takedExample
step0
instance Show a => Show (Vect n a) where
show VNil = "VNil"
show (x :# xs) = show x ++ " :# " ++ show xs
printVec :: Show a => Vect n a -> String
= "{" ++ printRec v ++ "}"
printVec v
printRec :: Show a => Vect n a -> String
VNil = ""
printRec :# VNil) = show x
printRec (x :# xs@(_ :# _)) = show x ++ " ; " ++ printRec xs
printRec (x
instance Show (Slice 'Z 'Z 'VNil) where
show SliceNil = "SliceNil"
instance Show (Slice t i is) => Show (Slice ( 'S t ) i ( 'False ':# is) ) where
show (SliceRem is) = "SliceRem " ++ show is
instance Show (Slice t i is) => Show (Slice ( 'S t ) ('S i) ( 'True ':# is) ) where
show (SliceInc is) = "SliceInc " ++ show is
-- | Less than over Nat
type family LTB (n :: Nat) (m :: Nat) :: Bool where
LTB n m = LTEB ('S n) m
-- | Greater than equal over Nat
type family GTEB (n :: Nat) (m :: Nat) :: Bool where
GTEB 'Z m = 'False
GTEB ( 'S n ) 'Z = 'True
GTEB ( 'S n ) ( 'S m ) = GTEB n m
promoteVec :: SNat n -> [a] -> Maybe (Vect n a)
SZ [] = Just VNil
promoteVec SZ (_ : _) = Nothing
promoteVec SS _) [] = Nothing
promoteVec (SS n) (x : xs) = (x :#) <$> promoteVec n xs
promoteVec (
demoteSNat :: SNat n -> Nat
SZ = Z
demoteSNat SS n) = S (demoteSNat n)
demoteSNat (
natToInt :: Nat -> Integer
Z = 0
natToInt S n) = 1 + natToInt n
natToInt (
promoteLTE1 :: forall (n :: Nat) k . (forall (m :: Nat) . LTE n m -> k) -> SNat n -> Integer -> Maybe k
promoteLTE1 f n m| m < 0 = Nothing
| m == 0 = case n of
SZ -> Just $ f LTEZero
-> Nothing
_ | otherwise = case n of
SZ -> Just $ f LTEZero
SS n') -> promoteLTE1 (f . LTESucc) n' (m - 1)
(
promoteLTE2 :: forall k . (forall (n :: Nat) (m :: Nat) . LTE n m -> k) -> Integer -> Integer -> Maybe k
promoteLTE2 f n m| n < 0 || m < 0 = Nothing
| n > m = Nothing
| n == 0 = Just $ f LTEZero
| otherwise = promoteLTE2 (f . LTESucc) (n - 1) m