External Content not shown. Further Information.
Arrays in Haskell can be a pain. Monads are a nice thing, but for some things, I would prefer linear types. There are so-called DiffArrays in Haskell, which are, for no apparent reason, ridiculously slow.
As a quick reminder: A DiffArray maintains a history of changes, which should be only linear overhead and when using it linearly only, most of the memory overhead should be quite short-lived. But it isn't.
My idea now is different: Using an array only linearly should be trivial to check when having an AST (and I wonder why nobody has written such a thing yet which just replaces linearily used DiffArrays with unsafe operations). But other than overwriting the old array with a difference, I just mark it as invalid:
import System.IO.Unsafe
import Data.IORef
import Data.Array.IO
data LinearArray_ = Invalid | Valid (IOUArray Int Int)
type LinearArray = IORef LinearArray_
Creating new Arrays is straightforward:
newArray :: (Int, Int) -> Int -> LinearArray
newArray (start, end) init =
unsafePerformIO $
do
t <- Data.Array.IO.newArray (start, end) init
r <- newIORef (Valid t)
return $ r
Overwriting an array does the following: It takes the array from the
old ref and overwrites it accordingly. Then the old ref is overwritten
and marked Invalid
, and a new array is returned. Of course, if the
old ref was Invalid
already, an error is rised:
overwriteArray :: LinearArray -> Int -> Int -> LinearArray
overwriteArray old index value =
unsafePerformIO $
do ior <- readIORef old
case ior of
Invalid -> error "nonlinear use of linear array!"
Valid a -> do
writeArray a index value
writeIORef old Invalid
new <- newIORef (Valid a)
return new
Trying to read from an invalid array will do the same:
readArray :: LinearArray -> Int -> Int
readArray arr index =
unsafePerformIO $
do ior <- readIORef arr
case ior of
Invalid -> error "nonlinear use of linear array!"
Valid a -> do
r <- Data.Array.IO.readArray a index
return r
This approach has several caveeats. For example, in most functional languages, one can just overload a name. In Haskell, a code like
let {a = []} in let {a = 1 : a} in a
produces an infinite list of ones: By default, a refers to itself, and all definitions are recursive. But name overloading is something that comes in handy when using linear arrays. Another caveeat is lazy evaluation. The following code "surprisingly" works:
a :: Int
a = let arr1 = Main.newArray (0, 10) 0
arr2 = overwriteArray arr1 9 1
in Main.readArray arr1 9
That is because arr2 is never evaluated. However, activating {-#
LANGUAGE BangPatterns #-}
and using
a :: Int
a = let arr1 = Main.newArray (0, 10) 0
!arr2 = overwriteArray arr1 9 1
in Main.readArray arr1 9
will throw an error. Notice that both are wrong, in the sense that
the specification of a linear array requires linear occurences only,
but arr1
occurs twice. It is only that there is no error in the one
case.
My benchmarks show that this kind of array is pretty fast. Now, while feeling a lynchmob of Haskell-fanboys forming, let me add that I would not recommend to actually use this. I would just wish that Haskell had support for uniqueness types – or at least automatic optimization of linearly used immutable arrays or DiffArrays in that manner.
Because it should be quite easy to find out whether an array occurs only once in a term, and all its decendants are also only used once. And the benefits would be really huge, in my opinion.