Information is not knowledge, knowledge is not wisdom, wisdom is not truth. (Frank Zappa)

External Content not shown. Further Information.

Arrays in Haskell can be a pain. Monads are a nice thing, but for some things, I would prefer linear types. There are so-called DiffArrays in Haskell, which are, for no apparent reason, ridiculously slow.

As a quick reminder: A DiffArray maintains a history of changes, which should be only linear overhead and when using it linearly only, most of the memory overhead should be quite short-lived. But it isn't.

My idea now is different: Using an array only linearly should be trivial to check when having an AST (and I wonder why nobody has written such a thing yet which just replaces linearily used DiffArrays with unsafe operations). But other than overwriting the old array with a difference, I just mark it as invalid:

import System.IO.Unsafe
import Data.IORef
import Data.Array.IO

data LinearArray_ = Invalid | Valid (IOUArray Int Int)
type LinearArray = IORef LinearArray_

Creating new Arrays is straightforward:

newArray :: (Int, Int) -> Int -> LinearArray
newArray (start, end) init =
    unsafePerformIO $
    do
      t <- Data.Array.IO.newArray (start, end) init
      r <- newIORef (Valid t)
      return $ r

Overwriting an array does the following: It takes the array from the old ref and overwrites it accordingly. Then the old ref is overwritten and marked Invalid, and a new array is returned. Of course, if the old ref was Invalid already, an error is rised:

overwriteArray :: LinearArray -> Int -> Int -> LinearArray
overwriteArray old index value =
    unsafePerformIO $
    do ior <- readIORef old
       case ior of
         Invalid -> error "nonlinear use of linear array!"
         Valid a -> do
                  writeArray a index value
                  writeIORef old Invalid
                  new <- newIORef (Valid a)
                  return new

Trying to read from an invalid array will do the same:

readArray :: LinearArray -> Int -> Int
readArray arr index =
    unsafePerformIO $
    do ior <- readIORef arr
       case ior of
         Invalid -> error "nonlinear use of linear array!"
         Valid a -> do
                  r <- Data.Array.IO.readArray a index
                  return r

This approach has several caveeats. For example, in most functional languages, one can just overload a name. In Haskell, a code like

let {a = []} in let {a = 1 : a} in a

produces an infinite list of ones: By default, a refers to itself, and all definitions are recursive. But name overloading is something that comes in handy when using linear arrays. Another caveeat is lazy evaluation. The following code "surprisingly" works:

a :: Int
a = let arr1 = Main.newArray (0, 10) 0
        arr2 = overwriteArray arr1 9 1
    in Main.readArray arr1 9

That is because arr2 is never evaluated. However, activating {-# LANGUAGE BangPatterns #-} and using

a :: Int
a = let arr1 = Main.newArray (0, 10) 0
        !arr2 = overwriteArray arr1 9 1
    in Main.readArray arr1 9

will throw an error. Notice that both are wrong, in the sense that the specification of a linear array requires linear occurences only, but arr1 occurs twice. It is only that there is no error in the one case.

My benchmarks show that this kind of array is pretty fast. Now, while feeling a lynchmob of Haskell-fanboys forming, let me add that I would not recommend to actually use this. I would just wish that Haskell had support for uniqueness types – or at least automatic optimization of linearly used immutable arrays or DiffArrays in that manner.

Because it should be quite easy to find out whether an array occurs only once in a term, and all its decendants are also only used once. And the benefits would be really huge, in my opinion.