tehgeekmeister’s blog

December 22, 2008

fast string appending/concatenation in haskell

Filed under: programming — tehgeekmeister @ 9:58 pm

working on my graded reader project yesterday, i was really frustrated by how slow it was going.  after profiling, i realized i was spending over 95% of my time appending strings; uh oh.  i did something stupid.

the code in question was something like this:

appendToContent str page = page {pageContent = newContent}
where newContent = (pageContent page) ++ str

which looks innocent enough to the unwary — after all, you’re just using the normal haskell append operator, right?

to see why this is so bad, let’s take a look at what it’s doing:

[]    ++ ys = ys
(x:xs) ++ ys = x : (xs ++ ys)

this means that to concatenate two lists, we have to recurse thru EVERY value of the first list.  this means that you’re essentially calling (++) once for every element in the first list.  why does haskell do this?  let’s take a look at the definition of a list in haskell

data List a = [] | a : List a — (this isn’t how it’s really defined; it’s built into GHC, but this is how it’d look.)

this means that a List is either the empty list, [], or a pair of any value of type a and a list of values of type a.  this means there’s no direct way to access any element in the list but the very first one, so to concatenate two lists we have to traverse every element in the first list, deconstruct it, and reconstruct it.  that’s not so bad if you are always adding onto the front of a list, because each time you only have to traverse the new elements; but if you’re adding to the end of a list (what i was doing), you have to traverse all the old elements of the list.  and each time you append, there are now more elements to traverse the next time.

there are two ways around this:

  1. don’t append to the end of a list multiple times.
  2. if you have to, use DList or ShowS

since i was working with a String (just a list of characters in haskell), i used ShowS.  so the original code became

appendToContent str page = page {pageContent = newContent}
where newContent = (pageContent page) . showString str

the difference being that now str is of type ShowS, which is to say that for the initial value of pageContent we started with showString “” (showString is a function that takes two strings, and prepends the first to the second).  since pageContent starts as a value of type ShowS (or String -> String), we can compose it with more partial applications of showString.  when we finally are done appending, we simply apply pageContent to “” (the empty string), and it does all those appends once only, so we only traverse all the elements of each string once.

so.  if you find your haskell code involving a lot of appends/concatenations is slow, make sure you’re not doing what i did.  if you are, use the showString trick, and it’ll be much faster.  or if you’re working with lists of other types, the DList library on hackage should solve your problem in the same way.  just remember to make sure you’re composing calls, and then applying them to “”, because if you do each call one at a time it’ll be just as slow as regular appends.

Blog at WordPress.com.