tehgeekmeister’s blog

December 22, 2008

fast string appending/concatenation in haskell

Filed under: programming — tehgeekmeister @ 9:58 pm

working on my graded reader project yesterday, i was really frustrated by how slow it was going.  after profiling, i realized i was spending over 95% of my time appending strings; uh oh.  i did something stupid.

the code in question was something like this:

appendToContent str page = page {pageContent = newContent}
where newContent = (pageContent page) ++ str

which looks innocent enough to the unwary — after all, you’re just using the normal haskell append operator, right?

to see why this is so bad, let’s take a look at what it’s doing:

[]    ++ ys = ys
(x:xs) ++ ys = x : (xs ++ ys)

this means that to concatenate two lists, we have to recurse thru EVERY value of the first list.  this means that you’re essentially calling (++) once for every element in the first list.  why does haskell do this?  let’s take a look at the definition of a list in haskell

data List a = [] | a : List a — (this isn’t how it’s really defined; it’s built into GHC, but this is how it’d look.)

this means that a List is either the empty list, [], or a pair of any value of type a and a list of values of type a.  this means there’s no direct way to access any element in the list but the very first one, so to concatenate two lists we have to traverse every element in the first list, deconstruct it, and reconstruct it.  that’s not so bad if you are always adding onto the front of a list, because each time you only have to traverse the new elements; but if you’re adding to the end of a list (what i was doing), you have to traverse all the old elements of the list.  and each time you append, there are now more elements to traverse the next time.

there are two ways around this:

  1. don’t append to the end of a list multiple times.
  2. if you have to, use DList or ShowS

since i was working with a String (just a list of characters in haskell), i used ShowS.  so the original code became

appendToContent str page = page {pageContent = newContent}
where newContent = (pageContent page) . showString str

the difference being that now str is of type ShowS, which is to say that for the initial value of pageContent we started with showString “” (showString is a function that takes two strings, and prepends the first to the second).  since pageContent starts as a value of type ShowS (or String -> String), we can compose it with more partial applications of showString.  when we finally are done appending, we simply apply pageContent to “” (the empty string), and it does all those appends once only, so we only traverse all the elements of each string once.

so.  if you find your haskell code involving a lot of appends/concatenations is slow, make sure you’re not doing what i did.  if you are, use the showString trick, and it’ll be much faster.  or if you’re working with lists of other types, the DList library on hackage should solve your problem in the same way.  just remember to make sure you’re composing calls, and then applying them to “”, because if you do each call one at a time it’ll be just as slow as regular appends.

About these ads

15 Comments »

  1. This seems like a fairly nasty gotcha to me. The kind that could impress a naive or simply impatient user with a negative view of haskell’s performance. =\

    Comment by James William Pye — December 22, 2008 @ 11:22 pm

  2. James: yeah, it is kinda nasty. but we get lots of other cool stuff from the nature of our list type, and i think it’s a good trade. this is really why i made the post, so others could see how to solve the problem if they run into it; when i was trying to solve it i had to talk to people on #haskell, which, while a great resource, shouldn’t be necessary for something that’s so common as this.

    Comment by tehgeekmeister — December 22, 2008 @ 11:40 pm

  3. Nasty? No. All languages have the same problem, c’mon. Never saw

    “”.join(strings)

    in Python code, for example?

    Comment by Felipe — December 23, 2008 @ 4:10 am

  4. Another option would be to use Data.Sequence. But maybe that wasn’t an option for you if you used libraries which used lists?

    Comment by Josef Svenningsson — December 23, 2008 @ 4:12 am

  5. You should avoid intensive string concatenation where possible in *all* languages. Read the article http://en.wikipedia.org/wiki/Schlemiel_the_painter%27s_Algorithm .

    By the way, in Haskell string concatenation is less of a performance hog because of its laziness: for example, completely evaluating s1 ++ s2 will only take as much time and memory as needed to scan s1 and then scan s2.

    Consider the implementation of (++):

    [] ++ bs = bs
    (a:as) ++ bs = a:(as++bs)

    So, to know the next element of (as++bs), you have to check whether ‘as’ is empty.

    However, for ((s1 ++ s2) ++ s3) ++ s4, to know its very first element, you have to check whether (s1++s2)++s3 is empty, for that you have to check whether (s1++s2) is empty, and for that you have to check whether s1 is empty. And these 4 operations continue to happen until s1 ends; after that you have 3 operations until s2 ends etc.

    So, fully evaluating the contatenation of N strings of lengths L1,L2,..,LN in a *LEFT*-fold fashion takes N*L1 + (N-1)*L2 + … steps.

    Note that if you concatenate from the *RIGHT*, you will have absolutely no problems: completely evaluating s1 ++ (s2 ++ (s3 ++ s4)) takes only as much time as needed to scan s1, then scan s2, then s3, then s4.

    Comment by Eugene Kirpichov — December 23, 2008 @ 11:37 am

  6. Note that showString *is* ++. http://haskell.org/onlinereport/standard-prelude.html#$vshowString

    Your second definition is “newContent = (pageContent page) . (str ++)”, which is better for the reasons that Eugene Kirpichov notes.

    Comment by josh — December 23, 2008 @ 3:26 pm

  7. Have you ever considered about including a little bit more than
    just your articles? I mean, what you say

    is important and all. Nevertheless think about if you
    added some great visuals or video clips to give your posts
    more,

    “pop”! Your content is excellent but with pics
    and videos, this website could

    undeniably be one of the very best in its niche. Wonderful
    blog!

    Comment by http://forum.bloodmoon.dk/member.php?u=34566 — February 10, 2013 @ 5:02 pm

  8. As I web site possessor I believe the content material here is rattling great , appreciate it for your efforts.

    You should keep it up forever! Good Luck.

    Comment by Roseann — February 20, 2013 @ 6:27 pm

  9. Wow, superb weblog format! How long have you been running a blog for?

    you made blogging

    look easy. The overall look of your web site is magnificent, let alone the content!

    Comment by spain football kit jjb — April 21, 2013 @ 1:27 pm

  10. My spouse and I absolutely love your blog and find most of your post’s to be just what I’m

    looking for. Would you offer guest writers to write content in your case?
    I wouldn’t mind composing

    a post or elaborating on most of the subjects you write

    concerning here. Again, awesome site!

    Comment by Chang — June 28, 2013 @ 3:01 pm

  11. I wish to convey my passion for your generosity for folks who need
    guidance on this one question. Your real commitment to getting the solution up and down was wonderfully

    practical and have without

    exception allowed regular people much like me to arrive at their objectives.
    Your own valuable guideline implies this much a person like me and especially to my colleagues.
    Warm regards; from everyone of us.

    Comment by Wilford — July 8, 2013 @ 9:10 am

  12. I must show my love for your kind-heartedness in support of those who actually need assistance with that
    field. Your real commitment to getting the solution all through
    became exceedingly interesting and have consistently allowed girls just like me to arrive at their goals.

    The valuable key points signifies a lot to me and somewhat more to my office colleagues.

    Regards; from everyone of us.

    Comment by sex w rodzinie opowiadania — July 17, 2013 @ 3:26 pm

  13. I not to mention my buddies were actually reading through the good
    thoughts found on your site while all of a sudden I had a horrible feeling I had not thanked the blog owner for them.
    My young men are actually consequently joyful to read through all
    of them and have now surely been tapping into those things.

    Appreciation for truly being very considerate as well as for finding some incredibly good subjects millions of individuals are really eager to be informed on.

    Our honest regret for not saying thanks to you earlier.

    Comment by http://verena-wartmann.at/blog/?p=316 — August 5, 2013 @ 7:38 pm

  14. That is really interesting, You’re an overly skilled blogger.
    I have joined your feed and look ahead to looking
    for extra of your

    wonderful post. Also, I have shared your site in

    my social networks!

    Comment by loreanwilliams89 — September 4, 2013 @ 7:09 pm

  15. Greetings from Florida! I’m bored to death at

    work so I decided to browse your site on my iphone during lunch break.
    I

    love the knowledge you present here and can’t
    wait to take a look when I get home. I’m

    surprised at how quick your blog loaded on my cell phone ..

    I’m not even using WIFI,

    just 3G .. Anyhow, good blog!

    Comment by Kam — January 4, 2014 @ 9:54 am


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Silver is the New Black Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: