tehgeekmeister’s blog

May 8, 2009


Filed under: Uncategorized — tehgeekmeister @ 8:57 pm

i plan to make a few two to how i run this place.

  • i’m going to update more often.  i’m still debating a minimal schedule, but somewhere around two to three times a week at least.  more when i’ve got more to share.
  • i’m going to have a new focus.  or rather, a focus.  before it was just whatever i happened to be interested in, now i will write about how to efficiently effectively teach yourself anything you want to know.  also, i’ll update about what i’m teaching myself, what problems i’m having, and how i’m solving them.

there will be no more focus to the site than that, aside from my interests.  i’m interested in all sorts of things, from buddhism to foreign languages to evolution to physics to computer science.  there will be bits about whatever i’m studying, but themes will come and go as i study one subject or another in more or less depth.

another goal is to dispel the myth that seems prevalent about autodidacticism, which is that an autodidact can’t learn something with the same thoroughness as those in school.  well, at least people seem to believe this about most people they know; we all accept that there are extreme cases of autodidacticism that have worked.  i admit right off the bat, most autodidacts don’t learn as thoroughly as those who do well in a school.  but i contend that this is a result of a poverty of resources and techniques available to the autodidact, of not sharing our approaches and developing better skills.  this is because, i suspect, autodidacts don’t tend to have a community of others whom they learn with or from.

expect the first such post by monday.  =D


February 6, 2009

google code project, google group, and darcs repository for WikimediaParser

Filed under: programming — Tags: , — tehgeekmeister @ 10:38 am

So if anyone wants to contribute, they can now, easily!

The google code project is here, I intend to use it only for the wiki and bug tracker.  The darcs repository is on patch-tag.  The google group, if anyone has any suggestions or anything of the sort, is here.

In any case, expect at least a few more releases and a reasonably general, clean version from me soon; but help from others is more than welcome!

February 5, 2009

releasing WikimediaParser 0.1

Filed under: programming — Tags: , , — tehgeekmeister @ 8:17 pm

It’s a cabalized version of some very rough, but functional, tools I’m using for parsing wikimedia markup.  Currently it has some french wikipedia specific code, but by release 0.2 (which should come soon) I intend to have it general enough to be used for at least any language of wikipedia (and in later releases any wikimedia markup at all).  Anyone who wants to contribute some patches and get it there quicker, feel free to submit patches using darcs send (for now, that is.  I’ll have a darcs repository up on code.haskell.org soon, but for right now I’m locked out of my account).

hackage page

January 17, 2009

what to do when you can’t solve a problem with a hackage library you need?

Filed under: programming — Tags: — tehgeekmeister @ 8:12 pm

in trying to work on my graded reader project recently I encountered two problems that were beyond my ability to solve. the first was that HDBC’s quickQuery was loading all results into memory, while fetchAllRows was not loading any rows (in fact the query was never reaching postgresql). this made it so I couldn’t use HDBC with postgresql for my project. then I decided to try takusen, but couldn’t figure out the problems with the current cabal file for it, despite quite a lot of googling. I asked around on #haskell about both of these problems, and got some help, but neither got resolved; so my question for planet haskell is what should I do when I reach this sort of an impass? i’ll gladly fix any problems I can on my own and share the wealth, but I hate being stuck when it’s absolutely beyond me.

December 22, 2008

fast string appending/concatenation in haskell

Filed under: programming — tehgeekmeister @ 9:58 pm

working on my graded reader project yesterday, i was really frustrated by how slow it was going.  after profiling, i realized i was spending over 95% of my time appending strings; uh oh.  i did something stupid.

the code in question was something like this:

appendToContent str page = page {pageContent = newContent}
where newContent = (pageContent page) ++ str

which looks innocent enough to the unwary — after all, you’re just using the normal haskell append operator, right?

to see why this is so bad, let’s take a look at what it’s doing:

[]    ++ ys = ys
(x:xs) ++ ys = x : (xs ++ ys)

this means that to concatenate two lists, we have to recurse thru EVERY value of the first list.  this means that you’re essentially calling (++) once for every element in the first list.  why does haskell do this?  let’s take a look at the definition of a list in haskell

data List a = [] | a : List a — (this isn’t how it’s really defined; it’s built into GHC, but this is how it’d look.)

this means that a List is either the empty list, [], or a pair of any value of type a and a list of values of type a.  this means there’s no direct way to access any element in the list but the very first one, so to concatenate two lists we have to traverse every element in the first list, deconstruct it, and reconstruct it.  that’s not so bad if you are always adding onto the front of a list, because each time you only have to traverse the new elements; but if you’re adding to the end of a list (what i was doing), you have to traverse all the old elements of the list.  and each time you append, there are now more elements to traverse the next time.

there are two ways around this:

  1. don’t append to the end of a list multiple times.
  2. if you have to, use DList or ShowS

since i was working with a String (just a list of characters in haskell), i used ShowS.  so the original code became

appendToContent str page = page {pageContent = newContent}
where newContent = (pageContent page) . showString str

the difference being that now str is of type ShowS, which is to say that for the initial value of pageContent we started with showString “” (showString is a function that takes two strings, and prepends the first to the second).  since pageContent starts as a value of type ShowS (or String -> String), we can compose it with more partial applications of showString.  when we finally are done appending, we simply apply pageContent to “” (the empty string), and it does all those appends once only, so we only traverse all the elements of each string once.

so.  if you find your haskell code involving a lot of appends/concatenations is slow, make sure you’re not doing what i did.  if you are, use the showString trick, and it’ll be much faster.  or if you’re working with lists of other types, the DList library on hackage should solve your problem in the same way.  just remember to make sure you’re composing calls, and then applying them to “”, because if you do each call one at a time it’ll be just as slow as regular appends.

May 26, 2008

how juries are fooled by statistics

Filed under: Uncategorized — tehgeekmeister @ 4:06 pm

many of you have likely seen this, a ted talk by peter donnelly on common misconceptions about statistics.  nonetheless i think this video is too important for any member of any modern society for me to pass up sharing it.  certainly the best explanation of how statistics can be misused, and to horrible ends, i’ve seen yet.  as well a few nice explanations of bayes theorem (but he doesn’t say that’s what they are), which is nice since that’s typically such a point of frustration for people learning statistics/probability.

May 16, 2008

car modding

Filed under: Uncategorized — tehgeekmeister @ 6:12 pm

i’ve had a strong urge over the past while to combine my interest in all things mechanical, electronic/electrical, and programming by getting an oldish (late eighties/early nineties) car and modding it to have all sorts of goodies and gadgets like newer cars do, as well as ones that newer cars in general don’t have yet.  unfortunately, i have neither the time or money to get into such involved and generally pointless endeavors at the moment — but this post is my commitment to myself that i WILL do it if i can manage it in the future.

that is all.

April 4, 2008

out of town

Filed under: Uncategorized — tehgeekmeister @ 9:16 pm

for about the next month, working on a nuclear power plant.  i might update while i’m away, but more likely i won’t.  all projects are on hold as a result also.

have fun while i’m gone!

March 23, 2008

cocoa, mnemosyne, graded reader group

Filed under: Uncategorized — Tags: , , , , , — tehgeekmeister @ 1:49 pm

i’ve been getting myself familiarized with cocoa in order to, assuming i have the free time, do a proper osx port of mnemosyne, as it currently only works via x11, which i’d like to avoid using when possible (at least in osx).  i like xcode and interface builder so far.  i’ve found that, even if they’re a bit daunting for the beginner (a whole lot of windows pop up for no apparent reason, with no obvious purpose), they cut out a lot of the busy work which would be associated with coding a gui by hand.  this is a nice introduction, and should make it a bit easier when i have to use another gui framework in the future.

i’m porting mnemosyne because i want a simple way to use spaced repetition in my various studies without having to open up vmware just to run mnemosyne.  i’ve wanted a flashcard app that uses spaced repetition for years now, but being that programming is only something i do in my free time, i’ve never quite gotten to the point where i could feasibly do it before.  anyway, it’ll be useful along with the graded reader (more news about that in the next paragraph) for learning languages, and for remembering important formulas and theorems for maths.

james tauber has started a google group and google code project for the graded reader i posted about previously.  this is exciting to me, as it’ll also be very useful to me for language learning, something i’ve long neglected and intend to pick up again shortly.  once his source is in the google code project, i’ll be contributing there as i’m able, however, i’ll likely be making a haskell clone (potentially with some different approaches of my own?  we’ll see.) of both this and mnemosyne, with the aim of integrating them into some sort of an app to aid my language learning.  i’ve wanted to do these things for a long time, but time to program and study is hard to come by for me, and so it’s slow going getting there.

also, thanks to everyone for the suggestions on how to get involved in projects!  i’ll slowly be working on grokking the various projects, looking for things i can fix, and adding documentation as i’m able.  i’ll also be taking a look at lambdabot, since someone mentioned that to me.  if i can help clean it up some, since apparently it’s been neglected, that’d be great.  getting to learn more about programming and haskell and help out the community at the same time is great.

March 18, 2008

how to grok a multi-file project?

Filed under: programming — Tags: , , — tehgeekmeister @ 12:38 am

i’ve decided that my next step in learning haskell and programming will be to get familiarized with and start contributing to a few open source projects.  the ones i’m most interested in and feel the most capable to use/contribute to in any way are yi and happs, and to get started i’ve built and looked around the source of both of them a bit: but i’ve found it very difficult to get oriented in projects of this size.  being that the most complicated things i’ve ever coded are somewhere along the lines of maybench or trivial exercises, i’m not used to figuring out where to start or how to go about understanding larger projects.  so, if anyone out there (hi planet haskell!) has any suggestions for how to get acquainted with a larger project, or can help in any other way, it’d be muchly appreciated.  even better if you’re involved in the development of either yi or happs and would be willing to answer some questions/point me in the right direction until i’m up to speed and able to contribute on my own!

p.s.: one idea i’d had was to attempt to document both projects, seeing as they both are in need of more documentation, and i’d absolutely have to become intricately familiar with the source in order to do that — but the problem remains of how to manage the complexity of a multi-file project, where one starts in order to figure out the whole contraption.

« Newer PostsOlder Posts »

Blog at WordPress.com.