Josef “Jeff” Sipek

Post Formats in blahgd

After reading What creates a good wikitext dialect depends on how it’s going to be used, I decided to write a short post about how I handled the changing needs that my blahg experienced.

One of the things that I implemented in my blogging software a long time ago was the support for different flavors of markup. This allowed me to switch to a “saner” markup without revisiting all the previous entries. Since every post already has metadata (title, publication time, etc.) it was easy enough to add another field (“fmt”) which in my case is just a simple integer identifying how the post contents should be parsed.

Over the years, there have been four formats:

fmt 0 (removed)
Wordpress compat
fmt 1
“Improved” Wordpress format
fmt 2
raw html
fmt 3
LaTeX-like (my current format of choice)

The formats follow a pretty natural progression.

It all started in January 2009 as an import of about 250 posts from Wordpress. Wordpress stores the posts in a html-esque format. At the very least, it inserts line and paragraph breaks. If I remember correctly, one newline is a line break and two newlines are a paragraph break, but otherwise is passes along HTML more or less unchanged. Suffice to say, I was not in the mood to rewrite all the posts in a different format right away so I implemented something that resembled the Wordpress behavior. I did eventually end up converting these posts to a more modern format and then I removed support for this one.

The next format (fmt 1) was a natural evolution of fmt 0. One thing that drove me nuts about fmt 0 was the handling of line breaks. Since the whole point of blahgd was to allow me to compose entries in a text editor (vim if you must know) and I like my text files to be word wrapped, the transformation of every newline to <br/> was quite annoying. (It would result in jagged lines in the browser!) So, about a month later (February 2009), I made a trivial change to the parsing code to treat a single newline as just whitespace. I called this changed up parser fmt 1. (There are currently 24 posts using this format.)

A couple of months after I added fmt 1, I came to the conclusion that in some cases I just wanted to write raw HTML. And so fmt 2 was born (April 2009). (There are currently 5 posts using this format.)

After using fmt 2 for about a year and a half, I concluded that writing raw HTML is not the way to go. Any time I wanted to change the rendering of a certain thing (e.g., switching between <strong> and <b>), I had to revisit every post. Since I am a big fan of LaTeX, I thought it would be cool to have a LaTeX-like markup. It took a couple of false starts spanning about six months but eventually (February 2011) I got the lex and yacc files working well enough. (There are currently 422 posts using this format.)

While I am reasonably happy with fmt 3, but I do see some limitations that I’d like to address. This will inevitably lead to fmt 4. I am hoping to make a separate post about the fmt 3 limitations and how fmt 4 will address them sometime in the (hopefully) near future.

Powered by blahgd