Text formatting, an experiment with functional languages

Ever since I started writing Bitter, I've been wondering how to apply functional programming concepts to creating a better text formatter.

Before continuing, make sure you have read my introduction to the Bitter syntax highlighter as it contains some important ideas on which I will be elaborating in this article.

For now I'm going to call this idea for a language 'Terror', as it's a text formatter and probably going to be full of errors.

What would it look like?

Obviously the syntax cannot be identical to Bitter, as all Bitter needs to do is wrap pieces of text with 'tags'. Here, we actually need a way to define HTML. So I've introduced three new functions:

element(name)
Define a new element
attribute(name, value)
Define a new attribute
text(value)
Define a new text node

Each name or value accepts a string, possibly containing references to matched data: attribute('class', '$1').

<?php
	
	Terror::rule(
		Terror::id('text'),
		
		Terror::id('text-headers')
	);
	
	Terror::rule(
		Terror::id('text-headers'),
		Terror::match('^(h[0-6])\.\s*[^\r\n]+'),
		Terror::element('$1'),
		
		Terror::rule(
			Terror::id('text-header-text'),
			Terror::match('[^\.]+\.\s*(.*)'),
			Terror::text('$1')
		)
	);
	
?>

So, as you can see, this example looks for any Textile style headers, and creates elements to represent them.

If you where to pass it this text:

h1. Heading one

h2. Heading two

Then you'd end up with this output:

<h1>Heading one</h1>

<h2>Heading two</h2>

Pretty straight forward, huh? All you do is tell it to match a particular 'syntax' and what to output it as, everything else is automatic.

It's not all easy

There's one obvious issue with this method: not enough logic. The example above assumes you want your headers to be as they where written in the text, there's no shortcut way to offset the header by one: h1 + 1 = h2.

Instead you'd have to write individual rules for each of the possible states. Not an ideal solution in the least:

<?php
	
	Terror::rule(
		Terror::id('textile-headers'),
		Terror::match('^h[0-6]\.\s*[^\r\n]+'),
		
		Terror::rule(
			Terror::id('textile-header-one'),
			Terror::match('^h1\.\s*[^\r\n]+'),
			Terror::element('h2'),
			
			Terror::id('textile-header-text')
		)
	);
	
?>

It may be possible to instead use XPath to match captures:

<?php
	
	Terror::rule(
		Terror::id('text-headers'),
		Terror::match('^h([0-6])\.\s*[^\r\n]+'),
		Terror::element('h{capture-1 + 1}'),
		
		Terror::rule(
			Terror::id('text-header-text'),
			Terror::match('[^\.]+\.\s*(.*)'),
			Terror::text('{capture-1}')
		)
	);
	
?>

This poses it's own problems however, since there would be a considerable amount of processing needed to convert an array of matches into XML which can then be used to execute XPath expressions against.

But then again, it'd be better than introducing yet another non–standard language.

Bite me

So, that's what I'd like to create, when I have the time. My ideas are not perfectly brilliant, nor really perfect or brilliant, so please, share your thoughts.

Share your thoughts...

Nils Hörrmann wrote on :

Rowan, are you still working on this? Is this something that could be turned into a Symphony master text–formatter offering a new, customizable parser for Markdown or Textile?