Beaten to death by lightweight markup
Markdown, Textile and other lightweight markup languages are in use everywhere, but are they really the best we can do?
Over the past few years lightweight markup languages like Markdown and Textile have started to appear everywhere, often replacing some archaic terror from the blue, like BBCode. I have, for the most, seen this happening with a positive light, but recently I've started to question the sense of using these languages:
- They are often ambiguous,
- and limited compared to HTML
In this article I'll be picking on Textile, not because I think it's worse than Markdown, but because it is in my opinion better. I'm also only going to give one example of each, not because I don't have more, but because it wouldn't add or detract from the point I'm trying to make.
Ambiguous?
Perhaps the best example of how Textile can be ambiguous is when trying to add emphasis to a link, here's the correct way to do it:
*"Example":http://example.com/* OR: "*Example*":http://example.com/
And here's what I often find myself fixing:
*"Example"*:http://example.com/
Who's going to say they where wrong? It seems perfectly reasonable that this should work, except that it doesn't.
Limited?
Indeed, while a lot of lightweight markup languages make a lot of sense for basic things, paragraphs, images, links and adding emphasis, you could even create a table:
|a|table|row| |a|table|row|
That's pretty simple, and easy, but what if you need a table header? Well, you can't.
Solution?
There hasn't really been an ideal solution to the problem of providing power and ease of use at the same time. You could keep the lightweight markup features and add to it support for HTML, but then you're faced with another problem — you need to pay greater attention to sanitising user input, or you'll end up with people killing your layout with a well placed </div>, which is part of the reason why lightweight markup languages exist in the first place.
In the process of building this site I created a HTML Formatter extension for Symphony CMS that does just that — It keeps the barest essentials that you get with a lightweight markup language, but allows you to use whatever HTML you need. You can give it basically anything for input, and it'll give you back valid XML as output.
Essentially, it's a two step process:
- Run HTML Tidy over the user input,
- Apply those lightweight markup features
Of course, this is grossly simplified, since we're being careful to generate valid XML output, the entire thing must be done with DOM manipulation in PHP.
Anyhow, I'd really like to hear other peoples thoughts on this matter, is this a good solution? Too hard, or not enough?
Share your thoughts...
Max Wheeler wrote on :
Always hard to decide the best way to deal with user input, beyond creating a "perfect" WSYWIG editor this seems like a good solution.
Perhaps the problem of ambiguity can be mediated by allow people to easily preview the output of their input. JavaScript live preview perhaps?
Rowan Lewis wrote on :
Max, the editor used on the Symphony site really does help, but only for the standard formats, if you want to do something special you really need to use HTML. If you add a live–preview to that, it'd probably work pretty well, and it would also help encourage people to type valid HTML, because the preview wouldn't work if it was very broken.
I also meant to say that I don't like the idea of mixing Markdown or Textile with HTML, because to do it correctly you have to do a lot of crazy processing, consider the following user input:
When you have something written like that, you can't just use a simple regular expression to add emphasis between the two asterisks. You don't want to stop right in the middle of the
@hrefbecause that'd cause invalid HTML to be created. So instead you have to treat the document as an XML tree from the very beginning, which makes tracking the asterisks a whole lot trickier.It wouldn't be impossible, but it'd probably make the terrible Textile code look like a wonderful oasis.
ZalmaNN wrote on :
Sorry about offtopic, but Symphony CMS is very impressive! I'm going to grab my own copy :)
Dan Brendstrup wrote on :
From the Markdown documentation:
"For any markup that is not covered by Markdown’s syntax, you simply use HTML itself. There’s no need to preface it or delimit it to indicate that you’re switching from Markdown to HTML; you just use the tags."
And from _why's Textile reference:
"You can certainly use HTML tags inside your Textile documents. HTML will only be escaped if it’s found in a pre or code block."
Isn't that "an ideal solution to the problem of providing power and ease of use at the same time"? Both markup languages explicitly aim to support a subset of HTML, with full support (in any decent implementation) for breaking out into actual HTML when needed.
Rowan Lewis wrote on :
Dan, yeah, it's supposed to work, but I kept running into issues with the PHP implementations of Textile and Markdown.
Since Markdown seems to the the least buggy, we've started using that instead of Textile at work.
This solution is for people like myself, who do technical writing, and are actually quite fond of HTML.
You can take this article with a grain of salt, I wrote it after dealing with a particularly annoying issue in the Textile processor.