Syntactic Trees for Markdown - Part 1 (identifying the problem)

As a linguistics student with a special interest in syntactic theory, I have to draw a lot of syntactic trees, and often I have to draw them while taking notes on my laptop. I have to say it’s not easy. If I am taking notes in Markdown, which is a very lightweight and speed-efficient markup language, then I don’t have any sort of drawing functionality available. When I try to keep notes in the Swiss-army knife that is LaTeX, I have access to tikz-qtree, a powerful syntax-tree drawing macro. Powerful, and arcane. It’s syntax is hard enough to read/write even without the pressure of time, let alone when you are trying to follow a lecture.

To cut a long rant short, I decided that I need to find a solution to this problem that is in the spirit of Markdown, that is a kind of markup that is speed-efficient, and which looks natural and human-readable even when it’s not converted into a document like HTML+CSS, OpenDocument Text, or a PDF. That’s the goal now. Next, the tools in my disposal:

  • Pandoc Markdown
  • Python Markdown with standard extensions
  • Python’s Natural Language Toolkit (NLTK)
  • LaTeX with tikz-qtree
  • phpSyntaxTree

So, the solution I see is an extension to either Pandoc’s or Python’s Markdown flavour (both implementations support extensions), that takes some easy-to-write, easy-to-understand syntax and unambiguously converts it to the syntax used either by Python NLTK, tikz-qtree, or phpSyntaxTree to draw trees. At “compile” time (eg Markdown to HTML) this syntax is then passed to the relevant program (eg to NLTK), and the generated tree is saved as an image (preferably as an SVG file), and is embedded in the resulting document.

So my first task is to think of a Markdown-like syntax for syntactic tree generation. I already ruled out anything that uses recursive brackets like tikz-qtree and the slightly more readable phpSyntaxTree:

[XP [ZP][X' [X [tree]][YP]]]

I am thinking of something more like rewrite rules, perhaps fenced in a special kind of code-block:

~~~xbar
XP > ZP; X'
X' > X; YP
X > 'tree'
~~~

Resulting in an SVG code like:

XP ZP X’ X YP tree

<svg width="107" height="164" version="1.1" xmlns="http://www.w3.org/2000/svg">
<text style="fill: blue; font-size: 12px;" x="48" y="22">XP</text>
<text style="fill: red; font-size: 12px;" x="15" y="66">ZP</text>
<line style="stroke:rgb(0,0,0);stroke-width:1;" x1="20" y1="49" x2="53" y2="27" />
<text style="fill: blue; font-size: 12px;" x="64" y="66">X'</text>
<line style="stroke:rgb(0,0,0);stroke-width:1;" x1="68" y1="49" x2="53" y2="27" />
<text style="fill: blue; font-size: 12px;" x="49" y="110">X</text>
<line style="stroke:rgb(0,0,0);stroke-width:1;" x1="53" y1="93" x2="68" y2="71" />
<text style="fill: red; font-size: 12px;" x="80" y="110">YP</text>
<line style="stroke:rgb(0,0,0);stroke-width:1;" x1="86" y1="93" x2="68" y2="71" />
<text style="fill: red; font-size: 12px;" x="46" y="154">tree</text>
<line style="stroke:rgb(0,0,0);stroke-width:1;" x1="53" y1="137" x2="53" y2="115" />
</svg>

This SVG code was generated by phpSyntaxTree.

In any case, that’s literally the easiet tree to both draw and read, it’s just the X-bar schema. I should try and see how readable it becomes with real sentences. And then I have to worry about movement. There’s a lot of rewriting ahead.

To be continued.