Time to build a markdown parser and processor (MDL Log #1)

edA‑qa mort‑ora‑y - Mar 3 '19 - - Dev Community

I need to write a markdown parser and processor. My writing projects have exceeded the abilities of the tools I currently have. There's also a dearth of quality writing tools -- something I discovered while working on my book. I've finally decided I have to fix the situation for myself, and hopefully, somebody else can use it as well.

I figured the best way to start this was with a post on dev.to. Perhaps I'm procrastinating slightly, but there's a reason for this post. I want to provide the opportunity to follow the development from the start. So far I have a repository, but it's empty.

I encourage you to ask questions and to question anything you see in the project.

Requirements

The correct place to start would be a user-story. However, it's a bit awkward to talk about me in a user story. Also, my needs are somewhat clear, so the requirements are quite strict. Nonetheless, I will get back to a proper user story as I move along a bit. There is some missing motivation for some of the features.

Here are some of the key things I need:

  • All the things I do in my technical blog articles. This includes the standard formatting, including code, images, and also latex equations. (Replacing latex with something better is a long-term feature).
  • Multiple targets. I post on my own site, here on dev.to, on Medium, on my cooking site, and some writing sites. These all have different formatting and encoding requirements.
  • eBook and print ready. I was aghast at the tools available in the writing sector. It should have been possible to use basic markdown and publish a lovely looking book.

I'll expand these with user stories as time permits. It'll help you understand why I need the features, and how they should work. I'm also working on a course at Skill Share for writing user stories; I'll get back to you on that.

Architecture

I've written more parsers and tree processors than I can count. I will not be evaluating any pre-existing solutions -- I figure I've done this for over 20 years now and am still unhappy with what's there. My most recent work on Leaf had an exemplary parser structure, and I think I will mimic that.

This document system will work much like a compiler, and has these phases:

  • Tree Parser: Parse the raw document into a tree of nodes. This takes care of low-level source details and partially processes syntax. By sticking with a generic tree, we keep a lot of language details out of the parser; thus it's simpler.
  • Parse Tree Converter: From here, the parse tree is scanned and converted into an abstract syntax tree (AST). This lifts the low-level constructs into high-level syntax.
  • AST Processing: This is where a lot of the tools will be built. The input tree is annotated with more information, things like user-templates are resolved, and pieces pulled together. While the export mode can influence this stage, it remains an abstract tree.
  • Lowering/Export: The AST is exported to the final document format. This includes processing of features like syntax highlighting, latex graphic creation, upload source code to gist.

I'll provide more information on each phase as I work on it. This structure provides several distinct layers where extensions can be added.

While I'll be coding in Python, the tree parser will ultimately end up in C++. It's the most costly part of the processing -- scanning through characters one at a time is hard on interpreted or dynamic languages. My initial needs, however, are for individual docs, so, for now, the speed is of little concern.

First goal

My first goal is to get a rudimentary tree parser. This is a testable component. It's also the first in the chain and the entry point to all other components.

I'll give an update when I've completed that. In the meantime, feel free to ask questions. You can look at the repository, but it'll be empty for a while, then chaotic. I believe refactoring is more important than design.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player