Getting Started

A parser reads text to discover structure and meaning. For example, a C language parser can read a C program and understand in a real sense everything that the program has to say.

Contrast this to a pattern matcher, such as regular-expression matching, which can find fragments of a program useful in editing but can't keep track of enough context to make sense of a whole program.

Exploratory parsing lets you divide string matches into those you expect and others. See this example developed step by step. post

We often use the unix grep utility to look through large files. By applying a regular-expression match to each line, grep is able to report just the lines of interest.

When we allow ourselves to grep repeatedly, driven by our curiosity, responding to each answer grep provides with another question, when we do this we are exploring. The internet is full of text that defies understanding in any sense with simple pattern matching. In response AboutUs built an environment for exploring the internet interactively, using parsers constructed on a whim, returning matches in within the context described by the explorer.

Get the Exploratory Parser

We've been using a tool made out of two parts, both of them available to other programmers under AboutUs on GitHub. One is our fork of Ian Piumarta's peg/leg parser generator. The other is our parsing experiment management system.

The parser generator is written in C and could be rough going for programmers who haven't studied compilers at some point in their lives. We've only modified peg/leg as we found our unusual approach to parsing was not anticipated by Ian. Ian provides documentation on his website.

The experiment manager is a web application written in Ruby to run under Mac or Unix. We run it on our laptops and in Amazon's EC2 cloud. We've described how we install it in our GitHub ReadMe. Your Mileage May Vary.

Thinking Different

We think we've opened up a whole new way to use technology. This can happen when one takes some assumed requirement and reverse it.

Parser generators have traditionally been used to describe exactly what should be written and anything else is a "syntax error". Wiki allows writers to write what they think makes sense.

With exploratory parsing we now have a way for the parser writer to discover what has been written after the fact. This inversion of control mirrors the original thinking behind wiki.

Let those who know write as they see fit. Trust people to be regular enough to create lasting value. Use the power of our modern computers and networks to organize that value.