I'm just keeping this here, to keep me honest about what I'm reading.
-
A Play on Regular Expressions: A Functional Pearl. Sebastian Fischer, Frank Huch, Thomas Wilke.
- Solid introduction to parsing with semirings in Haskell
- Haskell is straightforward and accessible
- Readable, breezy style
- Semirings covered: Recognition, Counting, Viterbi-Best (Maxlong, Firstlong)
-
Algebraic Foundations of Semiring Parsing. Yudong Liu
- Discusses semiring parsing as unified framework
- Unites semiring parsing and finite state transducers
- Describes a mechanical C++ framework for the pieces/parts
- Looks a lot like Barge/Barre
- Good overview, fairly accessible.
-
An Algorithm for RELAX NG Validation James Clark.
- Describes the Interleaf operator in some detail.
-
Bit Coded Regular Expression Parsing. Lasse Nielsen.
- Presentation, so short on details
- Describes bit coding as a way to do comparisons on regexs with less than machine-width alternatives
-
Bitsets Match Regular Expressions, Compactly. Paul Khuong
- Describes the Bitap algorithm for bitmap encoding
- Describes graph minimization as assuming that all states are singular, and breaking them up as we prove the assumption wrong.
-
Constructing Human Grade Parsers. Joe Karter.
- Wish-list of parser capabilities outside the actual parsing engine
- Parse errors should be recoverable
- Parser should be able to produce partial output
- Requires tooling outside the formalities of regex and semiring parsing
-
Design and Analysis of String Representation. R. Sekar.
- Describes Owens, et. al.'s DFA generator as a realization of McNaughton-Yamada
- Describes alternative search algorithms
-
Efficient and Flexible Incremental Parsing. S. L. Graham
- Describes incremental parsing, in which a document tree and parse tree are maintained and updated together.
-
Error-Correcting Parser Combinators S. Doaitse Swierstra, Pablo R. Azero Alcocer
- Haskell-based parsing language
- Describes error-handling in parsing processes
- Describes recovering from errors to continue parsing
- Uses combinators, though
-
Total Parser Combinators Guillaume Allais
- Describes a combinator-based parser with guaranteed termination
- Uses Brzozowski as its basis
- Proves that the combinator recognizer is a semiring
- Written in Agda (eek!)
-
Kleene Meets Church. Fritz Henglein
- Has the "theory vs practice" description
- Ideal: parsing + catamorphic processing
- Actual: FAs + ad-hoc instrumentation
-
On the Complexity and Performance of Parsing with Derivatives Michael D. Adams, Celeste Hollenbeck, Matthew Might
- Provides both theoretical and mechanical optimizations
- Replaces the recursion nullability with data-flow analysis
- Provides basis for Darais's implementation
-
Recognising and Generating Terms using Derivatives of Parsing Expression Grammars. Tony Garnock-Jones, Mahdi Eslamimehr, Alessandro Warth.
- Describes the derivatives of PEGs
- Describes the δPEG algorithm without backtracking for negation
- Describes the derivation of prioritized choice
-
Regular Expression Derivatives Reexamined Scott Owens, John Reppy, Aaron Turon.
- The paper that started the current project.
- Section 3.3: Using derivatives for DFA construction.
- Describes equivalence sets in the face of symbol class operators
- Describes regular vectors for multithreaded matching.
- Describes integration of complement and intersection operations
-
Regular Expressions as Types. Fritz Henglein
- Introduced the "ad-hoc" slander about PCRE. 😊
- Describes regex as a set membership problem
- Describes types systems as "regular languages"
- Clarifies emitter location for stream-processing
- On the Kleene star operation
- May require different Kleene stars, or a visitor with some context awareness.
-
Semiring Parsing. Joshua Goodman.
- Mentioned by Liu
- Covers the initial history of semiring parsing
- Describes derivations forests (see Might & Adams) in a concise way (sec 2.5.1)
- Describes Viterbi n-best as a probablistic disambiguation strategy
-
Stream Processing Using Grammars and Regular Expressions Urlik Turk Rasmussen
-
Has a great explanation of the "Regular Expressions as Types" approach.
-
Leads into regex equivalence sets
-
Discusses disambiguation strategies with equivalence sets
-
Discusses bit coding of unions (no μ-regex or e-regex)
-
(Con) Uses bit-coding as a parse tree extraction tool
-
Discusses alternative needs
- Recognition
- Matching (capture groups)
- Parsing (parse trees)
-
Discusses two-pass analysis (Might) vs one-pass (Adams)
-
Discusses difficulty of regex composition (see Conway 2018)
-
Discusses CFGs as an alternative DSL for regex composition (PEG, EBNF)
-
Is actually 5 papers; the other 4 cover algorithm's evolution
-
Fantastic overview overall
-
-
Taxonomies and Toolkits of Regular Expression Algorithms Bruce William Watson
- Solid overview
- Fairly long
- Excellent coverage of DFA minimization
- Discusses prefix/suffix scanning via Aho-Corasik, et.al.
- Cover Brzozowski's algorithm as a performance-based alternative DFA constructor
-
Yacc is Dead Matthew Might, David Darais
- Started this whole mess
- Still a pretty good introduction.