I'm just keeping this here, to keep me honest about what I'm reading.

  • A Play on Regular Expressions: A Functional Pearl. Sebastian Fischer, Frank Huch, Thomas Wilke.

    • Solid introduction to parsing with semirings in Haskell
    • Haskell is straightforward and accessible
    • Readable, breezy style
    • Semirings covered: Recognition, Counting, Viterbi-Best (Maxlong, Firstlong)
  • Algebraic Foundations of Semiring Parsing. Yudong Liu

    • Discusses semiring parsing as unified framework
    • Unites semiring parsing and finite state transducers
    • Describes a mechanical C++ framework for the pieces/parts
    • Looks a lot like Barge/Barre
    • Good overview, fairly accessible.
  • An Algorithm for RELAX NG Validation James Clark.

    • Describes the Interleaf operator in some detail.
  • Bit Coded Regular Expression Parsing. Lasse Nielsen.

    • Presentation, so short on details
    • Describes bit coding as a way to do comparisons on regexs with less than machine-width alternatives
  • Bitsets Match Regular Expressions, Compactly. Paul Khuong

    • Describes the Bitap algorithm for bitmap encoding
    • Describes graph minimization as assuming that all states are singular, and breaking them up as we prove the assumption wrong.
  • Constructing Human Grade Parsers. Joe Karter.

    • Wish-list of parser capabilities outside the actual parsing engine
    • Parse errors should be recoverable
    • Parser should be able to produce partial output
    • Requires tooling outside the formalities of regex and semiring parsing
  • Design and Analysis of String Representation. R. Sekar.

    • Describes Owens, et. al.'s DFA generator as a realization of McNaughton-Yamada
    • Describes alternative search algorithms
  • Efficient and Flexible Incremental Parsing. S. L. Graham

    • Describes incremental parsing, in which a document tree and parse tree are maintained and updated together.
  • Error-Correcting Parser Combinators S. Doaitse Swierstra, Pablo R. Azero Alcocer

    • Haskell-based parsing language
    • Describes error-handling in parsing processes
    • Describes recovering from errors to continue parsing
    • Uses combinators, though
  • Total Parser Combinators Guillaume Allais

    • Describes a combinator-based parser with guaranteed termination
    • Uses Brzozowski as its basis
    • Proves that the combinator recognizer is a semiring
    • Written in Agda (eek!)
  • Kleene Meets Church. Fritz Henglein

    • Has the "theory vs practice" description
    • Ideal: parsing + catamorphic processing
    • Actual: FAs + ad-hoc instrumentation
  • On the Complexity and Performance of Parsing with Derivatives Michael D. Adams, Celeste Hollenbeck, Matthew Might

    • Provides both theoretical and mechanical optimizations
    • Replaces the recursion nullability with data-flow analysis
    • Provides basis for Darais's implementation
  • Recognising and Generating Terms using Derivatives of Parsing Expression Grammars. Tony Garnock-Jones, Mahdi Eslamimehr, Alessandro Warth.

    • Describes the derivatives of PEGs
    • Describes the δPEG algorithm without backtracking for negation
    • Describes the derivation of prioritized choice
  • Regular Expression Derivatives Reexamined Scott Owens, John Reppy, Aaron Turon.

    • The paper that started the current project.
    • Section 3.3: Using derivatives for DFA construction.
    • Describes equivalence sets in the face of symbol class operators
    • Describes regular vectors for multithreaded matching.
    • Describes integration of complement and intersection operations
  • Regular Expressions as Types. Fritz Henglein

    • Introduced the "ad-hoc" slander about PCRE. 😊
    • Describes regex as a set membership problem
    • Describes types systems as "regular languages"
    • Clarifies emitter location for stream-processing
    • On the Kleene star operation
    • May require different Kleene stars, or a visitor with some context awareness.
  • Semiring Parsing. Joshua Goodman.

    • Mentioned by Liu
    • Covers the initial history of semiring parsing
    • Describes derivations forests (see Might & Adams) in a concise way (sec 2.5.1)
    • Describes Viterbi n-best as a probablistic disambiguation strategy
  • Stream Processing Using Grammars and Regular Expressions Urlik Turk Rasmussen

    • Has a great explanation of the "Regular Expressions as Types" approach.

    • Leads into regex equivalence sets

    • Discusses disambiguation strategies with equivalence sets

    • Discusses bit coding of unions (no μ-regex or e-regex)

    • (Con) Uses bit-coding as a parse tree extraction tool

    • Discusses alternative needs

      • Recognition
      • Matching (capture groups)
      • Parsing (parse trees)
    • Discusses two-pass analysis (Might) vs one-pass (Adams)

    • Discusses difficulty of regex composition (see Conway 2018)

    • Discusses CFGs as an alternative DSL for regex composition (PEG, EBNF)

    • Is actually 5 papers; the other 4 cover algorithm's evolution

    • Fantastic overview overall

  • Taxonomies and Toolkits of Regular Expression Algorithms Bruce William Watson

    • Solid overview
    • Fairly long
    • Excellent coverage of DFA minimization
    • Discusses prefix/suffix scanning via Aho-Corasik, et.al.
    • Cover Brzozowski's algorithm as a performance-based alternative DFA constructor
  • Yacc is Dead Matthew Might, David Darais

    • Started this whole mess
    • Still a pretty good introduction.