> With attributes and intervals, IPGs allow the specification of data dependence as well as the dependence between control and data.
> Moreover, parser termination checking becomes possible.
> To further utilize the idea of intervals, an interval-based, monadic parser combinator library is proposed.
This sounds like a well-behaved variant. Adding local attribute references simplifies the grammar and is tractably implemented.
This might support classifying and implementing formats by severability + composability, i.e., whether you can parse one part at the same time as another, or at least find/prioritize precursor structures like indexes.
The yet-unaddressed streaming case is most interesting:
> We can first have an analysis that determines if it is possible to generate a stream parser from an IPG: within each production rule, it checks if the attribute dependency is only from left to right. After this analysis, a stream parser can be generated to parse in a bottom-up way
For parallel composition, you'd want to distinguish the attributes required by the consuming/combining (whole-assembly) operation from those only used in the part-parsing operation to plan the interfaces.
Aside from their mid-level parser-combinators, you might want some binary-specific lowering operations (as they did with Int) specific to your target architecture and binary encodings.
For the overall architecture it seems wise for flatbuffers et al to expressly avoid unbounded hierarchy. Perhaps three phases (prelude+split, process, merge+finish) would be more manageable than fully-general dependency stages possible with arbitrary attribute dependencies.
I would hate to see a parser technology discounted because it doesn't handle the crap of PDF or even MS xml. I'd be very interested in a language that could constrain/direct us to more performant data formats, particularly for data archives like genomics or semantics where an archive-resident index can avoid full-archive scans in most use-cases.
For PDF, that's fair. Video "Types of PDF - Computerphile" covers this: https://www.youtube.com/watch?v=K7oxZCgO1dY
To be fair, the ability to stick a ZIP file at the end of any other kind of file enables all sorts of neat tricks (like the old self-extracting zips).
[0] https://github.com/sealmove/binarylang
[1] https://github.com/khaledh/elfdump/blob/master/elfparse.nim
I guess that's good for preventing off-by-one-based parsing errors, but surely there's prior art from long ago.
I once asked a question related to this on the computer science stack overflow:
https://cs.stackexchange.com/q/60718
Would someone like to add this as an answer?