Posted on

Table of Contents

these are some quick notes for my own reference of the peglib readme

  • PEG uses both / and |
  • A/B indicates a prioritized choice, meaning A takes precedence over B
  • | is only used as a delimiter between terminal character strings
  • ",',` are used to encase literals
  • the ?, *, + operators exist and consume maximally
  • . is a wild card
  • character classes are denoted by []
  • &, ! are syntactic predicates, &A consumes a character if pattern A is matchable, !A consumes if pattern A isn't matchable.
  • () can be used as embedded options. i.e (A/B)C matches A C and B C
  • <> is used to denote token boundaries, for example if you want to use multiple regexp tokens.

using the library

there is a parser type that takes in a grammar string for its constructorR"(...)" every rule consists of a line of the form foo <- bar

then for every variable in the grammar you can set up a lambda

// variable <- A \ B
parser["variable"]=[](const SemanticValues& vs){
    switch (vs.choice()){
        case 0: // the first match, A
            break;
        default: // B
            break;
    }
};

you can set up an error handler with

parser.set_logger([](size_t line, size_t col, const string& msg, const string& rule)){...}

it will then be called when parsing fails. You can set a specific error message by adding {error_msg "foo"} after a rule. error messages have the %t and %c placeholders for tokens and characters that the parser fails on.

to make an ast

use

parser.enable_ast();
shared_ptr<peg::Ast> ast;
parser.parse(text, ast);

There is also a ast_to_s(ast) method. By default the ast includes every variable the parser goes through in generating the provided source code. To make the tree more minimal we can use parser.optimize_ast(ast) and we can flag certain variable in our grammar not to be optimized out using a {no_ast_opt} flag.

using the AST

each ast node has the following members:

const std::string path;
const size_t line = 1;
const size_t column = 1;

const std::string name;
size_t position;
size_t length;
const size_t choice_count;
const size_t choice;
const std::string original_name;
const size_t original_choice_count;
const size_t original_choice;
const unsigned int tag;
const unsigned int original_tag;

const bool is_token;
const std::string_view token;

std::vector<std::shared_ptr<AstBase<Annotation>>> nodes;
std::weak_ptr<AstBase<Annotation>> parent;

std::string token_to_string() const {
assert(is_token);
return std::string(token);
}

template <typename T> T token_to_number() const {
return token_to_number_<T>(token);
}