Syntax redux

Syntax redux

The other thread was getting croweded, so here's a new attempt:Taking comments into account:

  • From another thread I believe that only one statement per line is allowed
    No, we've confirmed that this is allowed:
    var x = 1; var y = 2;
  • I think you can get away with this for NumericConstant (depending on if a zero is required before the decimal point or not):NumericConstant = ([0-9]*'.')?[0-9]+
    No this does not match "1."
  • I've given all operators equal precedence [...]This makes "1 + 2 +3" illegal, but "1 + (2 + 3)" not.
    From Rama:I think it would be better to evaluate expression from right-to-left. Hope this clarifies.
  • Can identifiers be keywords? I hope not cause its requires extra lookahead:
    var sqrt = 1;
    var x = sqrt + 1;
    From Gaston: Keywords and function names not allowed as variables.

Other thoughts:

  • Because expressions can consist solely of variables (e.g. "A + B") I've made one root expression rule.

Unanswered Questions:

  • Can () be used to group string expressions? It would never have any effect, but are they allowed syntactically, e.g.:
    A + ("foo" + "bar") + string(X)
8 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Here's the actual revised syntax:Program := SourceLine*SourceLine := Statement* (Newline | Eof)Statement := (Variable | Free | Assignment | Output) ';'Newline := '\r' '\n'? | '\n'Eof := Variable := "var" Identifier '=' Expression?Free := "free" '(' Identifier ')'Assignment := Identifier '=' ExpressionOutput := "output" '(' Expression ')'Identifier := [a-zA-Z][a-zA-Z0-9]*Expression := AtomicExpr (Op AtomicExpr)*Op := [+-*/]AtomicExpr := (ParenExpr | FunctionExpr | NumericConstant | StringConstant | Identifier)ParenExpr := '(' Expression ')'FunctionExpr := PowFunction | SqrtFunction | StringFunctionPowFunction := "pow" '(' Expression ',' Expression ')'SqrtFunction := "sqrt" '(' Expression ')'StringFunction := "string" '(' Expression ')'NumericConstant = [0-9]* '.' [0-9]+ | [0-9]+ ('.' [0-9]*)?StringConstant = '"' [a-zA-Z0-9 \t]* '"'

Looking pretty good.
The StringConstant needs to be limited '{0,256}'.
Does \t (tab) need to be added to the StringConstant characters?
Identifier should exclude keywords'var', 'free', 'string', 'output', 'pow' and 'sqrt' (not sure how to express that).
What happened to the Unary operators?

Thanks for trying to provide a more formal syntax def, hopefully Intel will bless it

Unary ops were excluded by Rama.Yes strings should include tab.I'll leave it up to the reader to exclude the keywords and limit the strings, doesn't necessarily need to be part of the syntax spec.Cheers!john

variables have to be initialized at declaration, so the ('=' Expression)? isn't optional;(Rama retracted uninitialized variables shortly after introducing them.)Otherwise, this looks good! Do we know the maximum length for identifiers?

I made assignment mandatory in declaration (edited original).
I'd still like claraification on whether () grouping is allowed in string expressions. IMHO it should be allowed in the syntax, even if we never get any input examples like that, since it makes for an easier parser.
Any other comments or does this look OK?

While Rama claims that only alpha, digit and whitespace are in quotes, the sample also contains :, maybe should simplify this and assume that strings can contain anything except ["\r\n] -- its also tempting to include semicolon in the set, because if ; can't appear in strings then searching for statement boundaries is very fast.

If I were to define the rules for very fast parsing I would also make two different quotation marks. One for left quote, say " (two dit marker) and one for right quote ' (one dit marker). US keyboards do not have the open and close quotation markers.

The reason being file reads are likely going to have to be done into multiple buffers. If these are to be processed in parallel with matched quotation marks you have no way to tell if the quote marke is left or right. Sure, finding a CR/LF could give you some information but there is no limitation on line length. Therefore, it is possible to have a buffer with no line termination character and have text " more text " ... without being able to determine what is in or out of quotes until you communicate with the prior buffer parsing (and that with its prior buffer (and that ...))) IOW back to serial parsing. (at least for quote marks)

Jim Dempsey

Leave a Comment

Please sign in to add a comment. Not a member? Join today