עבור לתוכן

basic c++ lexer

Featured Replies

פורסם

I want to implement a basic lexer in c++, i.e to be able to read input (strings) from either stdin or a text file, take it apart according to specific grammer rules and identify syntax errors

lex http://www.opengroup.org/onlinepubs/007908799/xcu/lex.html would've been ideal for this task, however unfortunately i'm forbidden from using it

what i'm basically asking for is either tips about how to go and implement it, a general guide conserning this topic (better) or a basic lexer source code (even better)

פורסם

hmm, from what I remember before 'lexing' you need to tokenize your text -

parsing and identifying tokens in the strings (numbers, quoted strings, keywords, names, ...)

after that phase you should have a list of tokens, each with it's data (if it's a number - its value, if it's a keyword - its text or enumeration for the type of the keyword) and optionally its location in the original code (so you can tell where the error is).

after that, you can write a function for every syntactic structure that takes a position on the list, and returns a syntax-tree node and the last position in the list (so you know what should be lexed next.

There are many shortcuts you can take depending on how complex is the language you're parsing. (e.g. it's easier if you can always tell the syntax structure by reading the first token)

פורסם
  • מחבר

ok, our terms are a bit mixed because i was refering to "lexing" as the tokenize stage you mentioned - namely take the input string and chop it up to small tokens. lets get a bit more specific - the project i'm working on should be able to read functions (such as f(x) = 5*x) and manipulate them in all kinds of ways. the lexing module i'm trying to implement should get that "f(x) = 5*x" string and cut it to "f(x)" "=" "5*x". it then passes the result to the Parser module, which is a bit more intelligent and really understands that we're trying to define a function here. that parser will not be implemented by me so it's not really my concern right now.

פורסם

Funny, I did that once (with Delphi, maybe I even still got the code)

from what I understand, even in your example, you need to break up the code to even smaller elements, "f", "(", "x", ")", "=", "5", "*", "x.

you can do the parsing using a simple state machine, with states like "number", "name", "quoted string" and process the input char by char.

when you pass between states you output the token along with its type.

פורסם

אתה צריך לבנות אוטומט (גרף) שיתאים ל-Token-ים שאתה מחפש

פשוט תגדיר מבנה ששומר רשימת שכנים והאות שמעבירה אותך לשכנים האלה וכשאתה קורא את הקובץ תתקדם על האוטומט (גרף) עד שאתה נתקע ותחזיר את מה שמתאים לזה.

ארכיון

דיון זה הועבר לארכיון ולא ניתן להוסיף בו תגובות חדשות.

דיונים חדשים