Hacker Read

lispm | karma 12407 | avg karma 2.21 · 2019-09-25 16:18:02+00:00

The only restriction is that it is a s-expression: a possibly nested list of data. There is no particular programming language syntax it is parsed against - it's parsed as data.

For example this is a valid Lisp macro form

    (loop for i below 10 and j downfrom 20
          when (= (+ i j) 9) sum i into isum of-type integer
          finally (return (+ j isum)))

    It returns 10.

The LOOP macro parses it on its own - it just sees a list of data. Lisp has other than that no idea what the tokens mean and what syntax the LOOP macro accepts.

    CL-USER 142 > (defmacro my-macro (&rest stuff)
                     t)
    MY-MACRO

    CL-USER 143 > (my-macro we are going to Europe and have a good time)
    T

Works.

kazinator | karma 30751 | avg karma 1.78 · 2019-09-25 16:27:38+00:00

The data has an AST. The AST is made of conses, and atoms. The atoms have a large variety of types. They are not "tokens", but objects. Tokens only exist briefly inside the Lisp reader. 123 and |123| are tokens; one converts to an integer node, one to a symbol node. (1 (2 3)) has an AST which is

    CONS
   /      \
  FIXNUM   CONS
  |        /   \
  1      CONS   SYMBOL
        /     \     \ 
      FIXNUM   CONS  NIL
        |      /   \
        2   FIXNUM  SYMBOL
              |        |
              3        NIL

lispm | karma 12407 | avg karma 2.21 · 2019-09-25 18:41:51+00:00

> The data has an AST.

(not (eq 'has 'is))

But for an Abstract Syntax Tree for code we have more categories: function, operator, call, control structure, variable, class, ...

reply

thsealienbstrds | karma 152 | avg karma 1.63 · 2019-09-25 20:31:33+00:00

Fair enough I guess. If the Lisp macro system only sees parsed s-exps, in principle you still need to figure out for yourself what kind of construct you are dealing with on a higher level of abstraction. Nim's macro system seems to be operating on a higher level of abstraction in that regard.

kazinator | karma 30751 | avg karma 1.78 · 2019-09-25 23:22:42+00:00

That kind of tree is stipulated a particular form of compiler (well, parser) writing dogma revolving around C++/Java style OOP. You set up classes with inheritance representing various node types; everything has methods, and "visitors" walk the tree, and various nonsense like that.

If our code walker dispatches on pattern matches on the nested list structure of conses and atoms, we don't need that sort of encapsulated data structuring. The shapes of the patterns are de facto the higher level AST nodes. The code walking pattern case that recognizes (if test [then [else]]) is in fact working with an "if node". That AST node isn't defined in a rigid structure in the underlying data, but it's defined in the pattern that is applied to it which imposes a schema on the data.

If that's not an AST node, that's like saying that (1 95000) isn't an employee record; only #S(employee id 1 salary 95000) is an employee record because it has a proper type with a name, and named fields, whereas (1 95000) "could be anything".

reply

lispm | karma 12407 | avg karma 2.21 · 2019-09-26 05:32:30+00:00

Syntax trees are no nonsense and go way back before Java and visitors were hip.

The code walker is just another parser. It needs to know the Lisp syntax. It needs to know which parts of a LET form is a binding list, what a binding list looks like, it needs to know where declarations are and where the code body ist. It can then recognize variables, declarations, calls, literal data, etc, It needs to know the scope of the variables etc. Nothing of that is encoded in the LET form (since it is no AST), and needs to be determined by the code walker. Actually that's one of the most important uses: finding out what things are in the source code. Lisp does not offer us that information. That's why we need an additional tool. A code walker may or may not construct an AST.

No, (1 950000) is not an employee record. Only your interpretation makes it one. Other than that our default Lisp interpretation based on s-expressions: it's a list of two numbers. In terms of the machine it's cons cells, numbers, nil. Without further context, it has no further meaning.

reply

kazinator | karma 30751 | avg karma 1.78 · 2019-09-26 00:45:42

Note that deciphering type tag info in a Lisp value can be called "parsing". Cases occur: for some values we have to chase a pointer into a heap to get more type info.

A Lisp function call does "parsing". (foo 1 2 3) has to figure out dynamically whether a (lambda (&rest args)) is being called or (lambda (a b c)) or (lambda (a b &optional c (d 42)) or whatever.

The #S(employee id 1 salary 95000) object also isn't an employee record without context and interpretation.

> it needs to know where declarations are and where the code body ist

The syntax can be subject to a fairly trivial canonicalizing pass, after which all these things are at fixed positions:

  (let ((a 3) b) (foo a)) ---canon-->  (let ((a 3) (b)) (declare) (foo a))

Now the variables are all pairs to which we can blindly apply car and cadr, the declarations are at caddr and the body forms at cdddr.

lispm | karma 12407 | avg karma 2.21 · 2019-09-26 07:10:35+00:00

> Note that deciphering type tag info in a Lisp value can be called "parsing".

No. You are still operating on the level of s-expressions, a data format. The type-tag of LET is SYMBOL.

Here we have some Lisp code in the form of an s-expression:

   (let ((let 'let))
     ((lambda (let)
        (let ((let let))
          let))
      let))

All above LET have the same type tag, but in terms of syntax they have a different purpose in the form above: we have special operators, variable declarations, variable usage, data objects. I can't just car/cdr down the lists and call TYPE-OF. This always returns SYMBOL for LET.

On the level of a syntax tree we would want to know what it is in terms of syntactic categories: variable, operator, data object, function, macro, etc. Lisp source code has no representation for that and we need to determine that by parsing the code.

reply

kazinator | karma 30751 | avg karma 1.78 · 2019-09-26 16:03:31+00:00

The type tag is to be "parsed out" in some implementation specific way. For instance, we first look at the LSB of the Lisp value. If we find a 0 there, then it's a pointer; we dereference the pointer into some heap record, where we retrieve a second-level type tag. Except that, perhaps, if the whole thing is 0 bits, then it's nil; we must not chase that as a pointer. And we find a 1 in in the LSB, then perhaps we look at other surrounding bits for the type. All of this examination of bits with irregularities and cases looks like parsing.