aboutsummaryrefslogtreecommitdiff
path: root/src/cmd/compile/internal/syntax/scanner.go
AgeCommit message (Collapse)Author
2021-04-07cmd/compile/internal/syntax: add "~" operatorRobert Griesemer
Change-Id: I7991103d97b97260d9615b7f5baf7ec75ad87d1f Reviewed-on: https://go-review.googlesource.com/c/go/+/307370 Trust: Robert Griesemer <gri@golang.org> Reviewed-by: Robert Findley <rfindley@google.com>
2020-04-21cmd/compile: detect and diagnose invalid //go: directive placementRuss Cox
Thie CL changes cmd/compile/internal/syntax to give the gc half of the compiler more control over pragma handling, so that it can prepare better errors, diagnose misuse, and so on. Before, the API between the two was hard-coded as a uint16. Now it is an interface{}. This should set us up better for future directives. In addition to the split, this CL emits a "misplaced compiler directive" error for any directive that is in a place where it has no effect. I've certainly been confused in the past by adding comments that were doing nothing and not realizing it. This should help avoid that kind of confusion. The rule, now applied consistently, is that a //go: directive must appear on a line by itself immediately before the declaration specifier it means to apply to. See cmd/compile/doc.go for precise text and test/directive.go for examples. This may cause some code to stop compiling, but that code was broken. For example, this code formerly applied the //go:noinline to f (not c) but now will fail to compile: //go:noinline const c = 1 func f() {} Change-Id: Ieba9b8d90a27cfab25de79d2790a895cefe5296f Reviewed-on: https://go-review.googlesource.com/c/go/+/228578 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2020-03-11cmd/compile/internal/syntax: various cleanups following CL 221603Robert Griesemer
1) Introduced setLit method to uniformly set the scanner state for literals instead of directly manipulating the scanner fields. 2) Use a local variable 'ok' to track validity of literals instead of relying on the side-effect of error reporters setting s.bad. More code but clearer because it is local and explicit. 3) s/litname/baseName/ and use this function uniformly, also for escapes. Consequently we now report always "hexadecimal" and not "hex" (in the case of invalid escapes). 4) Added TestDirectives verifying that we get the correct directive string (even if that string contains '%'). Verified that lines/s parsing performance is unchanged by comparing go test -run StdLib -fast -skip "syntax/(scanner|scanner_test)\.go" before and after (no relevant difference). Change-Id: I143e4648fdaa31d1c365fb794a1cae4bc1c3f5ba Reviewed-on: https://go-review.googlesource.com/c/go/+/222258 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-03-05cmd/compile/internal/scanner: report correct directive string (fix build)Robert Griesemer
Change-Id: I01b244e97e4140545a46b3d494489a30126c2139 Reviewed-on: https://go-review.googlesource.com/c/go/+/222257 Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-03-05cmd/compile/internal/syntax: faster and simpler source readerRobert Griesemer
This is one of several changes that were part of a larger rewrite which I made in early 2019 after switching to the new number literal syntax implementation. The purpose of the rewrite was to simplify reading of source code (Unicode character by character) and speed up the scanner but was never submitted for review due to other priorities. Part 3 of 3: This change contains a complete rewrite of source.go, the file that implements reading individual Unicode characters from the source. The new implementation is easier to use and has simpler literal buffer management, resulting in faster scanner and thus parser performance. Thew new source.go (internal) API is centered around nextch() which advances the scanner by one character. The scanner has been adjusted around nextch() and now consistently does one character look-ahead (there's no need for complicated ungetr-ing anymore). Only in one case backtrack is needed (when finding '..' rather than '...') and that case is now more cleanly solved with the new reset() function. Measuring line/s parsing peformance by running go test -run StdLib -fast -skip "syntax/(scanner|source)\.go" (best of 5 runs on "quiet" MacBook Pro, 3.3GHz Dual-Core i7, 16GB RAM, OS X 10.15.3) before and after shows consistently 3-5% improvement of line parsing speed: old: parsed 1788155 lines (3969 files) in 1.255520307s (1424234 lines/s) new: parsed 1788155 lines (3969 files) in 1.213197037s (1473919 lines/s) (scanner.go and parser.go are skipped because this CL changed those files.) Change-Id: Ida947f4b538d42eb2d2349062c69edb6c9e5ca66 Reviewed-on: https://go-review.googlesource.com/c/go/+/221603 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-03-05cmd/compile/internal/syntax: better scanner error messagesRobert Griesemer
This is one of several changes that were part of a larger rewrite which I made in early 2019 after switching to the new number literal syntax implementation. The purpose of the rewrite was to simplify reading of source code (Unicode character by character) and speed up the scanner but was never submitted for review due to other priorities. Part 2 of 3: This change contains improvements to the scanner error messages: - Use "rune literal" rather than "character literal" to match the spec nomenclature. - Shorter, more to the point error messages. (For instance, "more than one character in rune literal" rather than "invalid character literal (more than one character)", etc.) Change-Id: I1aaf79003374a68dbb05926437ed305cf2a8ec96 Reviewed-on: https://go-review.googlesource.com/c/go/+/221602 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2019-09-10cmd/compile/internal/scanner: report at most one lexical error per number ↵Robert Griesemer
literal Leave reporting of multiple errors for strings alone for now; we probably want to see all incorrect escape sequences in runes/strings independent of other errors. Fixes #33961. Change-Id: Id722e95f802687963eec647d1d1841bd6ed17d35 Reviewed-on: https://go-review.googlesource.com/c/go/+/192499 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2019-08-29cmd/compile/internal/syntax: add BasicLit.Bad field for lexical errorsRobert Griesemer
The new (internal) field scanner.bad indicates whether a syntax error occurred while scanning a literal; the corresponding scanner.lit string may be syntactically incorrect in that case. Store the value of scanner.bad together with the scanner.lit in BasicLit. Clean up error handling so that all syntactic errors use one of the scanner's error reporting methods which also set scanner.bad. Make use of the new field in a few places where we used to track a prior error separately. Preliminary step towards fixing #32133 in a comprehensive manner. Change-Id: I4d79ad6e3b50632dd5fb3fc32ca3df0598ee77b4 Reviewed-on: https://go-review.googlesource.com/c/go/+/192278 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2019-02-19cmd/compile: accept 'i' suffix orthogonally on all numbersRobert Griesemer
This change accepts the 'i' suffix on binary and octal integer literals as well as hexadecimal floats. The suffix was already accepted on decimal integers and floats. Note that 0123i == 123i for backward-compatibility (and 09i is valid). See also the respective language in the spec change: https://golang.org/cl/161098 Change-Id: I9d2d755cba36a3fa7b9e24308c73754d4568daaf Reviewed-on: https://go-review.googlesource.com/c/162878 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-02-11cmd/compile: accept new Go2 number literalsRobert Griesemer
This CL introduces compiler support for the new binary and octal integer literals, hexadecimal floats, and digit separators for all number literals. The new Go 2 number literal scanner accepts the following liberal format: number = [ prefix ] digits [ "." digits ] [ exponent ] [ "i" ] . prefix = "0" [ "b" |"B" | "o" | "O" | "x" | "X" ] . digits = { digit | "_" } . exponent = ( "e" | "E" | "p" | "P" ) [ "+" | "-" ] digits . If the number starts with "0x" or "0X", digit is any hexadecimal digit; otherwise, digit is any decimal digit. If the accepted number is not valid, errors are reported accordingly. See the new test cases in scanner_test.go for a selection of valid and invalid numbers and the respective error messages. R=Go1.13 Updates #12711. Updates #19308. Updates #28493. Updates #29008. Change-Id: Ic8febc7bd4dc5186b16a8c8897691e81125cf0ca Reviewed-on: https://go-review.googlesource.com/c/157677 Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>
2019-02-11cmd/compile/internal/syntax: allow more than one rune "unread"Robert Griesemer
Make it possible to "unread" more than one byte before the most recently read rune. Use a better name than ungetr2 and make it slightly more efficient. R=Go1.13 Change-Id: I45d5dfa11e508259a972ca6560d1f78d7a51fe15 Reviewed-on: https://go-review.googlesource.com/c/158957 Reviewed-by: Russ Cox <rsc@golang.org>
2018-02-24cmd/compile/internal/syntax: use stringer for operators and tokensDaniel Martí
With its new -linecomment flag, it is now possible to use stringer on values whose strings aren't valid identifiers. This is the case with tokens and operators in Go. Operator alredy had inline comments with each operator's string representation; only minor modifications were needed. The inline comments were added to each of the token names, using the same strategy. Comments that were previously inline or part of the string arrays were moved to the line immediately before the name they correspond to. Finally, declare tokStrFast as a function that uses the generated arrays directly. Avoiding the branch and strconv call means that we avoid a performance regression in the scanner, perhaps due to the lack of mid-stack inlining. Performance is not affected. Measured with 'go test -run StdLib -fast' on an X1 Carbon Gen2 (i5-4300U @ 1.90GHz, 8GB RAM, SSD), the best of 5 runs before and after the changes are: parsed 1709399 lines (3763 files) in 1.707402159s (1001169 lines/s) allocated 449.282Mb (263.137Mb/s) parsed 1709329 lines (3765 files) in 1.706663154s (1001562 lines/s) allocated 449.290Mb (263.256Mb/s) Change-Id: Idcc4f83393fcadd6579700e3602c09496ea2625b Reviewed-on: https://go-review.googlesource.com/95357 Reviewed-by: Robert Griesemer <gri@golang.org>
2018-02-15cmd/compile/internal/syntax: don't assume (operator) ~ means operator ^Robert Griesemer
The scanner assumed that ~ really meant ^, which may be helpful when coming from C. But ~ is not a valid Go token, and pretending that it should be ^ can lead to confusing error messages. Better to be upfront about it and complain about the invalid character in the first place. This was code "inherited" from the original yacc parser which was derived from a C compiler. It's 10 years later and we can probably assume that people are less confused about C and Go. Fixes #23587. Change-Id: I8d8f9b55b0dff009b75c1530d729bf9092c5aea6 Reviewed-on: https://go-review.googlesource.com/94160 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-02-12cmd/compile/internal/syntax: implement comment reporting in scannerRobert Griesemer
R=go1.11 In order to collect comments in the AST and for error testing purposes, the scanner needs to not only recognize and skip comments, but also be able to report them if so desired. This change adds a mode flag to the scanner's init function which controls the scanner behavior around comments. In the common case where comments are not needed, there must be no significant overhead. Thus, comments are reported via a handler upcall rather than being returned as a _Comment token (which the parser would have to filter out with every scanner.next() call). Because the handlers for error messages, directives, and comments all look the same (they take a position and text), and because directives look like comments, and errors never start with a '/', this change simplifies the scanner's init call to only take one (error) handler instead of 2 or 3 different handlers with identical signature. It is trivial in the handler to determine if we have an error, directive, or general comment. Finally, because directives are comments, when reporting directives the full comment text is returned now rather than just the directive text. This simplifies the implementation and makes the scanner API more regular. Furthermore, it provides important information about the comment style used by a directive, which may matter eventually when we fully implement /*line file:line:col*/ directives. Change-Id: I2adbfcebecd615e4237ed3a832b6ceb9518bf09c Reviewed-on: https://go-review.googlesource.com/88215 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-02-12cmd/compile/internal/syntax: permit /*line file:line:col*/ directivesRobert Griesemer
R=go1.11 This implements parsing of /*line file:line*/ and /*line file:line:col*/ directives and also extends the optional column format to regular //line directives, per #22662. For a line directive to be recognized, its comment text must start with the prefix "line " which is followed by one of the following: :line :line:col filename:line filename:line:col with at least one : present. The line and col values must be unsigned decimal integers; everything before is considered part of the filename. Valid line directives are: //line :123 //line :123:8 //line foo.go:123 //line C:foo.go:123 (filename is "C:foo.go") //line C:foo.go:123:8 (filename is "C:foo.go") /*line ::123*/ (filename is ":") No matter the comment format, at the moment all directives act as if they were in //line comments, and column information is ignored. To be addressed in subsequent CLs. For #22662. Change-Id: I1a2dc54bacc94bc6cdedc5229ee13278971f314e Reviewed-on: https://go-review.googlesource.com/86037 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-09cmd/compile/internal/syntax: start line offset (column) numbers at 1Robert Griesemer
We could leave it alone and fix line offset (column) numbers when reporting errors, but that is likely to cause confusion (internal numbers don't match reported numbers). Instead, switch to default numbering starting at 1. For package syntax-internal use only, introduced constants defining the line and column bases, and use them throughout the code and its tests. It is possible to change these constants and package syntax will continue to work. But changing them is going to break any client that makes explicit assumptions about line and column numbers (which is "all of them"). Change-Id: Ia3d136a8ec8d9372ed9c05ca47d3dff222cf030e Reviewed-on: https://go-review.googlesource.com/37996 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-02-16cmd/compile/internal/syntax: better errors and recovery for invalid ↵Robert Griesemer
character literals Fixes #15611. Change-Id: I352b145026466cafef8cf87addafbd30716bda24 Reviewed-on: https://go-review.googlesource.com/37138 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-02-15cmd/compile/internal/syntax: compiler directives must start at beginning of lineRobert Griesemer
- ignore them, if they don't. - added tests Fixes #18393. Change-Id: I13f87b81ac6b9138ab5031bb3dd6bebc4c548156 Reviewed-on: https://go-review.googlesource.com/37020 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-02-09cmd/compile/internal/syntax: differentiate between ';' and '\n' in syntax errorsRobert Griesemer
Towards better syntax error messages: With this change, the parser knows whether a semicolon was an actual ';' in the source, or whether it was an automatically inserted semicolon as result of a '\n' or EOF. Using this information in error messages makes them more understandable. For #17328. Change-Id: I8cd9accee8681b62569d0ecef922d38682b401eb Reviewed-on: https://go-review.googlesource.com/36636 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-12-09[dev.inline] cmd/compile/internal/syntax: remove gcCompat uses in scannerRobert Griesemer
- make the scanner unconditionally gc compatible - consistently use "invalid" instead "illegal" in errors Reviewed in and cherry-picked from https://go-review.googlesource.com/#/c/33896/. Change-Id: I4c4253e7392f3311b0d838bbe503576c9469b203 Reviewed-on: https://go-review.googlesource.com/34237 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-12-09[dev.inline] cmd/compile/internal/syntax: use syntax.Pos for all external ↵Robert Griesemer
positions - use syntax.Pos in syntax.Error (rather than line, col) - use syntax.Pos in syntax.PragmaHandler (rather than just line) - update uses - better documentation in various places Also: - make Pos methods use Pos receiver (rather than *Pos) Reviewed in and cherry-picked from https://go-review.googlesource.com/#/c/33891/. With minor adjustments to noder.go to make merge compile. Change-Id: I5507cea6c2be46a7677087c1aeb69382d31033eb Reviewed-on: https://go-review.googlesource.com/34236 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-12-09[dev.inline] cmd/compile/internal/syntax: clean up error and pragma handlingRobert Griesemer
Reviewed in and cherry-picked from https://go-review.googlesource.com/#/c/33873/. - simplify error handling in source.go (move handling of first error into parser, where it belongs) - clean up error handling in scanner.go - move pragma and position base handling from scanner to parser where it belongs - have separate error methods in parser to avoid confusion with handlers from scanner.go and source.go - (source.go) and (scanner.go, source.go, tokens.go) may be stand-alone packages if so desired, which means these files are now less entangled and easier to maintain Change-Id: I81510fc7ef943b78eaa49092c0eab2075a05878c Reviewed-on: https://go-review.googlesource.com/34235 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-12-09[dev.inline] cmd/compile/internal/syntax: simplified position codeRobert Griesemer
Reviewed in and cherry-picked from https://go-review.googlesource.com/#/c/33805/. Change-Id: I859d9bd5f2256ca78f7b24b330290f7ae600854d Reviewed-on: https://go-review.googlesource.com/34234 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-12-09[dev.inline] cmd/compile/internal/syntax: process //line pragmas in scannerRobert Griesemer
Reviewed in and cherry-picked from https://go-review.googlesource.com/#/c/33764/. Minor adjustment in noder.go to make merge compile again. Change-Id: Ib5029b52b59944f207b0f2438c8a5aa576eb25b8 Reviewed-on: https://go-review.googlesource.com/34233 Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-12-09[dev.inline] cmd/compile/internal/syntax: introduce general position info ↵Robert Griesemer
for nodes Reviewed in and cherry-picked from https://go-review.googlesource.com/#/c/33758/. Minor adjustments in noder.go to fix merge. Change-Id: Ibe429e327c7f8554f8ac205c61ce3738013aed98 Reviewed-on: https://go-review.googlesource.com/34231 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-11-05Revert "cmd/compile/internal/syntax: support for alias declarations"Robert Griesemer
This reverts commit 32db3f2756324616b7c856ac9501deccc2491239. Reason: Decision to back out current alias implementation. For #16339. Change-Id: Ib05e3d96041d8347e49cae292f66bec791a1fdc8 Reviewed-on: https://go-review.googlesource.com/32825 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-09-16cmd/compile/internal/syntax: support for alias declarationsRobert Griesemer
Permits parsing of alias declarations with -newparser const/type/var/func T => p.T but the compiler will reject it with an error. For now this also accepts type T = p.T so we can experiment with a type-alias only scenario. - renamed _Arrow token to _Larrow (<-) - introduced _Rarrow token (=>) - introduced AliasDecl node - extended scanner to accept _Rarrow - extended parser and printer to handle alias declarations Change-Id: I0170d10a87df8255db9186d466b6fd405228c38e Reviewed-on: https://go-review.googlesource.com/29355 Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-09-09cmd/compile/internal/syntax: remove strbyteseqlMatthew Dempsky
cmd/compile already optimizes "string(b) == s" to avoid allocating a temporary string. Change-Id: I4244fbeae8d350261494135c357f9a6e2ab7ace3 Reviewed-on: https://go-review.googlesource.com/28931 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-08-31cmd/compile: handle pragmas immediately with -newparser=1Matthew Dempsky
Instead of saving all pragmas and processing them after parsing is finished, process them immediately during scanning like the current lexer does. This is a bit unfortunate because it means we can't use syntax.ParseFile to concurrently parse files yet, but it fixes how we report syntax errors in the presence of //line pragmas. While here, add a bunch more gcCompat entries to syntax/parser.go to get "go build -toolexec='toolstash -cmp' std cmd" passing. There are still a few remaining cases only triggered building unit tests, but this seems like a nice checkpoint. Change-Id: Iaf3bbcf2849857a460496f31eea228e0c585ce13 Reviewed-on: https://go-review.googlesource.com/28226 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2016-08-19cmd/compile/internal/syntax: match old parser errors and line numbersMatthew Dempsky
This makes a bunch of changes to package syntax to tweak line numbers for AST nodes. For example, short variable declaration statements are now associated with the location of the ":=" token, and function calls are associated with the location of the final ")" token. These help satisfy many unit tests that assume the old parser's behavior. Because many of these changes are questionable, they're guarded behind a new "gcCompat" const to make them easy to identify and revisit in the future. A handful of remaining tests are too difficult to make behave identically. These have been updated to execute with -newparser=0 and comments explaining why they need to be fixed. all.bash now passes with both the old and new parsers. Change-Id: Iab834b71ca8698d39269f261eb5c92a0d55a3bf4 Reviewed-on: https://go-review.googlesource.com/27199 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2016-08-18cmd/compile/internal/syntax: fast Go syntax trees, initial commit.Robert Griesemer
Syntax tree nodes, scanner, parser, basic printers. Builds syntax trees for entire Go std lib at a rate of ~1.8M lines/s in warmed up state (MacMini, 2.3 GHz Intel Core i7, 8GB RAM): $ go test -run StdLib -fast parsed 1074617 lines (2832 files) in 579.66364ms (1853863 lines/s) allocated 282.212Mb (486.854Mb/s) PASS Change-Id: Ie26d9a7bf4e5ff07457aedfcc9b89f0eba72ae3f Reviewed-on: https://go-review.googlesource.com/27195 Run-TryBot: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Robert Griesemer <gri@golang.org>