aboutsummaryrefslogtreecommitdiff
path: root/src/cmd/compile/internal/syntax/source.go
AgeCommit message (Collapse)Author
2020-03-05cmd/compile/internal/syntax: faster and simpler source readerRobert Griesemer
This is one of several changes that were part of a larger rewrite which I made in early 2019 after switching to the new number literal syntax implementation. The purpose of the rewrite was to simplify reading of source code (Unicode character by character) and speed up the scanner but was never submitted for review due to other priorities. Part 3 of 3: This change contains a complete rewrite of source.go, the file that implements reading individual Unicode characters from the source. The new implementation is easier to use and has simpler literal buffer management, resulting in faster scanner and thus parser performance. Thew new source.go (internal) API is centered around nextch() which advances the scanner by one character. The scanner has been adjusted around nextch() and now consistently does one character look-ahead (there's no need for complicated ungetr-ing anymore). Only in one case backtrack is needed (when finding '..' rather than '...') and that case is now more cleanly solved with the new reset() function. Measuring line/s parsing peformance by running go test -run StdLib -fast -skip "syntax/(scanner|source)\.go" (best of 5 runs on "quiet" MacBook Pro, 3.3GHz Dual-Core i7, 16GB RAM, OS X 10.15.3) before and after shows consistently 3-5% improvement of line parsing speed: old: parsed 1788155 lines (3969 files) in 1.255520307s (1424234 lines/s) new: parsed 1788155 lines (3969 files) in 1.213197037s (1473919 lines/s) (scanner.go and parser.go are skipped because this CL changed those files.) Change-Id: Ida947f4b538d42eb2d2349062c69edb6c9e5ca66 Reviewed-on: https://go-review.googlesource.com/c/go/+/221603 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2019-02-11cmd/compile/internal/syntax: allow more than one rune "unread"Robert Griesemer
Make it possible to "unread" more than one byte before the most recently read rune. Use a better name than ungetr2 and make it slightly more efficient. R=Go1.13 Change-Id: I45d5dfa11e508259a972ca6560d1f78d7a51fe15 Reviewed-on: https://go-review.googlesource.com/c/158957 Reviewed-by: Russ Cox <rsc@golang.org>
2018-12-05cmd/compile/internal/syntax: remove unused field in (scanner) sourceRobert Griesemer
The source.offs field was intended for computing line offsets which may allow a tiny optimization (see TODO in source.go). We haven't done the optimization, so for now just remove the field to avoid confusion. It's trivially added if needed. While at it, also: - Fix comment for ungetr2. - Make sure sentinel is present even if reading from the io.Reader failed. Change-Id: Ib056c6478030b3fe5fec29045362c8161ff3d19e Reviewed-on: https://go-review.googlesource.com/c/152763 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-02-12cmd/compile/internal/syntax: implement comment reporting in scannerRobert Griesemer
R=go1.11 In order to collect comments in the AST and for error testing purposes, the scanner needs to not only recognize and skip comments, but also be able to report them if so desired. This change adds a mode flag to the scanner's init function which controls the scanner behavior around comments. In the common case where comments are not needed, there must be no significant overhead. Thus, comments are reported via a handler upcall rather than being returned as a _Comment token (which the parser would have to filter out with every scanner.next() call). Because the handlers for error messages, directives, and comments all look the same (they take a position and text), and because directives look like comments, and errors never start with a '/', this change simplifies the scanner's init call to only take one (error) handler instead of 2 or 3 different handlers with identical signature. It is trivial in the handler to determine if we have an error, directive, or general comment. Finally, because directives are comments, when reporting directives the full comment text is returned now rather than just the directive text. This simplifies the implementation and makes the scanner API more regular. Furthermore, it provides important information about the comment style used by a directive, which may matter eventually when we fully implement /*line file:line:col*/ directives. Change-Id: I2adbfcebecd615e4237ed3a832b6ceb9518bf09c Reviewed-on: https://go-review.googlesource.com/88215 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-09-20cmd/compile/internal/syntax: fix source buffer refillingMatthew Dempsky
The previous code seems to have an off-by-1 in it somewhere, the consequence being that we didn't properly preserve all of the old buffer contents that we intended to. After spending a while looking at the existing window-shifting logic, I wasn't able to understand exactly how it was supposed to work or where the issue was, so I rewrote it to be (at least IMO) more obviously correct. Fixes #21938. Change-Id: I1ed7bbc1e1751a52ab5f7cf0411ae289586dc345 Reviewed-on: https://go-review.googlesource.com/64830 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2017-03-09cmd/compile/internal/syntax: start line offset (column) numbers at 1Robert Griesemer
We could leave it alone and fix line offset (column) numbers when reporting errors, but that is likely to cause confusion (internal numbers don't match reported numbers). Instead, switch to default numbering starting at 1. For package syntax-internal use only, introduced constants defining the line and column bases, and use them throughout the code and its tests. It is possible to change these constants and package syntax will continue to work. But changing them is going to break any client that makes explicit assumptions about line and column numbers (which is "all of them"). Change-Id: Ia3d136a8ec8d9372ed9c05ca47d3dff222cf030e Reviewed-on: https://go-review.googlesource.com/37996 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-12-09[dev.inline] cmd/compile/internal/syntax: report byte offset rather then ↵Robert Griesemer
rune count for column value This will only become user-visible if error messages show column information. Per the discussion in #10324. For #10324. Change-Id: I5959c1655aba74bb1a22fdc261cd728ffcfa6912 Reviewed-on: https://go-review.googlesource.com/34244 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-12-09[dev.inline] cmd/compile/internal/syntax: use syntax.Pos for all external ↵Robert Griesemer
positions - use syntax.Pos in syntax.Error (rather than line, col) - use syntax.Pos in syntax.PragmaHandler (rather than just line) - update uses - better documentation in various places Also: - make Pos methods use Pos receiver (rather than *Pos) Reviewed in and cherry-picked from https://go-review.googlesource.com/#/c/33891/. With minor adjustments to noder.go to make merge compile. Change-Id: I5507cea6c2be46a7677087c1aeb69382d31033eb Reviewed-on: https://go-review.googlesource.com/34236 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-12-09[dev.inline] cmd/compile/internal/syntax: clean up error and pragma handlingRobert Griesemer
Reviewed in and cherry-picked from https://go-review.googlesource.com/#/c/33873/. - simplify error handling in source.go (move handling of first error into parser, where it belongs) - clean up error handling in scanner.go - move pragma and position base handling from scanner to parser where it belongs - have separate error methods in parser to avoid confusion with handlers from scanner.go and source.go - (source.go) and (scanner.go, source.go, tokens.go) may be stand-alone packages if so desired, which means these files are now less entangled and easier to maintain Change-Id: I81510fc7ef943b78eaa49092c0eab2075a05878c Reviewed-on: https://go-review.googlesource.com/34235 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-12-09[dev.inline] cmd/compile/internal/syntax: introduce general position info ↵Robert Griesemer
for nodes Reviewed in and cherry-picked from https://go-review.googlesource.com/#/c/33758/. Minor adjustments in noder.go to fix merge. Change-Id: Ibe429e327c7f8554f8ac205c61ce3738013aed98 Reviewed-on: https://go-review.googlesource.com/34231 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-11-09cmd/compile/internal/syntax: fix error handling for Read/Parse callsRobert Griesemer
- define syntax.Error for cleaner error reporting - abort parsing after first error if no error handler is installed - make sure to always report the first error, if any - document behavior of API calls - while at it: rename ReadXXX -> ParseXXX (clearer) - adjust cmd/compile noder.go accordingly Fixes #17774. Change-Id: I7893eedea454a64acd753e32f7a8bf811ddbb03c Reviewed-on: https://go-review.googlesource.com/32950 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-08-18cmd/compile/internal/syntax: fast Go syntax trees, initial commit.Robert Griesemer
Syntax tree nodes, scanner, parser, basic printers. Builds syntax trees for entire Go std lib at a rate of ~1.8M lines/s in warmed up state (MacMini, 2.3 GHz Intel Core i7, 8GB RAM): $ go test -run StdLib -fast parsed 1074617 lines (2832 files) in 579.66364ms (1853863 lines/s) allocated 282.212Mb (486.854Mb/s) PASS Change-Id: Ie26d9a7bf4e5ff07457aedfcc9b89f0eba72ae3f Reviewed-on: https://go-review.googlesource.com/27195 Run-TryBot: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Robert Griesemer <gri@golang.org>