SemanticDB Guide · Scalameta

SemanticDB is a data model for semantic information such as symbols and types about programs in Scala and other languages. SemanticDB decouples production and consumption of semantic information, establishing documented means for communication between tools.

In this document, we introduce practical aspects of working with SemanticDB. We describe the tools that can be used to produce SemanticDB payloads, the tools can be used to consume SemanticDB payloads and useful tips & tricks for working with SemanticDB. If you're looking for a comprehensive reference of SemanticDB features, check out the specification.

Installation

This guide covers several non-standard command-line tools: metac and metap. First, install the coursier command-line tool by following the instructions here. Next, use coursier to install metac and metap.

cs install metac metap

Example

Let's generate SemanticDB for a simple Scala program. (At the moment, our SemanticDB producers provide full Scala support and partial Java support. Theoretically, the SemanticDB protobuf schema can accommodate other languages as well, but we haven't attempted to do that yet).

object Test {
  def main(args: Array[String]): Unit = {
    println("hello world")
  }
}

In order to obtain a SemanticDB corresponding to this program, let's use the Metac command-line tool. For more information on other tools that can produce SemanticDB, see below.

metac Test.scala

metac is a thin wrapper over the Scala compiler. It supports the same command-line arguments as scalac supports, but instead of generating .class files it generates .semanticdb files.

$ tree
.
├── META-INF
│   └── semanticdb
│       └── Test.scala.semanticdb
└── Test.scala

If we take a look inside Test.scala.semanticdb, we'll see a weird mix of legible-looking text and special characters. That's because .semanticdb files store protobuf payloads.

$ xxd META-INF/semanticdb/Test.scala.semanticdb
00000000: 0aaa 0408 0412 0a54 6573 742e 7363 616c  .......Test.scal
00000010: 612a 580a 1a5f 656d 7074 795f 2f54 6573  a*X.._empty_/Tes
00000020: 742e 6d61 696e 2829 2e28 6172 6773 2918  t.main().(args).
00000030: 082a 0461 7267 7380 0101 8a01 2e22 2c0a  .*.args......",.
00000040: 2a12 2812 0c73 6361 6c61 2f41 7272 6179  *.(..scala/Array
00000050: 231a 1812 1612 1473 6361 6c61 2f50 7265  #......scala/Pre
00000060: 6465 662e 5374 7269 6e67 232a 530a 0d5f  def.String#*S.._
00000070: 656d 7074 795f 2f54 6573 742e 180a 2008  empty_/Test... .
00000080: 2a04 5465 7374 8001 018a 012f 0a2d 0a00  *.Test...../.-..
00000090: 1211 120f 120d 7363 616c 612f 416e 7952  ......scala/AnyR
000000a0: 6566 2322 160a 145f 656d 7074 795f 2f54  ef#"..._empty_/T
000000b0: 6573 742e 6d61 696e 2829 2e92 0102 3a00  est.main()....:.
000000c0: 2a5c 0a14 5f65 6d70 7479 5f2f 5465 7374  *\.._empty_/Test
...

In order to make sense of .semanticdb files, we can use the Metap command-line tool. For more information on other tools that can consume SemanticDB, see below.

$ metap .
Test.scala
----------

Summary:
Schema => SemanticDB v4
Uri => Test.scala
Text => empty
Language => Scala
Symbols => 3 entries
Occurrences => 7 entries

Symbols:
_empty_/Test. => final object Test extends AnyRef { +1 decls }
_empty_/Test.main(). => method main(args: Array[String]): Unit
_empty_/Test.main().(args) => param args: Array[String]

Occurrences:
[0:7..0:11) <= _empty_/Test.
[1:6..1:10) <= _empty_/Test.main().
[1:11..1:15) <= _empty_/Test.main().(args)
[1:17..1:22) => scala/Array#
[1:23..1:29) => scala/Predef.String#
[1:33..1:37) => scala/Unit#
[2:4..2:11) => scala/Predef.println(+1).

Metap prettyprints various parts of the SemanticDB payload in correspondence with the SemanticDB specification. Here are the most important parts:

Uri stores the URI of the source file relative to the directory where the SemanticDB producer was invoked.
Symbols contains information about definitions in the source file, including modifiers, signatures, etc.

For example, _empty_/Test.main(). => method main: (args: Array[String]): Unit says that main is a method with one parameter of type Array[String].
Occurrences contains a list of identifiers from the source file with their line/column-based positions and unique identifiers pointing to corresponding definitions resolved by the compiler.

For example, [2:4..2:11): println => scala/Predef.println(+1). says that the identifier println on line 3 (zero-based numbering scheme!) refers to the second overload of println from scala/Predef.

What is SemanticDB good for?

SemanticDB decouples producers and consumers of semantic information about programs and establishes a rigorous specification of the interchange format.

Thanks to that, SemanticDB-based tools like Scalafix, Metadoc and Metals don't need to know about compiler internals and can work with any compiler that supports SemanticDB. This demonstrably improves developer experience, portability and scalability. Next-generation semantic tools at Twitter are based on SemanticDB.

For more information about the SemanticDB vision, check out our talks:

Semantic Tooling at Twitter (June 2017) by Eugene Burmako & Stu Hood.
SemanticDB for Scala developer tools (April 2018) by Ólafur Páll Geirsson.
How We Built Tools That Scale to Millions of Lines of Code (June 2018) by Eugene Burmako.

Producing SemanticDB

Scalac compiler plugin

The semanticdb-scalac compiler plugin injects itself immediately after the typer phase of the Scala compiler and then harvests and dumps semantic information from Scalac in SemanticDB format.

scalac -Xplugin:path/to.jar -Yrangepos [<pluginOption> ...] [<scalacOption> ...] [<sourceFile> ...]

The compiler plugin supports the following options that can be passed through Scalac in the form of

-P:semanticdb:<option>:<value>

(in the table below, the "Option" column refers only to the <option> part, and the "Value" column describes the <value> component):

Option	Value	Explanation	Default
`failures`	`error`, `warning`, `info`, `ignore`	The level at which the Scala compiler should report crashes that may happen during SemanticDB generation.	`warning`
`profiling`	`on`, `off`	Controls basic profiling functionality that computes the overhead of SemanticDB generation relative to regular compilation time (`on` for dumping profiling information to console, `off` for disabling profiling).	`off`
`include`	Java regex	Which source files to include in SemanticDB generation?	`.*`
`exclude`	Java regex	Which source files to exclude from SemanticDB generation?	`^$`
`sourceroot`	Absolute or relative path	Used to relativize source file paths into `TextDocument.uri`. If the value starts with `targetroot:`, the remaining part is interpreted as relative to `targetroot` (see below; must be specified first) instead of current working directory.	Current working directory
`targetroot`	Absolute or relative path	The output directory to produce `META-INF/semanticdb/*/.semanticdb` files.	The compiler output directory, matches the sbt setting key `classDirectory` and scalac command-line option `-d`.
`text`	`on`, `off`	Specifies whether to save source code in `TextDocument.text` (`on` for yes, `off` for no).	`off`
`md5`	`on`, `off`	Specifies whether to save a hexadecimal formatted MD5 fingerprint of the source file contents in `TextDocument.md5` (`on` for yes, `off` for no).	`on`
`symbols`	`all`, `local-only`, `none`	Specifies which symbol informations to save in `TextDocument.symbols` (`all` for both local and global symbols, `local-only` for only local symbols and `none` for no symbols).	`all`
`diagnostics`	`on`, `off`	Specifies whether to save compiler messages in `TextDocument.diagnostics` (`on` for yes, `off` for no).	`on`
`synthetics`	`on`, `off`	Specifies whether to save some of the compiler-synthesized code in the currently unspecified `TextDocument.synthetics` section (`on` for yes, `off` for no).	`off`

semanticdb-scalac can be hooked into Scala builds in a number of ways. Read below for more information on command-line tools as well as integration into Scala build tools.

Metac

Metac is a command-line tool that serves as a drop-in replacement for scalac and produces *.semanticdb files instead of *.class files. It supports the same command-line arguments as scalac, including the compiler plugin options described above.

# downloads CLI tool based on org.scalameta.metac, version 4.13.9
cs install metac # if necessary
metac [<pluginOption> ...] [<scalacOption> ...] [<sourceFile> ...]

With metac, it is not necessary to provide the flags -Xplugin:/path/to.jar and -Yrangepos, which makes it ideal for quick experiments with SemanticDB. For an example of using Metac, check out Example.

sbt

In order to enable semanticdb-scalac for your sbt project, add the following to your build. Note that the compiler plugin requires the -Yrangepos compiler option to be enabled.

addCompilerPlugin("org.scalameta" % "semanticdb-scalac" % "4.13.9" cross CrossVersion.full)
scalacOptions += "-Yrangepos"

Consuming SemanticDB

Scala bindings

The semanticdb library contains ScalaPB bindings to the SemanticDB protobuf schema. Using this library, one can model SemanticDB entities as Scala case classes and serialize/deserialize them into bytes and streams.

libraryDependencies += "org.scalameta" %% "semanticdb-shared" % "4.13.9"

Caveats:

At the moment, there are no compatibility guarantees for Scala bindings to the SemanticDB schema. The current package of the schema (scala.meta.internal.semanticdb) is considered internal, so we do not provide any guarantees about compatibility across different versions of the semanticdb library. We are planning to improve the situation in the future.
At the moment, SemanticDB-based tools are responsible for implementing discovery of SemanticDB payloads on their own. For example, the non-trivial logic that Metap uses to traverse its inputs and detect SemanticDB files must be reproduced by Scalafix, Metadoc and others. We are planning to improve the situation in the future.

Metap

Metap is a command-line tool that takes a list of paths and then prettyprints all .semanticdb files that it finds in these paths. Advanced options control prettyprinting format.

# downloads CLI tool based on org.scalameta.semanticdb-metap, version 4.13.9
cs install metap # if necessary
metap [options] <classpath>

Option	Value	Explanation	Default
`<classpath>`	Pseudo classpath	Supported classpath entries: .semanticdb files (prettyprinted directly) Directories (traversed recursively, all found .semanticdb files prettyprinted) .jar files (traversed recursively, all found .semanticdb files uncompressed and prettyprinted)
`-compact`, `-detailed`, `-proto`		Specifies prettyprinting format, which can be either `-compact` (prints the most important parts of the payload in a condensed fashion), `-detailed` (more detailed than -compact, but still pretty condensed), or `-proto` (prints the same output as `protoc` would print, see below).	`-compact`

For an example of using Metap, check out Example.

Protoc

The Protocol Compiler tool (protoc) can inspect protobuf payloads in --decode (takes a schema) and --decode_raw (doesn't need a schema) modes. For the reference, here's the SemanticDB protobuf schema.

$ tree
.
├── META-INF
│   └── semanticdb
│       └── Test.scala.semanticdb
└── Test.scala

$ protoc --proto_path <directory with the .proto file>\
--decode scala.meta.internal.semanticdb.TextDocuments\
semanticdb.proto < META-INF/semanticdb/Test.scala.semanticdb

documents {
  schema: SEMANTICDB4
  uri: "Test.scala"
  symbols {
    symbol: "_empty_/Test.main().(args)"
    kind: PARAMETER
    name: "args"
    access {
      publicAccess {
      }
    }
    language: SCALA
    signature {
      valueSignature {
        tpe {
          typeRef {
            symbol: "scala/Array#"
            type_arguments {
              typeRef {
                symbol: "scala/Predef.String#"
              }
            }
          }
        }
      }
    }
  }
  symbols {
    symbol: "_empty_/Test."
    kind: OBJECT
    properties: 8
...

protoc was useful for getting things done in the early days of SemanticDB, but nowadays it's a bit too low-level. It is recommended to use metap instead of protoc.

SemanticDB-based tools

Scalafix

Scalafix is a rewrite and linting tool for Scala developed at the Scala Center with the goal to help automate migration between different Scala compiler and library versions.

Scalafix provides syntactic and semantic APIs that tool developers can use to write custom rewrite and linting rules. Syntactic information is obtained from the Scalameta parser, and semantic information is loaded from SemanticDB files produced by the Scalac compiler plugin.

Thanks to SemanticDB, Scalafix is:

Accessible: Scalafix enables novices to implement advanced rules without learning compiler internals.
Portable: Scalafix is not tied to compiler internals, which means that it can seamlessly work with any compiler / compiler version that supports the SemanticDB compiler plugin.
Scalable: Scalafix does not need a running Scala compiler, so it can perform rewrites and lints in parallel. (Unlike compiler plugin-based linters that are limited by the single-threaded architecture of Scalac).

Metadoc

Metadoc is an experiment with SemanticDB to build online code browser with IDE-like features. Check out the demo for more information.

Metadoc takes Scala sources and corresponding SemanticDB files generated by the Scalac compiler plugin. It then generates a static site that is possible to serve via GitHub pages, supporting jump to definition, find usages and search by symbol.

Thanks to SemanticDB, Metadoc is:

Cross-platform: Scala bindings to SemanticDB are cross-compiled to JVM and Scala.js, which means that the site generator and the online code browser can reuse the same logic to work with SemanticDB payloads.
Portable: Just like Scalafix, Metadoc is not tied to compiler internals, which means that it can seamlessly work with any compiler / compiler version that supports the SemanticDB compiler plugin.

Metals

Metals is an experiment to implement a language server for Scala using Scalameta projects such as Scalafmt, Scalafix and SemanticDB. Check out the presentation for more information.

Metals uses SemanticDB to:

Index project dependencies for intelligent jump to definition and quick lookups of project symbols.
Communicate with the Scala compiler regarding semantic information about the opened project.
Feed semantic information into Scalafix-based refactorings.

Thanks to SemanticDB, Metals is:

Mostly portable: Unlike Scalafix and Metadoc, Metals has modules that interface directly with compiler internals, but the majority its functionality is based on SemanticDB, so it can work with any compiler / compiler version that supports the SemanticDB compiler plugin.
Surprisingly fast: A well-defined schema for semantic information that can come from multiple locations (dependency classpath, uncompiled files, compiled files, etc) allows for a robust implementation of indexing, which reliably speeds up operations like jump to definition and find usages. We have even experimented with a relational index for SemanticDB data, which further improves performance characteristics.
Resilient: Reification of semantic information makes it possible to consult results of previous typechecks and accommodate certain edits by simply shifting offsets in old SemanticDB snapshots. This technique is surprisingly effective for supporting minor edits that result in temporarily invalid code.