SemanticDB Specification
SemanticDB is a data model for semantic information such as symbols and types about programs in Scala and other languages. SemanticDB decouples production and consumption of semantic information, establishing documented means for communication between tools.
Motivation
Nowadays, there is a clear trend towards standards for communication between developer tools. Language Server Protocol (LSP) [2], a protocol that connects programming language implementations and text editors, has gained strong industrial support and at the time of writing has implementations for many programming languages and editors. Build Server Protocol (BSP) [3] follows in LSP's tracks with an ambition to define a protocol for communication between language servers and build tools.
While lots of work in the open-source community has been invested in unifying user experience (by codifying commonly used operations like go to definition or find all references), relatively less work went into unifying implementor experience. For example, at the moment, there exist five different LSP implementations for Scala [4, 5, 6, 7, 8]. They all implement the same protocol that works with code, but they all use different data structures to represent that code.
Without a standard way to share information between tools, implementors have two unpleasant choices. First, they can use compiler internals, which are often underdocumented and lack compatibility guarantees. Otherwise, they have to reimplement compiler internals, which usually leads to duplication of effort and inconsistent user experience. For example, Scala IDE [9] uses Scala compiler internals, which has known stability problems in interactive mode. To the contrast, IntelliJ [10] has its own Scala typechecker, which is more stable but is known for spurious red squiggles.
This demonstrates the necessity for portable metaprogramming APIs - something that we have been working on within Scalameta [11]. In the previous years, we shipped portable syntactic APIs for Scala, including abstract syntax trees, parsing and prettyprinting [12]. SemanticDB is our take on portable semantic APIs.
Data Model
SemanticDB is a data model for semantic information such as symbols and types about programs in Scala and other languages. SemanticDB decouples production and consumption of semantic information, establishing documented means for communication between tools.
In this section, we describe the SemanticDB data model by going through the individual sections of the associated Protocol Buffers [13] schema. In the future, we may also support other kinds of schemas, including JSON [14] and SQL [15]. See Data Schemas for more information.
The data structures in the SemanticDB data model are language-agnostic, but the entities corresponding to these data structures may be language-dependent. For example, Range and Location mean the same thing in all languages, while Symbol format relies on scoping rules that generally vary across languages. See Languages for more information about language-dependent aspects of the specification.
TextDocument
message TextDocuments {
repeated TextDocument documents = 1;
}
message TextDocument {
reserved 4, 8, 9;
Schema schema = 1;
string uri = 2;
string text = 3;
string md5 = 11;
Language language = 10;
repeated SymbolInformation symbols = 5;
repeated SymbolOccurrence occurrences = 6;
repeated Diagnostic diagnostics = 7;
repeated Synthetic synthetics = 12;
}
TextDocument
provides semantic information about a code snippet. It is the
central data structure of the SemanticDB model, and its entities are also called
"SemanticDB payloads".
SemanticDB payloads must include the version of the SemanticDB model in the
schema
field. The following versions of the model are supported:
Version | Explanation | Data model |
LEGACY |
Legacy SemanticDB payloads | semanticdb2.proto |
SEMANTICDB3 |
SemanticDB v3 payloads | semanticdb3.proto |
SEMANTICDB4 |
SemanticDB v4 payloads | semanticdb.proto | (described in this document)
uri
defines the relative URI encoded path to the text document. The path is
relativized by the project sourceroot, which by convention is the root directory
of the project's workspace.
text
optionally defines the full string contents of the text document. When
text
is empty, the combination of the uri
field and a sourceroot enables
tools to retrieve the text document contents from disk.
md5
defines the hexadecimal formatted MD5 fingerprint of the source file
contents. When text
is empty, the MD5 fingerprint can be used to ensure a
SemanticDB payload is up-to-date with the file contents on disk.
language
defines the Language in which the code snippet is
written. See Languages for the list of supported programming
languages.
Semantic information about code snippets is stored in so called sections -
repeated fields within TextDocument
- as described below. These sections are
optional, which means that documents providing only part of semantic information
for the corresponding snippet (or no semantic information at all) are completely
legal.
Language
enum Language {
UNKNOWN_LANGUAGE = 0;
SCALA = 1;
JAVA = 2;
}
Language
represents a programming language that defines certain SemanticDB
entities, e.g. TextDocument or Symbol. Currently,
See Languages for the details of how features of supported
programming languages map onto SemanticDB.
At the moment, SemanticDB does not have official support for modelling languages
that are not included in the list above. Moreover, Language
does not have
capabilities to specify the associated language or compiler version. We may
improve on this in the future.
URI
URIs are unique resource identifiers as defined in [16]. In the SemanticDB model, URIs are represented as strings.
Range
message Range {
int32 start_line = 1;
int32 start_character = 2;
int32 end_line = 3;
int32 end_character = 4;
}
Range
in SemanticDB directly corresponds to Range
in LSP [2]. It
represents a range between start and end points in a document. Both points are
represented by zero-based line and zero-based character offsets. The start point
is inclusive, and the end point is exclusive.
Location
message Location {
string uri = 1;
Range range = 2;
}
Location
in SemanticDB directly corresponds to Location
in LSP [2].
It represents a location inside a document, such as a line inside a text file.
Symbol
Symbols are tokens that are used to correlate references and definitions. In the SemanticDB model, symbols are represented as strings.
At the moment, symbols are not guaranteed to be globally unique, which means that there may be limitations for using symbols as keys in maps or databases. Read below for more information about uniqueness guarantees for different symbol categories.
Global symbols. Correspond to a definition that can be referenced outside the document where the definition is located.
Global symbol format accommodates scoping rules of the underlying language, and
is therefore language-dependent. For example, the Int
class in the Scala
standard library is modelled as scala/Int#
, where the pound sign (#
) at the
end of the symbol means that the symbol corresponds to a class as opposed to an
object that would end with a dot (.
). See Languages for more
information.
Global symbols must be unique across the universe of documents that a
SemanticDB-based is working with at any given time. For example, if in such a
universe, there exist multiple definitions of Int
- e.g. coming from multiple
different versions of Scala - then all references to those definitions will have
SymbolOccurrence.symbol
equal to the same scala/Int#
symbol, and SemanticDB
will not be able to provide information to distinguish these references from
each other.
In the future, we may extend SemanticDB to allow for multiple definitions that under current rules would correspond to the same global symbol. In the meanwhile, when global uniqueness is required, tool authors are advised to accompany global symbols with out-of-band metadata.
Local symbols. Correspond to a definition that isn't global (see above).
Local symbol format is language-agnostic and is a concatenation of local
, a
decimal number and an optional suffix that consists of a plus (+
) and another
decimal number. For example, x
in a Scala method def identity[T](x: T): T
may be modelled by local symbols local0
, local1
, local2+1
, etc. The same
logic applies to the type parameter T
, which is also a local definition.
Local symbols must be unique within the underlying document, but they don't have
to be unique across multiple documents. For example, at the time of writing the
Scalac-based SemanticDB producer generates local symbols named local0
,
local1
, etc, with the counter resetting to zero for every new document.
In the future, we may extend SemanticDB to make local symbols more unique, but
we haven't yet found a way to do that without sacrificing performance and
payload size. In the meanwhile, when global uniqueness is required, tool authors
are advised to accompany local symbols with TextDocument.uri
.
Scope
message Scope {
repeated string symlinks = 1;
repeated SymbolInformation hardlinks = 2;
}
Scope
represents a container for definitions such as type parameters,
parameters or class declarations. Depending on the Language and the
implementation, scopes specify their members as follows:
- Via symbolic links to members, i.e. using Symbol.
- Or via direct embedding of member metadata, i.e. using SymbolInformation.
Hardlinking comes in handy in situations when an advanced type such as
StructuralType
, ExistentialType
or UniversalType
ends up being part of the
type signature of a global symbol. For example, in the following Scala program,
method m
has a structural type AnyRef { def x: Int }
:
class C {
def m = new { def x: Int = ??? }
}
At the time of writing, we haven't found a way to model the method x
, which
constitutes a logical part of this structural type, as a global symbol.
Therefore, this method has to be modelled as a local symbol.
However, turning x
into a local symbol that is part of the
"Symbols" section presents certain difficulties. In that
case, in order to analyze public signatures of the containing
TextDocument, SemanticDB consumers have to load the entire
"Symbols" section, which has adverse performance characteristics.
Hardlinking solves this conundrum by storing symbol metadata related to advanced types directly inside the payloads representing these types. Thanks to hardlinking, we don't need to invent global symbols to remain convenient to SemanticDB consumers.
Constant
message Constant {
oneof sealed_value {
UnitConstant unit_constant = 1;
BooleanConstant boolean_constant = 2;
ByteConstant byte_constant = 3;
ShortConstant short_constant = 4;
CharConstant char_constant = 5;
IntConstant int_constant = 6;
LongConstant long_constant = 7;
FloatConstant float_constant = 8;
DoubleConstant double_constant = 9;
StringConstant string_constant = 10;
NullConstant null_constant = 11;
}
}
message UnitConstant {
}
message BooleanConstant {
bool value = 1;
}
message ByteConstant {
int32 value = 1;
}
message ShortConstant {
int32 value = 1;
}
message CharConstant {
int32 value = 1;
}
message IntConstant {
int32 value = 1;
}
message LongConstant {
int64 value = 1;
}
message FloatConstant {
float value = 1;
}
message DoubleConstant {
double value = 1;
}
message StringConstant {
string value = 1;
}
message NullConstant {
}
Constant
represents compile-time constants. Compile-time constants include
values of the nine primitive types on the JVM, as well as strings and null
.
Type
message Type {
reserved 1, 3, 4, 5, 6, 11, 12, 15, 16;
oneof sealed_value {
TypeRef type_ref = 2;
SingleType single_type = 20;
ThisType this_type = 21;
SuperType super_type = 22;
ConstantType constant_type = 23;
IntersectionType intersection_type = 17;
UnionType union_type = 18;
WithType with_type = 19;
StructuralType structural_type = 7;
AnnotatedType annotated_type = 8;
ExistentialType existential_type = 9;
UniversalType universal_type = 10;
ByNameType by_name_type = 13;
RepeatedType repeated_type = 14;
}
}
Type
represents expression types. Definition signatures are modelled with
Signature.
The SemanticDB type system is a superset of the type systems of supported languages - currently modelled after the Scala type system [18]. This section describes the model, while Languages elaborates on how language types map onto this model.
message TypeRef {
Type prefix = 1;
string symbol = 2;
repeated Type type_arguments = 3;
}
TypeRef
is the bread-and-butter type of SemanticDB. It represents a reference
to a Symbol, possibly parameterized by type_arguments
. To model
references to nested type definitions, e.g. path-dependent types in Scala, type
refs include prefix
.
message SingleType {
Type prefix = 1;
string symbol = 2;
}
SingleType
is a singleton type that is inhabited only by the value of the
definition represented by symbol
and the accompanying prefix
.
message ThisType {
string symbol = 1;
}
ThisType
is a singleton type that is inhabited only by the value of this
reference to the definition represented by symbol
, only visible inside the
lexical extent of the enclosing definition.
message SuperType {
Type prefix = 1;
string symbol = 2;
}
SuperType
is a singleton type that is inhabited only by the value of super
reference to the definition represented by symbol
, prefixed by an optional
prefix
, only visible inside the lexical extent of the enclosing definition.
message ConstantType {
Constant constant = 1;
}
ConstantType
is a singleton type that is inhabited only by the value of a
Constant.
message IntersectionType {
repeated Type types = 1;
}
IntersectionType
represents an intersection of types
.
message UnionType {
repeated Type types = 1;
}
UnionType
represents a union of types
.
message WithType {
repeated Type types = 1;
}
WithType
represents a Scala-like compound type [31] based on types
.
Unlike intersection types, compound types are not commutative.
message StructuralType {
reserved 1, 2, 3;
Type tpe = 4;
Scope declarations = 5;
}
StructuralType
represents a structural type specified by its base type tpe
and declarations
. Declarations are modelled by a Scope.
message AnnotatedType {
reserved 2;
repeated Annotation annotations = 3;
Type tpe = 1;
}
AnnotatedType
represents a type tpe
annotated by one or more
Annotations.
message ExistentialType {
reserved 2;
Type tpe = 1;
Scope declarations = 3;
}
ExistentialType
represents a type tpe
existentially quantified over
declarations
. Declarations are modelled by a Scope.
message UniversalType {
reserved 1;
Scope type_parameters = 3;
Type tpe = 2;
}
UniversalType
represents a type tpe
universally quantified over
type_parameters
. Type parameters are modelled by a Scope.
message ByNameType {
Type tpe = 1;
}
ByNameType
represents a signature of a by-name parameter.
message RepeatedType {
Type tpe = 1;
}
RepeatedType
represents a signature of a repeated parameter.
Signature
message Signature {
oneof sealed_value {
ClassSignature class_signature = 1;
MethodSignature method_signature = 2;
TypeSignature type_signature = 3;
ValueSignature value_signature = 4;
}
}
Signature
represents definition signatures. Expression types are modelled with
Type.
The SemanticDB type system is a superset of the type systems of supported languages - currently modelled after the Scala type system [18]. This section describes the model, while Languages elaborates on how language types map onto this model.
message ClassSignature {
Scope type_parameters = 1;
repeated Type parents = 2;
Type self = 3;
Scope declarations = 4;
}
ClassSignature
represents signatures of objects, package objects, classes,
traits and interfaces. Both type parameters and declarations are modelled by a
Scope. self
represents an optional self-type [99].
message MethodSignature {
Scope type_parameters = 1;
repeated Scope parameter_lists = 2;
Type return_type = 3;
}
MethodSignature
represents signatures of methods (including getters and
setters), constructors and macros. It features type_parameters
,
parameter_lists
and a return_type
. Both type parameters and parameters are
modelled by Scopes. Moreover, in order to support multiple parameter
lists in Scala methods, parameter_lists
is a list of lists.
message TypeSignature {
Scope type_parameters = 1;
Type lower_bound = 2;
Type upper_bound = 3;
}
TypeSignature
represents signatures of type parameters or type members. It
features type_parameters
as well as lower_bound
and upper_bound
. Type
parameters are modelled by a Scope.
message ValueSignature {
Type tpe = 1;
}
ValueSignature
represents signatures of locals, fields and self parameters. It
encapsulates an underlying type of the definition.
SymbolInformation
message SymbolInformation {
reserved 2, 6, 7, 8, 9, 10, 11, 12, 14, 15;
string symbol = 1;
Language language = 16;
Kind kind = 3;
int32 properties = 4;
string display_name = 5;
Signature signature = 17;
repeated Annotation annotations = 13;
Access access = 18;
repeated string overridden_symbols = 19;
Documentation documentation = 20;
}
"Symbols" is a section of a TextDocument that stores information about Symbols that are defined in the underlying document. In a sense, this section is analogous to a symbol table [19] in a compiler.
language
. Language that defines the corresponding definition.
kind
. Enumeration that defines the kind of the corresponding definition. See
Languages for information on how definitions in supported
languages map onto these kinds.
Value | Name | Explanation |
19 |
LOCAL |
Local value or variable, e.g. val x = 42 or
var x = 42 inside a method. |
20 |
FIELD |
Field, e.g. int x = 42 . |
3 |
METHOD |
Method, e.g. def x = 42 . |
21 |
CONSTRUCTOR |
Constructor, e.g. (x: Int) or
def this() = this(42) in class C(x: Int) . |
6 |
MACRO |
Macro, e.g. def m = macro impl . |
7 |
TYPE |
Type member, e.g. type T <: Int or type T = Int . |
8 |
PARAMETER |
Parameter, e.g. x in class C(x: Int) . |
17 |
SELF_PARAMETER |
Self parameter, e.g. self in class C { self => ... } . |
9 |
TYPE_PARAMETER |
Type parameter, e.g. T in class C[T](x: T) . |
10 |
OBJECT |
Object, e.g. object M . |
11 |
PACKAGE |
Package, e.g. package p . |
12 |
PACKAGE_OBJECT |
Package object, e.g. package object p . |
13 |
CLASS |
Class, e.g. class C . |
14 |
TRAIT |
Trait, e.g. trait T . |
18 |
INTERFACE |
Interface, e.g. interface I . |
properties
. Bitmask of miscellaneous bits of metadata. See
Languages for information on how definitions in supported
languages map onto these properties.
Value | Name | Explanation |
0x1 |
RESERVED |
Reserved for backward compatibility. |
0x2 |
RESERVED |
Reserved for backward compatibility. |
0x4 |
ABSTRACT |
Has an abstract modifier, or is effectively abstract,
i.e. is an abstract value, variable, method or type? |
0x8 |
FINAL |
Has a final modifier, or is effectively final,
i.e. is an object or a package object? |
0x10 |
SEALED |
Has a sealed modifier? |
0x20 |
IMPLICIT |
Has an implicit modifier? |
0x40 |
LAZY |
Has a lazy modifier? |
0x80 |
CASE |
Has a case modifier? |
0x100 |
COVARIANT |
Has a covariant (+ ) modifier? |
0x200 |
CONTRAVARIANT |
Has a contravariant (- ) modifier? |
0x400 |
VAL |
Is a val (local value, member value or
val parameter of a constructor)? |
0x800 |
VAR |
Is a var (local variable, member variable or
var parameter of a constructor)? |
0x1000 |
STATIC |
Is a static field, method or class? |
0x2000 |
PRIMARY |
Is a primary constructor? |
0x4000 |
ENUM |
Is an enum field or class? |
0x8000 |
DEFAULT |
Is a default parameter or a default method? |
0x10000 |
GIVEN |
Is a given instance (e.g. given x: Int = ... or using x: Int )? |
0x20000 |
INLINE |
Has an inline modifier? |
0x40000 |
OPEN |
Has an open modifier? |
0x80000 |
TRANSPARENT |
Has a transparent modifier? |
0x100000 |
INFIX |
Has a infix modifier? |
0x200000 |
OPAQUE |
Has an opaque modifier? |
display_name
. Display name of the definition, i.e. how this definition should
be presented to humans, e.g. in code browsers or integrated development
environments.
This is not necessarily the same as the symbol name of the corresponding Symbol. See Languages for more information on which definitions have which display and symbol names in supported languages.
signature
. Signature that represents the definition signature.
See Languages for more information on which definitions have which
signatures in supported languages.
annotation
. Annotations of the corresponding definition.
access
. Access modifier of the corresponding definition.
overridden_symbols
. list of symbols that this symbol overrides.
documentation
. Documentation and the format that are associated with a symbol.
Documentation
message Documentation {
enum Format {
HTML = 0;
MARKDOWN = 1;
JAVADOC = 2;
SCALADOC = 3;
KDOC = 4;
}
string message = 1;
Format format = 2;
}
Documentation
represents the documentation associated with a Symbol. The format
fields
specifies the format of the text stored in message
.
Annotation
message Annotation {
Type tpe = 1;
}
Annotation
represents annotations. See Languages for information
on how annotations in supported languages map onto this data structure.
Access
message Access {
oneof sealed_value {
PrivateAccess private_access = 1;
PrivateThisAccess private_this_access = 2;
PrivateWithinAccess private_within_access = 3;
ProtectedAccess protected_access = 4;
ProtectedThisAccess protected_this_access = 5;
ProtectedWithinAccess protected_within_access = 6;
PublicAccess public_access = 7;
}
}
message PrivateAccess {
}
message PrivateThisAccess {
}
message PrivateWithinAccess {
string symbol = 1;
}
message ProtectedAccess {
}
message ProtectedThisAccess {
}
message ProtectedWithinAccess {
string symbol = 1;
}
message PublicAccess {
}
Access
represents access modifiers of definitions, including private
and
protected
, as well as variants: 1) limited to the current object instance,
and 2) limited to the given symbol
. See Languages for
information on how access modifiers in supported languages map onto this data
structure.
SymbolOccurrence
message SymbolOccurrence {
Range range = 1;
string symbol = 2;
Role role = 3;
}
"Occurrences" is a section of a TextDocument that represents the results of name resolution for identifiers in the underlying code snippet.
SymbolOccurrence
refers to a Range in the
TextDocument and has a symbol as explained in
Symbol. role
is an enumeration that describes the semantic role
that the identifier performs in the code.
Value | Name | Explanation |
1 |
REFERENCE |
Reference, e.g. y in val x = y . |
2 |
DEFINITION |
Definition, e.g. x in val x = y . |
See Languages for information on how language features in supported languages map onto this data structure.
Diagnostic
message Diagnostic {
Range range = 1;
Severity severity = 2;
string message = 3;
}
"Diagnostics" is a section of a TextDocument that stores diagnostic messages produced by compilers, linters and other developer tools.
Diagnostic
in SemanticDB directly correspond to Diagnostic
in LSP
[2]. It has a Range, a severity and an associated message. If
the severity is unknown, it is up to the consumer to interpret diagnostics as
error, warning, information or hint.
Value | Name | Explanation |
1 |
ERROR |
Error. |
2 |
WARNING |
Warning. |
3 |
INFORMATION |
Information. |
4 |
HINT |
Hint. |
Synthetic
message Synthetic {
Range range = 1;
Tree tree = 2;
}
"Synthetics" is a section of a TextDocument that stores trees added by compilers that do not appear in the original source. Examples include inferred type arguments, implicit parameters, or desugarings of for loops.
Synthetic
models one of these synthetics as a transformation of a piece of the
original source file to a synthetic AST that may still use quotes of the
original source. The piece of the source file is given as a Range, and
the new synthetic AST is given as a Tree.
Tree
message Tree {
oneof sealed_value {
ApplyTree apply_tree = 1;
FunctionTree function_tree = 2;
IdTree id_tree = 3;
LiteralTree literal_tree = 4;
MacroExpansionTree macro_expansion_tree = 5;
OriginalTree original_tree = 6;
SelectTree select_tree = 7;
TypeApplyTree type_apply_tree = 8;
}
}
A Tree
represents a typed abstract syntax tree. The trees are similar to
Scalameta and Rsc trees, except for OriginalTree
, which represents a quote of
the original source file. We only support a small subset of Scala syntax
necessary to model Synthetics. At the moment, we do not have plans
to add more trees.
message ApplyTree {
Tree function = 1;
repeated Tree arguments = 2;
}
An ApplyTree
represents a method application.
message FunctionTree {
repeated IdTree parameters = 1;
Tree body = 2;
}
A FunctionTree
represents a function literal with parameter declarations and a
body.
message IdTree {
string symbol = 1;
}
An IdTree
represents a reference to a Symbol in an identifier.
message LiteralTree {
Constant constant = 1;
}
A LiteralTree
represents a Constant literal.
message MacroExpansionTree {
Tree before_expansion = 1;
Type tpe = 2;
}
A MacroExpansionTree
represents a macro expansion. The before_expansion
can
be an OriginalTree
(expansion of original code) or any other Tree
(expansion
of synthetic code).
message OriginalTree {
Range range = 1;
}
An OriginalTree
represents a quote from the text of the enclosing
TextDocument
, given as the range of that quote from the original text. These
represent trees that have direct correspondents with the original source file.
message SelectTree {
Tree qualifier = 1;
IdTree id = 2;
}
A SelectTree
represents a method or field selection on a qualifier.
message TypeApplyTree {
Tree function = 1;
repeated Type type_arguments = 2;
}
A TypeApplyTree
represents the type application of a method, providing that
method with type arguments.
Data Schemas
Protobuf
Languages
In this section, we describe language-dependent SemanticDB entities, i.e. symbols, types, symbol informations, annotations, access modifiers and symbol occurrences:
Notation
We use a concise notation to describe SemanticDB entities. In this notation,
M(v1, v2, ...)
corresponds a Protocol Buffers message M
with fields set to
values v1
, v2
, etc. Literals correspond to scalar values,
List(x1, x2, ...)
corresponds to repeated values, None
corresponds to
missing optional values. Moreover, <X>
corresponds to an entity that
represents X
.
Scala
In this section, we exhaustively map Scala language features onto SemanticDB. As a reference, we use the Scala Language Specification [17] (referred to as "SLS" in the text below), as well as additional resources [25, 40, 51, 56, 57] in the areas where SLS is incomplete or out of date.
Symbol
In this section, we describe the Scala symbol format, but don't cover the details of how Scala definitions map onto symbols (e.g. which symbols are created for which Scala definitions, what their metadata is, etc). See SymbolInformation for more information about that.
Symbols | Format |
Global symbols ↑ | |
Local symbols ↑ |
Concatenation of local and an implementation-dependent
suffix that doesn't contain slashes (`/`) and semicolons (`;`).
|
Owner is:
- For root package,
None
. - For empty package, root package.
- For top-level package, root package.
- For other package, parent package.
- For package object, its associated package.
- For other top-level definition, its package.
- For other global definition, the innermost enclosing definition, i.e. the definition whose Location in source code most tightly encloses the Location of the original definition.
- For other definition,
None
.
Descriptor is:
- For
LOCAL
, unsupported. - For
PACKAGE
, concatenation of its symbol name and a forward slash (/
). - For
OBJECT
orPACKAGE_OBJECT
, concatenation of its symbol name and a dot (.
). - Exceptionally, for
VAL
METHOD
, concatenation of its symbol name and a dot (.
). - For other
METHOD
,CONSTRUCTOR
, orMACRO
, concatenation of its symbol name, a disambiguator and a dot (.
). - For
TYPE
,CLASS
orTRAIT
, concatenation of its symbol name and a pound sign (#
). - For
PARAMETER
, concatenation of a left parenthesis ((
), its symbol name and a right parenthesis ()
). - For
SELF_PARAMETER
, unsupported. - For
TYPE_PARAMETER
, concatenation of a left bracket ([
), its symbol name and a right bracket (]
). - See SymbolInformation for details on which Scala definitions are modelled by which symbols.
Disambiguator is:
- Concatenation of a left parenthesis (
(
), a tag and a right parenthesis ()
). If the definition is not overloaded, the tag is empty. Two definitions are overloaded if they have the same name and both require a disambiguator. If the definition is overloaded, the tag is computed from the order of appearance of overloads in the source code (see "Function declarations and definitions" below for an example):- Empty string for the definition that appears first.
+1
for the definition that appears second.+2
for the definition that appears third.- ...
Symbol name is:
- For root package,
_root_
. - For empty package,
_empty_
. - For package object,
package
. - For constructor,
<init>
. - For anonymous definition, implementation-dependent name.
- For other definition, the name of the binding introduced by the definition [70]. If the name is not a Java identifier [22], it is wrapped in backticks.
For example, this is how some of the definitions from the Scala standard library must be modelled:
- The
scala
package:scala/
- The
Int
class:scala/Int#
- The
def implicitly[T](implicit e: T)
method:scala/Predef.implicitly().
- The
e
parameter of that method:scala/Predef.implicitly().(e)
- The
T
type parameter of that method:scala/Predef.implicitly().[T]
- The
def contains[A: Ordering](tree: Tree[A, _], x: A): Boolean
method:scala/collection/immutable/RedBlackTree#contains().
Type
reserved 1, 3, 4, 5, 6, 11, 12, 15, 16;
oneof sealed_value {
TypeRef typeRef = 2;
SingleType singleType = 20;
ThisType thisType = 21;
SuperType superType = 22;
ConstantType constantType = 23;
IntersectionType intersectionType = 17;
UnionType unionType = 18;
WithType withType = 19;
StructuralType structuralType = 7;
AnnotatedType annotatedType = 8;
ExistentialType existentialType = 9;
UniversalType universalType = 10;
ByNameType byNameType = 13;
RepeatedType repeatedType = 14;
}
In Scala, Type represents value types [18]. Non-value types [18] are modelled with a separate data structure called Signature (see below for examples).
In the examples below:
E
is the lexically enclosing class of the location where the example types are defined or computed.C
is a class that extends a traitM
.T
,T1
,T2
, etc are type aliases.t
is a type parameter.p
andx
are local values.scala
is thescala
package from the Scala standard library.Int
,List
,TupleN
andFunctionN
are classes from the Scala standard library.@ann1
,@ann2
, etc are annotations.M1
,M2
, etc are members.
Category | Examples |
Singleton types [24, 25] |
|
Type projections [26] |
|
Type designators [27] |
|
Parameterized types [28] |
|
Tuple types [29] |
|
Annotated types [30] |
|
Compound types [31] |
|
Infix types [32] |
|
Function types [33] |
|
Existential types [34] |
|
Notes:
- We diverge from SLS by having different data structures to model value types (Type) and non-value types (Signature).
- We diverge from SLS on the matter of handling prefixes (see definitions of
TypeRef
andSingleType
for more information).- In SLS, all types that can have a prefix must have it specified explicitly,
even if the prefix is trivial. For example in Scalac,
Int
must be represented asTypeRef(<scala.this.type>, <Int>, List())
[27]. - In SemanticDB, all types that have a trivial prefix must not have it
specified explicitly. For example in SemanticDB,
Int
must be represented asTypeRef(None, <Int>, List())
. Moreover, evenscala/Int
must be represented asTypeRef(None, <Int>, List())
. - By a trivial prefix, we mean either empty prefix (for definitions that
aren't members of any other definition, e.g. parameters or type parameters)
or
ThisType
of the enclosing class, trait, interface, object, package object or package (for all other definitions), as well as types equivalent to them.
- In SLS, all types that can have a prefix must have it specified explicitly,
even if the prefix is trivial. For example in Scalac,
- We leave the mapping between type syntax written in source code and
Type
entities deliberately unspecified. Some producers may transform types in unspecified ways (e.g. Scalac transforms allthis.type
types into qualifiedX.this.type
types), and our experience [38] shows that reverse engineering these transformations is very hard. We may improve on this in the future, but this is highly unlikely. In the meanwhile, use Occurrences for figuring out semantics of syntax written in source code.
Signature
In Scala, Signature represents definition signatures, which also includes non-value types [18]. See below to learn which Scala definitions have which signatures.
SymbolInformation
message SymbolInformation {
reserved 2, 6, 7, 8, 9, 10, 11, 12, 14, 15;
string symbol = 1;
Language language = 16;
Kind kind = 3;
int32 properties = 4;
string display_name = 5;
Signature signature = 17;
repeated Annotation annotations = 13;
Access access = 18;
Documentation documentation = 20;
}
Field | Explanation |
symbol |
See Symbol. |
language |
SCALA . |
kind |
Explained below on per-definition basis. |
properties |
Explained below on per-definition basis. |
display_name |
Explained below on per-definition basis. |
signature |
Explained below on per-definition basis. |
annotations |
Explained below on per-definition basis. |
access |
Explained below on per-definition basis. |
overridden_symbols |
List of symbols this symbol overrides. See Overriding |
documentation |
Always empty. Not supported by semanticdb-scalac and metacp. |
Value declarations and definitions [39] are represented by multiple symbols, with the exact number of symbols, their kinds, properties, signatures and access modifiers dependent on the corresponding value:
- Local symbol of kind
LOCAL
is created for all local values. - Getter symbol of kind
METHOD
is created for all member values to model the getter method associated with the corresponding member value. - Parameter symbol of kind
PARAMETER
is created forval
parameters of primary constructors to model the corresponding constructor parameter.
abstract class C(val xp: Int) {
val xm: Int = ???
val xam: Int
private[this] val xlm: Int = ???
def m = {
val xl: Int = ???
type S = { val xs: Int }
type E = xe.type forSome { val xe: AnyRef }
}
}
Definition | Symbol | Kind | Signature |
xp |
_empty_/C#xp(). |
METHOD |
MethodSignature(List(), List(), TypeRef(None, <Int>, List())) |
xp |
_empty_/C#`<init>`().(xp) |
PARAMETER |
ValueSignature(TypeRef(None, <Int>, List())) |
xm |
_empty_/C#xm(). |
METHOD |
MethodSignature(List(), List(), TypeRef(None, <Int>, List())) |
xam |
_empty_/C#xam(). |
METHOD |
MethodSignature(List(), List(), TypeRef(None, <Int>, List())) |
xlm |
_empty_/C#xlm(). |
METHOD |
MethodSignature(List(), List(), TypeRef(None, <Int>, List())) |
xl |
local0 |
LOCAL |
ValueSignature(TypeRef(None, <Int>, List())) |
xs |
local1 |
METHOD |
MethodSignature(List(), List(), TypeRef(None, <Int>, List())) |
xe |
local2 |
METHOD |
ValueSignature(TypeRef(None, <Int>, List())) |
Notes:
- As described in SLS [39], there are some language constructs that are
desugared into values. For these language constructs, symbols are created from
desugared values.
val p = e
(symbols are created for bound variablesx1
, ...,xm
that are defined inp
in order of their appearance in source code; symbols are NOT created for the synthetic value used in the desugaring).val x1, ..., xn: T
(symbols are created forx1
, ...,xn
in order of their appearance in source code).val p1, ..., pn = e
(symbols are created for bound variablesx1
, ...,xm
that are defined in patternsp1
, ...,pn
in order of their appearance in source code).val p1, ..., pn: T = e
(symbols are created for bound variablesx1
, ...,xm
that are defined in patternsp1
, ...,pn
in order of their appearance in source code).
- Supported properties for value symbols are:
ABSTRACT
: set for all corresponding symbols of value declarations.FINAL
: set for all corresponding symbols offinal
values.IMPLICIT
:- If a corresponding parameter symbol exists, set for the parameter symbol.
- If a corresponding getter symbol exists, set for the getter symbol.
- If a corresponding local symbol exists, set for the local symbol.
LAZY
: set for all corresponding symbols oflazy
values.VAL
: set for all corresponding symbols.
- Display name for value symbols is equal to the name of the binding introduced by the definition [70].
- If the type of the value is not provided in source code, it is inferred from the right-hand side of the value according to the rules described in SLS [39]. Corresponding signature is computed from the inferred type as explained in Type.
- Depending on their meta annotations, value annotations may end up as
Annotation
entities associated with multiple corresponding symbols. See [40] for more information. - Supported access modifiers for value symbols are:
PrivateAccess
: set for getters ofprivate
values.PrivateThisAccess
: set for vals of value members.PrivateWithinAccess
: set for getters ofprivate[...]
values.ProtectedAccess
: set for getters ofprotected
values.ProtectedThisAccess
: set for getters ofprotected[this]
values.ProtectedWithinAccess
: set for getters ofprotected[...]
values.
Variable declarations and definitions [41] are represented by multiple symbols, with the exact number of symbols, their kinds, properties, signatures and access modifiers dependent on the corresponding value:
- Local symbol of kind
LOCAL
is created for all local variables. - Getter and setter symbols of kind
METHOD
are created for all member variables to model the getter and setter methods associated with the corresponding member variable. - Parameter symbol of kind
PARAMETER
is created forvar
parameters of primary constructors to model the corresponding constructor parameter.
Notes:
- Variable symbols are modelled exactly the same as value symbols (see "Value symbols and declarations"), with the exceptions described below.
- Setter symbols have the following metadata:
kind
:METHOD
.properties
: see below.display_name
: concatenation of the display name of the variable and_=
.signature
:MethodSignature(List(), List(List(<x$1>)), <Unit>)
, wherex$1
is aPARAMETER
symbol havingsignature
equal to the type of the variable.annotations
andaccess
: same as value symbols.
- Supported properties for variable symbols are:
ABSTRACT
: set for all corresponding symbols of variable declarations.FINAL
: set for all corresponding symbols offinal
variables.IMPLICIT
:- If a corresponding parameter symbol exists, set for the parameter symbol.
- If a corresponding getter symbol exists, set for the getter symbol.
- If a corresponding local symbol exists, set for the local symbol.
LAZY
: never set for variable symbols, since variable declarations and definitions cannot belazy
.VAR
: set for all corresponding symbols.
Pattern variables [65] are represented differently depending on where they are defined:
- Local symbol is created for pattern variables in pattern matching expressions [66].
- A combination of local, field, getter and setter symbols is created for pattern variables in pattern definitions [39].
class C {
??? match { case List(x) => ??? }
val List(xval) = ???
var List(xvar) = ???
}
Definition | Symbol | Kind | Signature |
x |
local0 |
LOCAL |
ValueSignature(TypeRef(None, <Nothing>, List())) |
xval |
_empty_/C#xval(). |
METHOD |
MethodSignature(List(), List(), TypeRef(None, <Nothing>, List())) |
xvar |
_empty_/C#xvar(). |
METHOD |
MethodSignature(List(), List(), TypeRef(None, <Nothing>, List())) |
xvar |
_empty_/C#xvar_=(). |
METHOD |
MethodSignature(List(), List(<x$1>), TypeRef(None, <Unit>, List())) |
Notes:
- In the future, we may decide to introduce a dedicated symbol kind for regular pattern variables, so that they can be distinguished from local value definitions.
- Pattern variable symbols don't support any properties.
- Display name for pattern variable symbols is equal to the name of the binding introduced by the definition [70].
- Pattern definitions [39] do not exist as a first-class language feature. Instead, they are desugared into zero or more synthetic value definitions and only then modelled as symbols as described in "Value declarations and definitions" and "Variable declarations and definitions".
- Pattern variable symbols don't support any access modifiers.
Type declarations and type aliases [42] are represented with TYPE
symbols.
class C {
type T1 <: Hi
type T2 >: Lo
type T = Int
}
Definition | Symbol | Kind | Signature |
T1 |
_empty_/C#T1# |
TYPE |
TypeSignature(List(), None, <Hi>) |
T2 |
_empty_/C#T2# |
TYPE |
TypeSignature(List(), <Lo>, None) |
T |
_empty_/C#T# |
TYPE |
TypeSignature(List(), TypeRef(None, <Int>, List()), TypeRef(None, <Int>, List())) |
Notes:
- Supported properties for type symbols are:
ABSTRACT
: set for type declarations.FINAL
: set forfinal
type aliases.
- Display name for type symbols is equal to the name of the binding introduced by the definition [70].
- We leave the mapping between type syntax written in source code and
Type
entities deliberately unspecified. For example, a producer may represent the signature ofT1
asTypeSignature(List(), <Nothing>, <Hi>)
. See Types for more information. - If present, type parameters of type declarations and type aliases are represented as described below in order of their appearance in source code.
- Type symbols support all Scala access modifiers.
Type variables [67] are represented with TYPE
symbols.
class C {
??? match { case _: List[t] => }
}
Definition | Symbol | Kind | Signature |
t |
local0 |
TYPE |
TypeSignature(List(), None, None) |
Notes:
- In the future, we may decide to introduce a dedicated symbol kind for type variables, so that they can be distinguished from local type definitions.
- Type variable symbols are always
ABSTRACT
.a - Display name for type variable symbols is equal to:
- The name of the binding introduced by the definition [70].
- Except, in case of anonymous type variables via the
_
syntax,_
.
- We leave the mapping between type syntax written in source code and
Type
entities deliberately unspecified. For example, a producer may represent the signature oft
asTypeSignature(List(), <Nothing>, <Any>)
. See Types for more information. - Type variable symbols don't support any access modifiers.
Self parameters [64] are represented with SELF_PARAMETER
symbols.
class C1 {
self1 =>
}
class C2 {
self2: T =>
}
Definition | Symbol | Kind | Signature |
self1 |
local0 |
SELF_PARAMETER |
ValueSignature(TypeRef(None, <C1>, List())) |
self2 |
local0 |
SELF_PARAMETER |
ValueSignature(TypeRef(None, <T>, List())) |
Notes:
- Self parameters cannot be referenced outside the document where they are located, which means that they are represented by local symbols.
- Self parameter symbols don't support any properties.
- Display name for self parameter symbols is equal to:
- The name of the binding introduced by the definition [70].
- Except, in case of anonymous self parameters via
_: T =>
,this: T =>
or corresponding typeless syntaxes,_
.
- Self parameter symbols don't support any access modifiers.
Type parameters [43] are represented with TYPE_PARAMETER
symbols.
class C[T1] {
def m[T2[T3] <: Hi] = ???
type T[T4 >: Lo] = ???
}
Definition | Symbol | Kind | Signature |
T1 |
_empty_/C#[T1] |
TYPE_PARAMETER |
TypeSignature(List(), None, None) |
T2 |
_empty_/C#m()[T2] |
TYPE_PARAMETER |
TypeSignature(List(), None, <Hi>) |
T3 |
_empty_/C#m()[T2][T3] |
TYPE_PARAMETER |
TypeSignature(List(), None, None) |
T4 |
_empty_/C#T#[T4] |
TYPE_PARAMETER |
TypeSignature(List(), <Lo>, None) |
Notes:
- Supported properties for type parameter symbols are:
COVARIANT
: set for covariant type parameters.CONTRAVARIANT
: set for contravariant type parameters.
- If present, (higher-order) type parameters of type parameters are represented as described here in order of their appearance in source code.
- Display name for type parameter symbols is equal to:
- The name of the binding introduced by the definition [70].
- Except, in case of anonymous type variables via the
_
syntax,_
.
- We leave the mapping between type syntax written in source code and
Type
entities deliberately unspecified. For example, a producer may represent the signature ofT1
asTypeSignature(List(), <Nothing>, <Any>)
. See Types for more information. - If present, context bounds and value bounds of type parameters are desugared
into parameters of the enclosing definition as described in [44] and
are represented with corresponding
PARAMETER
symbols.
Parameters are represented with PARAMETER
symbols. (There is no section in
SLS dedicated to parameters, so we aggregate information about parameters from
multiple sections).
class C(p1: Int) {
def m2(p2: Int) = ???
def m3(p3: Int = 42) = ???
def m4(p4: => Int) = ???
def m5(p5: Int*) = ???
def m6[T: C <% V] = ???
}
Definition | Symbol | Kind | Signature |
p1 |
_empty_/C#`<init>`().(p1) |
PARAMETER |
ValueSignature(TypeRef(None, <Int>, List())) |
p2 |
_empty_/C#m2().(p2) |
PARAMETER |
ValueSignature(TypeRef(None, <Int>, List())) |
p3 |
_empty_/C#m3().(p3) |
PARAMETER |
ValueSignature(TypeRef(None, <Int>, List())) |
m3$default$1 |
_empty_/C#m3$default$1(). |
METHOD |
MethodSignature(List(), List(), TypeRef(None, <Int>, List())) |
p4 |
_empty_/C#m4().(p4) |
PARAMETER |
ValueSignature(ByNameType(TypeRef(None, <Int>, List()))) |
p5 |
_empty_/C#m5().(p5) |
PARAMETER |
ValueSignature(RepeatedType(TypeRef(None, <Int>, List()))) |
Context bound | _empty_/C#m6().(x$1) |
PARAMETER |
ValueSignature(TypeRef(None, <C>, List(<T>))) |
View bound | _empty_/C#m7().(x$2) |
PARAMETER |
ValueSignature(TypeRef(None, <Function1>, List(<T>, <V>))) |
Notes:
- As described above, some values and variables are represented with multiple symbols, including parameter symbols. For more information, see "Value declarations and definitions" and "Variable declarations and definitions".
- Supported properties for parameter symbols are:
IMPLICIT
: set forimplicit
parameters, as well as desugared context bounds and view bounds (see above).VAL
: set forval
parameters of primary constructors.VAR
: set forvar
parameters of primary constructors.DEFAULT
: set for parameters with default values.
- Scalac semantic model does not distinguish parameters in
class C(x: Int)
andclass C(private[this] val x: Int)
. As a result, due to implementation restrictionsprivate[this] val
parameters currently don't have theVAL
property. - Display name for parameter symbols is equal to:
- The name of the binding introduced by the definition [70].
- Except, in case of anonymous type variables via the
_
syntax,_
.
- Unlike some other metaprogramming systems for Scala, we do not distinguish regular parameters from parameters with default arguments [45]. However, we do create method symbols for synthetic methods that compute default arguments with names and signatures defined by [45].
- Signatures of by-name parameters [46] and repeated parameters
[47] are represented with special types (
ByNameType
andRepeatedType
correspondingly). - According to [44], context bounds and view bounds are desugared as parameters of enclosing definitions. Since SLS does not specify the names for such parameters (only their signatures), we also leave the names unspecified.
Function declarations and definitions [48] are represented with
METHOD
symbols.
abstract class C {
def m1: Int = ???
def m2(): Int = ???
object m3
def m3(x: Int): Int = ???
def m3(x: org.Int): Int = ???
val m4: Int
def m4(x: Int)(y: Int): Int = ???
}
Definition | Symbol | Kind | Signature |
m1 |
_empty_/C#m1(). |
METHOD |
MethodSignature(List(), List(), TypeRef(None, <Int>, List())) |
m2 |
_empty_/C#m2(). |
METHOD |
MethodSignature(List(), List(List()), TypeRef(None, <Int>, List())) |
m3 |
_empty_/C#m3. |
OBJECT |
ClassSignature(List(), List(), None, List()) |
m3 |
_empty_/C#m3(). |
METHOD |
MethodSignature(List(), List(List(<x>)), TypeRef(None, <Int>, List())) |
m3 |
_empty_/C#m3(+1). |
METHOD |
MethodSignature(List(), List(List(<x>)), TypeRef(None, <org.Int>, List())) |
m4 |
_empty_/C#m4. |
METHOD |
ValueSignature(TypeRef(None, <Int>, List())) |
m4 |
_empty_/C#m4(). |
METHOD |
MethodSignature(List(), List(List(<x>), List(<y>)), TypeRef(None, <Int>, List())) |
Notes:
- According to SLS, some language features involve synthetic methods that are
not written in source code. Symbols for synthetic methods must be included in
SemanticDB payloads alongside normal methods. Detailed information about
synthetic methods is provided in various subsections of
SymbolInformation together with related language
features, and here we provide a comprehensive list of such methods:
- Getters for vals and vars.
- Setters for vals and vars.
- Methods that compute default arguments.
- Methods synthesized for
case
classes and objects. - Implicit methods synthesized for
implicit
classes. - Methods synthesized for value classes.
- Supported properties for method symbols are:
ABSTRACT
: set for function declarations.FINAL
: set forfinal
methods.IMPLICIT
: set forimplicit
methods.
- Display name for method symbols is equal to the name of the binding introduced by the definition [70].
- If present, type parameters of methods are represented as described above in order of their appearance in source code.
- If present, parameters of methods are represented as described above in order of their appearance in source code.
- For procedures [49], the return type is assumed to be
Unit
. Corresponding signature is computed using the assumed return type as explained in Type. - If the return type is not provided in source code, it is inferred from the right-hand side of the method according to the rules described in SLS [50]. Corresponding signature is computed using the inferred retyrb type as explained in Type.
- Method symbols support all Scala access modifiers.
- The
OBJECT
symbolm3.
andVAL METHOD
symbolm4.
do not contribute to the disambiguator tag for the method symbolsm3().
,m3(+1).
andm4().
becauseOBJECT
and (exceptionally)VAL METHOD
symbols do not require a disambiguator.
Macro definitions [51] are represented with MACRO
symbols
similarly to function definitions (see above).
object M {
def m: Int = macro impl
}
Definition | Symbol | Kind | Signature |
m1 |
_empty_/M.m(). |
MACRO |
MethodSignature(List(), List(), TypeRef(None, <Int>, List())) |
Notes:
- Supported properties for macro symbols are the same as for method symbols,
except for
ABSTRACT
because macros cannot beabstract
. - Display name for macro symbols is equal to the name of the binding introduced by the definition [70].
- Return type inference for macros is not supported.
- At the moment,
SymbolInformation
for macros does not contain information about corresponding macro implementations. We may improve this in the future. - Macro symbols support all Scala access modifiers.
Constructors [52, 53] are represented with CONSTRUCTOR
symbols
similarly to function definitions (see above).
class C(x: Int) {
def this() = this(42)
}
Definition | Symbol | Kind | Signature |
Primary constructor | _empty_/C#`<init>`(). |
CONSTRUCTOR |
MethodSignature(List(), List(List(<x>)), None) |
Secondary constructor | _empty_/C#`<init>`(+1). |
CONSTRUCTOR |
MethodSignature(List(), List(), None) |
Notes:
- Unlike some other metaprogramming systems for Scala, we do not create synthetic constructor symbols for traits and objects.
- Supported properties for constructor symbols are:
PRIMARY
: set for primary constructors.
- Display name for constructor symbols is equal to
<init>
. - Constructors don't have type parameters and return types, but we still
represent their signatures with
MethodSignature
. In these signatures, type parameters are equal toList()
and the return type isNone
. - Primary constructor parameters with
val
andvar
modifiers give rise to multiple different symbols as described above. - Constructor symbols support all Scala access modifiers.
Class definitions [54] are represented with CLASS
symbols.
class C[T](x: T, val y: T, var z: T) extends B with X { self: Y =>
def m: Int = ???
}
Definition | Symbol | Kind | Signature |
C |
_empty_/C# |
CLASS |
ClassSignature(List(<T>), List(<B>, <X>), <Y>, List(<x>, <y>, <y>, <z>, <z>, <z_=>, <<init>>, <m>)) |
T |
_empty_/C#[T] |
TYPE_PARAMETER |
TypeSignature(List(), None, None) |
x |
_empty_/C#x(). |
METHOD |
ValueSignature(TypeRef(None, <T>, List())) |
y |
_empty_/C#x(). |
METHOD |
MethodSignature(List(), List(), TypeRef(None, <T>, List())) |
z |
_empty_/C#z(). |
METHOD |
MethodSignature(List(), List(), TypeRef(None, <T>, List())) |
z |
_empty_/C#z_=(). |
METHOD |
MethodSignature(List(), List(List(<x$1>)), TypeRef(None, <Unit>, List())) |
z |
_empty_/C#z_=().(x$1) |
PARAMETER |
ValueSignature(TypeRef(None, <T>, List())) |
Primary constructor | _empty_/C#`<init>`(). |
CONSTRUCTOR |
MethodSignature(List(), List(), None) |
x |
_empty_/C#`<init>`().(x) |
PARAMETER |
ValueSignature(TypeRef(None, <T>, List())) |
y |
_empty_/C#`<init>`().(y) |
PARAMETER |
ValueSignature(TypeRef(None, <T>, List())) |
z |
_empty_/C#`<init>`().(z) |
PARAMETER |
ValueSignature(TypeRef(None, <T>, List())) |
m |
_empty_/C#m(). |
METHOD |
MethodSignature(List(), List(), TypeRef(None, <Int>, List())) |
Notes:
- Supported properties for class symbols are:
ABSTRACT
: set forabstract
classes.FINAL
: set forfinal
classes.SEALED
: set forsealed
classes.IMPLICIT
: set forimplicit
classes.CASE
: set forcase
classes.
- Display name for class symbols is equal to the name of the binding introduced by the definition [70].
- We leave the mapping between parent syntax written in source code and
ClassSignature.parents
deliberately unspecified. Some producers are known to insert<AnyRef>
intoparents
under certain circumstances, so we can't guarantee a one-to-one mapping of parent clauses in source code and entities inparents
. We may improve on this in the future. ClassSignature.declarations
must be ordered as follows:- For every parameter of the primary constructor, its field symbol, then its getter symbol, then its setter symbol.
- Symbol of the primary constructor.
- Symbols of declared members in order of their appearance in source code.
(Inherited members must not be part of
declarations
.) - Synthetic symbols in unspecified order and positions in the declarations list. We may provide more structure here in the future.
- In some cases, SLS and its extensions mandate generation of synthetic members and/or companions for certain classes. Symbols for such synthetic definitions must be included in SemanticDB payloads alongside normal definitions. For details, see:
- Class symbols support all Scala access modifiers.
Traits [58] are represented by TRAIT
symbols similarly to class
definitions (see above). Concretely, the differences between trait symbols and
class symbols are:
- Trait symbols only support
SEALED
property. - Traits don't have constructors.
Object definitions [59] are represented by OBJECT
symbols
similarly to class definitions (see above). Concretely, the differences between
object symbols and class symbols are:
- Object symbols are always
FINAL
. - Apart from
FINAL
, object symbols only supportCASE
andIMPLICIT
properties. - Display name for object symbols is equal to the name of the binding introduced by the definition [70].
- Objects don't have type parameters, but we still represent their signatures
with
ClassSignature
. In these signatures, type parameters are equal toList()
. - Objects don't have constructors.
Package objects [60] are represented by PACKAGE_OBJECT
symbols
similarly to object definitions (see above). Concretely, the differences between
package object symbols and object symbols are:
- Package object symbols are always
FINAL
. - Apart from
FINAL
, package object symbols don't support any properties. - Display name for package object symbols is equal to the name of the binding introduced by the definition [70].
- Package objects don't have annotations.
- Package objects don't support any access modifiers.
Packages [61] are not included in the "Symbols" section.
Annotation
message Annotation {
Type tpe = 1;
}
In Scala, Annotation represents annotations [23].
Value | Explanation |
Annotation(<ann>) |
Definition annotation, e.g. @ann def m: T . |
Annotation(<ann>) |
Type annotation, e.g. T @ann . |
Not supported | Expression annotation, e.g. e: @ann . |
- At the moment,
Annotation
can't represent annotation arguments, which means that the annotation in@ann(x, y, z) def m: T
is represented asAnnotation(<ann>)
. We may improve on this in the future. - At the moment, SemanticDB cannot represent expressions, which means that it cannot represent expression annotations as well. We do not plan to add support for expressions in SemanticDB, so it is highly unlikely that expression annotations will be supported in the future.
Access
message Access {
oneof sealed_value {
PrivateAccess privateAccess = 1;
PrivateThisAccess privateThisAccess = 2;
PrivateWithinAccess privateWithinAccess = 3;
ProtectedAccess protectedAccess = 4;
ProtectedThisAccess protectedThisAccess = 5;
ProtectedWithinAccess protectedWithinAccess = 6;
PublicAccess publicAccess = 7;
}
}
message PrivateAccess {
}
message PrivateThisAccess {
}
message PrivateWithinAccess {
string symbol = 1;
}
message ProtectedAccess {
}
message ProtectedThisAccess {
}
message ProtectedWithinAccess {
string symbol = 1;
}
message PublicAccess {
}
In Scala, Access represents access modifiers of definitions.
Access | Code | Explanation |
None |
|
Definitions that can't have access modifiers, i.e. `LOCAL`, `PARAMETER`, `SELF_PARAMETER`, `TYPE_PARAMETER`, `PACKAGE` and `PACKAGE_OBJECT`. Definitions that can have access modifiers, but don't have them will have `PublicAccess` as described below. |
PrivateAccess() |
private def x = ??? |
Can be accessed only from within the directly enclosing template and its companion object or companion class [62]. |
PrivateThisAccess() |
private[this] def x = ??? |
Can be accessed only from within the object in which the definition is defined. [62]. |
PrivateWithinAccess( |
private[X] def x = ??? |
Can be accessed respectively only from code inside the package
X or only from code inside the class X
and its companion object.
[62].
|
ProtectedAccess() |
protected def x = ??? |
Can be accessed from within: 1) the template of the defining class, 2) all templates that have the defining class as a base class, 3) the companion object of any of those classes. [63]. |
ProtectedThisAccess() |
protected[this] def x = ??? |
Can be accessed as protected AND
only from within the object in which the definition is defined.
[63].
|
ProtectedWithinAccess( |
protected[X] def x = ??? |
Can be accessed as protected OR
from code inside the package X or from code inside
the class X and its companion object.
[63].
|
PublicAccess() |
def x = ??? |
None of the above. |
Notes:
- Not all kinds of symbols support all access modifiers. See SymbolInformation for more information.
SymbolOccurrence
message SymbolOccurrence {
Range range = 1;
string symbol = 2;
Role role = 3;
}
There is a Scala compiler plugin that generates SymbolOccurrences for Scala code. The implementation is used at Twitter scale, and it works well - both in terms of handling sizeable codebases and understanding esoteric language constructs and idioms. However, but we do not yet have a specification that comprehensively describes how Scala language features map onto symbol occurrences. We intend to improve on this in the future.
Synthetic
The following is an exhaustive list of the kinds of synthetics that are generated by the Scala compiler plugin, along with examples.
Synthetics generated from represent for loops are quite massive, so we use a
shorthand notation for them. In that notation, OriginalTree trees are
represented by orig(<code>)
and other trees are represented by their syntax.
Category | Source | Synthetic |
Inferred method calls |
|
|
|
|
|
Inferred type applications |
|
|
|
|
|
Implicit parameters |
|
|
Implicit views/conversions |
|
|
Macro expansions |
|
|
For loop desugarings |
|
|
|
|
|
Java
In this section, we exhaustively map Java language features onto SemanticDB. As a reference, we use the Java Language Specification [85] (referred to as "JLS" in the text below) and Java Virtual Machine Specification [91] (referred to as "JVMS" in the text below).
Symbol
In this section, we describe the Java symbol format.
Symbols | Format |
Global symbols ↑ |
|
Local symbols ↑ |
Concatenation of local and an implementation-dependent
suffix that doesn't contain slashes (`/`) and semicolons (`;`).
|
Owner is:
- For root package,
None
. - For unnamed package, root package.
- For top-level named package, root package.
- For other named package, parent package.
- For other top-level definition, its package.
- For other global definition, the innermost enclosing definition, i.e. the definition whose Location in source code most tightly encloses the Location of the original definition.
- For other declarations,
None
.
Descriptor is:
- For
LOCAL
, unsupported. - For
PACKAGE
, concatenation of its symbol name and a forward slash (/
). - For
FIELD
, concatenation of its symbol name and a dot (.
). - For
METHOD
orCONSTRUCTOR
, concatenation of its symbol name, a disambiguator and a dot (.
). - For
CLASS
orINTERFACE
, concatenation of its symbol name and a pound sign (#
). - For
PARAMETER
, concatenation of a left parenthesis ((
), its symbol name and a right parenthesis ()
). - For
TYPE_PARAMETER
, concatenation of a left bracket ([
), its symbol name and a right bracket (]
). - See SymbolInformation for details on which Java definitions are modelled by which symbols.
Disambiguator is:
Concatenation of a left parenthesis (
(
), a tag and a right parenthesis ()
). If the definition is not overloaded, the tag is empty. If the definition is overloaded, the tag is computed depending on where the definition appears in the following order:- non-static overloads first, following the same order as they appear in the original source,
- static overloads secondly, following the same order as they appear in the original source
FIELD
definitions are not included in the list of overloads. Given this order, the tag becomes- Empty string for the definition that appears first.
+1
for the definition that appears second.+2
for the definition that appears third.- ...
See "Class declarations" below for an example.
Symbol name is:
- For root package,
_root_
. - For unnamed package,
_empty_
. - For constructor,
<init>
. - For anonymous definition, implementation-dependent name.
- For other definition, the name of the binding introduced by the definition. If the name is not a Java identifier [22], it is wrapped in backticks.
For example, this is how some of the definitions from the Java standard library must be modelled:
- The
java
package:java/
- The
Integer
class:java/lang/Integer#
- The
int
primitive:scala/Int#
- The
Arrays.asList
method:java/util/Arrays#asList().
- The
a
parameter of that method:java/util/Arrays#asList().(a)
- The
T
type parameter of that method:java/util/Arrays#asList().[T]
Type
reserved 1, 3, 4, 5, 6, 11, 12, 15, 16;
oneof sealed_value {
TypeRef typeRef = 2;
SingleType singleType = 20;
ThisType thisType = 21;
SuperType superType = 22;
ConstantType constantType = 23;
IntersectionType intersectionType = 17;
UnionType unionType = 18;
WithType withType = 19;
StructuralType structuralType = 7;
AnnotatedType annotatedType = 8;
ExistentialType existentialType = 9;
UniversalType universalType = 10;
ByNameType byNameType = 13;
RepeatedType repeatedType = 14;
}
In Java, Type represents types [74].
In the examples below:
byte
,short
and friends are standard primitive types [75].A
is a top-level class.B
is a inner class defined inA
.C
is a top-level class that has multiple type parameters.T
,T1
, ...Tn
are type variables.
Category | Examples |
Primitive types [75] |
|
Reference types [96] |
|
Type variable [78] |
|
Parameterized types [79] |
|
Raw types [95] |
|
Array types [80] |
|
Intersection types [81] |
|
Notes:
- Primitive and array types are converted to their equivalent Scala type representations. We may improve on this in the future.
- Since Java doesn't support path-dependent types, prefixes in type refs are always empty.
Signature
In Java, Signature represents definition signatures. See below to learn which Java definitions have which signatures.
SymbolInformation
message SymbolInformation {
reserved 2, 6, 7, 8, 9, 10, 11, 12, 14, 15;
string symbol = 1;
Language language = 16;
Kind kind = 3;
int32 properties = 4;
string display_name = 5;
Signature signature = 17;
repeated Annotation annotations = 13;
Access access = 18;
Documentation documentation = 20;
}
Field | Explanation |
symbol |
See Symbol. |
language |
JAVA . |
kind |
Explained below on per-definition basis. |
properties |
Explained below on per-definition basis. |
display_name |
Explained below on per-definition basis. |
signature |
Explained below on per-definition basis. |
annotations |
Explained below on per-definition basis. |
access |
Explained below on per-definition basis. |
overridden_symbols |
List of symbols this symbol overrides. See Overriding |
documentation |
Non-empty string and format kind for classes/fields/methods/constructors where a Javadoc comment is available. |
Class declarations [76] are represented by a single symbol with the
CLASS
kind.
package a;
class C extends S1 implements I {
T1 m1;
static T2 m2();
zero.Overload m3;
T3 m3(one.Overload e1);
static T4 m3(two.Overload e2);
T5 m3(three.Overload e3);
static class D1<T6 extends S2 & S3, T7> { }
class D2 { }
}
Definition | Symbol | Kind | Signature |
C |
a/C# |
CLASS |
ClassSignature(List(), List(<S1>, <I>), None, List(<m1>, <m2>, <m3(Overload)>, <m3(Overload+2)>, <m3(Overload+1)>, <D1>, <D2>)) |
m1 |
a/C#m1. |
FIELD |
ValueSignature(TypeRef(None, <T1>, List())) |
m2 |
a/C#m2(). |
METHOD |
MethodSignature(List(), List(), TypeRef(None, <T2>, List())) |
m3 |
a/C#m3. |
FIELD |
ValueSignature(TypeRef(None, <zero.Overload>)) |
m3 |
a/C#m3(). |
METHOD |
MethodSignature(List(), List(<e1>), TypeRef(None, <T3>)) |
e1 |
a/C#m3().(e1) |
PARAMETER |
ValueSignature(TypeRef(None, <one.Overload>, List())) |
m3 |
a/C#m3(+2). |
METHOD |
MethodSignature(List(), List(<e2>) TypeRef(None, <T4>)) |
e2 |
a/C#m3(+2).(e2) |
PARAMETER |
ValueSignature(TypeRef(None, <two.Overload>, List())) |
m3 |
a/C#m3(+1). |
METHOD |
MethodSignature(List(), List(<e3>), TypeRef(None, <T5>)) |
e3 |
a/C#m3(+1).(e3) |
PARAMETER |
ValueSignature(TypeRef(None, <three.Overload>, List())) |
T6 |
a/C#D1#[T6] |
TYPE_PARAMETER |
TypeSignature(List(), None, Some(IntersectionType(List(<S2>, <S3>)))) |
T7 |
a/C#D1#[T7] |
TYPE_PARAMETER |
TypeSignature(List(), None, None) |
D1 |
a/C#D1# |
CLASS |
ClassSignature(List(<T6>, <T7>), List(<java/lang/Object#>), None, List()) |
D2 |
a/C#D2# |
CLASS |
ClassSignature(List(), List(), None, List()) |
Notes:
- A Java class maps to a single symbol with type
ClassSignature
including all static and non-static members. This departs from the Scala compiler internal representation of Java classes where non-static members are grouped under aCLASS
symbol and static members are grouped under anOBJECT
symbol. - The method
m3(+2)
has a higher disambiguator tag number thanm3(+1)
even ifm3(+2)
appears earlier in the source code. This is becausem3(+2)
is static and the disambiguator tag is computed from non-static members first and static members second. The reason for this required order is to ensure that it's possible to compute correct method symbols inside the Scala compiler, which groups static member under anOBJECT
symbol and non-static members under aCLASS
symbol. - Supported properties for
CLASS
symbols areFINAL
set for all final classesABSTRACT
set for all abstract classesSTATIC
set for static inner classesENUM
set for enum types
- Display name for class symbols is equal to the name of the binding introduced by the definition.
- Class declarations support all Java access modifiers.
- Class members without explicit access modifiers have access
PrivateWithinAccess
within the enclosing package. - The disambiguators for
m3()
,m3(+1)
andm3(+2)
do not take into account overloaded fieldm3.
.
Enum declarations [84] are represented by a single symbol with the
CLASS
kind.
package a;
public enum Coin {
PENNY, NICKEL
}
Definition | Symbol | Kind | Signature |
Coin |
a/Coin# |
CLASS |
ClassSignature(List(), List(<Enum<Coin>>), None, List(<PENNY>, <NICKEL>)) |
PENNY |
a/Coin#PENNY. |
FIELD |
ValueSignature(TypeRef(None, <Coin>, List())) |
NICKEL |
a/Coin#NICKEL. |
FIELD |
ValueSignature(TypeRef(None, <Coin>, List())) |
a/Coin#values(). |
METHOD |
MethodSignature(List(), List(), TypeRef(None, <Array>, List(<Coin>))) |
|
a/Coin#valueOf(). |
METHOD |
MethodSignature(List(), List(), TypeRef(None, <Coin>, List())) |
Notes:
- Enum declarations follow the same rules as class declarations.
- Supported properties for enum declarations are:
FINAL
: implicitly set for all enum declarations.STATIC
: implicitly set for all enum declarations.ENUM
: implicitly set for all enum declarations.
- Display name for enum symbols is equal to the name of the binding introduced by the definition.
- JLS mandates the following synthetic members for enum declarations
[86]:
- Enum fields have kind
FIELD
, propertiesFINAL
,STATIC
andENUM
, are named after the corresponding enum constants, have the type of the enum declaration andPublicAccess
access. valueOf
has kindMETHOD
, propertySTATIC
, have a method type that goes from a<String>
parameter to the enum declaration andPublicAccess
access.values
has kindMETHOD
, propertySTATIC
, have a method type that goes from an empty parameter list to an array of the enum declaration andPublicAccess
access.
- Enum fields have kind
- Enum declarations support all Java access modifiers.
Interface declarations [77] are represented by a single symbol like
classes but with the INTERFACE
kind.
package a;
public interface List<T> extends I {
T head();
}
Definition | Symbol | Kind | Signature |
List |
a/List# |
CLASS |
ClassSignature(List(<T>), List(<I>), None, List(<head>)) |
head |
a/List#head(). |
METHOD |
MethodSignature(List(), List(), TypeRef(None, <T>, List())) |
The differences between interface symbols and class symbols are:
- Interfaces do not have constructors.
- Supported properties for interface symbols are:
ABSTRACT
: implicitly set for all interface symbols.
- Display name for interface symbols is equal to the name of the binding introduced by the definition.
- Interface declarations support all Java access modifiers.
- Interface members without explicit access modifiers have access
PublicAccess
by default instead ofPrivateWithinAccess
.
Method declarations [82] are represented by a single symbol with the
METHOD
kind and one symbol for each type parameter with kind TYPE_PARAMETER
and formal parameter with kind PARAMETER
.
package a;
class A {
A m1();
A m2(T1 t1) throws E;
<T2> T2 m3(T2 t2);
}
Definition | Symbol | Kind | Signature |
A |
a/A# |
CLASS |
ClassSignature(List(), List(), None, List(<m1>, <m2>, <m3>)) |
m1 |
a/A#m1(). |
METHOD |
MethodSignature(List(), List(), TypeRef(None, <A>, List())) |
m2 |
a/A#m2(). |
METHOD |
MethodSignature(List(), List(<t1>), TypeRef(None, <A>, List())) |
t1 |
a/A#m2().(t1) |
PARAMETER |
ValueSignature(TypeRef(None, <T1>, List())) |
m3 |
a/A#m3(). |
METHOD |
MethodSignature(List(<T2>), List(<t2>), TypeRef(None, <T2>, List())) |
m3 |
a/A#m3().[T2] |
TYPE_PARAMETER |
TypeSignature(List(), None, None) |
t2 |
a/A#m3().(t2) |
PARAMETER |
ValueSignature(TypeRef(None, <T2>, List())) |
Notes:
- For type bounds of type parameters, we leave the mapping between type syntax
written in source code and
Type
entities deliberately unspecified. For example, a producer may represent the signature ofT2
asTypeSignature(List(), None, <Object>)
instead ofTypeSignature(List(), None, None)
. - When compiled with the compiler option
-parameters
, both display and symbol names of method parameters match their names written in source. Otherwise, parameters have both display and symbol namesparamN
whereN
is the index of that given parameter starting at index 0. - Variable arity parameters have the type equals to
RepeatedType(<tpe>)
, where<tpe>
is their type as declared in original source. - Method throws clauses are not modelled in SemanticDB. We may improve on this in the future.
- Supported properties for method symbols are:
FINAL
: set forfinal
methods.STATIC
: set forstatic
methods.ABSTRACT
: set forabstract
methods.DEFAULT
: set fordefault
methods.
- Display name for method symbols is equal to the name of the binding introduced by the definition.
- Method declarations support all Java access modifiers, however
method declarations in interfaces can only be
PublicAccess
.
Field declarations [83] are represented by a single symbol with the
FIELD
kind.
package a;
class A {
A field;
}
Definition | Symbol | Kind | Signature |
A |
a/A# |
CLASS |
ClassSignature(List(), List(), None, List(<field>)) |
field |
a/A#field. |
FIELD |
ValueSignature(TypeRef(None, <A>, List())) |
Notes:
- Supported properties for field symbols are:
FINAL
: set forfinal
fields and interface fields.STATIC
: set forstatic
fields and interface fields.
- Display name for field symbols is equal to the name of the binding introduced by the definition.
- Field declarations support all Java access modifiers. However,
field declarations in interfaces can only be
PublicAccess
.
Constructor declarations [90] are represented by a single symbol
with display and symbol name <init>
and the CONSTRUCTOR
kind. Constructor
formal parameters are represented the same way as method declaration formal
parameters.
package a;
class Outer {
Outer() {}
class Inner {
Inner() {}
}
}
Definition | Symbol | Kind | Signature |
Outer |
a/Outer# |
CLASS |
ClassSignature(List(), List(), None, List(<a.Outer#<init>, <Inner>)) |
Constructor of Outer |
a/Outer#<init>(). |
CONSTRUCTOR |
MethodSignature(List(), List(), None) |
Inner |
a/Outer#Inner# |
CLASS |
ClassSignature(List(), List(), None, List(<a.Outer#Inner#<init>)) |
Constructor of Inner |
a/Outer#Inner#<init>(). |
CONSTRUCTOR |
MethodSignature(List(), List(), None) |
Notes:
- Constructors don't have type parameters and return types, but we still
represent their signatures with
MethodSignature
. In these signatures, type parameters are equal toList()
and the return type isNone
. - Constructor declarations support no properties.
- Display name for constructor symbols is equal to
<init>
. - Constructor declarations support all Java access modifiers.
Packages [94] are not included in the "Symbols" section.
Root package
The root package is a synthetic package that does not exist in the JLS but has an equivalent in the SLS [20]. The root package is the owner of all unnamed and all top-level named packages. The motivation to define a root package for the Java language is to keep consistency with how package owners are modelled in Scala symbols.
Annotation
message Annotation {
Type tpe = 1;
}
In Java, Annotation represents access_flags
in the JVMS class
file format [92] but not the actual annotations [93]. We may
improve on this in the future.
Value | Explanation |
Annotation(TypeRef(None, <scala/annotation/strictfp>, List())) |
Declared strictfp ; floating-point mode is FP-strict e.g. strictfp class MyClass . Note that this is the default mode as of JDK17, so this deprecated annotation is no longer emitted for JDK17+ bytecode.
|
Not supported | JLS annotations [93] |
Access
message Access {
oneof sealed_value {
PrivateAccess privateAccess = 1;
PrivateThisAccess privateThisAccess = 2;
PrivateWithinAccess privateWithinAccess = 3;
ProtectedAccess protectedAccess = 4;
ProtectedThisAccess protectedThisAccess = 5;
ProtectedWithinAccess protectedWithinAccess = 6;
PublicAccess publicAccess = 7;
}
}
message PrivateAccess {
}
message PrivateThisAccess {
}
message PrivateWithinAccess {
string symbol = 1;
}
message ProtectedAccess {
}
message ProtectedThisAccess {
}
message ProtectedWithinAccess {
string symbol = 1;
}
message PublicAccess {
}
In Java, Access represents access control [87] of names.
Access | Code | Explanation |
None |
|
Definitions that can't have access modifiers, i.e. `LOCAL`, `PARAMETER`, `TYPE_PARAMETER` and `PACKAGE`. Definitions that can have access modifiers, but don't have them will have `PrivateWithinAccess` or `PublicAccess` as described below. |
PrivateAccess |
private F f; |
Can be accessed only from within the directly enclosing class. [88]. |
PrivateWithinAccess( |
package x; class A {} |
A class, interface, class member or constructor declared without an access modifier is implicitly private within the package in which is declared. [88]. |
ProtectedAccess() |
protected F f; |
A protected member of constructor of an object can be accessed from within: 1) the enclosing class, 2) all classes that are responsible for the implementation of that object. [89]. |
PublicAccess() |
public F f; |
Can be accessed from from any code provided that the compilation unit in which it is declared is observable. Packages are always implicitly public. Members of interfaces lacking interface modifiers are implicitly public. Other members are public only if explicitly declared `public`. [88]. |
Notes:
PrivateThisAccess
,ProtectedThisAccess
,ProtectedWithinAccess
are not supported in the Java language.
SymbolOccurrence
At this moment, there is no tool that supports SymbolOccurrences for the Java language. We intend to improve on this in the future.
Synthetic
At this moment, there is no tool that supports Synthetic for the Java language. We may improve on this in the future.