Contents

The Parsing Stage

In this chapter, we will see the details of the parsing of the scenario. Parsing of a modern programming language is an incredibly complex task. And this task is far outside of the reach of this article. If you need to parse multiple languages and formats, the complexity of this task can easily outweigh the joy from the project.

To avoid unnecessary complexity and to facilitate parsing, we will be using a tool called ANTLR4[1]. What is especially good about this tool is that it comes with already available grammar for most of the modern[2]. Those grammars are contributed and supported by people who are highly proficient with the languages in question.

We will employ two grammar files for our task at hand. One is for Kotlin language[3], and another is our custom grammar, which allows us to separate the code from the comments. If we would like to add the support for languages, other than Kotlin in the future, we always can do this by expanding the pool of available grammar.

I will not discuss the creation of the grammars and how to use ANTLR4 because it is outside of the scope of this article. For simplicity, we can think about it as if some tool will convert the provided language constructs into the classes, available for consumption in the language of our choice – in our case – Kotlin.

Let us start with a first step – parsing of the scenario file, passed from the input arguments.

com.blaster.business.InteractorParse::parseScenario

This call will convert a scenario file into a list of nodes. The parameters are self explanatory.

fun parseScenario(root: File, sourceUrl: URL, scenario: File): List<Node> {

First operation of this method is to convert text in the scenario file into a distinct nodes. Paragraphs are separated by the new lines

    val nodes = scenario.readText().lines().map { NodeText(it) }

The next task is to apply common procedures for the nodes: identification and application of commands, identification of the structures

    return renderNodes(root, sourceUrl, nodes)

com.blaster.business.InteractorParse::renderNodes

private fun renderNodes(root: File, sourceUrl: URL, nodes: List<Node>): List<Node> {
    val withCommands = interactorCommands.identifyCommands(root, sourceUrl, nodes)
    val commandsApplied = interactorCommands.applyCommands(root, sourceUrl, withCommands)
    return interactorStructs.identifyStructs(commandsApplied)
}
}

The scenario will produce a list of nodes and commands, which were identified. Let’s see how the identification of the commands works.

com.blaster.business.InteractorCommands::identifyCommand

Main commands identification routine. Will return a command if identified, or null if nothing found

private fun identifyCommand(root: File, sourceUrl: URL, command: String): NodeCommand? {

We prefix all of the commands in the text with a # symbol. If it is not found – it is not a command

    if (!command.startsWith(COMMAND_IDENTIFIER)) {
        return null
    }

Removing the prefix and converting a command into a stack of words

    val noPrefix = command.removePrefix(COMMAND_IDENTIFIER)
    val stack = noPrefix.splitCsv()
    val cmd = stack[0]
    val subcmd = stack.subList(1, stack.size)

Then identifying each command family by command name. We remove the head of the stack each time when we go to the next level. Each command family will be parsed in the similar fashion, until nothing is left on the stack

    return when {
        cmd == COMMAND_INCLUDE -> identifyIncludeCommand(root, sourceUrl, subcmd)
        cmd == COMMAND_HEADER -> identifyHeaderCommand(subcmd)
        cmd == COMMAND_PICTURE -> identifyPictureCommand(subcmd)
        cmd == COMMAND_INLINE -> identifyInlineCommand(root, sourceUrl, subcmd)
        cmd == COMMAND_OMIT -> identifyOmitCommand()
        cmd == COMMAND_CITE -> identifyCiteCommand(subcmd)
        cmd == COMMAND_CONTENT -> identifyContentCommand(subcmd)
        else -> TODO()
    }
}

After the commands are identified, we can finally apply them.

com.blaster.business.InteractorCommands::applyCommands

Commands application routine. It receives a source url and root and a list of nodes as a parameters. The result is a list of nodes modified by all of the commands in the original list.

fun applyCommands(root: File, sourceUrl: URL, nodes: List<Node>): List<Node> {

Since we will do operations on the list, we want to convert it to a mutable one – the structure of the list can be changed this way

    val mutableList = ArrayList(nodes)
    val iterator = mutableList.listIterator()

We iterate over the list until we reached the end

    while (iterator.hasNext()) {
        val node = iterator.next()

If the next item in the list is a command

        if (node is NodeCommand) {
            when (node.cmdType) {

We apply the command accordingly

                CmdType.INCLUDE -> applyIncludeCommand(root, sourceUrl, iterator, node)
                CmdType.OMIT -> applyOmitCommand(iterator)
                CmdType.INLINE -> applyInlineCommand(root, sourceUrl, iterator, node)

Some commands have meaning only for printing, so we do nothing right now

                else -> {}
            }
        }
    }

Returning the modified result

    return mutableList
}

As we already know, some commands will request to include the additional code to showcase the mechanics. Such a command will have a command name, command arguments, and the path of the code to be included. Here is the code, responsible for locating the snippets:

com.blaster.business.InteractorLocation::locate

This routine helps us to locate pieces of code, pointed out by path parameter. It returns a class, which represents the location of the found snippet.

fun locate(root: File, sourceUrl: URL, path: String): Location {

First of all we want to assert if the path is formatted properly. This allows to highlight errors early

    check(regexPath.find(path)!!.value.length == path.length) { "Wrong path for the location: $path" }

We start by extracting module from path if we have one

    val (module, modulePath) = extractModule(path)

Then the class follows – we simply grabs everything before :

    val clazz = extractClass(modulePath)

Now we want to extract filepath to help us with file an url

    val filepath = extractFilepath(clazz)

Next we want to retrieve the actual file, containing the class. We do that by looking at the sources root and a package

    val file = locateFile(module, root, filepath)

We also want to assemble the URL to the location based source url on Github

    val url = constructUrl(sourceUrl, module, filepath)

If the path contains exact member – extract it. If not – it is the same class

    val identifier = extractIdentifier(path)
    return Location(url, file, identifier)
}

The author of the scenario can choose how to include the code. We can either include definitions of declarations of methods and classes. The difference is that sometimes you want to include the body of the method, and sometimes you just want to mention it. Let us see how parsing of the requested definitions works.

com.blaster.business.InteractorParse::parseDef

Routine for parsing of the definitions. Accepts the sources url and root and a location of the definition. Returns a list of nodes for this definition (commentaries, code, commands, etc.)

fun parseDef(root: File, sourceUrl: URL, location: Location): List<Node> {

When the definition is located, we extract the code with the help of the ANTLR4

    val definition = kotlinManager.extractDefinition(location)

Next step is to split this text onto the commentaries and code snippets. We also format them – removing unused lines, spaces, etc.

    val withoutTabulation = definition.clearCode()
    val statements = statementsManager.extractStatements(withoutTabulation)
    return renderNodes(root, sourceUrl, statements)

com.blaster.business.InteractorParse::renderNodes

private fun renderNodes(root: File, sourceUrl: URL, nodes: List<Node>): List<Node> {
    val withCommands = interactorCommands.identifyCommands(root, sourceUrl, nodes)
    val commandsApplied = interactorCommands.applyCommands(root, sourceUrl, withCommands)
    return interactorStructs.identifyStructs(commandsApplied)
}
}

Parsing of the requested declarations works very similarly to definitions.

com.blaster.business.InteractorParse::parseDecl

fun parseDecl(root: File, sourceUrl: URL, location: Location): List<Node> {
    val declaration = kotlinManager.extractDeclaration(location)
    val withoutTabulation = declaration.clearCode()
    val statements = statementsManager.extractStatements(withoutTabulation)
    return renderNodes(root, sourceUrl, statements)
}

Couple of words should be said about identifying the structures among the text. After all of the text pieces are located, we want to check if they contain any kind of lists, tables, links, spans and etc.:

com.blaster.business.InteractorStructs::identifyStructs

This routine will kickstart the process of identification of formatting inside of the TextNode. The identification is based on the regular expressions checks. If a certain regular expression is found, we extract pieces of text with the appropriate formatting

fun identifyStructs(nodes: List<Node>): List<Node> = nodes.map {
    when (it) {
        is NodeText -> {
            val transformed = mutableListOf<Node>()

If the node is TextNode, we are trying to identify list items among the text

            transformed.addAll(identifyListItems(it))
            it.copy(children = transformed)
        }
        else -> it
    }
}

We sequentially check if the remaining text contains any of the above mentioned in the following order:

com.blaster.business.InteractorStructs::identifyListItems

private fun identifyListItems(node: NodeText): List<Node> {
    val match = regexListItem.find(node.text)
    return if (match != null) {

We can have links inside of the list items

        listOf(StructListItem(identifyLinks(match.groups[1]!!.value)))
    } else {

Or inside of any text

        identifyLinks(node.text)
    }
}

com.blaster.business.InteractorStructs::identifyLinks

private fun identifyLinks(text: String): List<Node> {
    val result = mutableListOf<Node>()
    identifySpansInText(text, regexLink) { span: String, isInside: Boolean ->
        val node = if (isInside) {
            listOf(parseLink(span))
        } else {

If text is not a part of the link, it still can contain references to the other sources

            identifyCites(span)
        }
        result.addAll(node)
    }
    return result
}

com.blaster.business.InteractorStructs::identifyCites

private fun identifyCites(text: String): List<Node> {
    val result = mutableListOf<Node>()
    identifySpansInText(text, regexCite) { span: String, isInside: Boolean ->
        val node = if (isInside) {
            StructCite(span)
        } else {

If we have just a piece of a text, we want to identify spans if any

            StructText(identifySpans(span))
        }
        result.add(node)
    }
    return result
}

com.blaster.business.InteractorStructs::identifySpans

private fun identifySpans(text: String): List<Node> {
    val result = mutableListOf<Node>()
    identifySpansInText(text, regexBold) { span: String, isInside: Boolean ->
        if (isInside) {

Currently, we support only bold or normal text

            result.add(SpanText(span, SpanText.Style.BOLD))
        } else {
            result.add(SpanText(span, SpanText.Style.NORMAL))
        }
    }
    return result
}

After all of the commands are executed and the snippets of the code are included from the repository, the commentaries in the code could contain additional commands. In this case, we will recursively go back to the stage of the commands and repeat the process.

Finally, after all of that is done, we should have a result of the parsing stage – a list of nodes, which represents our article. This result will be passed to the next step, which is the printing of the article. We will have a look at the details of the printing in the next chapter.

References


Leave a Reply

Your email address will not be published. Required fields are marked *