Why? What?

Many of us have a hobby or an art or a craft of some sort. But how to showcase it to the like-minded audience? There are many ways out there: you can write an article, you can create a video on YouTube, you can maintain a page on Wikipedia. You can even create your podcast series. Each of those approaches has its advantages and disadvantages. Some are a bit more entertaining, and the others are easier to produce. In my opinion, my craft – the art of programming – requires a unique approach.

Stanislaw Lem

I saw many videos about programming on YouTube. Their main disadvantage is the fact that it is quite hard to follow them. Imagine if you want to recreate or learn some specific details from the video. You will have to pause all the time, and you cannot copy the code snippets from the video. And sometimes, the details of the algorithms or complex mechanics are not explained deep enough due to the specifics of the medium.

Articles have disadvantages as well – especially the old ones. The codebase has a tendency to develop over time: bugs are fixed, ideas getting mature. If you ever read the technical articles about ancient and established products, you probably noticed what I am talking about – when the code and the text diverge significantly.

One of the approaches I liked was a so-called literate programming first time introduced by Donald Knuth[1]. In this case, source code behaves like an open book. The author explains the codebase in proper order and with an in-depth explanation, available on the spot. I want to develop a tool that allows defining a codebase in a particular manner. Imagine, if you have an article, which can be compiled as a part of the codebase. This approach allows the code and the text to always be in sync – even if the project is being modified over time.

One more influence is the example of the C programming language. When you port a C compiler on a different hardware platform, C becomes a self-compiling assembler. You start with porting a basic functionality, then more advanced features of the language can be added on top of it while utilizing the basics. This article is something along those lines. Over time, more features will be added to the tool, and the material will be updated. This article is an example and a first use case for this tool.

I decided to call this tool Lem in the memory of one of my favorite authors – Stanislav Lem[2]. He was a Polish writer of science fiction, philosophy and satire, and a trained physician. His books were a source of great inspiration during my childhood and early youth. If you have never read those – make yourself a favor.

Let’s now start discussing the internal construction of the tool. We will dive deep into the application by explaining the foundations – the architecture of the application.

The CLEAN architecture of the app

Every application should have some meaningful structure behind it. This structure, commonly known as the application architecture, is the primary reference while reading and understanding the code. The common fallacy behind the application architectures is to exist just for the sake of existing. A good sign of it is when the design is too restrictive and rigid when it is hard to add use cases, which were not evident during the designing stage. To avoid this common pitfall, I am usually trying to use something as flexible as possible. The most important part of the design is to do not restrict yourself. As we all know, premature optimization is the root of all evil.

That is why I often employ a so-called CLEAN design[3]. This approach was created and formalized by “Uncle” Bob Martin based on several other architectural designs. I like the flexibility and fundamental ideas behind this design.

The main point behind the CLEAN design is the principle of Separation of Concerns[4]. From Wikipedia: Separation of Concerns is a design principle for separating a computer program into distinct sections so that each section addresses a separate concern. This approach allows you to represent your application as a sort of layered pie, where each layer serves to its purpose.

The main layers are Presentational Layer, Business Logic Layer, and Data Access Layer. If done from the beginning, the application can be dissected into three distinct aspects without too much hassle. That gives the ability to work abstractly and independently, brings enormous benefits during the testing stage.

I am not following the design too strictly to allow myself some slack. In my opinion, every tool can be adjusted for a job.

In the code of the application, each layer is represented by its package. “platform” for the Presentational layer, “business” is for Business Layer, and “data” is for Data Access Layer. Each package is further divided according to its own needs. For example, the Business Business Layer package contains Interactors. Each Interactor represents a set of Use Cases with similar responsibilities grouped. Data Layer is represented by Entities and Managers to access those entities. And since we aren’t creating any UI, the Platform Layer is straightforward – the application class represents it.

In the next chapter, we will talk about additional auxiliary facilities, which serve as a typical boilerplate for the rest of the code.

Dependencies boilerplate

The key to managing the complexity of a growing project is the modularization. Modularization is a division of a system or a product into physically and functionally distinct units to allow removal and replacement. Those units have to reference each other somehow. And also we would like to allow ourselves some control and flexibility over those references. Wouldn’t it be cool to be able to just remove and replace parts of the system without interrupting others?

All of that is possible with a Dependency Injection system. In software engineering, Dependency Injection is a technique, whereby one object supplies the dependencies of another object [5]. With such system you control the creation, the destruction and a lifecycle of every component. For example you can decide if you will have only one instance of the component and therefore it will be marked as a singleton, or it will be created every time when it is required.

One more advantage of a good Dependency Injection framework is that it allows you to significantly reduce the amount of boilerplate code. Instead of dragging each dependency through the chain of constructors, you can simply inject a field through the DI on the creation stage.

This particular project uses Kodein to facilitate DI [6].

In the next chapter we will discuss a general overview of the application pipeline.

The pipeline overview

Let us start by briefly defining what we want to achieve with this application. My original idea was to have a tool that allows you to process the code into a readable article. Since this processing can be done every time the code is updated, we can maintain parity between the codebase and the material. The process can be dissected into distinct stages. In this chapter, we will overview those stages and define a goal for each of those.

The application starts its lifecycle, similarly to most of the JVM-based applications, with the main entry point. The main task at this stage is to handle the parameters and parse the properties of the following operation.


This is an application main entry point. Here we define all the projects we want to process into the articles.

fun main() {
    val lemApp = LemApp()
    lemApp.renderScenarios("./",         "/madeinsoviets/lem/blob/develop/")
    lemApp.renderScenarios("../blaster", "/madeinsoviets/blaster/blob/master/")

After the main classes are created, and the parameters are consumed, we can move to the Scenario Stage.

When I am writing the articles, I usually start by outlining the table of contents. Then I follow up by adding a little bit of text for every point in the draft. In our application, this blueprint is represented by the article scenario. A scenario is a template, which sets the stage for the rest of the article. The scenario file contains text nodes and commands definitions to be executed. Scenario Stage starts by parsing the scenario file, and as a result, we receive a list of sections and commands which were found.

When we have a list of nodes and first extracted commands on hands, we can move to the next stage, which is Commands Stage. Commands stage takes the current result and applies the first commands to it. Commands can be something as simple as adding a header or changing a font, or it can be something more complicated – for example, something which will modify the structure of the article or include snippets of code to it.

In most of the cases, as a result of some commands execution, some code will be queued to be parsed. That is how we keep a permanent link between an article and a codebase. As a result of the parsing of the code, we will receive snippets and comments. It is possible to have additional commands in those comments. Those commands will follow the same procedure: after being extracted from the comments, they will be applied to a current result. You can immediately notice the recursive nature of the process.

After all of the commands applied we can finally work with the text. It means, that we can identify structures and formatting in paragraphs. At this stage, for example, we can highlight all those lists and tables, different spans, links and etc.

Those steps can repeat many times. When finally, all of the preparations are done, and we have a list final list of nodes, we can move on to the next stage.

The final stage in the lifecycle of our application is the Printing Stage. At this stage, we take the list of nodes from the previous steps and convert them into an HTML page with the help of templates.

Each paragraph goes through its own type of template, which will apply different properties of the text and rendering – and will end up in the final HTML representation of the article.

So, to summarize:

In the next chapter, we will have a closer look at how things are parsed.


Leave a Reply

Your email address will not be published. Required fields are marked *