Why plain text configuration file can be a good user interface for business software?

Wubaoqi
5 min readFeb 27, 2021

1. Web GUI

I worked on an ETL (Extract Transform Load) software at work. Since our target users are normal business users, so, we thought we must build an easy-to-use & drag-and-drop web GUI for it.

So, the GUI is like:

We pre-define some reusable data components for user to construct data processing pipeline. Those components are divided into 3 categories:

  • Source: which means where data comes, and flow into this pipeline. Example Source components like: CSV/Excel input, JDBC input, S3 parquet files etc.
  • Processor: which means data processing steps, like: Group By, Pivot, Add/Remove Columns, SQL, etc.
  • Target: where the final data will go. Like: Local File System, HDFS, JDBC, S3 target folder, etc.

And when user want to build a data pipeline, he/she has to drag those components into the web stage, and config necessary parameters for that components.

This way works great at first, especially it has a low learning curve, so more business users can build the pipeline themselves, instead of rely on IT people.

But as more power users come, the GUI solution has some problems:

  • For real world scenarios, the pipeline may contains hundreds of components nodes. It is harder and harder for any user to understand what this pipeline is doing.
  • Part of pipeline needs a way to form a reusable unit.
  • User demands separation of development and production environments, and we need way to easily transfer pipelines across those environments
  • Hard to test whether logic is correct
  • Hard to track changes

There must be some better way.

2. dbt give some insights

I learned about dbt later, and it really gave me some mind-blowing insights.

By using dbt, you can transform data in their warehouses by simply writing select statements. dbt handles turning these select statements into tables and views.

It let me think more about what is a good User Interface for user, and plain text (mostly SQL in dbt case) can be good for data transform job.

And, by explicitly describe data pipeline in text:

  • it is more precise. can have comments/descriptions in line
  • the files can be organized into different files/directories, so, you only need to focus one file at one time
  • you can do unit test to ensure data quality
  • you can put your plain text into git for version control. By using version control tool, you really opened a lot of best software engineering practice, like Pull Request, code review, issue tracing etc.
  • now, submit pipeline from Development Environment to Production Environment is simply merge from different git branch.
  • CI/CD

Now, A new idea comes into my mind: I want to build a new ETL tool by using plain text files to describe data pipelines.

3. Open source project “waterdrop” let me find HOCON configuration format

Just when I struggled to choose which text file format for my new ETL tool: “To use JSON or YAML?”. I find an interesting open source project: https://github.com/InterestingLab/waterdrop

Which uses a new configuration file format “HOCON” to describe data pipelines, and later waterdrop will translate this configure file into Apache Spark jobs. Interesting.

Because I used Scala & Play framework to build a Business Intelligence software, I’m very familiar with Play Framework’s configuration file, it is very convenient and far better than INI config files or XML files. But I just didn’t realized I could also use that HOCON file to describe pipelines!

For JVM, the HOCON mainly implemented in https://github.com/lightbend/config.

One example Spark pipeline described in waterdrop is like:

The HOCON is more human friendly than YAML or JSON. It has built-in support for variable substitutions & ability to get values from environment variables, also it has built in include support so that it can import another HOCON file into current file.

4. Can we use configure files for other business software

Since use plain text configuration file to describe application logic has so many benefits, I don’t want to limit it for ETL tool. I think we can use plain text configuration file for other business software too! For example, I can use config file for Business Intelligence tool, for in-house CRUD web pages.

By this direction, I found another open source software: lowdefy, it is:

An open-source low-code framework to build web apps, admin panels, BI dashboards, workflows, and CRUD apps with YAML.

Yeah, it seems promising.

5. Don’t worry, configure files are not conflict with good Web GUI

I know, talking about using “plain text” configuration file may seems to be a Big NO for normal business users, who would prefer beautiful & easy-to-use web pages. That’s OK, and actually Web GUI don’t conflict with plain text configuration files. We can first build configuration file based application, which will lay out a solid foundation. And then above configuration file layer, we can build a easy to use Web GUI. By this way, we will have the benefit of both worlds.

And by adding plain text configuration middle layer, we are more easily to reason about our program’s behavior, add more test, build more reusable shared components and even make an public application market to share those common & useful configuration scripts.

And if we can specify the logic by using plain text, our program will be more suitable for serverless environments! For example, we can deploy the same binary as a AWS Lambda Function, and when we want to execute a data pipeline, we just send a HTTP Post request to Lambda with the content of configuration file as the HTTP Body.

I think “how to use plain text configuration to describe all kinds of software” is really a great idea, and I will continue to do more work on it.

--

--