BYOD

Start of a new level

With the success of my previous level in development of a webhook tester, I am set out into the next level. I am a grown person now. I plan first before grabbing my keyboard, I think through more maturely about what I am currently building and I have learnt from my mistakes. One of my mistakes was not documenting my webhook development journey at all. If you are wondering what I am talking about, that’s because I never wrote a damn thing about my previous project. It just sits there. And that’s what I want to change in this level. I want to document each and every decision I make in the process so that I have this back-of-the-head feeling that maybe someday someone is gonna see this and judge me so let’s think it though first.

Here I am with a new book, a new level but the same old thirst of building something. I am following the book ‘Build Your Own Database From Scratch’. It was suggested by someone very dignified in the field of Software Development and I find the title to be my type. Making things from scratch is something I yearn for. No boilerplate, no frameworks, just plain simple code and bugs.

Introduction

So basically by following this book, I am giong to be able to create a mini relational Database. We are gonna start with a B-Tree, then a simple KV store (think Redis) and finally a mini relational DB. Book is focused on three important topics

  • Persistence
    • We must be able to recover from a crash without losing data.
  • Indexing
    • We must be able to query and manipulate the data efficiently.
  • Concurrency
    • Handling multiple concurrent requests gracefully.

Looks like the book itself uses Golang which happens to be the language I am going to use. Considering the amount of error handling and concurrency management we are about to do, I’d not suggest anything else.

1. Persistence

Persistence is a concept taken for granted by developers. If we did not have any DBMS lying around to play with, we’d be still using CSVs and copying files and update the copied files and renaming the updated file and if the file is too large then waiting a long time for the process to happen all the while wondering why our customers are stuck at the loading screen four hours and yet bills are through the roof. Employ a database only if you hope your business is going to grow huge. Otherwise, any of these are overkill and we are better off working with a spreadsheet on Excel or Google Sheets. Imagine having a file with 100000 records and updating one record. We’d have to copy over the entire 100000 records, change the required value and rename. All that compute for 0.01% of work done. Thus databases come into picture.

2. Indexing

I am not clear about what does James mean about indexing. Is it like creating a pointer to the memory location containing my data. That would make sense but I am not really sure right now. He mentions there are two types of database queries

  • OLAP (Analytical)
  • OLTP (Transactional)

And we are going to learn only OLTP in this book. Rest all makes complete sense. Anybody with even a little passion about Computer Science would know about Binary Searches because they are so damn fast. Moreover, with algorithms like Binary Search working in O(log(n)) complexity manifold their efficiency as data gets larger.

At smaller datasets like 4 tuples, We’d be making roughly the same amount of comparisons. At a few hundreds we would be making 93% less comparisons than navie search already. At larger sizes of n = 100000, we will make only 0.00017% of the naive search comparisons. At this scale, IO operations will become our bottleneck.

This is wild and when I first realised the benefits of this, I was blown. And understandably as well. Anyways, We’ll be using B-Trees here I believe.

3. Concurrency

Here we are introduced with a new concept in databases: transactions
Not that I understood too much about that either. Maybe a read-update cycle is a transaction. Let’s move ahead and see. Concurrency is a huge reason to move towards databases. Queueing up multiple of those inefficient read-copy-mody-rename cycles will definitely eat up all the resources and indirectly your money.

Chapter 0 ends

So far, I think I already knew a few concepts and I learnt a lot of new ones. Onto the first section of the book where we are going to learn to make a simple Key-Value store.