Databases

When you're building a new system, you need to decide what kind of "filing cabinet" to use. This "filing cabinet" is called a database. A database is a tool that helps you store and find information easily. Just like a toy box where you can put all your toys in different compartments and find them easily, a database helps you put all your information in different sections and find it easily.

There are different types of databases, and each one is good for different things. It's important to choose the right one for your system because it can affect how well your system works and how easily it can grow. Imagine you have a toy box and you want to organize all your toys inside it. Some toys are small and you need to access them quickly like marbles, some toys are big and you want to keep them safe, like a big Lego building, and some toys you want to search through quickly like finding a specific car in a big collection of cars. In this scenario, the toy box is your database and the different types of toys are different types of data. Just like you would use different compartments or drawers in the toy box for different types of toys, you would use different types of databases for different types of data.

So, remember:

  • Databases are meant for data that needs to be queried
  • The data shouldn't be lost (persistence is important)
  • Choice of database impacts the non-functional requirements. For example, choosing a storage system with fixed capacity will not provide the scalability needed as your data grows
  • Different kinds of databases are optimized for different functional and non-functional requirements

Few mineal examples on how different types of databases serve specific needs:

  1. Caching: databases, like Redis/Memcached etc, are good for storing information that you need to access quickly, like the score of a game or a news article. You can use it when you need to show the most recent information in real-time. But, it's not good for storing large amounts of data, like all the sales data for a company, because it would be too big to fit in the "filing cabinet". An example of this type of database is Redis. It can handle a lot of requests at the same time, making it very fast. When your system needs to scale, Redis can help by storing the most important information in its memory for quick access, reducing the load on other databases.
    Examples: Redis, Memcached

  2. File storage: databases, like S3, are good for storing big files like pictures/images/gif and videos, where idea is to serve the data as it is. Such type of storage is callled Blob srorage.You can use it when you need to store large multimedia files. But, it's not good for storing small pieces of information, like the score of a game, because it's not designed for that. An example of this type of database is S3. It can store very large files and it can be accessed by many people at the same time. When your system needs to scale, S3 can help by distributing the files to multiple servers, so more people can access them at the same time. It can be used with a CDN to deliver the files all over the world. So there will be a primary DB like S3 combined with CDN. Basically CDN distributes same images geographically in a lot of locations to be able to access faster by users and have good user experience.
    Examples: Amazon S3, Google Cloud Storage, Azure Blob Storage

  3. Text searching: databases, like Elasticsearch, are good for searching for words or phrases in large amounts of text. You can use it when you need to search through a lot of text and find what you're looking for quickly. But, it's not good for storing information that you don't need to search, like the score of a game, because it's not designed for that. An example of this type of database is Elasticsearch. It can search through a lot of text very quickly and it can be accessed by many people at the same time. When your system needs to scale, Elasticsearch can help by distributing the text to multiple servers, so more people can search at the same time.
    Examples: Elasticsearch, Solr, Algolia

  4. Metric databases:, like Influx or OpenTSDB, are good for storing information about how a computer or system is working, like how fast it's running or how hot it's getting. You can use it when you need to store and analyze system performance data. But, it's not good for storing information that doesn't change over time, like the score of a game, because it's not designed for that. In simple words these are sequential fixed data, one makes sequential updates or bulk read query in large range. An example of this type of database is InfluxDB. It can store and analyze performance data very quickly and it can be accessed by many people at the same time. When your system needs to scale, InfluxDB can help by handling a large number of data points, so you can analyze your system's performance in real-time.
    Examples: InfluxDB, Prometheus, TimescaleDB, OpenTSDB

  5. Analytics databases:, store humungous amount of information about product/company etc such that you want to analyze or make offline reports from it, like all the sales data for a company that happened in summers in the last 10 years and find the analogy with the IPL games happened within the period (just an example). This in short a data warehouse where it can store petabytes of data and execute complex queries across distributed systems, enabling organization-wide analytics at scale.
    Examples: Snowflake, Google BigQuery, Amazon Redshift, Apache Hive

  6. DocumentDB: databases excel at storing semi-structured data in flexible, JSON-like documents without requiring a fixed schema. Implement when working with varied data structures that may evolve over time, or when application data naturally maps to document structures. Databases like MongoDB offer horizontal scaling through sharding, distributing data across multiple servers to handle increased load and storage requirements.
    Examples: MongoDB, Couchbase, Firebase Firestore

so,
Choice of DB largely depends on:

  1. Data Structure: Does your data fit better in tables (relational), documents, key-value pairs, or another model?
  2. Query Patterns: What types of queries will your application perform most frequently? Different databases optimize for different query patterns.
  3. Scale Requirements: How much data will you need to store, and what performance characteristics are necessary as your system grows?
  4. Consistency Requirements: Do you need strong consistency for all operations, or can your application tolerate eventual consistency for better performance?
  5. Operational Complexity: Consider the expertise required to manage and maintain different database technologies.

Read more