Introduction to Databases
A database (DB) is an organized collection of structured data stored electronically in a computer system, managed by a Database Management System (DBMS).
Let's Invent Database
Alright, so imagine you're building a movie collection app with Python. At first, you might think "I'll just use files!"
You create a file for each movie - titanic.txt, inception.txt, and so on. Inside each file, you write the title, director, year, rating. Simple enough!
But then problems start piling up. You want to find all movies from 2010? Now you're writing Python code to open every single file, read it, parse it, check the year. Slow and messy.
Your friend wants to update a movie's rating while you're reading it? Boom! File corruption or lost data because two programs can't safely write to the same file simultaneously.
You want to find all movies directed by Nolan AND released after 2010? Now your Python script is getting complex, looping through thousands of files, filtering multiple conditions.
What if the power goes out mid-write? Half-updated file, corrupted data.
This is where you start thinking, "there has to be a better way!" What if instead of scattered files, we had one organized system that could handle all this? A system designed from the ground up for concurrent access, fast searching, data integrity, and complex queries. That's the core idea behind what we'd call a database.
Database Management System
So you've realized you need a better system. Enter the DBMS, the Database Management System.
Instead of your Python code directly wrestling with files, the DBMS handles all the heavy lifting, managing storage, handling concurrent users, ensuring data doesn't get corrupted, and executing queries efficiently.
But here's the key question: how should we actually structure this data?
This is where the data model comes in. It's your blueprint for organizing information. For movies, you might think: "Every movie has attributes: title, director, year, rating." That's a relational model thinking, data organized in tables with rows and columns, like a spreadsheet but much more powerful.
Relational Model - Tables:
| movie_id | title | director | year | rating |
|---|---|---|---|---|
| 1 | Inception | Nolan | 2010 | 8.8 |
| 2 | Titanic | Cameron | 1997 | 7.9 |
| 3 | Interstellar | Nolan | 2014 | 8.7 |
Or maybe you think: "Movies are connected, directors make movies, actors star in them, movies belong to genres." That's more of a graph model, focusing on relationships between entities.
Graph Model - Nodes and Relationships:
(Movie: Inception)
|
|--[DIRECTED_BY]--> (Director: Nolan)
|
|--[RELEASED_IN]--> (Year: 2010)
|
|--[HAS_RATING]--> (Rating: 8.8)
(Movie: Interstellar)
|
|--[DIRECTED_BY]--> (Director: Nolan)
|
|--[RELEASED_IN]--> (Year: 2014)
The data model you choose shapes everything, how you store data, how you query it, how it performs. It's the fundamental architectural decision that defines your database.
What Is Schema ?
The schema is the blueprint (like class in Java or python) or structure of your database, it defines what can be stored and how it's organized, but not the actual data itself.
For our movie table, the schema would be:
Movies (
movie_id: INTEGER,
title: TEXT,
director: TEXT,
year: INTEGER,
rating: FLOAT
)
It specifies the table name, column names, and data types. It's like the architectural plan of a building, it shows the rooms and layout, but the furniture (actual data) comes later.
The schema enforces rules: you can't suddenly add a movie with a text value in the year field, or store a rating as a string. It keeps your data consistent and predictable.
Data Models
These are just example to know, but we will study only few, so it's ok if you they sounded complex, but they aren't.
Relational (SQL)
- Examples: PostgreSQL, MySQL, SQLite
- Use case: transactions. Need ACID guarantees, complex joins between related data.
Key-Value
- Examples: Redis, Memcached
- Use case: Session storage, user login tokens. Lightning-fast lookups by key, simple get/set operations.
Document/JSON (NoSQL)
- Examples: MongoDB, CouchDB
- Use case: Blog platform, each post is a JSON document with nested comments, tags, metadata. Flexible schema, easy to evolve.
Wide Column / Column Family
- Examples: Cassandra, HBase
- Use case: Time-series data like IoT sensors. Billions of writes per day, queried by device_id and timestamp range.
Array/Matrix/Vector
- Examples: PostgreSQL with pgvector, Pinecone, Weaviate
- Use case: AI embeddings for semantic search - store vectors representing documents, find similar items by vector distance.
Legacy Models:
- Hierarchical
- Network
- Semantic
- Entity-Relationship