2024-07-20
#computer-science
#data-structures
#algorithms

Data Structures: Theoretical Squeeze

Deep dive into dynamic data structures: arrays, linked lists, trees, stacks, queues, and hash tables.

Dynamic Data Structures are data structures designed to store our data in RAM (Random Access Memory) so we can easily work with them, whether it's simple numbers or complex objects.


RAM (Random Access Memory) is memory that stores the program's machine code and all input, output, and temporary data that the processor works with.

RAM consists of cells that store data bits based on their electrical charge. Each cell loses its charge over time (due to current leakage), so they need constant refreshing - that's why it's called "dynamic" memory.

Dynamic Memory Allocation is a way to allocate computer memory for objects during program execution.

A Garbage Collector manages dynamic memory allocation - it's a program that tracks memory usage and ensures timely memory cleanup.


------ Dynamic Array ------


Let's say we created a dynamic array with 5 elements allocated in dynamic memory.

Now we need to add a new element to this array.

  1. We need to create another array in dynamic memory, but one element larger.
  2. Copy all elements from the old array to the new one
  3. Write new data to the remaining cell of the new array
  4. Free the memory used by the old array and update the pointer to point to the new array.

The same process applies when removing elements.

Thankfully, this is automated. If we constantly add or remove elements from a Dynamic Array (say 5000 iterations), it creates quite high overhead.

Array elements are located next to each other because an array is a continuous memory block. We only know the address of the first element. Access to subsequent elements is done using pointer arithmetic. To access the next array element, we need to offset from the starting memory address by a certain number of bytes (depending on the data type) and read the data after that offset.


---- Singly Linked List ----


Singly Linked List is a dynamic data structure - a collection of elements in dynamic memory where each element knows the address of the next element. It has two main characteristics:

  • Fast operations for changing the number of elements - adding or removing elements is really quick
  • Slow data access compared to regular dynamic arrays

Linked lists solve the problem of frequent adding and removing elements from your data structure. For example, with dynamic arrays, constant multiple iterations of adding or removing elements creates quite high overhead.

Array elements are located next to each other since an array is a continuous memory block. We can directly access memory cells using just one pointer to the array's beginning. Then we simply iterate through the array and access any element by its index.


Singly Linked List is not just a continuous memory block like an array where elements are next to each other and accessible via one pointer to the first element.

A singly linked list is a more complex container where the elements in memory are in a chaotic order, since each element knows the clear address of the next element.

Essentially, a singly linked list element is a class containing two fields - Nodes:

  • Memory address (pointer)
  • Data/information

We need the memory address as a separate field to know where the next element is located. This is a pointer to the next list element.

If there's no next element at the end of the list, the address field equals NULL (NULL POINTER). We traverse the list from beginning to end, where the end is an address pointing to NULL.


--------------- Advantages ---------------
  • Singly and doubly linked lists work faster than dynamic arrays when we frequently add or remove elements. It's very convenient to manipulate the address field when deleting and adding new elements
-------------- Disadvantages --------------
  • Slow access to specific elements by address. To get the last element, I need to go through all elements, checking if each address field equals NULL If I need the third element, I need to keep a counter! (In arrays it's instant - just offset by a certain number of bytes)

    The further the tail, the slower it is to work with singly linked lists. Doubly linked lists partially solve this problem!


---- Doubly Linked List ----


Doubly Linked List

In a singly linked list, we always work from the beginning - Head. We can't access the last element - Tail - and then find other elements through it.

The difference between doubly and singly linked lists is how they store address data.

Doubly Linked List stores 3 fields in each element (Node):

  • pointer Next (NULL for the last element)
  • pointer Previous (NULL for the first element)
  • data

This way, a doubly linked list knows not only which element comes after it, but also which one was before it.

-------------- Advantages --------------

This gives doubly linked lists advantages in finding elements at the beginning and end of the list compared to singly linked lists. (but you need to know the number of elements to choose whether to start searching from the beginning or end)

-------------- Disadvantages --------------

This is also a disadvantage when adding or removing elements, since now you need to make more connections in both directions, which puts more load on the system.


------ Binary Tree ------


Binary Tree is also a dynamic data structure consisting of elements called nodes (vertices(вершины)), where each node can be a parent of two other nodes (only two - that's why it's called binary) - left and right, which can also have children (left and right).

The first element that has no parents is the root of our tree.

The last elements that have no children, whose pointers to left and right parts point to NULL, are tree leaves.


How Binary Tree Logic Works

A VERY important characteristic of a binary tree is that it's an ordered data structure. Meaning it's always sorted.

Binary tree organization logic tells us that numbers smaller than a node's value should be added to the left as children, and larger numbers to the right.

In linked lists, we simply added elements to the next free cell regardless of previous values. In binary trees, adding an element doesn't just happen anywhere - we add elements in an ordered manner.

To delete an element that has children, we need to find the largest node among its left child - go left once, then keep going right, and replace the deleted element's value with this child's value.


-------------- Advantages --------------
  • Faster finding of needed elements because we know in advance that some tree branches simply cannot contain the searched value
  • Same applies to adding elements (compared to dynamic arrays, but not always compared to linked lists where adding to beginning/end is faster)
  • Tree traversal (displaying all elements) is very convenient using recursive algorithms

--------- Stack ---------


Stack is a dynamic data structure - essentially a collection of elements with a specific organization that determines how elements can be added and removed.

You can say that Stack is based on singly linked list logic, following the principle: LIFO (LAST-IN-FIRST-OUT) The element that was added to the stack last must be removed first.

A stack has a bottom and a top. Think of a stack of plates - you can't take the bottom one or pull from the middle. Only the top one, same with adding. Or think of a gun magazine.


--------- Queue ---------


Queue is the simplest dynamic data structure because we encounter queues in everyday life.

Main queue principle: FIFO (First In, First Out) Whoever enters the queue first, exits first.

Queues are needed to process data and tasks in the same order they arrived. After data (or a list of actions) is processed, it's removed from the queue.

There are only two positions through which we can interact with a queue: New elements are added to the End of queue and removed from the Beginning of queue

Queues are easily implemented using doubly linked lists with some restrictions - allowing addition from the beginning, removal from the end, and viewing first and last elements.


-- Circular Queue & Priority Queue --

Circular Queue - same queue, but we don't delete elements from the beginning when extracting them - instead we move them back to the end of the queue.


Priority Queue - in this queue, the order of element extraction depends on processing priority. The criteria for evaluating processing priority (extraction order) can be anything.

Two ways to organize priority queues:

  • Priority Insertion Queue - elements must be ordered by priority when added to the queue
  • Priority Extraction Queue - elements are added in arrival order without sorting, but during extraction we choose the highest priority element for processing

For example, with integers.


--------- Deque ---------


Deque (Double Ended Queue) is simply a double-ended queue.

The main queue rule states that the first element to enter the queue will be first for processing when extracting elements.

With Deque, since it's a double-ended queue, this rule works in both directions. We can add and extract elements in queue order, but from either side.

Since it's unclear which side is beginning and end, they're called left end of Deque and right end of Deque.

There are also subtypes: input-restricted Deque and output-restricted Deque.

(The difference between Deque and doubly linked list is that Deque elements are arrays connected by references to each other)


------- Hash Table -------


Hash Table is a data structure that allows fast information retrieval by key, regardless of the amount of data.

They're often used in Cache building, Database Indexes, and language processors. Many programming languages have associative arrays that use Hash Tables as their foundation.


Imagine we have a table with names and phone numbers. To find Sasha's number, we'd have to check every record - but what if there are a million records?

How to speed up search? - add identifiers for searched records.

We need to refill the table so each element's index is calculated from its value. Now we add Sasha's record not to any random cell, but to a cell calculated from his name. In the future, knowing his name (key), we can calculate his index and get the value immediately.

This process of converting the key "Sasha" into a number identifier is called a Hash Function.

When two records produce the same identifier after hash function transformation, it's called a Collision.

There are several methods to resolve collisions:

  • Open Addressing Method (save record in the next free cell, following a specific algorithm called Probing)
  • Chaining Method (pass key "Sasha" through Hash Function to get identifier, find corresponding table cell, but this time we store not Sasha's information itself, but a reference to it in separate memory area; if cell is occupied, the existing element stores the reference to Sasha's information)

Overflow and deletion problems in Hash Tables are solved by Rehashing


-------- Good Hash Function --------

A good hash function must have 4 properties:

  • Determinism - for the same key value, Hash Function must always return the same value
  • Uniformity - table data should be distributed evenly, avoiding situations where many keys produce the same index (fewer collisions)
  • Efficiency - hash function should compute quickly
  • Boundedness - indexes produced by hash function must be within table limits (to store values in the table)

Connected Thoughts

Egor Zdioruc | Lead Full Stack Developer | Laravel & AI Solutions