INTRODUCTION
If you’re like me, you’ve most likely considered how best to arrange your wardrobe to easily retrieve a favourite wear when you have under one hour to get ready, or how best to avoid turning out your entire wardrobe as a heap on the floor just to reach your ideal combination for the day. This same idea appears in the world of computers and computation where the items to be stored are data and the usage is for computational processes to achieve a particular end or solve a particular problem.
Following the analogy above, it becomes pertinent to consider how data are stored for efficient retrieval and usage in computational processes, which is the very foundation of data structures.
In the same vein, algorithms are step-by-step rules to be followed to solve a particular problem or reach a particular goal. In light of the above analogy, it would refer to a step-by-step procedure I put in place to arrange and access particular clothes in my wardrobe to avoid turning everything into a heap on the floor, especially when I get to add more clothes, say after washing.
Data structures and algorithms are about finding efficient ways to store and retrieve data, to perform operations on data to solve a specific problem or perform a computation. While data structures are the different ways to organize and store data in a computer, algorithms are the operations you can perform on different data structures.
The idea of Data Structures and Algorithms implies three things fundamentally as building blocks:
Data: these are raw pieces of information that consist of basic facts and figures stored on a computer, such as numerical data, texts, and images;
Structures: the arrangement of and relation between parts in a definite pattern of organization. (Webster)
Algorithms: a step-by-step set of rules to be followed in calculations or other problem-solving operations, especially by a computer. (Oxford)
IMPORTANCE of DSA
Just as the storage space I have, and the speed in retrieving clothes from my wardrobe, become a concern as I add more and more clothes, efficiency in computational processes become more of a concern as the data inputs increase.
DSA are therefore crucial for optimizing memory usage and increasing speed of code execution.
A classic highlight is where a product evolves from serving a small user base to millions or billions of users. While searching for a user’s data stored in an unsuited data structure with a slow algorithm might work for 100 users’ records, but for a billion users, the inefficiency becomes apparent; taking more time than anticipated, or consuming more memory space than desired, which culminates in a poor user experience.
DSA Ensures:
Efficiency: not just a program that works (effectiveness), but in addition optimizes the resources (time and memory space) used to achieve a task;
Optimal Resource Utilization: efficient use of computational resources like CPU, memory, and storage;
Scalability and Concurrency: ensuring the system can handle increasing user demands, and process requests concurrently without slowing down
Reliability: poor choices in DSA can result in system crashes when input data (users’ requests) spikes.
DATA STRUCTURES
Data Structures can be classified by their characteristics as:
Linear DS: where elements are arranged in one dimension, such as arrays, lists, stacks, queues, etc.;
Non-Linear DS: where elements are arranged in one-many, many-one, and many-many dimensions, such as tree, graph, etc.;
Basic Data Structures
- ARRAY:
An array is a collection of elements/items stored in a contiguous (non-breaking) memory space.
From GeeksforGeeks - an array of characters in memory
Since the elements are located together, in a memory sequence, elements of an array can be accessed randomly at the same (constant) speed using the offset of each memory space. So in the image below, I can access the element 9 as quickly as the first element simply by accessing the offset of each memory space, which is at the offset of 12 bytes, or the 3rd index (arrays are 0-based indexed in many programming languages.)
From SubMain Software - an illustration of each item in an array, at a progressive offset of each memory size
However, because arrays use a chunk of memory to hold elements in a non-breaking sequence, other variables can be stored right after the chunk allocated to a particular array, thereby making the array locked in, and fixed in size. An attempt to add to the array creates a larger array in another memory location and copies all the elements of the original array into the larger array, one at a time, which can be an expensive operation. Similarly, to delete from an array is to create an empty memory slot within the contiguous memory space, then shift every other item to fill the space.
So, while an array DS might be great for use cases where we need to randomly access elements, it is not great where items will be frequently added or deleted.
- LINKED LISTS:
A linked list consists of a group of nodes in sequence, where each node holds two data: one is the value, and the other is the pointer to the address of the next element in the list. Each node points to or references the next element, hence the name-linked list.
By Harsh on Medium - an illustration of a linked list with each node containing the data and address to the next node.
Unlike an array that requires contiguous (non-breaking) memory space, a linked list does not require a contiguous memory allocation because the reference to the next node is contained in the previous node of a linked list. Therefore, each element can be stored at a different memory location, which allows size flexibility that can grow or shrink as required.
However, unlike an array where you can randomly access elements at the same speed (in constant time), accessing a random item in a linked list requires traversing all the nodes until you find the target element, hence the preference of an array for accessing a random element, and a linked list for operations that frequently require addition and removal.
Variations of the Linked List DS are singly linked list, doubly linked list, and circular linked list. Nodes in a singly linked list have only one pointer to the next element and the last node points to null, while a doubly linked list has a pointer to the next element and the previous element. In a circular linked list, the last node points back to the first node, forming a circle. So unlike a linear linked list that has a definite start and end, a circular linked list does not have a null reference at the end.
Practical use cases of linked lists include image viewer, where a doubly linked list allows viewing the next and previous images efficiently.
- STACKS:
A stack is an abstract data structure that can be implemented using an array or linked list. It is a linear DS in which operations are performed in a particular order on a set of data — just like a stack of plates, the last element inserted comes out first. Hence the alias LIFO — Last In First Out.
From GeeksforGeeks - an illustration of a stack, with push and pop operations
Common operations performed on a stack are push, pop, peek, size, and isEmpty. Push adds an element to the top of the stack, pop removes and returns the last inserted element at the top of the stack, peek returns the element at the top of the stack without removing it, size returns the total number of elements in a stack, and isEmpty checks if the stack is empty or otherwise.
A practical use case of a stack is the undo/redo operations where the latest action added to the stack of executed actions is removed or added.
- QUEUES:
A queue DS, like a stack, is an abstract DS that can be implemented using array or linked list. It is also a linear DS used for storing and managing data in a specific order — on a “First in First Out” (FIFO) principle, where the first element to be added to the queue is the first to be removed; just like your regular queue at the bank or in a mall.
By Harsh on Medium -an illustration of a queue, with dequeue and enqueue operations
Common operations on a queue are enqueue, dequeue, peek, isEmpty, isFull. Enqueue adds an element to the end of the queue, dequeue removes an element from the front of the queue, peek retrieves the first element without removing it from the queue, isEmpty checks if the queue is empty, and isFull checks if the queue is full.
- HASH TABLES:
An hash table is fundamentally an array of linked lists or arrays, designed for quick and efficient data storage and retrieval. It combines the merits of linked lists and arrays to enable efficient searching, addition, and deletion of data, even for large amounts of data.
Hash tables are data structures that map keys to values using a hash function. With a hash table, to get the location (called a bucket) in which to store or retrieve an element, the value is passed into the hash function which returns a hash code, which is the index or bucket in the hash table where the element should be stored in, retrieved, or removed from.
From Java point - an illustration of a hash table
This DS is not without its dip as there can be collisions — where the hash function returns the same hash code for two or more values/keys.
One way to solve the challenge of collision is through chaining, where each element at that index (otherwise called a bucket) is implemented as a linked list, that is, a node containing a value and a pointer to the next node assigned to that particular bucket. Another resolution technique is the open addressing strategy to find an alternate location for the colliding elements within the hash table, using techniques like: Linear probing, Quadratic probing, and Double hashing.
A quick observation from the chaining resolution technique is that it can become an absolute linked list, as a worst-case scenario, where every element is given the same hash code.
Practical applications of hash tables are caching for quick access to frequently used data; and databases for indexing and retrieving records.
TREES: Trees are non-linear data structures that allow hierarchical arrangements and parent-child relationships between data nodes. While the top node is called the root node, nodes with the same parent are called sibling nodes, then nodes without children are called leave nodes. A tree data structure is a hierarchical structure of nodes connected by edges, which show the relationship between one node (the parent) and another (the child).
GRAPHS: A graph is a data structure that consists of a set of nodes (also called vertices) and a set of edges that connect pairs of nodes. It is primarily used to represent relationships or connections between data items.
As a key feature, a graph can be directed or undirected. In a directed graph, the edges are unidirectional, depicting a one-way relationship, while in an undirected graph, the edges are bi-directional. Also, a graph may be weighted or otherwise, that is an attached cost to that connection, the edges merely represent the existence of a connection.
NOTE ⇒ You will observe that each DS has unique characteristics that makes it suitable for a particular task or otherwise. For instance, an Array is more suitable for operations like randomly accessing an element by its index but not for insertion or deletion, for which a Linked List will be preferred. Similarly, stacks and queues are to be considered where order matters, while Trees and Graphs are most suitable for modelling relationships between data points.
Your ability to choose or design the most suited data structure for a particular task can be the difference between a solution that’s functional and efficient and one that isn’t. (Roadmap.sh).
Aside from these basic data structures, other data structures including Tries, Advanced Lists, Binary Indexed Tree, etc.
In the sequel to this article, we will explore Algorithms and walk through a code implementation of some common algorithms.