Data Structure Notes

Classification of Data structure

  • Linear Data Structures: Arrays, Queue, Stack
    • Static: Fixed size
    • Dynamic: unfixed size
  • Non-Linear Data strcutures: Tree, Graph

Types of Data Strcutures with Related Algorithms:

Array DS

String

  • Trie Data Structure (Prefix Tree)
  • Manacher’s Algorithm
    • Used for palindrome related problems, generally resulting in O(n) runtime.
    • If left portion of a palindromic substring has already been explored, then no need to re-explore right portion of same palindromic substring as this is going to be mirror of already explored left portion.
  • TODO: Knuth–Morris–Pratt (KMP) algorithm for searching substring in string in O(n) time Longest proper prefix/suffix table is created using given string pattern to avoid comparison repetition.
String Problems

Heap

A complete binary tree data strcuture, where value of parent node >= children node. The comparison of values can be based on any kind of comparator.

  • Max heap: Value of parent node >= value of children node
  • Min heap: Value of parent node <= value of children node
  • Since it’s a complete binary tree, heap can be easily implemented on simple arrays, because zero based index of left and right children of any node having index \( i \) can be calculated as \( 2i+1 \) and \( 2i+2 \)
  • heapify() is the main function used in heaps, this is usually a recursive function

Stack

  • Supports push, pop, peek operations
  • Java has LinkedList<Object> class to perform stack operation. push, pop, and peek methods can be used to perform stack operations
Monotonic Stack
  • The stack stores element in sorted order (increasing/decreasing).
  • If any element to be pushed breaks the sorted order of stack, the stack is continuously popped until the element to be pushed no longer breaks the sorted order of the stack.
  • Use cases:
    • To determine the next/previouis smaller/larger element for each element in array. This can be achieved in O(n) operations using monotonic stack.
    • To determine the Minimum/Maximum of next/previous N elements for each element of array

Priority Queue

  • supports Queue/Deque (or Offer/Poll). Both have time complexity \( O\left(\log n\right) \).
  • Maintains elements in sorted form at all time, whenever deque is performed, the first element of sorted list is returned.
  • Access of sorted elements using index is not supported. For eg. accessing Kth smallest/largest element in the Priority Queue is not possible without deque-ing the first K-1 smallest/largest elements from queue.
  • Usually Implemented using Heaps.
  • Java has PriorityQueue<T> implemented in java.util package.
Indexed Priority Queue (IPQ)
  • Access of sorted elements using index is possible.
  • During implementation of IPQ, Indexes of elements inside IPQ are tracked manually, whenever any element is queued or dequeued, the indexes are updated.
  • Java doesn’t have built-in IPQ data structure. It has to be implemented manually (can be done using heap).
Removing objects from Prioryt Queue in O(logn) time
  • The remove(object) method removes the matching object from Priority queue, This typically executes in O(n) time
  • Changing the value of objects stored in priority queue with the help of some reference variable does NOT automatically re-arranges the priority queue.
  • To maintain the sorted order of objects in Priority queue while updating the value, we must first remove the older object value from PQ, then add the new updated value.
  • There are multiple workarounds to avoid performing remove operation and spending O(n) time:

Method 1: Add a boolean property named “obsolete” to each of the object. If an object has to be removed, instead of calling remove() method, just set the obsolete property to true. This way, whenever poll() or peek() is performed on priority queue, if the found object is set as obsolete, we poll that object from priority queue.

Method 2: If adding a new property to the object/class is not possible, then a Hash Set can be used to store the obsolete property of the object

Using this workaround avoids the call to remove() method hence increasing the time efficiency. However downside of such workaround is that it takes more space to store an additional property for each object. And since now the priority queue can also contain the obsolete objects along with new updated objects, this also adds up to the space complexity.

Method 3: Use Ordered Set, which works similar to priority queue, however ordered set can’t have multiple objects with equal values.

Graph

  • Collection of nodes connected together with edges.
    • Directed Graph: Direction of edges matter.
    • Undirected or Bidirected Graph: Direction of edges doesn’t matter.
    • Cyclic Graph: Graph that contains cycles (loops).
    • Acyclic Graph: Graph doesn’t contain cycles.
    • Direct Acyclic Graph: A directed graph which does not contain cycles when traversing through edges respecting the direction of edges. The graph however may contain cycles when direction of edges is not considered.
    • Connected Graph: All nodes are connected, there is only one group, every node is reachable from every other node in the graph.
    • Disconnected Graph: All nodes are not connected, there can be multiple groups of nodes.
    • Weighted graph: Each edge is associated with some value/cost.
    • Complete Graph: A graph in which each pair of vertices are connected with a single edge. (Every possible edge exists). A complete graph with \( n \) vertices has \[ \frac{n\left(n-1\right)}{2} \] edges.
    • Diameter of Graph: The greatest minimum distance between any pair of vertices in a graph. In other words, two find diameter of graph -> find shortest distance between each pair of vertices of graph -> the maximum of those distances is the diameter of graph.
    • Bipartite Graph: A graph in which nodes can be divided into two groups such that every edge connecting the nodes are also divided into to parts. I.e, graph is bipartite if it’s possible to divide graph in two groups, such that for each edge in the graph, the two endpoints of the edge lie in different groups.
      • All Acyclic graph are Bipartite graph
      • A cyclic graph is Bipartite if, each cycle of the graph has even length (even number of edges).
    • Isomorphic Graphs: Two graphs are Isomorphic if both of them are structurally the same. Position of nodes doesn’t matter, as long as structure of two graphs are same.
    • In-degree of a node: The number of incoming edges attached to the node.
    • Out-degree of a node: The number of outgoing edges attached to the node.
Graph Algorithms
  • Union Find algorithm: Also called Disjoint Set Union (DSU) algorithm
    • Used to find groups in graph. Each group is represented by it’s representative node.
    • Implementation contains two functions union() and find()
Topological Sort
Topological Sort
  • Topological Sorting of graph: Sorting the nodes of graph in a way such that, when nodes are placed linearly one after another, the direction of all edges are from left to right.
    • A single graph can be topologically sorted in multiple ways. (non-unique solution)
    • Use case: Program build dependency, Course prerequisite completion
    • Topological Sort Algorithms:
  • Single source shortest path (SSSP): For any given node, find shortest path/distance of all other nodes from given node.
    • Used with weighted direct acyclic graphs (DAG)
    • TODO: SSSP algorithms: Dijkstra, bellman ford, floyd warshall
  • Minimum Spanning Tree:
    • TODO: Krukshal algorithm, Prim’s algorithm
Graph Problems
  1. Check if graph is bipartite: Can be achived by checking if graph is acyclic. In case if graph is found to be cyclic, making sure that each cycle has even number of edges. This can be achived by DFS as well as BFS
  2. Find Center or Diameter of acyclic undirected graph (Tree): Center of undirected acyclic graph (Tree) is the center node which falls under the longest path in the Tree (undirected acyclic graph). The longest path is called diameter of the Tree.
  • TODO: Check if two graphs are isomorphic.

Tree

Trees are Acyclic connected Graph where any node of the graph can be represented as Root of the tree.

Rooted Tree: Directed acyclic connected graph, where direction of edges start from root of tree to the end/leaf node of tree.

Rooted Tree Problems
  • Find diameter of rooted tree. O(n). (Link)
  • TODO: Isomorphic trees
  • TODO: encoding and decoding Trees

Segment Tree

  • Binary (usually) tree based data structure used to store information of ranges as well as perform operations on the ranges of the arrays. Any array can be broken into multiple segments/ranges and information of these ranges can be stored in a binary tree where each node accounts for storing information of some range of the array in binary tree fashion.
  • A simple and general implementation of segment tree is used for querying sum/min/max of range of array in O(logN) and do the point updates of values of array in O(logN) time.
  • Segment tree works with ranges which are additive in nature, i.e. if \( f\left(a,b\right) \) is value of range \( a \) to \( b\), and \( f\left(c,d\right) \) is value of range \( c \) to \( d \), then \( f\left(a,d\right)=f\left(a,b\right)+f\left(c,d\right) \) should satisfy.
  • Segment tree is better when frequent query as well as updates are performed. Otherwise prefix sum can be used just to get range sum in constant time.
  • Segment tree are also used to calculate GCD/LCM of range of arrays.
Implementation of Segment Tree
Lazy Propagation in Segment Tree

Lazy propagation technique is used when we want to perform range updates on the the values of the array in O(LogN) operations.

Fenwick Tree (Binary Indexed Tree / BIT)

  • Fenwick Tree, also called Binary Indexed Tree (BIT) are used to store information of ranges of arrays as well as perform operations on those ranges of the arrays.
  • Fenwick trees work with the ranges where value of range can be deduced by subtracting two values of bigger ranges. I.e. if \( f\left(a,b\right) \) is value of range \( a \) to \( b\), and \( f\left(a,c\right) \) is value of range \( a \) to \( c \) then, the value of range b to c can be calculated based on the formula \( f\left(b,c\right)=f\left(a,c\right)-f\left(a,b\right) \)
  • Common use of Fenwick tree is range sum queries with point updates on the array. However Range minimum/maximum query (RMQ) can’t be implemented since information stored in ranges for these kind of queries don’t have subtractive nature.
  • General implementation of Fenwick tree works with 1 based indexes, however slight modification can be made to make it work with 0 based indexes.

A nice tutorial video on Fenwick Trees.

Sorted Set (Tree Set)

Maintains key -> value pair sorted by keys

supports operations like firtKey(), lastKey(), ceilingKey(), floorKey(), update/put/get

usually implemented using red-black tree (TODO)

Dynamic Programming with multiple state variables

Note: This is NOT multi-dimentional DP

In Multi-state DP problems, the solution of a problem in one state is derived from the solution of subproblem in another state. For eg, to find the maximum profit made by selling a stock at day \( d \) by trading stock can be derived by knowing the maximum avialable balance present after buying the stock in previous days, here buying and selling are two states of the problem.

Multi-State DP ProblemSolution
123. Best Time to Buy and Sell Stock IIIUsing 2-state DP
By partitioning array in two halfs

Boyer Moore’s Majority Voting Algorithm

Boyer Moore’s Majority Voting algorithm is used to find the majority element. A majority element is an element in array which repeats more than half the length of the array.

ProblemSolution
Majority ElementO(1) space, O(n) time
Majority Element IIO(1) space, O(n) time