PrevNext
Rare
 0/10

Persistent Data Structures

Authors: Andi Qu, Benjamin Qi

What if data structures could time travel?

Overview

A persistent data structure is a data structure that preserves the previous versions of itself when modified, allowing access to any historical version. In other words, once a change is made to the structure, both the original and modified versions remain accessible. This is particularly useful in scenarios where you need to keep track of the history of updates or backtrack to previous states of the data structure.

Persistent Array

Persistent arrays are one of the simplest persistent data structures. A persistent array should be able to access and update its elements at given times.

Fat Nodes

C++

In C++, one can implement this to run in O(logN)\mathcal O(\log N) time per query and O(1)\mathcal O(1) time per update by using an array of vectors.

vector<pair<int, int>> arr[100001]; // The persistent array
int get_item(int index, int time) {
// Gets the array item at a given index and time
auto ub =
upper_bound(arr[index].begin(), arr[index].end(), make_pair(time, INT_MAX));
return prev(ub)->second;
}
void update_item(int index, int value, int time) {

This approach (i.e. storing multiple values at each index without erasing old values) is known as fat nodes.

Although easy to implement, fat nodes are only partially persistent, meaning that only the latest version of the data structure can be modified.

For most competitive programming problems involving persistent data structures, we use path copying instead.

Path Copying

One can implement path copying to run in O(logN)\mathcal O(\log N) time per query and update by using a binary-tree-like structure where array elements are the leaves.

This is very similar to a sparse segment tree. The key differences are that we have multiple roots and every time we "update" a node, we actually create a new node in its place (hence persistence).

C++

struct Node {
int val;
Node *l, *r;
Node(ll x) : val(x), l(nullptr), r(nullptr) {}
Node(Node *ll, Node *rr) : val(0), l(ll), r(rr) {}
};
int n, a[100001]; // The initial array and its size
Node *roots[100001]; // The persistent array's roots

Path copying is fully persistent.

Persistent Segment Tree

Since persistent arrays with path copying are so similar to sparse segment trees, it's relatively straightforward to convert one into a persistent segment tree - just add range queries!

Focus Problem – try your best to solve this problem before continuing!

Resources

Resources
oml1111
Anudeep2011

not great formatting

This section is not complete.

Any help would be appreciated! Just submit a Pull Request on GitHub.

Solution

Since this problem involves range queries, we'll use some type of segment tree to solve it. (We can also use a Fenwick tree, but that's much harder to make persistent.)

When dealing with problems involving multiple dimensions, it's often helpful to view one of those dimensions as time. In this problem, we'll view the index of each array as its time dimension.

Using a persistent segment tree, we can then turn the problem into the following:

  • Type 1 queries involve a point update on the segment tree at some time.
  • Type 2 queries involve a range query on the segment tree at some time.
  • Type 3 queries involve copying the root of the segment tree at some time and appending it to the array of segment tree roots.

C++

#include <bits/stdc++.h>
typedef long long ll;
using namespace std;
struct Node {
ll val;
Node *l, *r;
Node(ll x) : val(x), l(nullptr), r(nullptr) {}
Node(Node *ll, Node *rr) {

Application 1 - Static 2D Range Sums on Large Grids

Persistent segment trees can be used for online 2D static range sum queries in O(logN)\mathcal O(\log N) time (think of it like prefix sums).

Note that 2D Fenwick trees with coordinate compression often also work for this (and are easier to implement), but it's still good to know this application.

Application 2 - Largest Interval Completely Inside a Range

Consider the following problem:

Given NN intervals on the number line, answer QQ queries of the form "what is the largest interval completely contained inside the range [x,y][x, y]?"

N,Q105N, Q \leq 10^5.

Since each interval has two dimensions (i.e. left and right endpoints lil_i and rir_i), we can view it as a point on the number line at lil_i with "value" rilir_i - l_i that was inserted at time rir_i.

Now, each query becomes "what is the most valuable point in the range [x,y][x, y] that was inserted at or before time yy?" This is much easier to handle, so we can solve this problem in O(QlogN)\mathcal O(Q \log N) time.

Problems

StatusSourceProblem NameDifficultyTags
CFEasy
Show TagsPersistent Segtree
COCINormal
Show TagsPersistent Segtree
NOINormal
Show TagsPersistent Segtree
CEOIHard
Show TagsPersistent Segtree, Sqrt
APIOHard
Show Tags2DRQ, Euler's Formula, Persistent Segtree
COCIVery Hard
Show TagsPersistent Segtree
IOIVery Hard
Show Tags2DRQ, Persistent Segtree

Persistent Heap

Resources
Benq
StatusSourceProblem NameDifficultyTags
Wesley's Anger ContestVery Hard
YSVery Hard

Module Progress:

Join the USACO Forum!

Stuck on a problem, or don't understand a module? Join the USACO Forum and get help from other competitive programmers!

PrevNext