Toll Roads

From programming_contest
Revision as of 07:44, 14 February 2023 by Kmk21 (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This problem defines a "toll" to traverse a road, and asks of a road network multiple queries of what is the minimum maximum toll to be paid to connect two cities, and how mayn cities can be connected with that toll.

The first observation (other than this is probably a graph problem), is that the problem space is too large for something like all pairs shortest path. It should also be apparent that even if we somehow were able to do dijstra, with 10^5 queries, its unlikely that we could do any serious processing. It is likely thus that:

  1. We'll have to process the input into some data structure
  2. There is likely a log in the runtime

The first intuition is to consider, isntead of going query by query, or city by city, what if we tried to process roads by toll cost? How would we, for instance, be able to generate all groupings of cities one could reach with weight 3? Naturally, one would go through each toll of length 3 and do a union-find to create the groupings of connected cities. How, then, would we extend this to tolls of weight 4, such that we might be able to do something useful? If we consider a toll of weight 4 might link up some of these weight 3 groups, we might see that using least common ancestor among pairs of nodes in this new super-group would tell us whether we have to traverse the "4" toll node. We can use this to form an algorithm outline:

  1. Order the tolls by weight
  2. For each weight, identify the groups of the two cities that are being joined via "find"
    1. If the tolls joining both the groups are the same weight as the new toll, join them
    2. If the tolls are not the same, create a new meta-node with the weight of the toll, and make the previous meta-nodes (or the city node itself) children of this meta-node
  3. When we are complete, we have a tree with the following properties
    • every meta-node has a toll-weight strictly greater than that of all its decendents
    • The actual city nodes are leaf nodes
    • The LCA defines the highest node in the tree, and thus highest toll, we must traverse to bridge two cities
    • The number of cities reachable from that node is the size of the subtree rooted at that LCA node

Aside from the LCA, this operation runs in NlogN time, which is sufficient for the size. And if we had the same speed from LCA, it would also suffice.

The traditional LCA algorithm involves walking from the two candidate nodes to the vertex, caching their ancestry, and finding the point at which it diverges. This is sufficient in balanced trees, as the length of the ancestry is logN. However, in this case, we have a worst-case scenario with a tree-depth O(N). This results from the merging of trees with differing tolls, which does not adhere to the union-find paradigm of join smaller to larger. This compromise is, of course, necessary, to get us in a situation where we have the necessary properties to utilize LCA in the first place.

Fortunately, there is a way to compute LCA in non-balanced trees using a form of binary search, called binary lifting. It comprises of two things:

Identifying if u is and ancestor of v

Without any pre-processing, we would have to search from u to find if v was a descendent, which is too slow. It is also too slow to cache ones entire ancestray at each node. If this were a binary search tree, we could do this easily by caching the lowest and highest values rooted at a given node. But we don't have a BST. We can, however, overlay one. We can perform a DFS traversal of the tree, and maintain a global order of nodes visited. When we enter and exit a node, we can cache the current counter representing the global order. Therefore, we can determine if a node is a decendent of us if their node value is between the values cached on our own entry and exit. Therefore, is u an ancestor of v is translated to "is v a descendent of u," which with this extra data, is a trivial check.

Jumping Around the Tree

As we are performing the DFS, we will build a pointer to each node that is a power of 2 above us. So each node will have a 1-ancestor, 2-ancestor, 4-, 8-, 16-, and so on. This is extra logN per node, so allowable. We can then binary search our ancestry as follows:

  1. check our furtherst N-ancestor to see if it meets the condition
    1. If it does, update current node to that N-ancestor
    2. If it does not, remain at current node
  2. update N to N/2, and repeat

Using this method, we can binary search through our ancetry to find a node which meets some condition. This is known as binary lifting. In this case, the condition we seek to meet is whether the ancestor is ALSO an ancestor of the other node. This gives an overall LCA algorithm as follows

  1. Check if u is an ancestor of V, or v is an ancestor of u. If so, mission accomplished
  2. Binary lift from u until a node which is an ancestor of v is found. mission accomplsihed

Since the binary lift is logN time, and the check for ancestry is constant, the overall query for LCA takes logN time, even in the unbalanced tree.

The last piece is to count the nodes rooted at that LCA, which can be construed from the two indices we cached to identify ancestry.

All phases are NlogN, and thus the algorithm is fast enough.