Bug 81806 - Split in pbds works in O(n) instead of O(log n)
Summary: Split in pbds works in O(n) instead of O(log n)
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 6.4.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-08-10 17:37 UTC by Oleksandr Kulkov
Modified: 2020-01-08 19:56 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2019-08-27 00:00:00


Attachments
patch optimizing pb_ds tree split, RFC (1.37 KB, patch)
2020-01-08 19:25 UTC, Xi Ruoyao
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Oleksandr Kulkov 2017-08-10 17:37:46 UTC
In the end of the split in policy based data structures extension function split finish is called (https://code.woboq.org/gcc/libstdc++-v3/include/ext/pb_ds/detail/bin_search_tree_/split_join_fn_imps.hpp.html#__gnu_pbds::detail::PB_DS_BIN_TREE_NAME::split_finish) which works in O(n) due to call of std::distance of two iterators. In the official documentation it is said to be O(log n) though. This problem can be resolved by keeping subtree sizes in metadata like it is done in tree_order_statistics_node_update. Is it possible to fix the bug?
Comment 1 Jonathan Wakely 2019-07-01 08:20:09 UTC
Do you actually use these containers?

Feedback on https://gcc.gnu.org/ml/libstdc++/2019-05/msg00107.html would be useful.
Comment 2 Jonathan Wakely 2019-07-10 09:19:18 UTC
Feedback was provided: https://gcc.gnu.org/ml/gcc/2019-07/msg00082.html
Comment 3 Xi Ruoyao 2019-08-26 06:08:02 UTC
(In reply to Oleksandr Kulkov from comment #0)
> In the end of the split in policy based data structures extension function
> split finish is called
> (https://code.woboq.org/gcc/libstdc++-v3/include/ext/pb_ds/detail/
> bin_search_tree_/split_join_fn_imps.hpp.html#__gnu_pbds::detail::
> PB_DS_BIN_TREE_NAME::split_finish) which works in O(n) due to call of
> std::distance of two iterators. In the official documentation it is said to
> be O(log n) though. This problem can be resolved by keeping subtree sizes in
> metadata like it is done in tree_order_statistics_node_update. Is it
> possible to fix the bug?

Hi Zlobober,

I remember you because I performed very badly in two contests prepared by you :).

I can't see any way to fix it w/o maintenance of subtree sizes.  We can do that but then...

1.  Is it worthy?  But I think WE can answer "YES" because there are no guys other than competitive programmers use it!  So maybe we should ask: would 
Jonathan (and libstdc++ team) accept such a "fix"?
2.  If we maintain subtree sizes for all pb_ds BBSTs should we remove tree_order_statistics_node_update, or make it a nop?
3.  What if pb_ds is immediately removed after we fix it? (I remember one of my Cilk fix, which is removed before the next release and never backported so I just effectively wasted some time.)
4.  Even if the fix is done in GCC 10, the onsite contestants have to wait for
additional several years because Ubuntu used in contest won't use latest GCC.
And for online contestants I think the better way is to copy a BBST from the
personal code library.
Comment 4 Oleksandr Kulkov 2019-08-26 11:29:06 UTC
Hi. I'm not Zlobober, I'm adamant.

1. At least, Jonathan suggested to start with fixing this in https://gcc.gnu.org/ml/libstdc++/2019-07/msg00066.html , so it doesn't seem hopeless for now
2. I'm not sure one really needs to remove it or make it a nop. Because order_statistics_node_update has order_of_key and find_by_order functions which are not necessary in other policy tags
3. Well, Jonathan dropped his suggestion to deprecate pbds in https://gcc.gnu.org/ml/libstdc++/2019-07/msg00071.html so probably we should presume pb_ds won't be immediately removed after fix
4. Even so is better than having it non-fixed or even deprecated forever
Comment 5 Oleksandr Kulkov 2019-08-26 11:29:46 UTC
Hi. I'm not Zlobober, I'm adamant.

1. At least, Jonathan suggested to start with fixing this in https://gcc.gnu.org/ml/libstdc++/2019-07/msg00066.html , so it doesn't seem hopeless for now
2. I'm not sure one really needs to remove it or make it a nop. Because order_statistics_node_update has order_of_key and find_by_order functions which are not necessary in other policy tags
3. Well, Jonathan dropped his suggestion to deprecate pbds in https://gcc.gnu.org/ml/libstdc++/2019-07/msg00071.html so probably we should presume pb_ds won't be immediately removed after fix
4. Even so is better than having it non-fixed or even deprecated forever
Comment 6 Jonathan Wakely 2019-08-26 13:58:50 UTC
(In reply to Oleksandr Kulkov from comment #5)
> 1. At least, Jonathan suggested to start with fixing this in
> https://gcc.gnu.org/ml/libstdc++/2019-07/msg00066.html , so it doesn't seem
> hopeless for now

I think ABI breaks in pb_ds are OK. If necessary we can use an inline namespace to mangle the fixed types differently.

> 4. Even so is better than having it non-fixed or even deprecated forever

Right. If you only care about the Ubuntu version used in contests, that already has pb_ds, so you wouldn't care if we removed it upstream, right? ;-)
Comment 7 Xi Ruoyao 2019-08-26 14:52:51 UTC
(In reply to Jonathan Wakely from comment #6)
> (In reply to Oleksandr Kulkov from comment #5)
> > 1. At least, Jonathan suggested to start with fixing this in
> > https://gcc.gnu.org/ml/libstdc++/2019-07/msg00066.html , so it doesn't seem
> > hopeless for now
> 
> I think ABI breaks in pb_ds are OK. If necessary we can use an inline
> namespace to mangle the fixed types differently.

Oh I didn't expect an ABI breakage.  I thought "pb_ds is a source code library so there is no ABI in libstdc++.so".  Now I understand that we should not break old third-party libraries compiled with the old pb_ds headers (is there any? :)

> > 4. Even so is better than having it non-fixed or even deprecated forever
> 
> Right. If you only care about the Ubuntu version used in contests, that
> already has pb_ds, so you wouldn't care if we removed it upstream, right? ;-)

I don't only care about the Ubuntu in contests (actually I'm retired so I can only participant in online contests now).  If we want to keep pb_ds instead of deprecate it, this PR have to be fixed.

One LUG co-leader in our university is also an experienced competitive programmer.  I've discussed this issue with him.  Hopefully we can collaborate to make a fix.

Something may be off-topic:  ICPC is kind enough to use the latest Ubuntu LTS.  One local contest I've participated in was using Windoge and an ancient MinGW.  So even std::string didn't work :(.
Comment 8 Jonathan Wakely 2019-08-27 10:36:20 UTC
(In reply to Xi Ruoyao from comment #7)
> Oh I didn't expect an ABI breakage.  I thought "pb_ds is a source code
> library so there is no ABI in libstdc++.so".  Now I understand that we
> should not break old third-party libraries compiled with the old pb_ds
> headers (is there any? :)

The libstdc++ ABI is larger than the set of symbols exported from libstdc++.so

std::vector is defined entirely in headers, not exported from libstdc++.so, but we can't just change its layout. That would make it impossible to link objects compiled with different versions of GCC because they would disagree on the layout of std::vector.

So this isn't about third-party libraries, it's about all code that uses pb_ds. If you compile foo.o with one version of GCC and bar.o with another version of GCC, then you should be able to link them together.
Comment 9 Jonathan Wakely 2019-08-27 10:37:51 UTC
**But**, as I said, I think it's OK to change the pb_ds types. We can use inline namespaces to change the mangled names of the affected types, and the user base of pb_ds is probably so tiny that it won't actually hurt anybody.

Changing std::vector would not be OK, because everybody uses it.
Comment 10 Raihat Zaman Neloy 2020-01-08 07:31:19 UTC
Hello all,
I am also a user of pb_ds and recently dig into the split function's o(n) behaviour. A lot of contestants from my country (Bangladesh) uses these DS. If the split function behaves correctly (logn complexity), this DS's will have been vastly used for query processing. Please consider this before deprecating the pb_ds. It helps contestants a lot to code and solve quicker.
Comment 11 Jonathan Wakely 2020-01-08 11:32:12 UTC
Again, GCC's purpose is not to help people in programming contests.
Comment 12 Xi Ruoyao 2020-01-08 19:25:13 UTC
Created attachment 47615 [details]
patch optimizing pb_ds tree split, RFC

I made up a patch but I doubt if this is really useful in competitive programming.  (I just tried to find a problem to test the split functionality but I couldn't find one.)

(1) Even with this patch, splitting with rb_tree_tag is still stupidly slow.  pb_ds documentation says it is "poly-logarithmic complexity".  It's too loose - maybe $O(log n)^3$?  So the only useful case is with splay_tree_tag.

(2) It's impossible to implement any "batch updating" with pb_ds.  For example, considering a problem where we would maintain an integer set $S$, supporting several operations on it:

add a: $S \leftarrow \{x \in S | x + a\}$
query b: output $\min_{x \in S} |b - x|$
rm c: remove $c$ from $S$, if $c \in S$
ins d: insert $d$ into $S$

This is a splay problem, but requiring batch updating and lazy evaluation.  Unfortunately we just can't do it with pb_ds.  I believe this "defect" directly makes split/join useless.

Then should we add more functionality into pb_ds?  I think Jonathan won't agree since he is absolutely correct that GCC is not to help people in programming contests.  But if we not, pb_ds won't be very useful.  Only very naive splay problems can be solved with it.  In real contests my friend wang9897 can manually code a splay tree before I finish up all the tree_order_statistics_node_update stuff, for those naive problems.
Comment 13 Raihat Zaman Neloy 2020-01-08 19:56:07 UTC
(In reply to Xi Ruoyao from comment #12)
> Created attachment 47615 [details]
> patch optimizing pb_ds tree split, RFC
> 
> I made up a patch but I doubt if this is really useful in competitive
> programming.  (I just tried to find a problem to test the split
> functionality but I couldn't find one.)
> 
> (1) Even with this patch, splitting with rb_tree_tag is still stupidly slow.
> pb_ds documentation says it is "poly-logarithmic complexity".  It's too
> loose - maybe $O(log n)^3$?  So the only useful case is with splay_tree_tag.
> 
> (2) It's impossible to implement any "batch updating" with pb_ds.  For
> example, considering a problem where we would maintain an integer set $S$,
> supporting several operations on it:
> 
> add a: $S \leftarrow \{x \in S | x + a\}$
> query b: output $\min_{x \in S} |b - x|$
> rm c: remove $c$ from $S$, if $c \in S$
> ins d: insert $d$ into $S$
> 
> This is a splay problem, but requiring batch updating and lazy evaluation. 
> Unfortunately we just can't do it with pb_ds.  I believe this "defect"
> directly makes split/join useless.
> 
> Then should we add more functionality into pb_ds?  I think Jonathan won't
> agree since he is absolutely correct that GCC is not to help people in
> programming contests.  But if we not, pb_ds won't be very useful.  Only very
> naive splay problems can be solved with it.  In real contests my friend
> wang9897 can manually code a splay tree before I finish up all the
> tree_order_statistics_node_update stuff, for those naive problems.

Hi Xi,
Thanks a lot for your comment. 

(1) It's true that rb_tree_tag is slower. I used splay_tree_tag to solve few problems.

(2) It's also true that for batch-update / range-update / lazy propagation, pb_ds not gonna work. But if the split/join works as intended, then it would possible to get help of pb_ds where we need some kinds of 2D data-structures. Like - a Segtree where each node may need to contain another BIT/BST for some simple calculations.

It's also true that, as a versatile BST, I will code Treap/Splay. But for some simple calculations like I describe, writing the whole code is also too much costly (where we atleast have the feature, but not efficient).

On the other hand, today I tried to solve SPOJ-CITY2 with pb_ds to check if split or join works perfectly or not, I figured out a work-around for that (using what I got AC as well). Here is the code: https://paste.ubuntu.com/p/w3hWpRmJSP/
I just tried to override Standard namespace's distance function to work with that.

I also agree that GNU's purpose isn't helping people doing competitive programming but my opinion is something like, as the feature already exists, it can be efficient as well (which is the intended behaviour I would say) :)