Let's consider two ways to solve it that are two slow, and then merge them into one solution, that will be guaranteed to finish in milliseconds.

**Approach 1** (slow)

Allocate an array `v`

of size `2^16`

. Every time you add an element, do the following:

```
void add(int s) {
for (int i = 0; i < (1 << 16); ++ i) if ((s & i) == 0) {
v[s | i] ++;
}
}
```

(to delete do the same, but decrement instead of incrementing)

Then to answer `cnt s`

you just need to return the value of `v[s]`

. To see why, note that `v[s]`

is incremented exactly once for every number `a`

that is added such that `a & s == a`

(I will leave it is an exercise to figure out why this is the case).

**Approach 2** (slow)

Allocate an array `v`

of size `2^16`

. When you add an element `s`

, just increment `v[s]`

. To query the count, do the following:

```
int cnt(int s) {
int ret = 0;
for (int i = 0; i < (1 << 16); ++ i) if ((s | i) == s) {
ret += v[s & ~i];
}
return ret;
}
```

(`x & ~y`

is a number that has all the bits that are set in `x`

that are not set in `y`

)

This is a more straightforward approach, and is very similar to what you do, but is written in a slightly different fashion. You will see why I wrote it this way when we combine the two approaches.

Both these approaches are too slow, because in which of them one operation is constant, and one is `O(s)`

, so in the worst case, when the entire input consists of the slow operations, we spend `O(Q * s)`

, which is prohibitively slow. Now let's merge the two approaches using meet-in-the-middle to get a faster solution.

**Fast approach**

We will merge the two approaches in the following way: `add`

will work similarly to the first approach, but instead of considering every number `a`

such that `a & s == a`

, we will only consider numbers, that differ from `s`

only in the lowest 8 bits:

```
void add(int s) {
for (int i = 0; i < (1 << 8); ++ i) if ((i & s) == 0) {
v[s | i] ++;
}
}
```

For delete do the same, but instead of incrementing elements, decrement them.

For counts we will do something similar to the second approach, but we will account for the fact that each `v[a]`

is already accumulated for all combinations of the lowest 8 bits, so we only need to iterate over all the combinations of the higher 8 bits:

```
int cnt(int s) {
int ret = 0;
for (int i = 0; i < (1 << 8); ++ i) if ((s | (i << 8)) == s) {
ret += v[s & ~(i << 8)];
}
return ret;
}
```

Now both `add`

and `cnt`

work in `O(sqrt(s))`

, so the entire approach is `O(Q * sqrt(s))`

, which for your constraints should be milliseconds.

Pay extra attention to overflows -- you didn't provide the upper bound on `s`

, if it is too high, you might want to replace `int`

s with `long long`

s.

`add`

directive. It would seem to me it would be worth trying to brute-force this via`std::unordered_map<int, unsigned int>`

, mapping values to their instance counts. The`del`

directive would remove the key if the count falls to zero, and the`cnt`

command would iterate over the container and, for each hit, add in the mapped count to the running total. Or, I utterly misunderstood the question.9more comments