Close form solution for finding a root

Suppose I have a Pandas Series s whose values sum to 1 and whose values are also all greater than or equal to 0. I need to subtract a constant from all values such that the sum of the new Series is equal to 0.6. The catch is, when I subtract this constant, the values never end up less than zero.

In math formula, assume I have a series of x‘s and I want to find k

enter image description here


import pandas as pd
import numpy as np
from string import ascii_uppercase

np.random.seed([3, 141592653])
s = np.power(
    1000, pd.Series(
).pipe(lambda s: s / s.sum())


A    0.001352
B    0.163135
C    0.088365
D    0.010904
E    0.007615
F    0.407947
G    0.005856
H    0.198381
I    0.027455
J    0.088989
dtype: float64

The sum is 1



What I’ve tried

I can use Newton’s method (among others) found in Scipy’s optimize module

from scipy.optimize import newton

def f(k):
    return s.sub(k).clip(0).sum() - .6

Finding the root of this function will give me the k I need

initial_guess = .1
k = newton(f, x0=initial_guess)

Then subtract this from s

new_s = s.sub(k).clip(0)

A    0.000000
B    0.093772
C    0.019002
D    0.000000
E    0.000000
F    0.338583
G    0.000000
H    0.129017
I    0.000000
J    0.019626
dtype: float64

And the new sum is




Can we find k without resorting to using a solver?


Updated: Three different implementations – interestingly, the least sophisticated scales best.

import numpy as np

def f_sort(A, target=0.6):
    B = np.sort(A)
    C = np.cumsum(np.r_[B[0], np.diff(B)] * np.arange(N, 0, -1))
    idx = np.searchsorted(C, 1 - target)
    return B[idx] + (1 - target - C[idx]) / (N-idx)

def f_partition(A, target=0.6):
    target, l = 1 - target, len(A)
    while len(A) > 1:
        m = len(A) // 2
        A = np.partition(A, m-1)
        ls = A[:m].sum()
        if ls + A[m-1] * (l-m) > target:
            A = A[:m]
            l -= m
            target -= ls
            A = A[m:]
    return target / l            

def f_direct(A, target=0.6):
    target = 1 - target
    while True:
        gt = A > target / len(A)
        if np.all(gt):
            return target / len(A)
        target -= A[~gt].sum()
        A = A[gt]

N = 10
A = np.random.random(N)
A /= A.sum()

print(f_sort(A), np.clip(A-f_sort(A), 0, None).sum())
print(f_partition(A), np.clip(A-f_partition(A), 0, None).sum())
print(f_direct(A), np.clip(A-f_direct(A), 0, None).sum())

from timeit import timeit
kwds = dict(globals=globals(), number=1000)

N = 100000
A = np.random.random(N)
A /= A.sum()

print(timeit('f_sort(A)', **kwds))
print(timeit('f_partition(A)', **kwds))
print(timeit('f_direct(A)', **kwds))

Sample run:

0.04813686999999732 0.5999999999999999
0.048136869999997306 0.6000000000000001
0.048136869999997306 0.6000000000000001

Vectorized way of checking dataframe values (as key, value tuple) against a dictionary?

I’d like to create a column in my dataframe that checks whether the values in one column are the dictionary values of another column which comprises the dictionary keys, like so:

In [3]:
df = pd.DataFrame({'Model': ['Corolla', 'Civic', 'Accord', 'F-150'],
                   'Make': ['Toyota', 'Honda', 'Toyota', 'Ford']})
dic = {'Prius':'Toyota', 'Corolla':'Toyota', 'Civic':'Honda', 
       'Accord':'Honda', 'Odyssey':'Honda', 'F-150':'Ford', 
       'F-250':'Ford', 'F-350':'Ford'}

Out [3]:
     Model    Make
0  Corolla  Toyota
1    Civic   Honda
2   Accord  Toyota
3    F-150    Ford

And after applying a function, or whatever it takes, I’d like to see:

Out [10]:
     Model    Make   match
0  Corolla  Toyota    TRUE
1    Civic   Honda    TRUE
2   Accord  Toyota   FALSE
3    F-150    Ford    TRUE

Thanks in advance!

Edit: I tried making a function that is passed a tuple which would be the two columns, but I don’t think I’m passing the arguments correctly:

def is_match(make, model):
    has_item = dic[make] == model
  except KeyError:
    has_item = False

df[['Model', 'Make']].apply(is_match)

results in:
TypeError: ("is_match() missing 1 required positional 
argument: 'model'", 'occurred at index Model')


You can using map

     Make    Model  match
0  Toyota  Corolla   True
1   Honda    Civic   True
2  Toyota   Accord  False
3    Ford    F-150   True

How can you re-use a variable scope in tensorflow without a new scope being created by default?

I have created a variable scope in one part of my graph, and later in another part of the graph I want to add OPs to an existing scope. That equates to this distilled example:

import tensorflow as tf

with tf.variable_scope('myscope'):
  tf.Variable(1.0, name='var1')

with tf.variable_scope('myscope', reuse=True):
  tf.Variable(2.0, name='var2')

print([ for n in tf.get_default_graph().as_graph_def().node])

Which yields:


My desired result is:


I saw this question which didn’t seem to have an answer that addressed the question directly: TensorFlow, how to reuse a variable scope name


Here is one straightforward way to do this using as with somename in a context manager. Using this somename.original_name_scope property, you can retrieve that scope and then add more variables to it. Below is an illustration:

In [6]: with tf.variable_scope('myscope') as ms1:
   ...:   tf.Variable(1.0, name='var1')
   ...: with tf.variable_scope(ms1.original_name_scope) as ms2:
   ...:   tf.Variable(2.0, name='var2')
   ...: print([ for n in tf.get_default_graph().as_graph_def().node])

Please also note that setting reuse=True is optional; That is, even if you pass reuse=True, you’d still get the same result.

Another way (thanks to OP himself!) is to just add / at the end of the variable scope when reusing it as in the following example:

In [13]: with tf.variable_scope('myscope'):
    ...:   tf.Variable(1.0, name='var1')
    ...: # reuse variable scope by appending `/` to the target variable scope
    ...: with tf.variable_scope('myscope/', reuse=True):
    ...:   tf.Variable(2.0, name='var2')
    ...: print([ for n in tf.get_default_graph().as_graph_def().node])

Also, please note that setting reuse=True is again optional; That is, even if you pass reuse=True, you’d still get the same result.

Negatively updating a Python dict [NOT "key"]

I am looking for a way to update/access a Python dictionary by addressing all keys that do NOT match the key given.

That is, instead of the usual dict[key], I want to do something like dict[!key]. I found a workaround, but figured there must be a better way which I cannot figure out at the moment.

# I have a dictionary of counts
dicti = {"male": 1, "female": 200, "other": 0}

# Problem: I encounter a record (cannot reproduce here) that 
# requires me to add 1 to every key in dicti that is NOT "male", 
# i.e. dicti["female"], and  dicti["other"], 
# and other keys I might add later

# Here is what I am doing and I don't like it
dicti.update({k: v + 1 for k,v in dicti.items() if k != "male"})


If you have to perform this “add to others” operation more often, and if all the values are numeric, you could also subtract from the given key and add the same value to some global variable counting towards all the values (including that same key). For example, as a wrapper class:

import collections
class Wrapper:
    def __init__(self, **values):
        self.d = collections.Counter(values)
        self.n = 0
    def add(self, key, value):
        self.d[key] += value
    def add_others(self, key, value):
        self.d[key] -= value
        self.n += value
    def get(self, key):
        return self.d[key] + self.n
    def to_dict(self):
        if self.n != 0:  # recompute dict and reset global offset
            self.d = {k: v + self.n for k, v in self.d.items()}
            self.n = 0
        return self.d


>>> dicti = Wrapper(**{"male": 1, "female": 200, "other": 0})
>>> dicti.add("male", 2)
>>> dicti.add_others("male", 5)
>>> dicti.get("male")
>>> dicti.to_dict()
{'other': 5, 'female': 205, 'male': 3}

The advantage is that both the add and the add_others operation are O(1) and only when you actually need them, you update the values with the global offset. Of course, the to_dict operation still is O(n), but the updated dict can be saved and only recomputed when add_other has been called again in between.

Create an indicator column based on one column being within +/- 5% of another column

I would like to populate the ‘Indicator’ column based on both charge columns. If ‘Charge1’ is within plus or minus 5% of the ‘Charge2’ value, set the ‘Indicator’ to RFP, otherwise leave it blank (see example below).

ID  Charge1  Charge2  Indicator
1   9.5      10       RFP
2   22       20 
3   41       40       RFP
4   65       80 
5   160      160      RFP
6   315      320      RFP
7   613      640      RFP
8   800      700    
9   759      800    
10  1480     1500     RFP

I tried using a .loc approach, but struggled to establish if ‘Charge1’ was within +/- 5% of ‘Charge2’.


In [190]: df.loc[df.eval("Charge2*0.95 <= Charge1 <= Charge2*1.05"), 'RFP'] = 'REP'

In [191]: df
   ID  Charge1  Charge2  RFP
0   1      9.5       10  REP
1   2     22.0       20  NaN
2   3     41.0       40  REP
3   4     65.0       80  NaN
4   5    160.0      160  REP
5   6    315.0      320  REP
6   7    613.0      640  REP
7   8    800.0      700  NaN
8   9    759.0      800  NaN
9  10   1480.0     1500  REP

How to efficiently find the indices of matching elements in two lists

I am working on two large data sets, and my question is as follows.

Suppose I have two lists:

list1 = [A,B,C,D]

list2 = [B,D,A,G]

How can I efficiently find the matching index, using Python, other than O(n2) searching? The result should look like:

matching_index(list1,list2) -> [(0,2),(1,0),(3,1)]


Without duplicates

If your objects are hashable and your lists have no duplicates, you can create an inverse index of the first list and then traverse the second list. This traverses each list only once and thus is O(n).

def find_matching_index(list1, list2):

    inverse_index = { element: index for index, element in enumerate(list1) }

    return [(index, inverse_index[element])
        for index, element in enumerate(list2) if element in inverse_index]

find_matching_index([1,2,3], [3,2,1]) # [(0, 2), (1, 1), (2, 0)]

With duplicates

You can extend the previous solution to account for duplicates. You can keep track of multiple index with a set.

def find_matching_index(list1, list2):

    # Create an inverse index which keys are now sets
    inverse_index = {}

    for index, element in enumerate(list1):

        if element not in inverse_index:
            inverse_index[element] = {index}


    # Traverse the second list    
    matching_index = []

    for index, element in enumerate(list2):

        # We have to create one pair by element in the set of the inverse index
        if element in inverse_index:
            matching_index.extend([(x, index) for x in inverse_index[element]])

    return matching_index

find_matching_index([1, 1, 2], [2, 2, 1]) # [(2, 0), (2, 1), (0, 2), (1, 2)]

Unfortunately, this is no longer O(n). Consider the case where you input [1, 1] and [1, 1], the output is [(0, 0), (0, 1), (1, 0), (1, 1)]. Thus by the size of the output, the worst case is O(n^2).

Although, this solution is still O(n) if there are not duplicates.

How can I apply a function to itself?

Suppose I have function, f, which takes in some variable and returns a variable of the same type. For simplicity, let’s say

def f(x):
    return x/2+1

I’m interested in applying f to itself over and over. Something like f(f(f(...(f(x))...))).

I could do this like

s = f(x)
for i in range(100):
    s = f(s)

But I was wondering if there was a simpler, less verbose way to doing the same thing. I wan’t to avoid for loops (just as a challenge to myself). Is there maybe some way of using map or a similar function to accomplish this?


Is there maybe some way of using map or a similar function to accomplish this?

Not map, but reduce. I wouldn’t use it for this, but you could call reduce on an n-item sequence to cause f to be called n times. For example:

>>> def f(x):
...   return x+1
>>> reduce(lambda n,_: f(n), range(100), 42)


  • n is assigned each successive return value of f.
  • _ is the list of numbers from range(100). These numbers are all ignored. All that matters is how many there are.
  • 42 is the starting value.

100 nested calls to f(f(f...(f(42))...)) results in 142.