Purpose of multiprocessing.Pool.apply and multiprocessing.Pool.apply_async

See example and execution result below:

#!/usr/bin/env python3.4
from multiprocessing import Pool
import time
import os

def initializer():
    print("In initializer pid is {} ppid is {}".format(os.getpid(),os.getppid()))

def f(x):
    print("In f pid is {} ppid is {}".format(os.getpid(),os.getppid()))
    return x*x

if __name__ == '__main__':
    print("In main pid is {} ppid is {}".format(os.getpid(), os.getppid()))
    with Pool(processes=4, initializer=initializer) as pool:  # start 4 worker processes
        result = pool.apply(f, (10,)) # evaluate "f(10)" in a single process
        print(result)

        #result = pool.apply_async(f, (10,)) # evaluate "f(10)" in a single process
        #print(result.get())

Gives:

$ ./pooleg.py
In main pid is 22783 ppid is 19542
In initializer pid is 22784 ppid is 22783
In initializer pid is 22785 ppid is 22783
In initializer pid is 22787 ppid is 22783
In f pid is 22784 ppid is 22783
In initializer pid is 22786 ppid is 22783
100

As is clear from the output: 4 processes were created but only one of them actually did the work (called f).

Question: Why would I create a pool of > 1 workers and call apply() when the work f is done only by one process ? And same thing for apply_async() because in that case also the work is only done by one worker.

I don’t understand the use cases in which these functions are useful.

Solution:

First off, both are meant to operate on argument-tuples (single function calls), contrary to the Pool.map variants which operate on iterables. So it’s not an error when you observe only one process used when you call these functions only once.


You would use Pool.apply_async instead of one of the Pool.map versions, where you need more fine grained control over the single tasks you want to distribute.

The Pool.map versions take an iterable and chunk them into tasks, where every task has the same (mapped) target function.
Pool.apply_async typically isn’t called only once with a pool of >1 workers. Since it’s asynchronous, you can iterate over manually pre-bundled tasks and submit them to several
worker-processes before any of them has completed. Your task-list here can consist of different target functions like you can see in this answer here. It also allows registering callbacks for results and errors like in this example.

These properties make Pool.apply_async pretty versatile and a first-choice tool for unusual problem scenarios you cannot get done with one of the Pool.map versions.


Pool.apply indeed is not widely usefull at first sight (and second). You could use it to synchronize control flow in a scenario where you start up multiple tasks with apply_async first and then have a task which has to be completed before you fire up another round of tasks with apply_async.

Using Pool.apply could also just mean sparing you to create a single extra Process for an in-between task, when you already have a pool which is currently idling.

Advertisements

How to convert list of list into structured dict, Python3

I have a list of list, the content of which should be read and store in a structured dictionary.

my_list = [
    ['1', 'a1', 'b1'],
    ['',  'a2', 'b2'],
    ['',  'a3', 'b3'],
    ['2', 'c1', 'd1'],
    ['',  'c2', 'd2']]

The 1st, 2nd, 3rd columns in each row represents 'id', 'attr1', 'attr2'. If 'id' in a row is not empty, a new object starts with this 'id'. In the example above, there are two objects. The object with 'id' being '1' has 3 elements in both 'attr1' and 'attr2'; while the object with 'id' being '2' has 2 elements in both 'attr1' and 'attr2'. In my real application, there can be more objects, and each object can have an arbitrary number of elements.

For this particular example, the outcome should be

my_dict = {
    'id': ['1', '2'],
    'attr1': [['a1', 'a2', 'a3'], ['c1', 'c2']]
    'attr2': [['b1', 'b2', 'b3'], ['d1', 'd2']]

Could you please show me how to write a generic and efficient code to achieve it?

Thanks!

Solution:

Just build the appropriate dict in a loop with the right conditions:

d = {f: [] for f in ('id', 'attr1', 'attr2')}

for id, attr1, attr2 in my_list:
    if id:
        d['id'].append(id)
        d['attr1'].append([])
        d['attr2'].append([])
    d['attr1'][-1].append(attr1)
    d['attr2'][-1].append(attr2)

"or" boolean in an inline "if" statement

I have a program that begins by parsing several arguments, one of which is a “verbose” flag. However, I also have a “simulate” flag, which I would like to automatically flip the verbose flag to “True” if it is on.

Right now I have this working:

if args.verbose or simulate:  
    verbose = True

How can I get this onto one line? I was expecting to be able to do something like:

verbose = True if args.verbose or simulate

or like:

verbose = True if (args.verbose or simulate)

While searching here, I found a solution that fits on one line:

verbose = (False, True)[args.verbose or simulate]

However, I find that solution to be much less readable than the others that I was hoping would work. Is this possible, and I’m just missing something? Or is it not possible to use an “or” between two checks for “True” like this in one line?

Solution:

The problem isn’t with or, it’s that you need an else clause to specify what the value should be if the if statement fails. Otherwise, what is getting assigned if the condition is false?

verbose = True if args.verbose or simulate else False

There’s no need for the if at all, though. It’s even simpler if you just assign the result of the test to verbose directly:

verbose = args.verbose or simulate

How to return the most frequent letters in a string and order them based on their frequency count

I have this string: s = "china construction bank". I want to create a function that returns the 3 most frequent characters and order them by their frequency of appearance and the number of times they appear, but if 2 characters appears the same number of times, they should be ordered based on their alphabetical order. I also want to print each character in a separate line.

I have built this code by now:

from collections import Counter
def ordered_letters(s, n=3):
    ctr = Counter(c for c in s if c.isalpha())
    print ''.join(sorted(x[0] for x in ctr.most_common(n)))[0], '\n', ''.join(sorted(x[0] for x in ctr.most_common(n)))[1], '\n', ''.join(sorted(x[0] for x in ctr.most_common(n)))[2]`

This code applied to the above string will yield:

a 
c 
n

But this is not what i really want, what i would like as output is:

1st most frequent: 'n'. Appearances: 4
2nd most frequent: 'c'. Appearances: 3
3rd most frequent: 'a'. Appearances: 2

I’m stuck in the part where i have to print in alphabetical order the characters which have the same frequencies. How could i do this?

Thank you very much in advance

Solution:

You can use heapq.nlargest with a custom sort key. We use -ord(k) as a secondary sorter to sort by ascending letters. Using a heap queue is better than sorted as there’s no need to sort all items in your Counter object.

from collections import Counter
from heapq import nlargest

def ordered_letters(s, n=3):
    ctr = Counter(c.lower() for c in s if c.isalpha())

    def sort_key(x):
        return (x[1], -ord(x[0]))

    for idx, (letter, count) in enumerate(nlargest(n, ctr.items(), key=sort_key), 1):
        print('#', idx, 'Most frequent:', letter, '.', 'Appearances:', count)

ordered_letters("china construction bank")

# 1 Most frequent: n . Appearances: 4
# 2 Most frequent: c . Appearances: 3
# 3 Most frequent: a . Appearances: 2

Python / Get unique tokens from a file with a exception

I want to find the number of unique tokens in a file. For this purpose I wrote the below code:

splittedWords = open('output.txt', encoding='windows-1252').read().lower().split()
uniqueValues = set(splittedWords)

print(uniqueValues)

The output.txt file is like this:

Türkiye+Noun ,+Punc terörizm+Noun+Gen ve+Conj kitle+Noun imha+Noun silah+Noun+A3pl+P3sg+Gen küresel+Adj düzey+Noun+Loc olus+Verb+Caus+PastPart+P3sg tehdit+Noun+Gen boyut+Noun+P3sg karsi+Adj+P3sg+Loc ,+Punc tüm+Det ülke+Noun+A3pl+Gen yay+Verb+Pass+Inf2+Gen önle+Verb+Pass+Inf2+P3sg hedef+Noun+A3pl+P3sg+Acc paylas+Verb+PastPart+P3pl ,+Punc daha+Noun güven+Noun+With ve+Conj istikrar+Noun+With bir+Num dünya+Noun düzen+Noun+P3sg için+PostpPCGen birlik+Noun+Loc çaba+Noun göster+Verb+PastPart+P3pl bir+Num asama+Noun+Dat gel+Verb+Pass+Inf2+P3sg+Acc samimi+Adj ol+Verb+ByDoingSo arzula+Verb+Prog2+Cop .+Punc 
Ab+Noun ile+PostpPCNom gümrük+Noun Alan+Noun+P3sg+Loc+Rel kurumsal+Adj iliski+Noun+A3pl 
club+Noun toplanti+Noun+A3pl+P3sg 
Türkiye+Noun+Gen -+Punc At+Noun gümrük+Noun isbirlik+Noun+P3sg komite+Noun+P3sg ,+Punc Ankara+Noun Anlasma+Noun+P3sg+Gen 6+Num madde+Noun+P3sg uyar+Verb+When ortaklik+Noun rejim+Noun+P3sg+Gen uygula+Verb+Pass+Inf2+P3sg+Acc ve+Conj gelis+Verb+Inf2+P3sg+Acc sagla+Verb+Inf1 üzere+PostpPCNom ortaklik+Noun Konsey+Noun+P3sg+Gen 2+Num /+Punc 69+Num sayili+Adj karar+Noun+P3sg ile+Conj teknik+Noun komite+Noun mahiyet+Noun+P3sg+Loc kur+Verb+Pass+Narr+Cop .+Punc 
nispi+Adj 
nisbi+Adj 
görece+Adj+With 
izafi+Adj 
obur+Adj 

With this code I can get the unique tokens like Türkiye+Noun, Türkiye+Noun+Gen. But I want to get forexample Türkiye+Noun, Türkiye+Noun+Gen like only one token before the + sign. I only want Türkiye part. In the end Türkiye+Noun and Türkiye+Noun+Gen tokens needs to be same and only treated as a single unique token. I think I need to write regex for this purpose.

Solution:

It seems the word you want is always the 1st in a list of '+'-joined words:

Split the splitted words at + and take the 0th one:

text = """Türkiye+Noun ,+Punc terörizm+Noun+Gen ve+Conj kitle+Noun imha+Noun silah+Noun+A3pl+P3sg+Gen küresel+Adj düzey+Noun+Loc olus+Verb+Caus+PastPart+P3sg tehdit+Noun+Gen boyut+Noun+P3sg karsi+Adj+P3sg+Loc ,+Punc tüm+Det ülke+Noun+A3pl+Gen yay+Verb+Pass+Inf2+Gen önle+Verb+Pass+Inf2+P3sg hedef+Noun+A3pl+P3sg+Acc paylas+Verb+PastPart+P3pl ,+Punc daha+Noun güven+Noun+With ve+Conj istikrar+Noun+With bir+Num dünya+Noun düzen+Noun+P3sg için+PostpPCGen birlik+Noun+Loc çaba+Noun göster+Verb+PastPart+P3pl bir+Num asama+Noun+Dat gel+Verb+Pass+Inf2+P3sg+Acc samimi+Adj ol+Verb+ByDoingSo arzula+Verb+Prog2+Cop .+Punc 
Ab+Noun ile+PostpPCNom gümrük+Noun Alan+Noun+P3sg+Loc+Rel kurumsal+Adj iliski+Noun+A3pl 
club+Noun toplanti+Noun+A3pl+P3sg 
Türkiye+Noun+Gen -+Punc At+Noun gümrük+Noun isbirlik+Noun+P3sg komite+Noun+P3sg ,+Punc Ankara+Noun Anlasma+Noun+P3sg+Gen 6+Num madde+Noun+P3sg uyar+Verb+When ortaklik+Noun rejim+Noun+P3sg+Gen uygula+Verb+Pass+Inf2+P3sg+Acc ve+Conj gelis+Verb+Inf2+P3sg+Acc sagla+Verb+Inf1 üzere+PostpPCNom ortaklik+Noun Konsey+Noun+P3sg+Gen 2+Num /+Punc 69+Num sayili+Adj karar+Noun+P3sg ile+Conj teknik+Noun komite+Noun mahiyet+Noun+P3sg+Loc kur+Verb+Pass+Narr+Cop .+Punc 
nispi+Adj 
nisbi+Adj 
görece+Adj+With 
izafi+Adj 
obur+Adj """

splittedWords = text.lower().replace("\n"," ").split()
uniqueValues = set( ( s.split("+")[0] for s in splittedWords))

print(uniqueValues)

Output:

{'imha', 'çaba', 'ülke', 'arzula', 'terörizm', 'olus', 'daha', 'istikrar', 'küresel', 
 'sagla', 'önle', 'üzere', 'nisbi', 'türkiye', 'gelis', 'bir', 'karar', 'hedef', '2', 
 've', 'silah', 'kur', 'alan', 'club', 'boyut', '-', 'anlasma', 'iliski', 
 'izafi', 'kurumsal', 'karsi', 'ankara', 'ortaklik', 'obur', 'kitle', 'güven', 
 'uygula', 'ol', 'düzey', 'konsey', 'teknik', 'rejim', 'komite', 'gümrük', 'samimi', 
  'gel', 'yay', 'toplanti', '.', 'asama', 'mahiyet', 'ab', '69', 'için', 
 'paylas', '6', '/', 'nispi', 'dünya', 'at', 'sayili', 'görece', 'isbirlik', 'birlik', 
 ',', 'tüm', 'ile', 'düzen', 'uyar', 'göster', 'tehdit', 'madde'}

You might need to do some additional cleanup to remove things like

',' '6' '/'

Split and remove anything thats just numbers or punctuation

from string import digits, punctuation

remove=set(digits+punctuation)

splittedWords = text.lower().split()
uniqueValues = set( ( s.split("+")[0] for s in splittedWords))

# remove from set anything that only consists of numbers or punctuation
uniqueValues = uniqueValues - set ( x for x in uniqueValues if all(c in remove for c in x))
print(uniqueValues)

to get it as:

{'teknik', 'yay', 'göster','hedef', 'terörizm', 'ortaklik','ile', 'daha', 'ol', 'istikrar', 
 'paylas', 'nispi', 'üzere', 'sagla', 'tüm', 'önle', 'asama', 'uygula', 'güven', 'kur', 
 'türkiye', 'gel', 'dünya', 'gelis', 'sayili', 'ab', 'club', 'küresel', 'imha', 'çaba', 
 'olus', 'iliski', 'izafi', 'mahiyet', 've', 'düzey', 'anlasma', 'tehdit', 'bir', 'düzen', 
 'obur', 'samimi', 'boyut', 'ülke', 'arzula', 'rejim', 'gümrük', 'karar', 'at', 'karsi', 
 'nisbi', 'isbirlik', 'alan', 'toplanti', 'ankara', 'birlik', 'kurumsal', 'için', 'kitle', 
 'komite', 'silah', 'görece', 'uyar', 'madde', 'konsey'} 

Pandas: replace numpy.nan cell with maximum of non-nan adjacent cells

test case:

df = pd.DataFrame([[np.nan, 2, np.nan, 0],
                    [3, 4, np.nan, 1],
                    [np.nan, np.nan, np.nan, 5],
                    [np.nan, 3, np.nan, 4]],
                    columns=list('ABCD'))

where A[i + 1, j], A[i – 1, j], A[i, j + 1], A[i, j – 1] are the set of
entries adjacent to A[i,j].

In so many words, this:

     A    B   C  D
0  NaN  2.0 NaN  0
1  3.0  4.0 NaN  1
2  NaN  NaN NaN  5
3  NaN  3.0 NaN  4

should become this:

     A    B   C  D
0  3.0  2.0 2.0  0.0
1  3.0  4.0 4.0  1.0
2  3.0  4.0 5.0  5.0
3  3.0  3.0 4.0  4.0

Solution:

You can use the rolling method over both directions and then find the max of each. Then you can use that to fill in the missing values of the original.

df1 = df.rolling(3, center=True, min_periods=1).max().fillna(-np.inf)
df2 = df.T.rolling(3, center=True, min_periods=1).max().T.fillna(-np.inf)
fill = df1.where(df1 > df2).fillna(df2)
df.fillna(fill)

Output

     A    B    C  D
0  3.0  2.0  2.0  0
1  3.0  4.0  4.0  1
2  3.0  4.0  5.0  5
3  3.0  3.0  4.0  4

Invalid syntax during reading of csv file in python

I am trying to read a file using csv.reader in python. I am new to Python and am using Python 2.7.15.

The example that I am trying to recreate is gotten from “Reading CSV Files With csv” section of this page. This is the code:

import csv

with open('employee_birthday.txt') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            print(f'Column names are {", ".join(row)}')
            line_count += 1
        else:
            print(f'\t{row[0]} works in the {row[1]} department, and was born in {row[2]}.')
            line_count += 1
    print(f'Processed {line_count} lines.')

During execution of the code, I am getting the following errors:

File "sidd_test2.py", line 11
  print(f'Column names are {", ".join(row)}')
                                         ^
SyntaxError: invalid syntax 

What am I doing wrong? How can I avoid this error. I will appreciate any help.

Solution:

Because f in front of strings (f-strings) are only for versions above python 3.5, so try this:

print('Column names are',", ".join(row))

Or:

print('Column names are %s'%", ".join(row))

Or:

print('Column names are {}'.format(", ".join(row)))